Re: [linux-sunxi] [PATCH 2/8] clocksource: sun4i: Add clocksource and sched clock drivers

2013-06-28 Thread Maxime Ripard
Hi Siarhei,

> > > If we can be absolutely sure that nothing else may ever change
> > > the TIMER_CNT64_CTL_REG, then its default value can be probably
> > > cached instead of doing expensive read from the hardware register
> > > each time?
> > 
> > Since it's a free-running counter, its value will always change, so the
> > caching will bring no additions at all, right?
> 
> Sorry, 'caching' was not a very good description for something that is
> already a compile time constant. I mean just replace
> 
> u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG);
> 
> with 
> 
> u32 reg = TIMER_CNT64_CTL_CLR;
> 
> Because we know that the TIMER_CNT64_CTL_REG is already supposed
> to have the default TIMER_CNT64_CTL_CLR value (initialized in the
> 'sun4i_timer_init' function) between calls to 'sun4i_timer_sched_read'.
> Inside of 'sun4i_timer_sched_read' we set an extra TIMER_CNT64_CTL_RL
> bit in this register, but wait until it clears, effectively reverting
> TIMER_CNT64_CTL_REG register back to the default TIMER_CNT64_CTL_CLR
> value.
> 
> Removing this extra HW register read can save roughly a hundred of CPU
> cycles here and provide a ~10% overall improvement for gettimeofday
> (these estimates are based on the earlier benchmarks done with the
> Allwinner 3.4 kernel).
> 
> Or maybe I'm overlooking something?

Ah, I get what you mean now. We don't even need to bother about
TIMER_CNT64_CTL_CLR now, since it's suppose to be cleared once the
counter is reset.

However, I'd very much prefer to take the safer approach for now, and
try to optimise afterwards.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature


Re: [linux-sunxi] [PATCH 2/8] clocksource: sun4i: Add clocksource and sched clock drivers

2013-06-27 Thread Siarhei Siamashka
On Thu, 27 Jun 2013 19:02:28 +0200
Maxime Ripard  wrote:

> Hi Siarhei,
> 
> On Thu, Jun 27, 2013 at 01:17:29PM +0300, Siarhei Siamashka wrote:
> > On Wed, 26 Jun 2013 23:16:55 +0200
> > Maxime Ripard  wrote:
> > 
> > > The A10 and the A13 has a 64 bits free running counter that we can use
> > > as a clocksource and a sched clock, that were both not used yet on these
> > > platforms.
> > > 
> > > Signed-off-by: Maxime Ripard 
> > > ---
> > >  drivers/clocksource/sun4i_timer.c | 27 +++
> > >  1 file changed, 27 insertions(+)
> > > 
> > > diff --git a/drivers/clocksource/sun4i_timer.c 
> > > b/drivers/clocksource/sun4i_timer.c
> > > index bdf34d9..1d2eaa0 100644
> > > --- a/drivers/clocksource/sun4i_timer.c
> > > +++ b/drivers/clocksource/sun4i_timer.c
> > > @@ -23,6 +23,8 @@
> > >  #include 
> > >  #include 
> > >  
> > > +#include 
> > > +
> > >  #define TIMER_IRQ_EN_REG 0x00
> > >  #define TIMER_IRQ_EN(val)BIT(val)
> > >  #define TIMER_IRQ_ST_REG 0x04
> > > @@ -34,6 +36,11 @@
> > >  #define TIMER_CNTVAL_REG(val)(0x10 * val + 0x18)
> > >  
> > >  #define TIMER_SCAL   16
> > > +#define TIMER_CNT64_CTL_REG  0xa0
> > > +#define TIMER_CNT64_CTL_CLR  BIT(0)
> > > +#define TIMER_CNT64_CTL_RL   BIT(1)
> > > +#define TIMER_CNT64_LOW_REG  0xa4
> > > +#define TIMER_CNT64_HIGH_REG 0xa8
> > >  
> > >  static void __iomem *timer_base;
> > >  
> > > @@ -96,6 +103,20 @@ static struct irqaction sun4i_timer_irq = {
> > >   .dev_id = &sun4i_clockevent,
> > >  };
> > >  
> > > +static u32 sun4i_timer_sched_read(void)
> > > +{
> > > + u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG);
> > 
> > If we can be absolutely sure that nothing else may ever change
> > the TIMER_CNT64_CTL_REG, then its default value can be probably
> > cached instead of doing expensive read from the hardware register
> > each time?
> 
> Since it's a free-running counter, its value will always change, so the
> caching will bring no additions at all, right?

Sorry, 'caching' was not a very good description for something that is
already a compile time constant. I mean just replace

u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG);

with 

u32 reg = TIMER_CNT64_CTL_CLR;

Because we know that the TIMER_CNT64_CTL_REG is already supposed
to have the default TIMER_CNT64_CTL_CLR value (initialized in the
'sun4i_timer_init' function) between calls to 'sun4i_timer_sched_read'.
Inside of 'sun4i_timer_sched_read' we set an extra TIMER_CNT64_CTL_RL
bit in this register, but wait until it clears, effectively reverting
TIMER_CNT64_CTL_REG register back to the default TIMER_CNT64_CTL_CLR
value.

Removing this extra HW register read can save roughly a hundred of CPU
cycles here and provide a ~10% overall improvement for gettimeofday
(these estimates are based on the earlier benchmarks done with the
Allwinner 3.4 kernel).

Or maybe I'm overlooking something?

-- 
Best regards,
Siarhei Siamashka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-sunxi] [PATCH 2/8] clocksource: sun4i: Add clocksource and sched clock drivers

2013-06-27 Thread Maxime Ripard
Hi Siarhei,

On Thu, Jun 27, 2013 at 01:17:29PM +0300, Siarhei Siamashka wrote:
> On Wed, 26 Jun 2013 23:16:55 +0200
> Maxime Ripard  wrote:
> 
> > The A10 and the A13 has a 64 bits free running counter that we can use
> > as a clocksource and a sched clock, that were both not used yet on these
> > platforms.
> > 
> > Signed-off-by: Maxime Ripard 
> > ---
> >  drivers/clocksource/sun4i_timer.c | 27 +++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/drivers/clocksource/sun4i_timer.c 
> > b/drivers/clocksource/sun4i_timer.c
> > index bdf34d9..1d2eaa0 100644
> > --- a/drivers/clocksource/sun4i_timer.c
> > +++ b/drivers/clocksource/sun4i_timer.c
> > @@ -23,6 +23,8 @@
> >  #include 
> >  #include 
> >  
> > +#include 
> > +
> >  #define TIMER_IRQ_EN_REG   0x00
> >  #define TIMER_IRQ_EN(val)  BIT(val)
> >  #define TIMER_IRQ_ST_REG   0x04
> > @@ -34,6 +36,11 @@
> >  #define TIMER_CNTVAL_REG(val)  (0x10 * val + 0x18)
> >  
> >  #define TIMER_SCAL 16
> > +#define TIMER_CNT64_CTL_REG0xa0
> > +#define TIMER_CNT64_CTL_CLRBIT(0)
> > +#define TIMER_CNT64_CTL_RL BIT(1)
> > +#define TIMER_CNT64_LOW_REG0xa4
> > +#define TIMER_CNT64_HIGH_REG   0xa8
> >  
> >  static void __iomem *timer_base;
> >  
> > @@ -96,6 +103,20 @@ static struct irqaction sun4i_timer_irq = {
> > .dev_id = &sun4i_clockevent,
> >  };
> >  
> > +static u32 sun4i_timer_sched_read(void)
> > +{
> > +   u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG);
> 
> If we can be absolutely sure that nothing else may ever change
> the TIMER_CNT64_CTL_REG, then its default value can be probably
> cached instead of doing expensive read from the hardware register
> each time?

Since it's a free-running counter, its value will always change, so the
caching will bring no additions at all, right?

> The gettimeofday() abusers will feel a bit less pain. ARM does not
> currently enjoy the VDSO optimized gettimeofday, so the software
> which has been only tested on x86 may get a nasty surprise (an order
> of magnitude higher gettimeofday overhead).
> 
> > +   writel(reg | TIMER_CNT64_CTL_RL, timer_base + TIMER_CNT64_CTL_REG);
> > +   while (readl(timer_base + TIMER_CNT64_CTL_REG) & TIMER_CNT64_CTL_REG);
> 
> Some may think that this particular loop looks like a performance
> bottleneck, but it is very rarely run for more than one iteration.
> In fact, most of the time it just happens to be a single HW register
> read.

Thanks for your insight on this.

It does make me more eager to merge the simpler approach first, and then
try to take some shortcuts if needed and safe enough.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature


Re: [linux-sunxi] [PATCH 2/8] clocksource: sun4i: Add clocksource and sched clock drivers

2013-06-27 Thread Siarhei Siamashka
On Wed, 26 Jun 2013 23:16:55 +0200
Maxime Ripard  wrote:

> The A10 and the A13 has a 64 bits free running counter that we can use
> as a clocksource and a sched clock, that were both not used yet on these
> platforms.
> 
> Signed-off-by: Maxime Ripard 
> ---
>  drivers/clocksource/sun4i_timer.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/clocksource/sun4i_timer.c 
> b/drivers/clocksource/sun4i_timer.c
> index bdf34d9..1d2eaa0 100644
> --- a/drivers/clocksource/sun4i_timer.c
> +++ b/drivers/clocksource/sun4i_timer.c
> @@ -23,6 +23,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  #define TIMER_IRQ_EN_REG 0x00
>  #define TIMER_IRQ_EN(val)BIT(val)
>  #define TIMER_IRQ_ST_REG 0x04
> @@ -34,6 +36,11 @@
>  #define TIMER_CNTVAL_REG(val)(0x10 * val + 0x18)
>  
>  #define TIMER_SCAL   16
> +#define TIMER_CNT64_CTL_REG  0xa0
> +#define TIMER_CNT64_CTL_CLR  BIT(0)
> +#define TIMER_CNT64_CTL_RL   BIT(1)
> +#define TIMER_CNT64_LOW_REG  0xa4
> +#define TIMER_CNT64_HIGH_REG 0xa8
>  
>  static void __iomem *timer_base;
>  
> @@ -96,6 +103,20 @@ static struct irqaction sun4i_timer_irq = {
>   .dev_id = &sun4i_clockevent,
>  };
>  
> +static u32 sun4i_timer_sched_read(void)
> +{
> + u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG);

If we can be absolutely sure that nothing else may ever change
the TIMER_CNT64_CTL_REG, then its default value can be probably
cached instead of doing expensive read from the hardware register
each time?

The gettimeofday() abusers will feel a bit less pain. ARM does not
currently enjoy the VDSO optimized gettimeofday, so the software
which has been only tested on x86 may get a nasty surprise (an order
of magnitude higher gettimeofday overhead).

> + writel(reg | TIMER_CNT64_CTL_RL, timer_base + TIMER_CNT64_CTL_REG);
> + while (readl(timer_base + TIMER_CNT64_CTL_REG) & TIMER_CNT64_CTL_REG);

Some may think that this particular loop looks like a performance
bottleneck, but it is very rarely run for more than one iteration.
In fact, most of the time it just happens to be a single HW register
read.

> +
> + return readl(timer_base + TIMER_CNT64_LOW_REG);
> +}
> +
> +static cycle_t sun4i_timer_clksrc_read(struct clocksource *c)
> +{
> + return sun4i_timer_sched_read();
> +}
> +
>  static void __init sun4i_timer_init(struct device_node *node)
>  {
>   unsigned long rate = 0;
> @@ -117,6 +138,12 @@ static void __init sun4i_timer_init(struct device_node 
> *node)
>  
>   rate = clk_get_rate(clk);
>  
> + writel(TIMER_CNT64_CTL_CLR, timer_base + TIMER_CNT64_CTL_REG);
> + setup_sched_clock(sun4i_timer_sched_read, 32, clk_get_rate(clk));
> + clocksource_mmio_init(timer_base + TIMER_CNT64_LOW_REG, node->name,
> +   clk_get_rate(clk), 300, 32,
> +   sun4i_timer_clksrc_read);
> +
>   writel(rate / (TIMER_SCAL * HZ),
>  timer_base + TIMER_INTVAL_REG(0));
>  



-- 
Best regards,
Siarhei Siamashka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/