Re: [linux-sunxi] [PATCH 2/8] clocksource: sun4i: Add clocksource and sched clock drivers
Hi Siarhei, > > > If we can be absolutely sure that nothing else may ever change > > > the TIMER_CNT64_CTL_REG, then its default value can be probably > > > cached instead of doing expensive read from the hardware register > > > each time? > > > > Since it's a free-running counter, its value will always change, so the > > caching will bring no additions at all, right? > > Sorry, 'caching' was not a very good description for something that is > already a compile time constant. I mean just replace > > u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG); > > with > > u32 reg = TIMER_CNT64_CTL_CLR; > > Because we know that the TIMER_CNT64_CTL_REG is already supposed > to have the default TIMER_CNT64_CTL_CLR value (initialized in the > 'sun4i_timer_init' function) between calls to 'sun4i_timer_sched_read'. > Inside of 'sun4i_timer_sched_read' we set an extra TIMER_CNT64_CTL_RL > bit in this register, but wait until it clears, effectively reverting > TIMER_CNT64_CTL_REG register back to the default TIMER_CNT64_CTL_CLR > value. > > Removing this extra HW register read can save roughly a hundred of CPU > cycles here and provide a ~10% overall improvement for gettimeofday > (these estimates are based on the earlier benchmarks done with the > Allwinner 3.4 kernel). > > Or maybe I'm overlooking something? Ah, I get what you mean now. We don't even need to bother about TIMER_CNT64_CTL_CLR now, since it's suppose to be cleared once the counter is reset. However, I'd very much prefer to take the safer approach for now, and try to optimise afterwards. Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com signature.asc Description: Digital signature
Re: [linux-sunxi] [PATCH 2/8] clocksource: sun4i: Add clocksource and sched clock drivers
On Thu, 27 Jun 2013 19:02:28 +0200 Maxime Ripard wrote: > Hi Siarhei, > > On Thu, Jun 27, 2013 at 01:17:29PM +0300, Siarhei Siamashka wrote: > > On Wed, 26 Jun 2013 23:16:55 +0200 > > Maxime Ripard wrote: > > > > > The A10 and the A13 has a 64 bits free running counter that we can use > > > as a clocksource and a sched clock, that were both not used yet on these > > > platforms. > > > > > > Signed-off-by: Maxime Ripard > > > --- > > > drivers/clocksource/sun4i_timer.c | 27 +++ > > > 1 file changed, 27 insertions(+) > > > > > > diff --git a/drivers/clocksource/sun4i_timer.c > > > b/drivers/clocksource/sun4i_timer.c > > > index bdf34d9..1d2eaa0 100644 > > > --- a/drivers/clocksource/sun4i_timer.c > > > +++ b/drivers/clocksource/sun4i_timer.c > > > @@ -23,6 +23,8 @@ > > > #include > > > #include > > > > > > +#include > > > + > > > #define TIMER_IRQ_EN_REG 0x00 > > > #define TIMER_IRQ_EN(val)BIT(val) > > > #define TIMER_IRQ_ST_REG 0x04 > > > @@ -34,6 +36,11 @@ > > > #define TIMER_CNTVAL_REG(val)(0x10 * val + 0x18) > > > > > > #define TIMER_SCAL 16 > > > +#define TIMER_CNT64_CTL_REG 0xa0 > > > +#define TIMER_CNT64_CTL_CLR BIT(0) > > > +#define TIMER_CNT64_CTL_RL BIT(1) > > > +#define TIMER_CNT64_LOW_REG 0xa4 > > > +#define TIMER_CNT64_HIGH_REG 0xa8 > > > > > > static void __iomem *timer_base; > > > > > > @@ -96,6 +103,20 @@ static struct irqaction sun4i_timer_irq = { > > > .dev_id = &sun4i_clockevent, > > > }; > > > > > > +static u32 sun4i_timer_sched_read(void) > > > +{ > > > + u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG); > > > > If we can be absolutely sure that nothing else may ever change > > the TIMER_CNT64_CTL_REG, then its default value can be probably > > cached instead of doing expensive read from the hardware register > > each time? > > Since it's a free-running counter, its value will always change, so the > caching will bring no additions at all, right? Sorry, 'caching' was not a very good description for something that is already a compile time constant. I mean just replace u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG); with u32 reg = TIMER_CNT64_CTL_CLR; Because we know that the TIMER_CNT64_CTL_REG is already supposed to have the default TIMER_CNT64_CTL_CLR value (initialized in the 'sun4i_timer_init' function) between calls to 'sun4i_timer_sched_read'. Inside of 'sun4i_timer_sched_read' we set an extra TIMER_CNT64_CTL_RL bit in this register, but wait until it clears, effectively reverting TIMER_CNT64_CTL_REG register back to the default TIMER_CNT64_CTL_CLR value. Removing this extra HW register read can save roughly a hundred of CPU cycles here and provide a ~10% overall improvement for gettimeofday (these estimates are based on the earlier benchmarks done with the Allwinner 3.4 kernel). Or maybe I'm overlooking something? -- Best regards, Siarhei Siamashka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-sunxi] [PATCH 2/8] clocksource: sun4i: Add clocksource and sched clock drivers
Hi Siarhei, On Thu, Jun 27, 2013 at 01:17:29PM +0300, Siarhei Siamashka wrote: > On Wed, 26 Jun 2013 23:16:55 +0200 > Maxime Ripard wrote: > > > The A10 and the A13 has a 64 bits free running counter that we can use > > as a clocksource and a sched clock, that were both not used yet on these > > platforms. > > > > Signed-off-by: Maxime Ripard > > --- > > drivers/clocksource/sun4i_timer.c | 27 +++ > > 1 file changed, 27 insertions(+) > > > > diff --git a/drivers/clocksource/sun4i_timer.c > > b/drivers/clocksource/sun4i_timer.c > > index bdf34d9..1d2eaa0 100644 > > --- a/drivers/clocksource/sun4i_timer.c > > +++ b/drivers/clocksource/sun4i_timer.c > > @@ -23,6 +23,8 @@ > > #include > > #include > > > > +#include > > + > > #define TIMER_IRQ_EN_REG 0x00 > > #define TIMER_IRQ_EN(val) BIT(val) > > #define TIMER_IRQ_ST_REG 0x04 > > @@ -34,6 +36,11 @@ > > #define TIMER_CNTVAL_REG(val) (0x10 * val + 0x18) > > > > #define TIMER_SCAL 16 > > +#define TIMER_CNT64_CTL_REG0xa0 > > +#define TIMER_CNT64_CTL_CLRBIT(0) > > +#define TIMER_CNT64_CTL_RL BIT(1) > > +#define TIMER_CNT64_LOW_REG0xa4 > > +#define TIMER_CNT64_HIGH_REG 0xa8 > > > > static void __iomem *timer_base; > > > > @@ -96,6 +103,20 @@ static struct irqaction sun4i_timer_irq = { > > .dev_id = &sun4i_clockevent, > > }; > > > > +static u32 sun4i_timer_sched_read(void) > > +{ > > + u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG); > > If we can be absolutely sure that nothing else may ever change > the TIMER_CNT64_CTL_REG, then its default value can be probably > cached instead of doing expensive read from the hardware register > each time? Since it's a free-running counter, its value will always change, so the caching will bring no additions at all, right? > The gettimeofday() abusers will feel a bit less pain. ARM does not > currently enjoy the VDSO optimized gettimeofday, so the software > which has been only tested on x86 may get a nasty surprise (an order > of magnitude higher gettimeofday overhead). > > > + writel(reg | TIMER_CNT64_CTL_RL, timer_base + TIMER_CNT64_CTL_REG); > > + while (readl(timer_base + TIMER_CNT64_CTL_REG) & TIMER_CNT64_CTL_REG); > > Some may think that this particular loop looks like a performance > bottleneck, but it is very rarely run for more than one iteration. > In fact, most of the time it just happens to be a single HW register > read. Thanks for your insight on this. It does make me more eager to merge the simpler approach first, and then try to take some shortcuts if needed and safe enough. Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com signature.asc Description: Digital signature
Re: [linux-sunxi] [PATCH 2/8] clocksource: sun4i: Add clocksource and sched clock drivers
On Wed, 26 Jun 2013 23:16:55 +0200 Maxime Ripard wrote: > The A10 and the A13 has a 64 bits free running counter that we can use > as a clocksource and a sched clock, that were both not used yet on these > platforms. > > Signed-off-by: Maxime Ripard > --- > drivers/clocksource/sun4i_timer.c | 27 +++ > 1 file changed, 27 insertions(+) > > diff --git a/drivers/clocksource/sun4i_timer.c > b/drivers/clocksource/sun4i_timer.c > index bdf34d9..1d2eaa0 100644 > --- a/drivers/clocksource/sun4i_timer.c > +++ b/drivers/clocksource/sun4i_timer.c > @@ -23,6 +23,8 @@ > #include > #include > > +#include > + > #define TIMER_IRQ_EN_REG 0x00 > #define TIMER_IRQ_EN(val)BIT(val) > #define TIMER_IRQ_ST_REG 0x04 > @@ -34,6 +36,11 @@ > #define TIMER_CNTVAL_REG(val)(0x10 * val + 0x18) > > #define TIMER_SCAL 16 > +#define TIMER_CNT64_CTL_REG 0xa0 > +#define TIMER_CNT64_CTL_CLR BIT(0) > +#define TIMER_CNT64_CTL_RL BIT(1) > +#define TIMER_CNT64_LOW_REG 0xa4 > +#define TIMER_CNT64_HIGH_REG 0xa8 > > static void __iomem *timer_base; > > @@ -96,6 +103,20 @@ static struct irqaction sun4i_timer_irq = { > .dev_id = &sun4i_clockevent, > }; > > +static u32 sun4i_timer_sched_read(void) > +{ > + u32 reg = readl(timer_base + TIMER_CNT64_CTL_REG); If we can be absolutely sure that nothing else may ever change the TIMER_CNT64_CTL_REG, then its default value can be probably cached instead of doing expensive read from the hardware register each time? The gettimeofday() abusers will feel a bit less pain. ARM does not currently enjoy the VDSO optimized gettimeofday, so the software which has been only tested on x86 may get a nasty surprise (an order of magnitude higher gettimeofday overhead). > + writel(reg | TIMER_CNT64_CTL_RL, timer_base + TIMER_CNT64_CTL_REG); > + while (readl(timer_base + TIMER_CNT64_CTL_REG) & TIMER_CNT64_CTL_REG); Some may think that this particular loop looks like a performance bottleneck, but it is very rarely run for more than one iteration. In fact, most of the time it just happens to be a single HW register read. > + > + return readl(timer_base + TIMER_CNT64_LOW_REG); > +} > + > +static cycle_t sun4i_timer_clksrc_read(struct clocksource *c) > +{ > + return sun4i_timer_sched_read(); > +} > + > static void __init sun4i_timer_init(struct device_node *node) > { > unsigned long rate = 0; > @@ -117,6 +138,12 @@ static void __init sun4i_timer_init(struct device_node > *node) > > rate = clk_get_rate(clk); > > + writel(TIMER_CNT64_CTL_CLR, timer_base + TIMER_CNT64_CTL_REG); > + setup_sched_clock(sun4i_timer_sched_read, 32, clk_get_rate(clk)); > + clocksource_mmio_init(timer_base + TIMER_CNT64_LOW_REG, node->name, > + clk_get_rate(clk), 300, 32, > + sun4i_timer_clksrc_read); > + > writel(rate / (TIMER_SCAL * HZ), > timer_base + TIMER_INTVAL_REG(0)); > -- Best regards, Siarhei Siamashka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/