Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-07 Thread Tony Lindgren
* John Stultz john.stu...@linaro.org [150706 10:53]:
 On Mon, Jul 6, 2015 at 8:46 AM, Thomas Gleixner t...@linutronix.de wrote:
  On Mon, 6 Jul 2015, Tony Lindgren wrote:
  * Thomas Gleixner t...@linutronix.de [150706 07:20]:
   On Mon, 6 Jul 2015, Tony Lindgren wrote:
  The timekeeping accuracy issue certainly needs some thinking, and
  also the resolution between the clocksources can be different.. In the
  test case I have the slow timer is always on and of a lower resolution
  than the ARM global timer being used during runtime.
 
  Got some handy timer test in mind you want me to run to provide data
  on the accuracy?
 
  John Stultz might have something.
 
 You can turn on ntp stats in your ntp.conf and chart the loopstats
 data w/ gnuplot:
 
 set terminal png
 set output loopstat.png
 plot /var/log/ntpstats/loopstats.dateoftest using 2:3 with linespoints
 
 
 I also have  the drift-log.py and graph-log.py scripts I use here:
 https://github.com/johnstultz-work/timetests
 
 But those aren't really ready for mass consumption, as they're
 probably not very cross-distro compatible.

Great thanks, I'll play a bit with those to see how bad the jitter gets.

Regrds,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-07 Thread Tony Lindgren
* John Stultz john.stu...@linaro.org [150706 10:55]:
 On Mon, Jul 6, 2015 at 10:34 AM, Thomas Gleixner t...@linutronix.de wrote:
  On Mon, 6 Jul 2015, John Stultz wrote:
  On Mon, Jul 6, 2015 at 12:12 AM, Tony Lindgren t...@atomide.com wrote:
   Some persistent clocksources can be on a slow external bus. For shorter
   latencies for RT use, let's allow toggling the clocksource during idle
   between a faster non-persistent runtime clocksource and a slower 
   persistent
   clocksource.
 
  So yea, as Thomas mentioned, this will cause problems for timekeeping
  accuracy, since we end up resetting the ntp state when we change
  clocksource (additionally you gain a bit of error each time you switch
  clocksources). So you'll most likely see trouble w/ ntpd steering the
  clock.
 
  I'm not sure its quite clear in the description as to the need here.
  Could you expand a bit as to the rational for why?  I assume it has to
  do with you have a high-res clocksource that is good for fine-grained
  clock_gettime() calls, but wraps quickly, making it poor for long idle
  times. So you're trying to swap to a less fine grained clocksource
  that won't wrap so fast?
 
  The persistent-clock-like name muddies things further, since the
  persistent-clock is used for suspend, while it looks like this is just
  for idle times.
 
  Though that are idle states where the cpu is powered off so the fast
  clock source is powered off as well
 
  We all know how well that works from the x86/TSC horror. It's just the
  same thing again with a different arch prefix.
 
 Right.. and it might be telling that on x86 systems with this issue,
 we don't play games with it and that sort of hardware just has to use
 the slower and less fine-grained clocksources all the time.

Yes it's similar to the old x86 TSC vs ACPI PM timer issue. We have
ARM global timer powered off during idle.

It may be best to just use kernel cmdline to select the ARM global
timer and then disable any deeper power states if ARM global timer
is selected.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Tony Lindgren
Some persistent clocksources can be on a slow external bus. For shorter
latencies for RT use, let's allow toggling the clocksource during idle
between a faster non-persistent runtime clocksource and a slower persistent
clocksource.

Cc: Felipe Balbi ba...@ti.com
Cc: John Stultz john.stu...@linaro.org
Cc: Nishanth Menon n...@ti.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Yingjoe Chen yingjoe.c...@mediatek.com
Cc: Rafael J. Wysocki rafael.j.wyso...@intel.com
Cc: Peter Zijlstra pet...@infradead.org
Signed-off-by: Tony Lindgren t...@atomide.com
---

Anybody got better ideas for something like last_idled_cpu() type check
at the end of this patch?

---
 include/linuxt-email-lkml-omap/clocksource.h |  2 ++
 kernel/time/clocksource.c   | 60 +++--
 kernel/time/timekeeping.c   | 13 +-
 3 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 278dd27..7e5ff99 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -232,6 +232,8 @@ static inline void __clocksource_update_freq_khz(struct 
clocksource *cs, u32 khz
 
 
 extern int timekeeping_notify(struct clocksource *clock);
+extern int clocksource_pm_enter(void);
+extern void clocksource_pm_exit(void);
 
 extern cycle_t clocksource_mmio_readl_up(struct clocksource *);
 extern cycle_t clocksource_mmio_readl_down(struct clocksource *);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 841b72f..69dc307 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -93,6 +93,8 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 
to, u32 maxsec)
 /*[Clocksource internal variables]-
  * curr_clocksource:
  * currently selected clocksource.
+ * runtime_clocksource:
+ * preferred clocksource for runtime, can be local and non-persistent
  * clocksource_list:
  * linked list with the registered clocksources
  * clocksource_mutex:
@@ -101,6 +103,7 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 
to, u32 maxsec)
  * Name of the user-specified clocksource.
  */
 static struct clocksource *curr_clocksource;
+static struct clocksource *runtime_clocksource;
 static LIST_HEAD(clocksource_list);
 static DEFINE_MUTEX(clocksource_mutex);
 static char override_name[CS_NAME_LEN];
@@ -525,7 +528,8 @@ static inline void clocksource_update_max_deferment(struct 
clocksource *cs)
 
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
 
-static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
+static struct clocksource *clocksource_find_best(bool oneshot, bool persistent,
+bool skipcur)
 {
struct clocksource *cs;
 
@@ -540,6 +544,8 @@ static struct clocksource *clocksource_find_best(bool 
oneshot, bool skipcur)
list_for_each_entry(cs, clocksource_list, list) {
if (skipcur  cs == curr_clocksource)
continue;
+   if (persistent  !(cs-flags  CLOCK_SOURCE_SUSPEND_NONSTOP))
+   continue;
if (oneshot  !(cs-flags  CLOCK_SOURCE_VALID_FOR_HRES))
continue;
return cs;
@@ -553,7 +559,7 @@ static void __clocksource_select(bool skipcur)
struct clocksource *best, *cs;
 
/* Find the best suitable clocksource */
-   best = clocksource_find_best(oneshot, skipcur);
+   best = clocksource_find_best(oneshot, false, skipcur);
if (!best)
return;
 
@@ -802,6 +808,56 @@ int clocksource_unregister(struct clocksource *cs)
 }
 EXPORT_SYMBOL(clocksource_unregister);
 
+/**
+ * clocksource_pm_enter - change to a persistent clocksource before idle
+ *
+ * Changes system to use a persistent clocksource for idle. Intended to
+ * be called from CPUidle from the last active CPU.
+ */
+int clocksource_pm_enter(void)
+{
+   bool oneshot = tick_oneshot_mode_active();
+   struct clocksource *best;
+
+   if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
+ Unable to get clocksource_mutex))
+   return -EINTR;
+
+   best = clocksource_find_best(oneshot, true, false);
+   if (best) {
+   if (curr_clocksource != best 
+   !timekeeping_notify(best)) {
+   runtime_clocksource = curr_clocksource;
+   curr_clocksource = best;
+   }
+   }
+   mutex_unlock(clocksource_mutex);
+
+   return !!best;
+}
+
+/**
+ * clocksource_pm_exit - change to a runtime clocksrouce after idle
+ *
+ * Changes system to use the best clocksource for runtime. Intended to
+ * be called after waking up from CPUidle on the first active CPU.
+ */
+void clocksource_pm_exit(void)
+{
+   if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
+ Unable to get clocksource_mutex))
+   return;
+
+   if (runtime_clocksource) {
+ 

Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Thomas Gleixner
On Mon, 6 Jul 2015, Tony Lindgren wrote:
 * Thomas Gleixner t...@linutronix.de [150706 07:20]:
  On Mon, 6 Jul 2015, Tony Lindgren wrote:
 The timekeeping accuracy issue certainly needs some thinking, and
 also the resolution between the clocksources can be different.. In the
 test case I have the slow timer is always on and of a lower resolution
 than the ARM global timer being used during runtime.
 
 Got some handy timer test in mind you want me to run to provide data
 on the accuracy?

John Stultz might have something.
  
   +/**
   + * clocksource_pm_enter - change to a persistent clocksource before idle
   + *
   + * Changes system to use a persistent clocksource for idle. Intended to
   + * be called from CPUidle from the last active CPU.
   + */
   +int clocksource_pm_enter(void)
   +{
   + bool oneshot = tick_oneshot_mode_active();
   + struct clocksource *best;
   +
   + if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
   +   Unable to get clocksource_mutex))
   + return -EINTR;
  
  This trylock serves which purpose?
 
 Well we don't want to start changing clocksource if something is
 running like you pointed out.

Well yes, but 
 
  I really cannot see how this is proper serialized.
 
 We need to allow this only from the last cpu before hitting idle.

And I cannot see anything which does so.

cpu0cpu1
is_idle 
go_idle()
  clocksource_pm_enter()
lock(cs_mutex);
wakeup()
clocksource_pm_exit()
  trylock fails 

...  
unlock(cs_mutex);
  
-- Crap!

   @@ -1086,7 +1086,18 @@ int timekeeping_notify(struct clocksource *clock)

 if (tk-tkr_mono.clock == clock)
 return 0;
   - stop_machine(change_clocksource, clock, NULL);
   +
   + /*
   +  * We may want to toggle between a fast and a persistent
   +  * clocksource from CPUidle on the last active CPU and can't
   +  * use stop_machine at that point.
   +  */
   + if (cpumask_test_cpu(smp_processor_id(), cpu_online_mask) 
  
  Can you please explain how this code gets called from an offline cpu?
 
 Last cpu getting idled..

That does not make any sense at all. How is idle related to the online
mask? An idle cpu is still online.

   + !rcu_is_watching())
  
  So pick some random combination of conditions and define that it is
  correct, right? How on earth does !rcu_watching() tell that this is
  the last running cpu.
 
 We have called rcu_idle_enter() from cpuidle_idle_call(). Do you have
 some better test in mind when the last cpu is about hit idle?

The cpuidle code should know that. And if it does not know, it better
should keep track of that information and based on it provide the
proper serialization, so the call into the timekeeping code is not a
subject to guess work and race conditions.

Thanks,

tglx

  
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread John Stultz
On Mon, Jul 6, 2015 at 12:12 AM, Tony Lindgren t...@atomide.com wrote:
 Some persistent clocksources can be on a slow external bus. For shorter
 latencies for RT use, let's allow toggling the clocksource during idle
 between a faster non-persistent runtime clocksource and a slower persistent
 clocksource.

So yea, as Thomas mentioned, this will cause problems for timekeeping
accuracy, since we end up resetting the ntp state when we change
clocksource (additionally you gain a bit of error each time you switch
clocksources). So you'll most likely see trouble w/ ntpd steering the
clock.

I'm not sure its quite clear in the description as to the need here.
Could you expand a bit as to the rational for why?  I assume it has to
do with you have a high-res clocksource that is good for fine-grained
clock_gettime() calls, but wraps quickly, making it poor for long idle
times. So you're trying to swap to a less fine grained clocksource
that won't wrap so fast?

The persistent-clock-like name muddies things further, since the
persistent-clock is used for suspend, while it looks like this is just
for idle times.

thanks
-john
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread John Stultz
On Mon, Jul 6, 2015 at 10:34 AM, Thomas Gleixner t...@linutronix.de wrote:
 On Mon, 6 Jul 2015, John Stultz wrote:
 On Mon, Jul 6, 2015 at 12:12 AM, Tony Lindgren t...@atomide.com wrote:
  Some persistent clocksources can be on a slow external bus. For shorter
  latencies for RT use, let's allow toggling the clocksource during idle
  between a faster non-persistent runtime clocksource and a slower persistent
  clocksource.

 So yea, as Thomas mentioned, this will cause problems for timekeeping
 accuracy, since we end up resetting the ntp state when we change
 clocksource (additionally you gain a bit of error each time you switch
 clocksources). So you'll most likely see trouble w/ ntpd steering the
 clock.

 I'm not sure its quite clear in the description as to the need here.
 Could you expand a bit as to the rational for why?  I assume it has to
 do with you have a high-res clocksource that is good for fine-grained
 clock_gettime() calls, but wraps quickly, making it poor for long idle
 times. So you're trying to swap to a less fine grained clocksource
 that won't wrap so fast?

 The persistent-clock-like name muddies things further, since the
 persistent-clock is used for suspend, while it looks like this is just
 for idle times.

 Though that are idle states where the cpu is powered off so the fast
 clock source is powered off as well

 We all know how well that works from the x86/TSC horror. It's just the
 same thing again with a different arch prefix.

Right.. and it might be telling that on x86 systems with this issue,
we don't play games with it and that sort of hardware just has to use
the slower and less fine-grained clocksources all the time.

thanks
-john
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Tony Lindgren
* Thomas Gleixner t...@linutronix.de [150706 08:48]:
 On Mon, 6 Jul 2015, Tony Lindgren wrote:
  * Thomas Gleixner t...@linutronix.de [150706 07:20]:
   On Mon, 6 Jul 2015, Tony Lindgren wrote:
  The timekeeping accuracy issue certainly needs some thinking, and
  also the resolution between the clocksources can be different.. In the
  test case I have the slow timer is always on and of a lower resolution
  than the ARM global timer being used during runtime.
  
  Got some handy timer test in mind you want me to run to provide data
  on the accuracy?
 
 John Stultz might have something.
   
+/**
+ * clocksource_pm_enter - change to a persistent clocksource before 
idle
+ *
+ * Changes system to use a persistent clocksource for idle. Intended to
+ * be called from CPUidle from the last active CPU.
+ */
+int clocksource_pm_enter(void)
+{
+   bool oneshot = tick_oneshot_mode_active();
+   struct clocksource *best;
+
+   if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
+ Unable to get clocksource_mutex))
+   return -EINTR;
   
   This trylock serves which purpose?
  
  Well we don't want to start changing clocksource if something is
  running like you pointed out.
 
 Well yes, but 
  
   I really cannot see how this is proper serialized.
  
  We need to allow this only from the last cpu before hitting idle.
 
 And I cannot see anything which does so.
 
 cpu0  cpu1
   is_idle 
 go_idle()
   clocksource_pm_enter()
 lock(cs_mutex);
   wakeup()
   clocksource_pm_exit()
 trylock fails 
 
 ...  
 unlock(cs_mutex);
   
 -- Crap!

OK you're right, this only works with cpuidle and using
drivers/cpuidle/coupled.c.
 
@@ -1086,7 +1086,18 @@ int timekeeping_notify(struct clocksource *clock)
 
if (tk-tkr_mono.clock == clock)
return 0;
-   stop_machine(change_clocksource, clock, NULL);
+
+   /*
+* We may want to toggle between a fast and a persistent
+* clocksource from CPUidle on the last active CPU and can't
+* use stop_machine at that point.
+*/
+   if (cpumask_test_cpu(smp_processor_id(), cpu_online_mask) 
   
   Can you please explain how this code gets called from an offline cpu?
  
  Last cpu getting idled..
 
 That does not make any sense at all. How is idle related to the online
 mask? An idle cpu is still online.

Oops yeah that's a bogus test, cpu off != offlined.
 
+   !rcu_is_watching())
   
   So pick some random combination of conditions and define that it is
   correct, right? How on earth does !rcu_watching() tell that this is
   the last running cpu.
  
  We have called rcu_idle_enter() from cpuidle_idle_call(). Do you have
  some better test in mind when the last cpu is about hit idle?
 
 The cpuidle code should know that. And if it does not know, it better
 should keep track of that information and based on it provide the
 proper serialization, so the call into the timekeeping code is not a
 subject to guess work and race conditions.

OK I agree. Based on your comments this clearly needs to be limited to
cpuidle. And thanks for your comments.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread John Stultz
On Mon, Jul 6, 2015 at 8:46 AM, Thomas Gleixner t...@linutronix.de wrote:
 On Mon, 6 Jul 2015, Tony Lindgren wrote:
 * Thomas Gleixner t...@linutronix.de [150706 07:20]:
  On Mon, 6 Jul 2015, Tony Lindgren wrote:
 The timekeeping accuracy issue certainly needs some thinking, and
 also the resolution between the clocksources can be different.. In the
 test case I have the slow timer is always on and of a lower resolution
 than the ARM global timer being used during runtime.

 Got some handy timer test in mind you want me to run to provide data
 on the accuracy?

 John Stultz might have something.

You can turn on ntp stats in your ntp.conf and chart the loopstats
data w/ gnuplot:

set terminal png
set output loopstat.png
plot /var/log/ntpstats/loopstats.dateoftest using 2:3 with linespoints


I also have  the drift-log.py and graph-log.py scripts I use here:
https://github.com/johnstultz-work/timetests

But those aren't really ready for mass consumption, as they're
probably not very cross-distro compatible.

thanks
-john
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Thomas Gleixner
On Mon, 6 Jul 2015, John Stultz wrote:
 On Mon, Jul 6, 2015 at 12:12 AM, Tony Lindgren t...@atomide.com wrote:
  Some persistent clocksources can be on a slow external bus. For shorter
  latencies for RT use, let's allow toggling the clocksource during idle
  between a faster non-persistent runtime clocksource and a slower persistent
  clocksource.
 
 So yea, as Thomas mentioned, this will cause problems for timekeeping
 accuracy, since we end up resetting the ntp state when we change
 clocksource (additionally you gain a bit of error each time you switch
 clocksources). So you'll most likely see trouble w/ ntpd steering the
 clock.
 
 I'm not sure its quite clear in the description as to the need here.
 Could you expand a bit as to the rational for why?  I assume it has to
 do with you have a high-res clocksource that is good for fine-grained
 clock_gettime() calls, but wraps quickly, making it poor for long idle
 times. So you're trying to swap to a less fine grained clocksource
 that won't wrap so fast?
 
 The persistent-clock-like name muddies things further, since the
 persistent-clock is used for suspend, while it looks like this is just
 for idle times.

Though that are idle states where the cpu is powered off so the fast
clock source is powered off as well

We all know how well that works from the x86/TSC horror. It's just the
same thing again with a different arch prefix.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Tony Lindgren
Hi,

* Thomas Gleixner t...@linutronix.de [150706 07:20]:
 On Mon, 6 Jul 2015, Tony Lindgren wrote:
 
  Some persistent clocksources can be on a slow external bus. For shorter
  latencies for RT use, let's allow toggling the clocksource during idle
  between a faster non-persistent runtime clocksource and a slower persistent
  clocksource.
 
 I really cannot follow that RT argument here. The whole switchover
 causes latencies itself and whats worse is, that this breaks
 timekeeping accuracy because there is no way to switch clocksources
 atomically without loss.

It would be during deeper idle states.. But yeah the RT use would be
better replaced in the description with with lower runtime timer
latency. 

The timekeeping accuracy issue certainly needs some thinking, and
also the resolution between the clocksources can be different.. In the
test case I have the slow timer is always on and of a lower resolution
than the ARM global timer being used during runtime.

Got some handy timer test in mind you want me to run to provide data
on the accuracy?
 
  ---
   include/linuxt-email-lkml-omap/clocksource.h |  2 ++
 
 Interesting file name.

Heh that needs to go back to sed land :)
 
   extern int timekeeping_notify(struct clocksource *clock);
  +extern int clocksource_pm_enter(void);
  +extern void clocksource_pm_exit(void);
 
 Unfortunately you are not providing the call site, so I cannot see
 from which context this is going to be called.
 
 I fear its from the guts of the idle code probably with interrupts
 disabled etc , right?

Yes from the last cpu active in cpuidle. Here's the related snippet
in my case:

--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -111,6 +111,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
*dev,
 (cx-mpu_logic_state == PWRDM_POWER_OFF);
 
tick_broadcast_enter();
+   clocksource_pm_enter();
 
/*
 * Call idle CPU PM enter notifier chain so that
@@ -167,6 +168,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
*dev,
if (dev-cpu == 0  mpuss_can_lose_context)
cpu_cluster_pm_exit();
 
+   clocksource_pm_exit();
tick_broadcast_exit();
 
 fail:

   
  +/**
  + * clocksource_pm_enter - change to a persistent clocksource before idle
  + *
  + * Changes system to use a persistent clocksource for idle. Intended to
  + * be called from CPUidle from the last active CPU.
  + */
  +int clocksource_pm_enter(void)
  +{
  +   bool oneshot = tick_oneshot_mode_active();
  +   struct clocksource *best;
  +
  +   if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
  + Unable to get clocksource_mutex))
  +   return -EINTR;
 
 This trylock serves which purpose?

Well we don't want to start changing clocksource if something is
running like you pointed out.

  +   best = clocksource_find_best(oneshot, true, false);
  +   if (best) {
  +   if (curr_clocksource != best 
  +   !timekeeping_notify(best)) {
  +   runtime_clocksource = curr_clocksource;
  +   curr_clocksource = best;
  +   }
  +   }
  +   mutex_unlock(clocksource_mutex);
  +
  +   return !!best;
  +}
  +
  +/**
  + * clocksource_pm_exit - change to a runtime clocksrouce after idle
  + *
  + * Changes system to use the best clocksource for runtime. Intended to
  + * be called after waking up from CPUidle on the first active CPU.
  + */
  +void clocksource_pm_exit(void)
  +{
  +   if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
  + Unable to get clocksource_mutex))
  +   return;
  +
  +   if (runtime_clocksource) {
  +   if (curr_clocksource != runtime_clocksource 
  +   !timekeeping_notify(runtime_clocksource)) {
  +   curr_clocksource = runtime_clocksource;
  +   runtime_clocksource = NULL;
  +   }
  +   }
  +   mutex_unlock(clocksource_mutex);
  +}
 
 I really cannot see how this is proper serialized.

We need to allow this only from the last cpu before hitting idle.
 
   #ifdef CONFIG_SYSFS
   /**
* sysfs_show_current_clocksources - sysfs interface for current 
  clocksource
  diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
  index bca3667..0379260 100644
  --- a/kernel/time/timekeeping.c
  +++ b/kernel/time/timekeeping.c
  @@ -1086,7 +1086,18 @@ int timekeeping_notify(struct clocksource *clock)
   
  if (tk-tkr_mono.clock == clock)
  return 0;
  -   stop_machine(change_clocksource, clock, NULL);
  +
  +   /*
  +* We may want to toggle between a fast and a persistent
  +* clocksource from CPUidle on the last active CPU and can't
  +* use stop_machine at that point.
  +*/
  +   if (cpumask_test_cpu(smp_processor_id(), cpu_online_mask) 
 
 Can you please explain how this code gets called from an offline cpu?

Last cpu getting idled..
 
  +   !rcu_is_watching())
 
 So 

Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Thomas Gleixner
On Mon, 6 Jul 2015, Tony Lindgren wrote:

 Some persistent clocksources can be on a slow external bus. For shorter
 latencies for RT use, let's allow toggling the clocksource during idle
 between a faster non-persistent runtime clocksource and a slower persistent
 clocksource.

I really cannot follow that RT argument here. The whole switchover
causes latencies itself and whats worse is, that this breaks
timekeeping accuracy because there is no way to switch clocksources
atomically without loss.

 ---
  include/linuxt-email-lkml-omap/clocksource.h |  2 ++

Interesting file name.

  extern int timekeeping_notify(struct clocksource *clock);
 +extern int clocksource_pm_enter(void);
 +extern void clocksource_pm_exit(void);

Unfortunately you are not providing the call site, so I cannot see
from which context this is going to be called.

I fear its from the guts of the idle code probably with interrupts
disabled etc , right?
  
 +/**
 + * clocksource_pm_enter - change to a persistent clocksource before idle
 + *
 + * Changes system to use a persistent clocksource for idle. Intended to
 + * be called from CPUidle from the last active CPU.
 + */
 +int clocksource_pm_enter(void)
 +{
 + bool oneshot = tick_oneshot_mode_active();
 + struct clocksource *best;
 +
 + if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
 +   Unable to get clocksource_mutex))
 + return -EINTR;

This trylock serves which purpose?

 +
 + best = clocksource_find_best(oneshot, true, false);
 + if (best) {
 + if (curr_clocksource != best 
 + !timekeeping_notify(best)) {
 + runtime_clocksource = curr_clocksource;
 + curr_clocksource = best;
 + }
 + }
 + mutex_unlock(clocksource_mutex);
 +
 + return !!best;
 +}
 +
 +/**
 + * clocksource_pm_exit - change to a runtime clocksrouce after idle
 + *
 + * Changes system to use the best clocksource for runtime. Intended to
 + * be called after waking up from CPUidle on the first active CPU.
 + */
 +void clocksource_pm_exit(void)
 +{
 + if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
 +   Unable to get clocksource_mutex))
 + return;
 +
 + if (runtime_clocksource) {
 + if (curr_clocksource != runtime_clocksource 
 + !timekeeping_notify(runtime_clocksource)) {
 + curr_clocksource = runtime_clocksource;
 + runtime_clocksource = NULL;
 + }
 + }
 + mutex_unlock(clocksource_mutex);
 +}

I really cannot see how this is proper serialized.

  #ifdef CONFIG_SYSFS
  /**
   * sysfs_show_current_clocksources - sysfs interface for current clocksource
 diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
 index bca3667..0379260 100644
 --- a/kernel/time/timekeeping.c
 +++ b/kernel/time/timekeeping.c
 @@ -1086,7 +1086,18 @@ int timekeeping_notify(struct clocksource *clock)
  
   if (tk-tkr_mono.clock == clock)
   return 0;
 - stop_machine(change_clocksource, clock, NULL);
 +
 + /*
 +  * We may want to toggle between a fast and a persistent
 +  * clocksource from CPUidle on the last active CPU and can't
 +  * use stop_machine at that point.
 +  */
 + if (cpumask_test_cpu(smp_processor_id(), cpu_online_mask) 

Can you please explain how this code gets called from an offline cpu?

 + !rcu_is_watching())

So pick some random combination of conditions and define that it is
correct, right? How on earth does !rcu_watching() tell that this is
the last running cpu.

 + change_clocksource(clock);
 + else
 + stop_machine(change_clocksource, clock, NULL);

This patch definitely earns a place in my ugly code museum under the
category 'Does not explode in my face, so it must be correct'.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html