from:"Adam Belay"

Re: [-mm patch] make "struct menu_governor" static (again)

2007-08-27 Thread Adam Belay

This is already fixed in the most recent ACPI CPUIDLE tree.

Thanks,
Adam

On Mon, 2007-08-27 at 23:27 +0200, Adrian Bunk wrote:
> On Wed, Aug 22, 2007 at 02:06:48AM -0700, Andrew Morton wrote:
> >...
> > Changes since 2.6.23-rc2-mm2:
> >...
> >  git-acpi.patch
> >...
> >  git trees
> >...
> 
> "struct menu_governor" needlessly again became global.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> ---
> cb33b296204127cf50df54b84b2d79e152fb924b 
> diff --git a/drivers/cpuidle/governors/menu.c 
> b/drivers/cpuidle/governors/menu.c
> index f5a8865..8d3fdc5 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -117,7 +117,7 @@ static int menu_enable_device(struct cpuidle_device *dev)
>   return 0;
>  }
>  
> -struct cpuidle_governor menu_governor = {
> +static struct cpuidle_governor menu_governor = {
>   .name = "menu",
>   .rating =   20,
>   .enable =   menu_enable_device,
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm patch] make struct menu_governor static (again)

2007-08-27 Thread Adam Belay

This is already fixed in the most recent ACPI CPUIDLE tree.

Thanks,
Adam

On Mon, 2007-08-27 at 23:27 +0200, Adrian Bunk wrote:
 On Wed, Aug 22, 2007 at 02:06:48AM -0700, Andrew Morton wrote:
 ...
  Changes since 2.6.23-rc2-mm2:
 ...
   git-acpi.patch
 ...
   git trees
 ...
 
 struct menu_governor needlessly again became global.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 
 ---
 cb33b296204127cf50df54b84b2d79e152fb924b 
 diff --git a/drivers/cpuidle/governors/menu.c 
 b/drivers/cpuidle/governors/menu.c
 index f5a8865..8d3fdc5 100644
 --- a/drivers/cpuidle/governors/menu.c
 +++ b/drivers/cpuidle/governors/menu.c
 @@ -117,7 +117,7 @@ static int menu_enable_device(struct cpuidle_device *dev)
   return 0;
  }
  
 -struct cpuidle_governor menu_governor = {
 +static struct cpuidle_governor menu_governor = {
   .name = menu,
   .rating =   20,
   .enable =   menu_enable_device,
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc6-mm1

2007-04-10 Thread Adam Belay

On Tue, 2007-04-10 at 15:20 -0700, Venki Pallipadi wrote:
> On Mon, Apr 09, 2007 at 07:40:52PM +0200, Rafael J. Wysocki wrote:
> > On Monday, 9 April 2007 18:14, Pallipadi, Venkatesh wrote:
> > > 
> > > >-Original Message-
> > > >From: Rafael J. Wysocki [mailto:[EMAIL PROTECTED] 
> > > >Sent: Monday, April 09, 2007 9:08 AM
> > > >To: Andrew Morton
> > > >Cc: linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; 
> > > >[EMAIL PROTECTED]; Pallipadi, Venkatesh
> > > >Subject: Re: 2.6.21-rc6-mm1
> > > >
> > > >On Sunday, 8 April 2007 23:35, Andrew Morton wrote:
> > > >> 
> > > >> 
> > > >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2
> > > >.6.21-rc6/2.6.21-rc6-mm1/
> > > >> 
> > > >> 
> > > >> - Lots of x86 updates
> > > >> 
> > > >> - This is a 25MB diff against mainline, which is rather large.
> > > >
> > > >The cpuidle thing tends to hang my x86-64 machines on boot.
> > > >
> > > 
> > > Hi Rafael,
> > > 
> > > At what point during boot does it hang?
> > 
> > When mounting the root filesystem.  It hangs completely, even the magic 
> > SysRq
> > doesn't work
> > 
> 
> Rafael: Below patch should fix the hang.
> Len: Please include this patch in acpi-test.
> 
> Thanks,
> Venki
> 
> Prevent hang on x86-64, when ACPI processor driver is added as a module on
> a system that does not support C-states.
> 
> x86-64 expects all idle handlers to enable interrupts before returning from
> idle handler. This is due to enter_idle(), exit_idle() races. Make
> cpuidle_idle_call() confirm to this when there is no pm_idle_old.
> 
> Also, cpuidle look at the return values of attch_driver() and set
> current_driver to NULL if attach fails on all CPUs.

My vote would be to instead remove enter_idle() and exit_idle() from
x86-64, just as was done with i386.  Performance monitoring
infrastructure shouldn't be interfering with the idle interrupt
delivery, as that could only hurt performance...  Besides, there's
probably a better way of doing this than an idle notifier anyway.

-Adam


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc6-mm1

2007-04-10 Thread Adam Belay

On Tue, 2007-04-10 at 15:20 -0700, Venki Pallipadi wrote:
 On Mon, Apr 09, 2007 at 07:40:52PM +0200, Rafael J. Wysocki wrote:
  On Monday, 9 April 2007 18:14, Pallipadi, Venkatesh wrote:

   -Original Message-
   From: Rafael J. Wysocki [mailto:[EMAIL PROTECTED] 
   Sent: Monday, April 09, 2007 9:08 AM
   To: Andrew Morton
   Cc: linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; 
   [EMAIL PROTECTED]; Pallipadi, Venkatesh
   Subject: Re: 2.6.21-rc6-mm1

   On Sunday, 8 April 2007 23:35, Andrew Morton wrote:

   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2
   .6.21-rc6/2.6.21-rc6-mm1/

- Lots of x86 updates

- This is a 25MB diff against mainline, which is rather large.

   The cpuidle thing tends to hang my x86-64 machines on boot.

   Hi Rafael,

   At what point during boot does it hang?

  When mounting the root filesystem.  It hangs completely, even the magic 
  SysRq
  doesn't work

 Rafael: Below patch should fix the hang.
 Len: Please include this patch in acpi-test.

 Thanks,
 Venki

 Prevent hang on x86-64, when ACPI processor driver is added as a module on
 a system that does not support C-states.

 x86-64 expects all idle handlers to enable interrupts before returning from
 idle handler. This is due to enter_idle(), exit_idle() races. Make
 cpuidle_idle_call() confirm to this when there is no pm_idle_old.

 Also, cpuidle look at the return values of attch_driver() and set
 current_driver to NULL if attach fails on all CPUs.

My vote would be to instead remove enter_idle() and exit_idle() from
x86-64, just as was done with i386.  Performance monitoring
infrastructure shouldn't be interfering with the idle interrupt
delivery, as that could only hurt performance...  Besides, there's
probably a better way of doing this than an idle notifier anyway.

-Adam

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 3/3] add the 'menu' cpuidle governor

2007-03-27 Thread Adam Belay

On Mon, 2007-03-26 at 13:36 +0800, Shaohua Li wrote:
> Hi,
> On Sat, 2007-03-24 at 03:47 -0400, Adam Belay wrote:
> > This patch adds the 'menu' governor, as was described in my first email.
> > 
> 
> > +/**
> > + * menu_select - selects the next idle state to enter
> > + * @dev: the CPU
> > + */
> > +static int menu_select(struct cpuidle_device *dev)
> > +{
> > +   struct menu_device *data = &__get_cpu_var(menu_devices);
> > +   int i, expected_us, max_state = dev->state_count;
> > +
> > +   /* discard BM history because it is sticky */
> > +   cpuidle_get_bm_activity();
> Why discard BM history here? This way the next bm check almost always
> return 0.

Yes, although in testing it detects BM activity more often then one
might think, I agree, this is probably too aggressive.  At the time, I
was trying to avoid situations where BM_STS goes high early during a
long busy period and as a result becomes stale.

> BTW, bm activity is global (Not cpu specific), we'd better account it
> system wide.

Yes, but do we need to support BM_STS in the SMP case?

> 
> > +   /* determine the expected residency time */
> > +   expected_us = (s32) ktime_to_ns(tick_nohz_get_sleep_length()) / 1000;
> > +   expected_us = min(expected_us, data->break_last_us);
> > +
> > +   /* determine the maximum state compatible with current BM status */
> > +   if (cpuidle_get_bm_activity())
> > +   data->bm_elapsed_us = 0;
> > +   if (data->bm_elapsed_us <= data->bm_holdoff_us)
> > +   max_state = data->deepest_bm_state + 1;
> > +
> > +   /* find the deepest idle state that satisfies our constraints */
> > +   for (i = 1; i < max_state; i++) {
> > +   struct cpuidle_state *s = >states[i];
> > +   if (s->target_residency > expected_us)
> > +   break;
> > +   if (s->exit_latency > system_latency_constraint())
> > +   break;
> > +   }
> > +
> > +   data->last_state_idx = i - 1;
> > +   data->idle_jiffies = tick_nohz_get_idle_jiffies();
> > +   return i - 1;
> > +}
> > +
> > +/**
> > + * menu_reflect - attempts to guess what happened after entry
> > + * @dev: the CPU
> > + *
> > + * NOTE: it's important to be fast here because this operation will add to
> > + *   the overall exit latency.
> > + */
> > +static void menu_reflect(struct cpuidle_device *dev)
> > +{
> > +   struct menu_device *data = &__get_cpu_var(menu_devices);
> > +   int last_idx = data->last_state_idx;
> > +   int measured_us = cpuidle_get_last_residency(dev);
> > +   struct cpuidle_state *target = >states[last_idx];
> > +
> > +   /*
> > +* Ugh, this idle state doesn't support residency measurements, so we
> > +* are basically lost in the dark.  As a compromise, assume we slept
> > +* for one full standard timer tick.  However, be aware that this
> > +* could potentially result in a suboptimal state transition.
> > +*/
> > +   if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
> > +   measured_us = USEC_PER_SEC / HZ;
> > +
> > +   data->bm_elapsed_us += measured_us;
> > +   data->break_elapsed_us += measured_us;
> See the system state: idle->running->idle
> Looks the bm_elapsed_us and break_elapsed_us account ingored the running
> state between the two idles. Eg, the 'running' might generate a lot of
> bm activity, then maybe we should reset bm_elapsed_us in the next
> 'idle'.

I ignore the time between idle states because I'm only interested in
accounting the idle sleep behavior.  A more sophisticated strategy might
also account the running time between idles in some way.  However, it is
worth noting that a busy system has the indirect effect of shortening
the idle residency times. 

I think removing the BM_STS clear attempt at the beginning should help
to reset bm_elapsed_us after sufficiently long busy periods.

> 
> Thanks,
> Shaohua

Thanks for the feedback.

-Adam


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 3/3] add the 'menu' cpuidle governor

2007-03-27 Thread Adam Belay

On Mon, 2007-03-26 at 13:36 +0800, Shaohua Li wrote:
 Hi,
 On Sat, 2007-03-24 at 03:47 -0400, Adam Belay wrote:
  This patch adds the 'menu' governor, as was described in my first email.
  
 
  +/**
  + * menu_select - selects the next idle state to enter
  + * @dev: the CPU
  + */
  +static int menu_select(struct cpuidle_device *dev)
  +{
  +   struct menu_device *data = __get_cpu_var(menu_devices);
  +   int i, expected_us, max_state = dev-state_count;
  +
  +   /* discard BM history because it is sticky */
  +   cpuidle_get_bm_activity();
 Why discard BM history here? This way the next bm check almost always
 return 0.

Yes, although in testing it detects BM activity more often then one
might think, I agree, this is probably too aggressive.  At the time, I
was trying to avoid situations where BM_STS goes high early during a
long busy period and as a result becomes stale.

 BTW, bm activity is global (Not cpu specific), we'd better account it
 system wide.

Yes, but do we need to support BM_STS in the SMP case?

 
  +   /* determine the expected residency time */
  +   expected_us = (s32) ktime_to_ns(tick_nohz_get_sleep_length()) / 1000;
  +   expected_us = min(expected_us, data-break_last_us);
  +
  +   /* determine the maximum state compatible with current BM status */
  +   if (cpuidle_get_bm_activity())
  +   data-bm_elapsed_us = 0;
  +   if (data-bm_elapsed_us = data-bm_holdoff_us)
  +   max_state = data-deepest_bm_state + 1;
  +
  +   /* find the deepest idle state that satisfies our constraints */
  +   for (i = 1; i  max_state; i++) {
  +   struct cpuidle_state *s = dev-states[i];
  +   if (s-target_residency  expected_us)
  +   break;
  +   if (s-exit_latency  system_latency_constraint())
  +   break;
  +   }
  +
  +   data-last_state_idx = i - 1;
  +   data-idle_jiffies = tick_nohz_get_idle_jiffies();
  +   return i - 1;
  +}
  +
  +/**
  + * menu_reflect - attempts to guess what happened after entry
  + * @dev: the CPU
  + *
  + * NOTE: it's important to be fast here because this operation will add to
  + *   the overall exit latency.
  + */
  +static void menu_reflect(struct cpuidle_device *dev)
  +{
  +   struct menu_device *data = __get_cpu_var(menu_devices);
  +   int last_idx = data-last_state_idx;
  +   int measured_us = cpuidle_get_last_residency(dev);
  +   struct cpuidle_state *target = dev-states[last_idx];
  +
  +   /*
  +* Ugh, this idle state doesn't support residency measurements, so we
  +* are basically lost in the dark.  As a compromise, assume we slept
  +* for one full standard timer tick.  However, be aware that this
  +* could potentially result in a suboptimal state transition.
  +*/
  +   if (!(target-flags  CPUIDLE_FLAG_TIME_VALID))
  +   measured_us = USEC_PER_SEC / HZ;
  +
  +   data-bm_elapsed_us += measured_us;
  +   data-break_elapsed_us += measured_us;
 See the system state: idle-running-idle
 Looks the bm_elapsed_us and break_elapsed_us account ingored the running
 state between the two idles. Eg, the 'running' might generate a lot of
 bm activity, then maybe we should reset bm_elapsed_us in the next
 'idle'.

I ignore the time between idle states because I'm only interested in
accounting the idle sleep behavior.  A more sophisticated strategy might
also account the running time between idles in some way.  However, it is
worth noting that a busy system has the indirect effect of shortening
the idle residency times. 

I think removing the BM_STS clear attempt at the beginning should help
to reset bm_elapsed_us after sufficiently long busy periods.

 
 Thanks,
 Shaohua

Thanks for the feedback.

-Adam


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 2/3] export time until next timer interrupt using NOHZ

2007-03-24 Thread Adam Belay

This patch exposes information about the time remaining until the next
timer interrupt expires by utilizing the dynticks infrastructure.  It
also modifies the main idle loop to allow dynticks to handle
non-interrupt break events (e.g. DMA).  Finally, it exposes sleep ticks
information to external code.  Thomas Gleixner is responsible for much
of the code in this patch.  However, I've made some additional changes,
so I'm probably responsible if there are any bugs or oversights :)

Thanks,
Adam

 arch/i386/kernel/process.c |3 ++-
 include/linux/tick.h   |   10 ++
 kernel/softirq.c   |5 -
 kernel/time/tick-sched.c   |   24 
 4 files changed, 36 insertions(+), 6 deletions(-)


diff -urN a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c2007-03-23 23:02:16.0 -0400
+++ b/arch/i386/kernel/process.c2007-03-24 01:48:33.0 -0400
@@ -174,13 +174,14 @@
 
/* endless idle loop with no priority at all */
while (1) {
-   tick_nohz_stop_sched_tick();
while (!need_resched()) {
void (*idle)(void);
 
if (__get_cpu_var(cpu_idle_state))
__get_cpu_var(cpu_idle_state) = 0;
 
+   tick_nohz_stop_sched_tick();
+
rmb();
idle = pm_idle;
 
diff -urN a/include/linux/tick.h b/include/linux/tick.h
--- a/include/linux/tick.h  2007-03-23 23:03:03.0 -0400
+++ b/include/linux/tick.h  2007-03-24 01:39:03.0 -0400
@@ -40,6 +40,7 @@
  * @idle_sleeps:   Number of idle calls, where the sched tick was stopped
  * @idle_entrytime:Time when the idle call was entered
  * @idle_sleeptime:Sum of the time slept in idle with sched tick stopped
+ * @sleep_length:  Duration of the current idle sleep
  */
 struct tick_sched {
struct hrtimer  sched_timer;
@@ -52,6 +53,7 @@
unsigned long   idle_sleeps;
ktime_t idle_entrytime;
ktime_t idle_sleeptime;
+   ktime_t sleep_length;
unsigned long   last_jiffies;
unsigned long   next_jiffies;
ktime_t idle_expires;
@@ -100,10 +102,18 @@
 extern void tick_nohz_stop_sched_tick(void);
 extern void tick_nohz_restart_sched_tick(void);
 extern void tick_nohz_update_jiffies(void);
+extern ktime_t tick_nohz_get_sleep_length(void);
+extern unsigned long tick_nohz_get_idle_jiffies(void);
 # else
 static inline void tick_nohz_stop_sched_tick(void) { }
 static inline void tick_nohz_restart_sched_tick(void) { }
 static inline void tick_nohz_update_jiffies(void) { }
+static inline ktime_t tick_nohz_get_sleep_length(void)
+{
+   ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
+
+   return len;
+}
 # endif /* !NO_HZ */
 
 #endif
diff -urN a/kernel/softirq.c b/kernel/softirq.c
--- a/kernel/softirq.c  2007-03-23 23:03:03.0 -0400
+++ b/kernel/softirq.c  2007-03-24 01:54:11.0 -0400
@@ -303,11 +303,6 @@
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
 
-#ifdef CONFIG_NO_HZ
-   /* Make sure that timer wheel updates are propagated */
-   if (!in_interrupt() && idle_cpu(smp_processor_id()) && !need_resched())
-   tick_nohz_stop_sched_tick();
-#endif
preempt_enable_no_resched();
 }
 
diff -urN a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
--- a/kernel/time/tick-sched.c  2007-03-23 23:03:03.0 -0400
+++ b/kernel/time/tick-sched.c  2007-03-24 01:44:55.0 -0400
@@ -153,6 +153,7 @@
unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
struct tick_sched *ts;
ktime_t last_update, expires, now, delta;
+   struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
int cpu;
 
local_irq_save(flags);
@@ -250,11 +251,34 @@
 out:
ts->next_jiffies = next_jiffies;
ts->last_jiffies = last_jiffies;
+   ts->sleep_length = ktime_sub(dev->next_event, now);
 end:
local_irq_restore(flags);
 }
 
 /**
+ * tick_nohz_get_sleep_length - return the length of the current sleep
+ *
+ * Called from power state control code with interrupts disabled
+ */
+ktime_t tick_nohz_get_sleep_length(void)
+{
+   struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+   return ts->sleep_length;
+}
+
+/**
+ * tick_nohz_get_idle_jiffies - returns the current idle jiffie count
+ */
+unsigned long tick_nohz_get_idle_jiffies(void)
+{
+   struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+   return ts->idle_jiffies;
+}
+
+/**
  * nohz_restart_sched_tick - restart the idle tick from the idle task
  *
  * Restart the idle tick when the CPU is woken up from idle


-
To unsubscribe from this

[RFC][PATCH 3/3] add the 'menu' cpuidle governor

2007-03-24 Thread Adam Belay

This patch adds the 'menu' governor, as was described in my first email.

Thanks,
Adam

 Kconfig|   11 +++
 governors/Makefile |1
 governors/menu.c   |  152 +
 3 files changed, 164 insertions(+)


diff -urN a/drivers/cpuidle/governors/Makefile 
b/drivers/cpuidle/governors/Makefile
--- a/drivers/cpuidle/governors/Makefile2007-03-23 23:09:45.0 
-0400
+++ b/drivers/cpuidle/governors/Makefile2007-03-24 02:10:29.0 
-0400
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_CPU_IDLE_GOV_LADDER) += ladder.o
+obj-$(CONFIG_CPU_IDLE_GOV_MENU) += menu.o
diff -urN a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
--- a/drivers/cpuidle/governors/menu.c  1969-12-31 19:00:00.0 -0500
+++ b/drivers/cpuidle/governors/menu.c  2007-03-23 23:51:15.0 -0400
@@ -0,0 +1,152 @@
+/*
+ * menu.c - the menu idle governor
+ *
+ * Copyright (C) 2006-2007 Adam Belay <[EMAIL PROTECTED]>
+ *
+ * This code is licenced under the GPL.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define BM_HOLDOFF 2   /* 20 ms */
+
+struct menu_device {
+   int last_state_idx;
+   int deepest_bm_state;
+
+   int break_last_us;
+   int break_elapsed_us;
+
+   int bm_elapsed_us;
+   int bm_holdoff_us;
+
+   unsigned long   idle_jiffies;
+};
+
+static DEFINE_PER_CPU(struct menu_device, menu_devices);
+
+/**
+ * menu_select - selects the next idle state to enter
+ * @dev: the CPU
+ */
+static int menu_select(struct cpuidle_device *dev)
+{
+   struct menu_device *data = &__get_cpu_var(menu_devices);
+   int i, expected_us, max_state = dev->state_count;
+
+   /* discard BM history because it is sticky */
+   cpuidle_get_bm_activity();
+
+   /* determine the expected residency time */
+   expected_us = (s32) ktime_to_ns(tick_nohz_get_sleep_length()) / 1000;
+   expected_us = min(expected_us, data->break_last_us);
+
+   /* determine the maximum state compatible with current BM status */
+   if (cpuidle_get_bm_activity())
+   data->bm_elapsed_us = 0;
+   if (data->bm_elapsed_us <= data->bm_holdoff_us)
+   max_state = data->deepest_bm_state + 1;
+
+   /* find the deepest idle state that satisfies our constraints */
+   for (i = 1; i < max_state; i++) {
+   struct cpuidle_state *s = >states[i];
+   if (s->target_residency > expected_us)
+   break;
+   if (s->exit_latency > system_latency_constraint())
+   break;
+   }
+
+   data->last_state_idx = i - 1;
+   data->idle_jiffies = tick_nohz_get_idle_jiffies();
+   return i - 1;
+}
+
+/**
+ * menu_reflect - attempts to guess what happened after entry
+ * @dev: the CPU
+ *
+ * NOTE: it's important to be fast here because this operation will add to
+ *   the overall exit latency.
+ */
+static void menu_reflect(struct cpuidle_device *dev)
+{
+   struct menu_device *data = &__get_cpu_var(menu_devices);
+   int last_idx = data->last_state_idx;
+   int measured_us = cpuidle_get_last_residency(dev);
+   struct cpuidle_state *target = >states[last_idx];
+
+   /*
+* Ugh, this idle state doesn't support residency measurements, so we
+* are basically lost in the dark.  As a compromise, assume we slept
+* for one full standard timer tick.  However, be aware that this
+* could potentially result in a suboptimal state transition.
+*/
+   if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
+   measured_us = USEC_PER_SEC / HZ;
+
+   data->bm_elapsed_us += measured_us;
+   data->break_elapsed_us += measured_us;
+
+   /*
+* Did something other than the timer interrupt cause the break event?
+*/
+   if (tick_nohz_get_idle_jiffies() == data->idle_jiffies) {
+   data->break_last_us = data->break_elapsed_us;
+   data->break_elapsed_us = 0;
+   }
+}
+
+/**
+ * menu_scan_device - scans a CPU's states and does setup
+ * @dev: the CPU
+ */
+static void menu_scan_device(struct cpuidle_device *dev)
+{
+   struct menu_device *data = _cpu(menu_devices, dev->cpu);
+   int i;
+
+   data->last_state_idx = 0;
+   data->break_last_us = 0;
+   data->break_elapsed_us = 0;
+   data->bm_elapsed_us = 0;
+   data->bm_holdoff_us = BM_HOLDOFF;
+
+   for (i = 1; i < dev->state_count; i++)
+   if (dev->states[i].flags & CPUIDLE_FLAG_CHECK_BM)
+   break;
+   data->deepest_bm_state = i - 1;
+}
+
+struct cpuidle_governor menu_governor = {
+   .name = "menu",
+   .scan = menu_scan_device,
+

[RFC][PATCH 0/3] A Dynticks Aware Processor Idle PM Governor

2007-03-24 Thread Adam Belay

Hi All,

Here is my first take at implementing an idle PM governor that takes
full advantage of NO_HZ. I call it the 'menu' governor because it
considers the full list of idle states before each entry.

I've kept the implementation fairly simple. It attempts to guess the
next residency time and then chooses a state that would meet at least
the break-even point between power savings and entry cost. To this end,
it selects the deepest idle state that satisfies the following
constraints:
1. If the idle time elapsed since bus master activity was detected
is below a threshold (currently 20 ms), then limit the selection
to C2-type or above.
2. Do not choose a state with a break-even residency that exceeds
the expected time remaining until the next timer interrupt.
3. Do not choose a state with a break-even residency that exceeds
the elapsed time between the last pair of break events,
excluding timer interrupts.

This governor has an advantage over "ladder" governor because it
proactively checks how much time remains until the next timer interrupt
using the tick infrastructure. Also, it handles device interrupt
activity more intelligently by not including timer interrupts in break
event calculations. Finally, it doesn't make policy decisions using the
number of state entries, which can have variable residency times (NO_HZ
makes these potentially very large), and instead only considers sleep
time deltas.

The menu governor can be selected during runtime using the cpuidle sysfs
interface like so:
"echo "menu" > /sys/devices/system/cpu/cpuidle/current_governor"

This patchset applies against 2.6.21-rc4 plus the latest from the acpi
testing tree, which is available here:
ftp://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/test/2.6.21/acpi-test-20070126-2.6.21-rc4.diff.bz2

I'd really appreciate any comments, benchmarks, or suggestions.

Cheers,
Adam

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

[RFC][PATCH 1/3] cpuidle governor API changes

2007-03-24 Thread Adam Belay

This patch prepares cpuidle for the menu governor.  It adds an optional
stage after idle state entry to give the governor an opportunity to
check why the state was exited.  Also it makes sure the idle loop
returns after each state entry, allowing the appropriate dynticks code
to run.

Thanks,
Adam

 drivers/cpuidle/cpuidle.c  |   33 ++---
 drivers/cpuidle/governor.c |2 +-
 drivers/cpuidle/governors/ladder.c |2 +-
 include/linux/cpuidle.h|4 ++--
 4 files changed, 18 insertions(+), 23 deletions(-)


diff -urN a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
--- a/drivers/cpuidle/cpuidle.c 2007-03-23 23:09:45.0 -0400
+++ b/drivers/cpuidle/cpuidle.c 2007-03-24 00:22:09.0 -0400
@@ -30,12 +30,10 @@
  * cpuidle_idle_call - the main idle loop
  *
  * NOTE: no locks or semaphores should be used here
- * FIXME: DYNTICKS handling
  */
 static void cpuidle_idle_call(void)
 {
struct cpuidle_device *dev = &__get_cpu_var(cpuidle_devices);
-
struct cpuidle_state *target_state;
int next_state;
 
@@ -46,24 +44,21 @@
return;
}
 
-   if (cpuidle_curr_governor->prepare_idle)
-   cpuidle_curr_governor->prepare_idle(dev);
-
-   while(!need_resched()) {
-   next_state = cpuidle_curr_governor->select_state(dev);
-   if (need_resched())
-   break;
-
-   target_state = >states[next_state];
-
-   dev->last_residency = target_state->enter(dev, target_state);
-   dev->last_state = target_state;
-   target_state->time += dev->last_residency;
-   target_state->usage++;
+   /* ask the governor for the next state */
+   next_state = cpuidle_curr_governor->select(dev);
+   if (need_resched())
+   return;
+   target_state = >states[next_state];
 
-   if (dev->status != CPUIDLE_STATUS_DOIDLE)
-   break;
-   }
+   /* enter the state and update stats */
+   dev->last_residency = target_state->enter(dev, target_state);
+   dev->last_state = target_state;
+   target_state->time += dev->last_residency;
+   target_state->usage++;
+
+   /* give the governor an opportunity to reflect on the outcome */
+   if (cpuidle_curr_governor->reflect)
+   cpuidle_curr_governor->reflect(dev);
 }
 
 /**
diff -urN a/drivers/cpuidle/governor.c b/drivers/cpuidle/governor.c
--- a/drivers/cpuidle/governor.c2007-03-23 23:09:45.0 -0400
+++ b/drivers/cpuidle/governor.c2007-03-24 00:31:04.0 -0400
@@ -124,7 +124,7 @@
 {
int ret = -EEXIST;
 
-   if (!gov || !gov->select_state)
+   if (!gov || !gov->select)
return -EINVAL;
 
mutex_lock(_lock);
diff -urN a/drivers/cpuidle/governors/ladder.c 
b/drivers/cpuidle/governors/ladder.c
--- a/drivers/cpuidle/governors/ladder.c2007-03-23 23:09:45.0 
-0400
+++ b/drivers/cpuidle/governors/ladder.c2007-03-23 23:26:06.0 
-0400
@@ -202,7 +202,7 @@
.init = ladder_init_device,
.exit = ladder_exit_device,
.scan = ladder_scan_device,
-   .select_state = ladder_select_state,
+   .select =   ladder_select_state,
.owner =THIS_MODULE,
 };
 
diff -urN a/include/linux/cpuidle.h b/include/linux/cpuidle.h
--- a/include/linux/cpuidle.h   2007-03-23 23:09:46.0 -0400
+++ b/include/linux/cpuidle.h   2007-03-23 23:24:02.0 -0400
@@ -158,8 +158,8 @@
void (*exit)(struct cpuidle_device *dev);
void (*scan)(struct cpuidle_device *dev);
 
-   void (*prepare_idle)(struct cpuidle_device *dev);
-   int  (*select_state)(struct cpuidle_device *dev);
+   int  (*select)  (struct cpuidle_device *dev);
+   void (*reflect) (struct cpuidle_device *dev);
 
struct module   *owner;
 };


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 0/3] A Dynticks Aware Processor Idle PM Governor

2007-03-24 Thread Adam Belay

Hi All,

Here is my first take at implementing an idle PM governor that takes
full advantage of NO_HZ. I call it the 'menu' governor because it
considers the full list of idle states before each entry.

This governor has an advantage over ladder governor because it
proactively checks how much time remains until the next timer interrupt
using the tick infrastructure. Also, it handles device interrupt
activity more intelligently by not including timer interrupts in break
event calculations. Finally, it doesn't make policy decisions using the
number of state entries, which can have variable residency times (NO_HZ
makes these potentially very large), and instead only considers sleep
time deltas.

The menu governor can be selected during runtime using the cpuidle sysfs
interface like so:
echo menu /sys/devices/system/cpu/cpuidle/current_governor

I'd really appreciate any comments, benchmarks, or suggestions.

Cheers,
Adam

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

[RFC][PATCH 1/3] cpuidle governor API changes

2007-03-24 Thread Adam Belay

This patch prepares cpuidle for the menu governor.  It adds an optional
stage after idle state entry to give the governor an opportunity to
check why the state was exited.  Also it makes sure the idle loop
returns after each state entry, allowing the appropriate dynticks code
to run.

Thanks,
Adam

 drivers/cpuidle/cpuidle.c  |   33 ++---
 drivers/cpuidle/governor.c |2 +-
 drivers/cpuidle/governors/ladder.c |2 +-
 include/linux/cpuidle.h|4 ++--
 4 files changed, 18 insertions(+), 23 deletions(-)


diff -urN a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
--- a/drivers/cpuidle/cpuidle.c 2007-03-23 23:09:45.0 -0400
+++ b/drivers/cpuidle/cpuidle.c 2007-03-24 00:22:09.0 -0400
@@ -30,12 +30,10 @@
  * cpuidle_idle_call - the main idle loop
  *
  * NOTE: no locks or semaphores should be used here
- * FIXME: DYNTICKS handling
  */
 static void cpuidle_idle_call(void)
 {
struct cpuidle_device *dev = __get_cpu_var(cpuidle_devices);
-
struct cpuidle_state *target_state;
int next_state;
 
@@ -46,24 +44,21 @@
return;
}
 
-   if (cpuidle_curr_governor-prepare_idle)
-   cpuidle_curr_governor-prepare_idle(dev);
-
-   while(!need_resched()) {
-   next_state = cpuidle_curr_governor-select_state(dev);
-   if (need_resched())
-   break;
-
-   target_state = dev-states[next_state];
-
-   dev-last_residency = target_state-enter(dev, target_state);
-   dev-last_state = target_state;
-   target_state-time += dev-last_residency;
-   target_state-usage++;
+   /* ask the governor for the next state */
+   next_state = cpuidle_curr_governor-select(dev);
+   if (need_resched())
+   return;
+   target_state = dev-states[next_state];
 
-   if (dev-status != CPUIDLE_STATUS_DOIDLE)
-   break;
-   }
+   /* enter the state and update stats */
+   dev-last_residency = target_state-enter(dev, target_state);
+   dev-last_state = target_state;
+   target_state-time += dev-last_residency;
+   target_state-usage++;
+
+   /* give the governor an opportunity to reflect on the outcome */
+   if (cpuidle_curr_governor-reflect)
+   cpuidle_curr_governor-reflect(dev);
 }
 
 /**
diff -urN a/drivers/cpuidle/governor.c b/drivers/cpuidle/governor.c
--- a/drivers/cpuidle/governor.c2007-03-23 23:09:45.0 -0400
+++ b/drivers/cpuidle/governor.c2007-03-24 00:31:04.0 -0400
@@ -124,7 +124,7 @@
 {
int ret = -EEXIST;
 
-   if (!gov || !gov-select_state)
+   if (!gov || !gov-select)
return -EINVAL;
 
mutex_lock(cpuidle_lock);
diff -urN a/drivers/cpuidle/governors/ladder.c 
b/drivers/cpuidle/governors/ladder.c
--- a/drivers/cpuidle/governors/ladder.c2007-03-23 23:09:45.0 
-0400
+++ b/drivers/cpuidle/governors/ladder.c2007-03-23 23:26:06.0 
-0400
@@ -202,7 +202,7 @@
.init = ladder_init_device,
.exit = ladder_exit_device,
.scan = ladder_scan_device,
-   .select_state = ladder_select_state,
+   .select =   ladder_select_state,
.owner =THIS_MODULE,
 };
 
diff -urN a/include/linux/cpuidle.h b/include/linux/cpuidle.h
--- a/include/linux/cpuidle.h   2007-03-23 23:09:46.0 -0400
+++ b/include/linux/cpuidle.h   2007-03-23 23:24:02.0 -0400
@@ -158,8 +158,8 @@
void (*exit)(struct cpuidle_device *dev);
void (*scan)(struct cpuidle_device *dev);
 
-   void (*prepare_idle)(struct cpuidle_device *dev);
-   int  (*select_state)(struct cpuidle_device *dev);
+   int  (*select)  (struct cpuidle_device *dev);
+   void (*reflect) (struct cpuidle_device *dev);
 
struct module   *owner;
 };


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH 2/3] export time until next timer interrupt using NOHZ

2007-03-24 Thread Adam Belay

This patch exposes information about the time remaining until the next
timer interrupt expires by utilizing the dynticks infrastructure.  It
also modifies the main idle loop to allow dynticks to handle
non-interrupt break events (e.g. DMA).  Finally, it exposes sleep ticks
information to external code.  Thomas Gleixner is responsible for much
of the code in this patch.  However, I've made some additional changes,
so I'm probably responsible if there are any bugs or oversights :)

Thanks,
Adam

 arch/i386/kernel/process.c |3 ++-
 include/linux/tick.h   |   10 ++
 kernel/softirq.c   |5 -
 kernel/time/tick-sched.c   |   24 
 4 files changed, 36 insertions(+), 6 deletions(-)


diff -urN a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.c2007-03-23 23:02:16.0 -0400
+++ b/arch/i386/kernel/process.c2007-03-24 01:48:33.0 -0400
@@ -174,13 +174,14 @@
 
/* endless idle loop with no priority at all */
while (1) {
-   tick_nohz_stop_sched_tick();
while (!need_resched()) {
void (*idle)(void);
 
if (__get_cpu_var(cpu_idle_state))
__get_cpu_var(cpu_idle_state) = 0;
 
+   tick_nohz_stop_sched_tick();
+
rmb();
idle = pm_idle;
 
diff -urN a/include/linux/tick.h b/include/linux/tick.h
--- a/include/linux/tick.h  2007-03-23 23:03:03.0 -0400
+++ b/include/linux/tick.h  2007-03-24 01:39:03.0 -0400
@@ -40,6 +40,7 @@
  * @idle_sleeps:   Number of idle calls, where the sched tick was stopped
  * @idle_entrytime:Time when the idle call was entered
  * @idle_sleeptime:Sum of the time slept in idle with sched tick stopped
+ * @sleep_length:  Duration of the current idle sleep
  */
 struct tick_sched {
struct hrtimer  sched_timer;
@@ -52,6 +53,7 @@
unsigned long   idle_sleeps;
ktime_t idle_entrytime;
ktime_t idle_sleeptime;
+   ktime_t sleep_length;
unsigned long   last_jiffies;
unsigned long   next_jiffies;
ktime_t idle_expires;
@@ -100,10 +102,18 @@
 extern void tick_nohz_stop_sched_tick(void);
 extern void tick_nohz_restart_sched_tick(void);
 extern void tick_nohz_update_jiffies(void);
+extern ktime_t tick_nohz_get_sleep_length(void);
+extern unsigned long tick_nohz_get_idle_jiffies(void);
 # else
 static inline void tick_nohz_stop_sched_tick(void) { }
 static inline void tick_nohz_restart_sched_tick(void) { }
 static inline void tick_nohz_update_jiffies(void) { }
+static inline ktime_t tick_nohz_get_sleep_length(void)
+{
+   ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
+
+   return len;
+}
 # endif /* !NO_HZ */
 
 #endif
diff -urN a/kernel/softirq.c b/kernel/softirq.c
--- a/kernel/softirq.c  2007-03-23 23:03:03.0 -0400
+++ b/kernel/softirq.c  2007-03-24 01:54:11.0 -0400
@@ -303,11 +303,6 @@
if (!in_interrupt()  local_softirq_pending())
invoke_softirq();
 
-#ifdef CONFIG_NO_HZ
-   /* Make sure that timer wheel updates are propagated */
-   if (!in_interrupt()  idle_cpu(smp_processor_id())  !need_resched())
-   tick_nohz_stop_sched_tick();
-#endif
preempt_enable_no_resched();
 }
 
diff -urN a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
--- a/kernel/time/tick-sched.c  2007-03-23 23:03:03.0 -0400
+++ b/kernel/time/tick-sched.c  2007-03-24 01:44:55.0 -0400
@@ -153,6 +153,7 @@
unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
struct tick_sched *ts;
ktime_t last_update, expires, now, delta;
+   struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
int cpu;
 
local_irq_save(flags);
@@ -250,11 +251,34 @@
 out:
ts-next_jiffies = next_jiffies;
ts-last_jiffies = last_jiffies;
+   ts-sleep_length = ktime_sub(dev-next_event, now);
 end:
local_irq_restore(flags);
 }
 
 /**
+ * tick_nohz_get_sleep_length - return the length of the current sleep
+ *
+ * Called from power state control code with interrupts disabled
+ */
+ktime_t tick_nohz_get_sleep_length(void)
+{
+   struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);
+
+   return ts-sleep_length;
+}
+
+/**
+ * tick_nohz_get_idle_jiffies - returns the current idle jiffie count
+ */
+unsigned long tick_nohz_get_idle_jiffies(void)
+{
+   struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);
+
+   return ts-idle_jiffies;
+}
+
+/**
  * nohz_restart_sched_tick - restart the idle tick from the idle task
  *
  * Restart the idle tick when the CPU is woken up from idle


-
To unsubscribe from this list: send the

[RFC][PATCH 3/3] add the 'menu' cpuidle governor

2007-03-24 Thread Adam Belay

This patch adds the 'menu' governor, as was described in my first email.

Thanks,
Adam

 Kconfig|   11 +++
 governors/Makefile |1
 governors/menu.c   |  152 +
 3 files changed, 164 insertions(+)


diff -urN a/drivers/cpuidle/governors/Makefile 
b/drivers/cpuidle/governors/Makefile
--- a/drivers/cpuidle/governors/Makefile2007-03-23 23:09:45.0 
-0400
+++ b/drivers/cpuidle/governors/Makefile2007-03-24 02:10:29.0 
-0400
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_CPU_IDLE_GOV_LADDER) += ladder.o
+obj-$(CONFIG_CPU_IDLE_GOV_MENU) += menu.o
diff -urN a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
--- a/drivers/cpuidle/governors/menu.c  1969-12-31 19:00:00.0 -0500
+++ b/drivers/cpuidle/governors/menu.c  2007-03-23 23:51:15.0 -0400
@@ -0,0 +1,152 @@
+/*
+ * menu.c - the menu idle governor
+ *
+ * Copyright (C) 2006-2007 Adam Belay [EMAIL PROTECTED]
+ *
+ * This code is licenced under the GPL.
+ */
+
+#include linux/kernel.h
+#include linux/cpuidle.h
+#include linux/latency.h
+#include linux/time.h
+#include linux/ktime.h
+#include linux/tick.h
+#include linux/hrtimer.h
+
+#define BM_HOLDOFF 2   /* 20 ms */
+
+struct menu_device {
+   int last_state_idx;
+   int deepest_bm_state;
+
+   int break_last_us;
+   int break_elapsed_us;
+
+   int bm_elapsed_us;
+   int bm_holdoff_us;
+
+   unsigned long   idle_jiffies;
+};
+
+static DEFINE_PER_CPU(struct menu_device, menu_devices);
+
+/**
+ * menu_select - selects the next idle state to enter
+ * @dev: the CPU
+ */
+static int menu_select(struct cpuidle_device *dev)
+{
+   struct menu_device *data = __get_cpu_var(menu_devices);
+   int i, expected_us, max_state = dev-state_count;
+
+   /* discard BM history because it is sticky */
+   cpuidle_get_bm_activity();
+
+   /* determine the expected residency time */
+   expected_us = (s32) ktime_to_ns(tick_nohz_get_sleep_length()) / 1000;
+   expected_us = min(expected_us, data-break_last_us);
+
+   /* determine the maximum state compatible with current BM status */
+   if (cpuidle_get_bm_activity())
+   data-bm_elapsed_us = 0;
+   if (data-bm_elapsed_us = data-bm_holdoff_us)
+   max_state = data-deepest_bm_state + 1;
+
+   /* find the deepest idle state that satisfies our constraints */
+   for (i = 1; i  max_state; i++) {
+   struct cpuidle_state *s = dev-states[i];
+   if (s-target_residency  expected_us)
+   break;
+   if (s-exit_latency  system_latency_constraint())
+   break;
+   }
+
+   data-last_state_idx = i - 1;
+   data-idle_jiffies = tick_nohz_get_idle_jiffies();
+   return i - 1;
+}
+
+/**
+ * menu_reflect - attempts to guess what happened after entry
+ * @dev: the CPU
+ *
+ * NOTE: it's important to be fast here because this operation will add to
+ *   the overall exit latency.
+ */
+static void menu_reflect(struct cpuidle_device *dev)
+{
+   struct menu_device *data = __get_cpu_var(menu_devices);
+   int last_idx = data-last_state_idx;
+   int measured_us = cpuidle_get_last_residency(dev);
+   struct cpuidle_state *target = dev-states[last_idx];
+
+   /*
+* Ugh, this idle state doesn't support residency measurements, so we
+* are basically lost in the dark.  As a compromise, assume we slept
+* for one full standard timer tick.  However, be aware that this
+* could potentially result in a suboptimal state transition.
+*/
+   if (!(target-flags  CPUIDLE_FLAG_TIME_VALID))
+   measured_us = USEC_PER_SEC / HZ;
+
+   data-bm_elapsed_us += measured_us;
+   data-break_elapsed_us += measured_us;
+
+   /*
+* Did something other than the timer interrupt cause the break event?
+*/
+   if (tick_nohz_get_idle_jiffies() == data-idle_jiffies) {
+   data-break_last_us = data-break_elapsed_us;
+   data-break_elapsed_us = 0;
+   }
+}
+
+/**
+ * menu_scan_device - scans a CPU's states and does setup
+ * @dev: the CPU
+ */
+static void menu_scan_device(struct cpuidle_device *dev)
+{
+   struct menu_device *data = per_cpu(menu_devices, dev-cpu);
+   int i;
+
+   data-last_state_idx = 0;
+   data-break_last_us = 0;
+   data-break_elapsed_us = 0;
+   data-bm_elapsed_us = 0;
+   data-bm_holdoff_us = BM_HOLDOFF;
+
+   for (i = 1; i  dev-state_count; i++)
+   if (dev-states[i].flags  CPUIDLE_FLAG_CHECK_BM)
+   break;
+   data-deepest_bm_state = i - 1;
+}
+
+struct cpuidle_governor menu_governor = {
+   .name = menu,
+   .scan = menu_scan_device,
+   .select =   menu_select,
+   .reflect =  menu_reflect

Re: [PATCH 1/3] Introducing cpuidle: core cpuidle infrastructure

2007-02-13 Thread Adam Belay

On Tue, 2007-02-13 at 05:31 -0800, Venkatesh Pallipadi wrote:
> On Mon, Feb 12, 2007 at 08:22:01PM -0500, Dave Jones wrote:
> > On Mon, Feb 12, 2007 at 10:39:25AM -0800, Venkatesh Pallipadi wrote:
> >  > 
> >  > Introducing 'cpuidle', a new CPU power management infrastructure to 
> > manage
> >  > idle CPUs in a clean and efficient manner.
> >  > cpuidle separates out the drivers that can provide support for multiple 
> > types
> >  > of idle states and policy governors that decide on what idle state to use
> >  > at run time.
> >  > A cpuidle driver can support multiple idle states based on parameters 
> > like
> >  > varying power consumption, wakeup latency, etc (ACPI C-states for 
> > example).
> >  > A cpuidle governor can be usage model specific (laptop, server,
> >  > laptop on battery etc).
> >  > Main advantage of the infrastructure being, it allows independent 
> > development
> >  > of drivers and governors and allows for better CPU power management.
> >  > 
> >  > A huge thanks to Adam Belay and Shaohua Li who were part of this 
> > mini-project
> >  > since its beginning and are greatly responsible for this patchset.
> > 
> > interesting.  Though I wonder about giving admins _more_ knobs to twiddle.
> > It took cpufreq a long time to settle down in this area, and typically
> > 'ondemand' was the answer in the end for 99.9% of people.   I question the 
> > usefulness
> > for the whole multiple governors interface, because in the case of cpuidle
> > there shouldn't be any real trade-off between one algorithm and another 
> > afaics?
> > So why can't we just have one, that just 'does the right thing' ?
> > The only differentiator that I can think of would be latency, but that seems
> > to be a) covered in a different tunable, and b) probably wouldn't affect
> > most people enough where it matters.
> > 
> 
> Agreed. In long term, I think cpuidle will also have one governor that will be
> used in most of the cases. But, we have to go through the process of
> experimenting with different governors, just like cpufreq and let the best
> governor win. I think this interface helps to experiment with new
> governors in a non-disruptive way. I mean, any new experiments will not have
> side effects on people already using currently established drivers in
> distributions.
> 
> Also, one of the things we are looking at is to have ratings for different
> drivers and governors (similar to time subsystem), with which we can control
> best driver and best governor for a platform from inside the kernel, instead
> of depending on admin/init script to do the right thing.
> 
> Having said that, I do feel we may need a different governor for things like
> handhelds. I heard them saying there idle routines has more than one
> dimension of low power-high latency idle states. But, that do not suggest the
> need for runtime switch in sysfs, as it will still be one proper governor for
> a platform.

Learning from the past, I think a good comparison would be the support
for several block IO schedulers (e.g. deadline, cfq, anticipatory, etc).
The added flexibility of a pluggable architecture allowed for a lot of
innovation and experimentation that might not have happened otherwise.
There even is a "noop" scheduler that makes sense for some hardware
devices but not others.  In short, Linux processor idle power management
support needs some growing room to find its "ondemand" equivalent.

In my opinion, the best sort of a tunable would be a variable that
indicates userspace's intentions to the cpuidle governor.  Maybe
something to the effect of the following...
- Maximum Performance
- Balanced (attempt to do well in both)
- Maximum Battery-life

Of course governors can have their own specific tunables, but it would
probably be best to not touch them in the typical use-case.

Thanks,
Adam


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] Introducing cpuidle: core cpuidle infrastructure

2007-02-13 Thread Adam Belay

On Tue, 2007-02-13 at 05:31 -0800, Venkatesh Pallipadi wrote:
 On Mon, Feb 12, 2007 at 08:22:01PM -0500, Dave Jones wrote:
  On Mon, Feb 12, 2007 at 10:39:25AM -0800, Venkatesh Pallipadi wrote:

Introducing 'cpuidle', a new CPU power management infrastructure to 
  manage
idle CPUs in a clean and efficient manner.
cpuidle separates out the drivers that can provide support for multiple 
  types
of idle states and policy governors that decide on what idle state to use
at run time.
A cpuidle driver can support multiple idle states based on parameters 
  like
varying power consumption, wakeup latency, etc (ACPI C-states for 
  example).
A cpuidle governor can be usage model specific (laptop, server,
laptop on battery etc).
Main advantage of the infrastructure being, it allows independent 
  development
of drivers and governors and allows for better CPU power management.

A huge thanks to Adam Belay and Shaohua Li who were part of this 
  mini-project
since its beginning and are greatly responsible for this patchset.
  
  interesting.  Though I wonder about giving admins _more_ knobs to twiddle.
  It took cpufreq a long time to settle down in this area, and typically
  'ondemand' was the answer in the end for 99.9% of people.   I question the 
  usefulness
  for the whole multiple governors interface, because in the case of cpuidle
  there shouldn't be any real trade-off between one algorithm and another 
  afaics?
  So why can't we just have one, that just 'does the right thing' ?
  The only differentiator that I can think of would be latency, but that seems
  to be a) covered in a different tunable, and b) probably wouldn't affect
  most people enough where it matters.
  
 
 Agreed. In long term, I think cpuidle will also have one governor that will be
 used in most of the cases. But, we have to go through the process of
 experimenting with different governors, just like cpufreq and let the best
 governor win. I think this interface helps to experiment with new
 governors in a non-disruptive way. I mean, any new experiments will not have
 side effects on people already using currently established drivers in
 distributions.
 
 Also, one of the things we are looking at is to have ratings for different
 drivers and governors (similar to time subsystem), with which we can control
 best driver and best governor for a platform from inside the kernel, instead
 of depending on admin/init script to do the right thing.
 
 Having said that, I do feel we may need a different governor for things like
 handhelds. I heard them saying there idle routines has more than one
 dimension of low power-high latency idle states. But, that do not suggest the
 need for runtime switch in sysfs, as it will still be one proper governor for
 a platform.

Learning from the past, I think a good comparison would be the support
for several block IO schedulers (e.g. deadline, cfq, anticipatory, etc).
The added flexibility of a pluggable architecture allowed for a lot of
innovation and experimentation that might not have happened otherwise.
There even is a noop scheduler that makes sense for some hardware
devices but not others.  In short, Linux processor idle power management
support needs some growing room to find its ondemand equivalent.

In my opinion, the best sort of a tunable would be a variable that
indicates userspace's intentions to the cpuidle governor.  Maybe
something to the effect of the following...
- Maximum Performance
- Balanced (attempt to do well in both)
- Maximum Battery-life

Of course governors can have their own specific tunables, but it would
probably be best to not touch them in the typical use-case.

Thanks,
Adam


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Custom IORESOURCE Class

2005-08-08 Thread Adam Belay

On Mon, Aug 08, 2005 at 09:00:21AM -0700, Greg KH wrote:
> On Mon, Aug 08, 2005 at 11:11:45AM -0700, Matthew Gilbert wrote:
> > Below is a patch that adds an additional resource class to the platform 
> > resource types. This is to support additional resources that need to be 
> > passed
> > to drivers without overloading the existing specific types. In my case, I 
> > need
> > to send clock information to the driver to enable power management. 
> > 
> > Signed-off-by: Matthew Gilbert <[EMAIL PROTECTED]>
> 
> Hm, you do realize that Pat's no longer the driver core maintainer?  :)
> 
> Anyway, Russell and Adam, any objections to this patch?

I'm not sure if I agree with this patch.  "struct resource" is used primarily 
for
I/O resource assignment.  Although I agree we may need to add new IORESOURCE 
types,
I'm not sure if clock data belongs here.  I don't think "start" and "end" would 
be
useful for most platform data.  Could you provide more information about this
specific issue and resource type?  Maybe we could create a new sysfs attribute?

Thanks,
Adam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Custom IORESOURCE Class

2005-08-08 Thread Adam Belay

On Mon, Aug 08, 2005 at 09:00:21AM -0700, Greg KH wrote:
 On Mon, Aug 08, 2005 at 11:11:45AM -0700, Matthew Gilbert wrote:
  Below is a patch that adds an additional resource class to the platform 
  resource types. This is to support additional resources that need to be 
  passed
  to drivers without overloading the existing specific types. In my case, I 
  need
  to send clock information to the driver to enable power management. 
  
  Signed-off-by: Matthew Gilbert [EMAIL PROTECTED]
 
 Hm, you do realize that Pat's no longer the driver core maintainer?  :)
 
 Anyway, Russell and Adam, any objections to this patch?

I'm not sure if I agree with this patch.  struct resource is used primarily 
for
I/O resource assignment.  Although I agree we may need to add new IORESOURCE 
types,
I'm not sure if clock data belongs here.  I don't think start and end would 
be
useful for most platform data.  Could you provide more information about this
specific issue and resource type?  Maybe we could create a new sysfs attribute?

Thanks,
Adam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Add PCI<->PCI bridge driver [4/9]

2005-07-15 Thread Adam Belay

On Fri, 2005-07-15 at 09:58 +0100, Russell King wrote:
> On Thu, Jul 14, 2005 at 04:55:19AM -0400, Adam Belay wrote:
> > This patch adds a basic PCI<->PCI bridge driver that utilizes the new
> > PCI bus class API.
> 
> Thanks.  I think this breaks Cardbus.
> 
> The whole point of the way PCI is _presently_ organised is that it allows
> busses to be configured and setup _before_ the devices are made available
> to drivers.  This breaks that completely:

Hi Russell,

I'm aware of this issue.  These changes are major and will need more
than one pass to be correct.  I'll be redoing most of the bus
configuration code in the next patch set.  I have a strategy for proper
device and bus configuration.  These are my current thoughts:

1.) When bound to its device PCI bridge drivers will add their current
devices to the bus device list, but will not register them with the
driver model.

2.) The bus class driver will initiate a procedure similar to
pci_bus_add_devices(), but only for host (root) bridges and hot-plugged
devices.

pci_register_bus_devices(struct pci_bus *bus)
{
- register all bios configured bridges
- call pci_register_bus_devices() for each previously registered bridge
- register remaining uninitialized bridges and call
pci_register_bus_devices() for each bridge as it's registered.
}

pci_register_devices(struct pci_bus *bus)
{
- register all remaining PCI devices, including those of child pci buses
}

* pci_register_bus_devices() will be called first followed by
pci_register_devices().

3.) Bridge windows will not be configured until a child device is
enabled.  In other words, resource configuration is lazy much like we
handle PCI IRQ routing.  We will, however, verify the validity of BIOS
assignments.  If the assignments are incorrect, the bridge will be
disabled and then reconfigured when needed.

Thanks,
Adam

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Add PCI-PCI bridge driver [4/9]

2005-07-15 Thread Adam Belay

On Fri, 2005-07-15 at 09:58 +0100, Russell King wrote:
 On Thu, Jul 14, 2005 at 04:55:19AM -0400, Adam Belay wrote:
  This patch adds a basic PCI-PCI bridge driver that utilizes the new
  PCI bus class API.
 
 Thanks.  I think this breaks Cardbus.
 
 The whole point of the way PCI is _presently_ organised is that it allows
 busses to be configured and setup _before_ the devices are made available
 to drivers.  This breaks that completely:

Hi Russell,

I'm aware of this issue.  These changes are major and will need more
than one pass to be correct.  I'll be redoing most of the bus
configuration code in the next patch set.  I have a strategy for proper
device and bus configuration.  These are my current thoughts:

1.) When bound to its device PCI bridge drivers will add their current
devices to the bus device list, but will not register them with the
driver model.

2.) The bus class driver will initiate a procedure similar to
pci_bus_add_devices(), but only for host (root) bridges and hot-plugged
devices.

pci_register_bus_devices(struct pci_bus *bus)
{
- register all bios configured bridges
- call pci_register_bus_devices() for each previously registered bridge
- register remaining uninitialized bridges and call
pci_register_bus_devices() for each bridge as it's registered.
}

pci_register_devices(struct pci_bus *bus)
{
- register all remaining PCI devices, including those of child pci buses
}

* pci_register_bus_devices() will be called first followed by
pci_register_devices().

3.) Bridge windows will not be configured until a child device is
enabled.  In other words, resource configuration is lazy much like we
handle PCI IRQ routing.  We will, however, verify the validity of BIOS
assignments.  If the assignments are incorrect, the bridge will be
disabled and then reconfigured when needed.

Thanks,
Adam


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 129 matches

Mail list logo