Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-15 Thread Arun R Bharadwaj
* Andi Kleen a...@firstfloor.org [2009-10-14 09:18:38]:

  How about something like this..
  If the arch does not enable CONFIG_CPU_IDLE, the cpuidle_idle_call
  which is called from cpu_idle() should call default_idle without
  involving the registering cpuidle steps. This should prevent bloating
  up of the kernel for archs which dont want to use cpuidle.
 
 On x86 some people want small kernel too, so selecting it on a architecture
 granuality is not good. Also you can make it default, you just need
 to slim it down first.
 

No, I dont mean selecting it on an architecture granularity.

At compile time, if CONFIG_CPU_IDLE is disabled, the arch can redefine
cpuidle_idle_call. For e.g. in arch/x86/kernel/process.c

#ifndef CONFIG_CPU_IDLE
void cpuidle_idle_call(void)
{
if (local_idle)
local_idle();
else
default_idle();
}
#endif

where local_idle points to the idle routine selected using
select_idle_routine() which can be poll, mwait, c1e.

So this way, we still preserve the exact same functionality as before
and we also remove the ugly pm_idle exported function pointer and we
avoid unnecessary code bloat for platforms who do not want to use
cpuidle.

--arun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-14 Thread Arun R Bharadwaj
* Andi Kleen a...@firstfloor.org [2009-10-12 20:00:05]:

 Peter Zijlstra a.p.zijls...@chello.nl writes:
 
  So does it make sense to have a set of sets?
 
  Why not integrate them all into one set to be ruled by this governor
  thing?
 
 cpuidle is currently optional, that is why the two level hierarchy
 is there so that you can still have simple idle selection without it.
 
 % size drivers/cpuidle/*.o
textdata bss dec hex filename
55141416  4469741b3e drivers/cpuidle/built-in.o
 
 Adding it unconditionally would add ~7k to everyone who wants idle functions.
 
 I think making it unconditional would require putting it on a serious
 diet first.
 

Hi Andi,

Yes, this is a valid point.

How about something like this..
If the arch does not enable CONFIG_CPU_IDLE, the cpuidle_idle_call
which is called from cpu_idle() should call default_idle without
involving the registering cpuidle steps. This should prevent bloating
up of the kernel for archs which dont want to use cpuidle.

--arun
 -Andi
 -- 
 a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-14 Thread Andi Kleen
 How about something like this..
 If the arch does not enable CONFIG_CPU_IDLE, the cpuidle_idle_call
 which is called from cpu_idle() should call default_idle without
 involving the registering cpuidle steps. This should prevent bloating
 up of the kernel for archs which dont want to use cpuidle.

On x86 some people want small kernel too, so selecting it on a architecture
granuality is not good. Also you can make it default, you just need
to slim it down first.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-12 Thread Andi Kleen
Peter Zijlstra a.p.zijls...@chello.nl writes:

 So does it make sense to have a set of sets?

 Why not integrate them all into one set to be ruled by this governor
 thing?

cpuidle is currently optional, that is why the two level hierarchy
is there so that you can still have simple idle selection without it.

% size drivers/cpuidle/*.o
   textdata bss dec hex filename
   55141416  4469741b3e drivers/cpuidle/built-in.o

Adding it unconditionally would add ~7k to everyone who wants idle functions.

I think making it unconditional would require putting it on a serious
diet first.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-09 Thread Arun R Bharadwaj
* Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 14:25:37]:

 On Thu, 2009-10-08 at 17:31 +0530, Arun R Bharadwaj wrote:
  
   Uhm, no, it would mean ACPI putting its idle routines on the same level
   as all others.
   
  
  Putting them all on the same level would mean, we need an
  enable/disable routine to enable only the currently active routines.
 
 What's this enable/disable stuff about?
 
  Also, the way governor works is that, it assumes that idle routines
  are indexed in the increasing order of power benefit that can be got
  out of the state. So this would get messed up.
 
 Right, which is why I initially had a power-savings field in my
 proposal, so it could weight the power savings vs the wakeup latency.
 
   http://lkml.org/lkml/2009/8/27/159
 
 There it was said that was exactly what these governors were doing,
 seems its not.
 
   Sounds like something is wrong alright. If you can register an idle
   routine you should be able to unregister it too.
  
  
  Yes, we can register and unregister in a clean way now.
  Consider this. We have a set of routines A, B, C currently registered.
  Now a module comes and registers D and E, and later on at some point
  of time wants to unregister. So how do you keep track of what all idle
  routines the module registered and unregister only those?
  Best way to do that is a stack, which is how I have currently
  implemented.
 
 Right, so destroy that inner set thing, that way we only have one
 left ;-)
 

I'm not convinced with your argument. Why dont we do this
incrementally. Right now, this set of sets mechanism works fine and
doesn't look like it has any obvious flaws in it. We have a clean
register/unregister mechanism which solves all the earlier problems we
started out to solve.

We can gradually build on this and try to come up with a single set
of idle states for a governor to choose from.

thanks,
arun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-08 Thread Peter Zijlstra
On Thu, 2009-10-08 at 15:20 +0530, Arun R Bharadwaj wrote:
 * Arun R Bharadwaj a...@linux.vnet.ibm.com [2009-10-08 15:18:28]:
 
 Implement a list based registering mechanism for architectures which
 have multiple sets of idle routines which are to be registered.
 
 Currently, in x86 it is done by merely setting pm_idle = idle_routine
 and managing this pm_idle pointer is messy.
 
 To give an example of how this mechanism works:
 In x86, initially, idle routine is selected from the set of poll/mwait/
 c1e/default idle loops. So the selected idle loop is registered in cpuidle
 as one idle state cpuidle devices. Once ACPI comes up, it registers
 another set of idle states on top of this state. Again, suppose a module
 registers another set of idle loops, it is added to this list.
 
 This provides a clean way of registering and unregistering idle state
 routines.

So cpuidle didn't already have a list of idle functions it takes an
appropriate one from?

Then what does this governor do?

Also, does this imply the governor doesn't consider these idle routines?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-08 Thread Arun R Bharadwaj
* Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 12:36:02]:

 On Thu, 2009-10-08 at 15:20 +0530, Arun R Bharadwaj wrote:
  * Arun R Bharadwaj a...@linux.vnet.ibm.com [2009-10-08 15:18:28]:
  
  Implement a list based registering mechanism for architectures which
  have multiple sets of idle routines which are to be registered.
  
  Currently, in x86 it is done by merely setting pm_idle = idle_routine
  and managing this pm_idle pointer is messy.
  
  To give an example of how this mechanism works:
  In x86, initially, idle routine is selected from the set of poll/mwait/
  c1e/default idle loops. So the selected idle loop is registered in cpuidle
  as one idle state cpuidle devices. Once ACPI comes up, it registers
  another set of idle states on top of this state. Again, suppose a module
  registers another set of idle loops, it is added to this list.
  
  This provides a clean way of registering and unregistering idle state
  routines.
 
 So cpuidle didn't already have a list of idle functions it takes an
 appropriate one from?
 

No.. As of now, cpuidle supported only one _set_ of idle states that
can be registered. So in this one set, it would choose the appropriate
idle state. But this list mechanism(actually a stack) allows for
multiple sets.

This is needed because we have a hierarchy of idle states discovery
in x86. First, select_idle_routine() would select poll/mwait/default/c1e.
It doesn't know of existance of ACPI. Later when ACPI comes up,
it registers a set of routines on top of the earlier set.

 Then what does this governor do?


The governor would only select the best idle state available from the
set of states which is at the top of the stack. (In the above case, it
would only consider the states registered by ACPI).

If the top-of-the-stack set of idle states is unregistered, the next
set of states on the stack are considered.

 Also, does this imply the governor doesn't consider these idle routines?


As i said above, governor would only consider the idle routines which
are at the top of the stack.

Hope this gave a better idea..

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-08 Thread Peter Zijlstra
On Thu, 2009-10-08 at 16:12 +0530, Arun R Bharadwaj wrote:
 
  So cpuidle didn't already have a list of idle functions it takes an
  appropriate one from?
  
 
 No.. As of now, cpuidle supported only one _set_ of idle states that
 can be registered. So in this one set, it would choose the appropriate
 idle state. But this list mechanism(actually a stack) allows for
 multiple sets.
 
 This is needed because we have a hierarchy of idle states discovery
 in x86. First, select_idle_routine() would select poll/mwait/default/c1e.
 It doesn't know of existance of ACPI. Later when ACPI comes up,
 it registers a set of routines on top of the earlier set.
 
  Then what does this governor do?
 
 
 The governor would only select the best idle state available from the
 set of states which is at the top of the stack. (In the above case, it
 would only consider the states registered by ACPI).
 
 If the top-of-the-stack set of idle states is unregistered, the next
 set of states on the stack are considered.
 
  Also, does this imply the governor doesn't consider these idle routines?
 
 
 As i said above, governor would only consider the idle routines which
 are at the top of the stack.
 
 Hope this gave a better idea..

So does it make sense to have a set of sets?

Why not integrate them all into one set to be ruled by this governor
thing?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-08 Thread Peter Zijlstra
On Thu, 2009-10-08 at 16:31 +0530, Arun R Bharadwaj wrote:
 * Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 12:50:33]:
 
  On Thu, 2009-10-08 at 16:12 +0530, Arun R Bharadwaj wrote:
   
So cpuidle didn't already have a list of idle functions it takes an
appropriate one from?

   
   No.. As of now, cpuidle supported only one _set_ of idle states that
   can be registered. So in this one set, it would choose the appropriate
   idle state. But this list mechanism(actually a stack) allows for
   multiple sets.
   
   This is needed because we have a hierarchy of idle states discovery
   in x86. First, select_idle_routine() would select poll/mwait/default/c1e.
   It doesn't know of existance of ACPI. Later when ACPI comes up,
   it registers a set of routines on top of the earlier set.
   
Then what does this governor do?
   
   
   The governor would only select the best idle state available from the
   set of states which is at the top of the stack. (In the above case, it
   would only consider the states registered by ACPI).
   
   If the top-of-the-stack set of idle states is unregistered, the next
   set of states on the stack are considered.
   
Also, does this imply the governor doesn't consider these idle routines?
   
   
   As i said above, governor would only consider the idle routines which
   are at the top of the stack.
   
   Hope this gave a better idea..
  
  So does it make sense to have a set of sets?
  
  Why not integrate them all into one set to be ruled by this governor
  thing?
  
 
 Right now there is a clean hierarchy. So breaking that would mean
 putting the registration of all idle routines under ACPI. 

Uhm, no, it would mean ACPI putting its idle routines on the same level
as all others.

 So, if ACPI
 fails to come up or if ACPI is not supported, that would lead to
 problems.

I think the problem is that ACPI is thinking its special, that should be
rectified, its not.

  Because if that happens now, we can fallback to the
 initially registered set.

I'm thinking its all daft and we should be having one set of idle
routines, if ACPI fails (a tautology if ever there was one) we simply
wouldn't have its idle routines to pick from.

 Also, if a module wants to register a set of routines later on, that
 cant be added to the initially registered set. So i think we need this
 set of sets.

Sounds like something is wrong alright. If you can register an idle
routine you should be able to unregister it too.

What about making ACPI register its idle routines too, 1 for each C
state, and have the governor make a selection out of the full set?

That also allows you to do away with this default_idle() nonsense and
simply panic the box when there are no registered idle routines when the
box wants to go idle.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-08 Thread Arun R Bharadwaj
* Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 13:25:10]:

 On Thu, 2009-10-08 at 16:31 +0530, Arun R Bharadwaj wrote:
  * Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 12:50:33]:
  
   On Thu, 2009-10-08 at 16:12 +0530, Arun R Bharadwaj wrote:

 So cpuidle didn't already have a list of idle functions it takes an
 appropriate one from?
 

No.. As of now, cpuidle supported only one _set_ of idle states that
can be registered. So in this one set, it would choose the appropriate
idle state. But this list mechanism(actually a stack) allows for
multiple sets.

This is needed because we have a hierarchy of idle states discovery
in x86. First, select_idle_routine() would select 
poll/mwait/default/c1e.
It doesn't know of existance of ACPI. Later when ACPI comes up,
it registers a set of routines on top of the earlier set.

 Then what does this governor do?


The governor would only select the best idle state available from the
set of states which is at the top of the stack. (In the above case, it
would only consider the states registered by ACPI).

If the top-of-the-stack set of idle states is unregistered, the next
set of states on the stack are considered.

 Also, does this imply the governor doesn't consider these idle 
 routines?


As i said above, governor would only consider the idle routines which
are at the top of the stack.

Hope this gave a better idea..
   
   So does it make sense to have a set of sets?
   
   Why not integrate them all into one set to be ruled by this governor
   thing?
   
  
  Right now there is a clean hierarchy. So breaking that would mean
  putting the registration of all idle routines under ACPI. 
 
 Uhm, no, it would mean ACPI putting its idle routines on the same level
 as all others.
 

Putting them all on the same level would mean, we need an
enable/disable routine to enable only the currently active routines.

Also, the way governor works is that, it assumes that idle routines
are indexed in the increasing order of power benefit that can be got
out of the state. So this would get messed up.

  So, if ACPI
  fails to come up or if ACPI is not supported, that would lead to
  problems.
 
 I think the problem is that ACPI is thinking its special, that should be
 rectified, its not.
 
   Because if that happens now, we can fallback to the
  initially registered set.
 
 I'm thinking its all daft and we should be having one set of idle
 routines, if ACPI fails (a tautology if ever there was one) we simply
 wouldn't have its idle routines to pick from.
 
  Also, if a module wants to register a set of routines later on, that
  cant be added to the initially registered set. So i think we need this
  set of sets.
 
 Sounds like something is wrong alright. If you can register an idle
 routine you should be able to unregister it too.


Yes, we can register and unregister in a clean way now.
Consider this. We have a set of routines A, B, C currently registered.
Now a module comes and registers D and E, and later on at some point
of time wants to unregister. So how do you keep track of what all idle
routines the module registered and unregister only those?
Best way to do that is a stack, which is how I have currently
implemented.

 What about making ACPI register its idle routines too, 1 for each C
 state, and have the governor make a selection out of the full set?
 
 That also allows you to do away with this default_idle() nonsense and
 simply panic the box when there are no registered idle routines when the
 box wants to go idle.
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.

2009-10-08 Thread Peter Zijlstra
On Thu, 2009-10-08 at 17:31 +0530, Arun R Bharadwaj wrote:
 
  Uhm, no, it would mean ACPI putting its idle routines on the same level
  as all others.
  
 
 Putting them all on the same level would mean, we need an
 enable/disable routine to enable only the currently active routines.

What's this enable/disable stuff about?

 Also, the way governor works is that, it assumes that idle routines
 are indexed in the increasing order of power benefit that can be got
 out of the state. So this would get messed up.

Right, which is why I initially had a power-savings field in my
proposal, so it could weight the power savings vs the wakeup latency.

  http://lkml.org/lkml/2009/8/27/159

There it was said that was exactly what these governors were doing,
seems its not.

  Sounds like something is wrong alright. If you can register an idle
  routine you should be able to unregister it too.
 
 
 Yes, we can register and unregister in a clean way now.
 Consider this. We have a set of routines A, B, C currently registered.
 Now a module comes and registers D and E, and later on at some point
 of time wants to unregister. So how do you keep track of what all idle
 routines the module registered and unregister only those?
 Best way to do that is a stack, which is how I have currently
 implemented.

Right, so destroy that inner set thing, that way we only have one
left ;-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev