Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
* Andi Kleen a...@firstfloor.org [2009-10-14 09:18:38]: How about something like this.. If the arch does not enable CONFIG_CPU_IDLE, the cpuidle_idle_call which is called from cpu_idle() should call default_idle without involving the registering cpuidle steps. This should prevent bloating up of the kernel for archs which dont want to use cpuidle. On x86 some people want small kernel too, so selecting it on a architecture granuality is not good. Also you can make it default, you just need to slim it down first. No, I dont mean selecting it on an architecture granularity. At compile time, if CONFIG_CPU_IDLE is disabled, the arch can redefine cpuidle_idle_call. For e.g. in arch/x86/kernel/process.c #ifndef CONFIG_CPU_IDLE void cpuidle_idle_call(void) { if (local_idle) local_idle(); else default_idle(); } #endif where local_idle points to the idle routine selected using select_idle_routine() which can be poll, mwait, c1e. So this way, we still preserve the exact same functionality as before and we also remove the ugly pm_idle exported function pointer and we avoid unnecessary code bloat for platforms who do not want to use cpuidle. --arun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
* Andi Kleen a...@firstfloor.org [2009-10-12 20:00:05]: Peter Zijlstra a.p.zijls...@chello.nl writes: So does it make sense to have a set of sets? Why not integrate them all into one set to be ruled by this governor thing? cpuidle is currently optional, that is why the two level hierarchy is there so that you can still have simple idle selection without it. % size drivers/cpuidle/*.o textdata bss dec hex filename 55141416 4469741b3e drivers/cpuidle/built-in.o Adding it unconditionally would add ~7k to everyone who wants idle functions. I think making it unconditional would require putting it on a serious diet first. Hi Andi, Yes, this is a valid point. How about something like this.. If the arch does not enable CONFIG_CPU_IDLE, the cpuidle_idle_call which is called from cpu_idle() should call default_idle without involving the registering cpuidle steps. This should prevent bloating up of the kernel for archs which dont want to use cpuidle. --arun -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
How about something like this.. If the arch does not enable CONFIG_CPU_IDLE, the cpuidle_idle_call which is called from cpu_idle() should call default_idle without involving the registering cpuidle steps. This should prevent bloating up of the kernel for archs which dont want to use cpuidle. On x86 some people want small kernel too, so selecting it on a architecture granuality is not good. Also you can make it default, you just need to slim it down first. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
Peter Zijlstra a.p.zijls...@chello.nl writes: So does it make sense to have a set of sets? Why not integrate them all into one set to be ruled by this governor thing? cpuidle is currently optional, that is why the two level hierarchy is there so that you can still have simple idle selection without it. % size drivers/cpuidle/*.o textdata bss dec hex filename 55141416 4469741b3e drivers/cpuidle/built-in.o Adding it unconditionally would add ~7k to everyone who wants idle functions. I think making it unconditional would require putting it on a serious diet first. -Andi -- a...@linux.intel.com -- Speaking for myself only. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
* Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 14:25:37]: On Thu, 2009-10-08 at 17:31 +0530, Arun R Bharadwaj wrote: Uhm, no, it would mean ACPI putting its idle routines on the same level as all others. Putting them all on the same level would mean, we need an enable/disable routine to enable only the currently active routines. What's this enable/disable stuff about? Also, the way governor works is that, it assumes that idle routines are indexed in the increasing order of power benefit that can be got out of the state. So this would get messed up. Right, which is why I initially had a power-savings field in my proposal, so it could weight the power savings vs the wakeup latency. http://lkml.org/lkml/2009/8/27/159 There it was said that was exactly what these governors were doing, seems its not. Sounds like something is wrong alright. If you can register an idle routine you should be able to unregister it too. Yes, we can register and unregister in a clean way now. Consider this. We have a set of routines A, B, C currently registered. Now a module comes and registers D and E, and later on at some point of time wants to unregister. So how do you keep track of what all idle routines the module registered and unregister only those? Best way to do that is a stack, which is how I have currently implemented. Right, so destroy that inner set thing, that way we only have one left ;-) I'm not convinced with your argument. Why dont we do this incrementally. Right now, this set of sets mechanism works fine and doesn't look like it has any obvious flaws in it. We have a clean register/unregister mechanism which solves all the earlier problems we started out to solve. We can gradually build on this and try to come up with a single set of idle states for a governor to choose from. thanks, arun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
On Thu, 2009-10-08 at 15:20 +0530, Arun R Bharadwaj wrote: * Arun R Bharadwaj a...@linux.vnet.ibm.com [2009-10-08 15:18:28]: Implement a list based registering mechanism for architectures which have multiple sets of idle routines which are to be registered. Currently, in x86 it is done by merely setting pm_idle = idle_routine and managing this pm_idle pointer is messy. To give an example of how this mechanism works: In x86, initially, idle routine is selected from the set of poll/mwait/ c1e/default idle loops. So the selected idle loop is registered in cpuidle as one idle state cpuidle devices. Once ACPI comes up, it registers another set of idle states on top of this state. Again, suppose a module registers another set of idle loops, it is added to this list. This provides a clean way of registering and unregistering idle state routines. So cpuidle didn't already have a list of idle functions it takes an appropriate one from? Then what does this governor do? Also, does this imply the governor doesn't consider these idle routines? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
* Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 12:36:02]: On Thu, 2009-10-08 at 15:20 +0530, Arun R Bharadwaj wrote: * Arun R Bharadwaj a...@linux.vnet.ibm.com [2009-10-08 15:18:28]: Implement a list based registering mechanism for architectures which have multiple sets of idle routines which are to be registered. Currently, in x86 it is done by merely setting pm_idle = idle_routine and managing this pm_idle pointer is messy. To give an example of how this mechanism works: In x86, initially, idle routine is selected from the set of poll/mwait/ c1e/default idle loops. So the selected idle loop is registered in cpuidle as one idle state cpuidle devices. Once ACPI comes up, it registers another set of idle states on top of this state. Again, suppose a module registers another set of idle loops, it is added to this list. This provides a clean way of registering and unregistering idle state routines. So cpuidle didn't already have a list of idle functions it takes an appropriate one from? No.. As of now, cpuidle supported only one _set_ of idle states that can be registered. So in this one set, it would choose the appropriate idle state. But this list mechanism(actually a stack) allows for multiple sets. This is needed because we have a hierarchy of idle states discovery in x86. First, select_idle_routine() would select poll/mwait/default/c1e. It doesn't know of existance of ACPI. Later when ACPI comes up, it registers a set of routines on top of the earlier set. Then what does this governor do? The governor would only select the best idle state available from the set of states which is at the top of the stack. (In the above case, it would only consider the states registered by ACPI). If the top-of-the-stack set of idle states is unregistered, the next set of states on the stack are considered. Also, does this imply the governor doesn't consider these idle routines? As i said above, governor would only consider the idle routines which are at the top of the stack. Hope this gave a better idea.. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
On Thu, 2009-10-08 at 16:12 +0530, Arun R Bharadwaj wrote: So cpuidle didn't already have a list of idle functions it takes an appropriate one from? No.. As of now, cpuidle supported only one _set_ of idle states that can be registered. So in this one set, it would choose the appropriate idle state. But this list mechanism(actually a stack) allows for multiple sets. This is needed because we have a hierarchy of idle states discovery in x86. First, select_idle_routine() would select poll/mwait/default/c1e. It doesn't know of existance of ACPI. Later when ACPI comes up, it registers a set of routines on top of the earlier set. Then what does this governor do? The governor would only select the best idle state available from the set of states which is at the top of the stack. (In the above case, it would only consider the states registered by ACPI). If the top-of-the-stack set of idle states is unregistered, the next set of states on the stack are considered. Also, does this imply the governor doesn't consider these idle routines? As i said above, governor would only consider the idle routines which are at the top of the stack. Hope this gave a better idea.. So does it make sense to have a set of sets? Why not integrate them all into one set to be ruled by this governor thing? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
On Thu, 2009-10-08 at 16:31 +0530, Arun R Bharadwaj wrote: * Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 12:50:33]: On Thu, 2009-10-08 at 16:12 +0530, Arun R Bharadwaj wrote: So cpuidle didn't already have a list of idle functions it takes an appropriate one from? No.. As of now, cpuidle supported only one _set_ of idle states that can be registered. So in this one set, it would choose the appropriate idle state. But this list mechanism(actually a stack) allows for multiple sets. This is needed because we have a hierarchy of idle states discovery in x86. First, select_idle_routine() would select poll/mwait/default/c1e. It doesn't know of existance of ACPI. Later when ACPI comes up, it registers a set of routines on top of the earlier set. Then what does this governor do? The governor would only select the best idle state available from the set of states which is at the top of the stack. (In the above case, it would only consider the states registered by ACPI). If the top-of-the-stack set of idle states is unregistered, the next set of states on the stack are considered. Also, does this imply the governor doesn't consider these idle routines? As i said above, governor would only consider the idle routines which are at the top of the stack. Hope this gave a better idea.. So does it make sense to have a set of sets? Why not integrate them all into one set to be ruled by this governor thing? Right now there is a clean hierarchy. So breaking that would mean putting the registration of all idle routines under ACPI. Uhm, no, it would mean ACPI putting its idle routines on the same level as all others. So, if ACPI fails to come up or if ACPI is not supported, that would lead to problems. I think the problem is that ACPI is thinking its special, that should be rectified, its not. Because if that happens now, we can fallback to the initially registered set. I'm thinking its all daft and we should be having one set of idle routines, if ACPI fails (a tautology if ever there was one) we simply wouldn't have its idle routines to pick from. Also, if a module wants to register a set of routines later on, that cant be added to the initially registered set. So i think we need this set of sets. Sounds like something is wrong alright. If you can register an idle routine you should be able to unregister it too. What about making ACPI register its idle routines too, 1 for each C state, and have the governor make a selection out of the full set? That also allows you to do away with this default_idle() nonsense and simply panic the box when there are no registered idle routines when the box wants to go idle. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
* Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 13:25:10]: On Thu, 2009-10-08 at 16:31 +0530, Arun R Bharadwaj wrote: * Peter Zijlstra a.p.zijls...@chello.nl [2009-10-08 12:50:33]: On Thu, 2009-10-08 at 16:12 +0530, Arun R Bharadwaj wrote: So cpuidle didn't already have a list of idle functions it takes an appropriate one from? No.. As of now, cpuidle supported only one _set_ of idle states that can be registered. So in this one set, it would choose the appropriate idle state. But this list mechanism(actually a stack) allows for multiple sets. This is needed because we have a hierarchy of idle states discovery in x86. First, select_idle_routine() would select poll/mwait/default/c1e. It doesn't know of existance of ACPI. Later when ACPI comes up, it registers a set of routines on top of the earlier set. Then what does this governor do? The governor would only select the best idle state available from the set of states which is at the top of the stack. (In the above case, it would only consider the states registered by ACPI). If the top-of-the-stack set of idle states is unregistered, the next set of states on the stack are considered. Also, does this imply the governor doesn't consider these idle routines? As i said above, governor would only consider the idle routines which are at the top of the stack. Hope this gave a better idea.. So does it make sense to have a set of sets? Why not integrate them all into one set to be ruled by this governor thing? Right now there is a clean hierarchy. So breaking that would mean putting the registration of all idle routines under ACPI. Uhm, no, it would mean ACPI putting its idle routines on the same level as all others. Putting them all on the same level would mean, we need an enable/disable routine to enable only the currently active routines. Also, the way governor works is that, it assumes that idle routines are indexed in the increasing order of power benefit that can be got out of the state. So this would get messed up. So, if ACPI fails to come up or if ACPI is not supported, that would lead to problems. I think the problem is that ACPI is thinking its special, that should be rectified, its not. Because if that happens now, we can fallback to the initially registered set. I'm thinking its all daft and we should be having one set of idle routines, if ACPI fails (a tautology if ever there was one) we simply wouldn't have its idle routines to pick from. Also, if a module wants to register a set of routines later on, that cant be added to the initially registered set. So i think we need this set of sets. Sounds like something is wrong alright. If you can register an idle routine you should be able to unregister it too. Yes, we can register and unregister in a clean way now. Consider this. We have a set of routines A, B, C currently registered. Now a module comes and registers D and E, and later on at some point of time wants to unregister. So how do you keep track of what all idle routines the module registered and unregister only those? Best way to do that is a stack, which is how I have currently implemented. What about making ACPI register its idle routines too, 1 for each C state, and have the governor make a selection out of the full set? That also allows you to do away with this default_idle() nonsense and simply panic the box when there are no registered idle routines when the box wants to go idle. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v8 PATCH 2/8]: cpuidle: implement a list based approach to register a set of idle routines.
On Thu, 2009-10-08 at 17:31 +0530, Arun R Bharadwaj wrote: Uhm, no, it would mean ACPI putting its idle routines on the same level as all others. Putting them all on the same level would mean, we need an enable/disable routine to enable only the currently active routines. What's this enable/disable stuff about? Also, the way governor works is that, it assumes that idle routines are indexed in the increasing order of power benefit that can be got out of the state. So this would get messed up. Right, which is why I initially had a power-savings field in my proposal, so it could weight the power savings vs the wakeup latency. http://lkml.org/lkml/2009/8/27/159 There it was said that was exactly what these governors were doing, seems its not. Sounds like something is wrong alright. If you can register an idle routine you should be able to unregister it too. Yes, we can register and unregister in a clean way now. Consider this. We have a set of routines A, B, C currently registered. Now a module comes and registers D and E, and later on at some point of time wants to unregister. So how do you keep track of what all idle routines the module registered and unregister only those? Best way to do that is a stack, which is how I have currently implemented. Right, so destroy that inner set thing, that way we only have one left ;-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev