Re: [linux-pm] Power Management framework proposal
sorry for the delay in responding On Wed, 25 Jul 2007, Jerome Glisse wrote: [EMAIL PROTECTED] wrote: On Wed, 25 Jul 2007, Jerome Glisse wrote: > On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > For instance on graphics card you could do the following (maybe > > more): > > > -change GPU clock > > > -change memory clock > > > -disable part of engine > > > -disable unit > > > i truly don't think you can make a common interface for all this, > > more > > > over there might be constraint on how you can change things (GPU & > > > memory clock might need to follow a given ratio). So you definitely > > > need knowledge in the user space program to handle this. > > > > sure you can, just enumerate all the options the driver writer wants > > to > > offer as options. yes this could be a lengthy list, so what? > > > > My point was that your interface by trying to fit square pegs into round > hole > will fail to expose all subtility of each device which might in the end > bring > to wrong power management decision. So i believe we can't sum up > power management to list of mode whose attribute are power consumption & capacity. it's possible (which is part of the reason I started the thread), but so far there hasn't been anything identified that is a really bad fit. Tell me how i do this in your model: GPU/VRAM memory clock change power consumption of the card and the power consumption is often not a trivial function of both of this parameters (i even here simplify the problem by omitting pipeline shutdown). So how with two different separate mode list (one for GPU speed another one for VRAM speed) can you provide consumption information while this consumption depends on the others settings. Then if you give as a solution to make only one list you end up with a more bigger list than previously needed (nrGPUmodes * nrVRAMmodes) do you expect the user to go through a lengthy list to find what he wants ? (remember that we will have to add pipeline power off, pll tweaking or many others way of saving power on such card). yes I expect that it would be a large list in some conditions. but one purpose of this API is to make these options able to be discovered by software. right now nothing could be done at all without driver specific knowledge. even a lengthy list can be better then that. presenting the list to the user directly is a last resort, only for experimentation or when nothing else wants to deal with devices of that type. with a description field (which I didn't include initially, but seems obviously needed now) it should be fairly easy to create descriptions that let the software see that there are multiple factors involved. So by choosing this power consumption as a unit of measure you end up in non trivial case. There is also the question of overclocking if the driver supports overclocking then list it in the modes (nothing says that % capacity couldn't go over 100% for example) , and other points already identified where unfortunately a global design such as your proposal does not seems to fit properly: local power decision (ethernet, wifi card, ... can power down them self is they are doing nothings but the place where you can know this is the driver) if they power themselves down with no notice to the system they should power themselves back up with no need for the rest of the system to tell them either. so this ca either be ignored or presented as a mode between off and on that enables this behavior. , there is also the child/parent relation, how to estimate power usage (on some configuration one device consumption can be marginal toward all others things while on other this same device can be the most power hungry device)... I see all this as bad fit. ahh, here we see a disconnect. I was not intending for the power field to be that exact. there are just too many variables. for example: even for a cpu, the power used isn't exactly tied to the clock speed and voltage, the mix of commands that the cpu is running will affect the power it eats, sometimes by a significant amount. it was intended to be an ordering factor and approximate the power used so that things could make a peroformance/power tradeoff with a good chance of makeing a reasonable choice. it's not intended for 'make this laptop use 24w of power instead of 25w of power' > And there is no way to design an abstraction given that all hw we will > have > to deal with are too much different and do not follow any standard > things > (beside ACPI there is other way to save power brightness, gpu/memory > clock, pll, ...) so i don't see how one might give a common view of > things > which are fundamentally different in how they affect consumption (same > end > result with many different paths leading to it). so you are saying that the power management software must know the details of each and every driver, and if you
Re: [linux-pm] Power Management framework proposal
sorry for the delay in responding On Wed, 25 Jul 2007, Jerome Glisse wrote: [EMAIL PROTECTED] wrote: On Wed, 25 Jul 2007, Jerome Glisse wrote: On 7/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption capacity. it's possible (which is part of the reason I started the thread), but so far there hasn't been anything identified that is a really bad fit. Tell me how i do this in your model: GPU/VRAM memory clock change power consumption of the card and the power consumption is often not a trivial function of both of this parameters (i even here simplify the problem by omitting pipeline shutdown). So how with two different separate mode list (one for GPU speed another one for VRAM speed) can you provide consumption information while this consumption depends on the others settings. Then if you give as a solution to make only one list you end up with a more bigger list than previously needed (nrGPUmodes * nrVRAMmodes) do you expect the user to go through a lengthy list to find what he wants ? (remember that we will have to add pipeline power off, pll tweaking or many others way of saving power on such card). yes I expect that it would be a large list in some conditions. but one purpose of this API is to make these options able to be discovered by software. right now nothing could be done at all without driver specific knowledge. even a lengthy list can be better then that. presenting the list to the user directly is a last resort, only for experimentation or when nothing else wants to deal with devices of that type. with a description field (which I didn't include initially, but seems obviously needed now) it should be fairly easy to create descriptions that let the software see that there are multiple factors involved. So by choosing this power consumption as a unit of measure you end up in non trivial case. There is also the question of overclocking if the driver supports overclocking then list it in the modes (nothing says that % capacity couldn't go over 100% for example) , and other points already identified where unfortunately a global design such as your proposal does not seems to fit properly: local power decision (ethernet, wifi card, ... can power down them self is they are doing nothings but the place where you can know this is the driver) if they power themselves down with no notice to the system they should power themselves back up with no need for the rest of the system to tell them either. so this ca either be ignored or presented as a mode between off and on that enables this behavior. , there is also the child/parent relation, how to estimate power usage (on some configuration one device consumption can be marginal toward all others things while on other this same device can be the most power hungry device)... I see all this as bad fit. ahh, here we see a disconnect. I was not intending for the power field to be that exact. there are just too many variables. for example: even for a cpu, the power used isn't exactly tied to the clock speed and voltage, the mix of commands that the cpu is running will affect the power it eats, sometimes by a significant amount. it was intended to be an ordering factor and approximate the power used so that things could make a peroformance/power tradeoff with a good chance of makeing a reasonable choice. it's not intended for 'make this laptop use 24w of power instead of 25w of power' And there is no way to design an abstraction given that all hw we will have to deal with are too much different and do not follow any standard things (beside ACPI there is other way to save power brightness, gpu/memory clock, pll, ...) so i don't see how one might give a common view of things which are fundamentally different in how they affect consumption (same end result with many different paths leading to it). so you are saying that the power management software must know the details of each and every driver, and if you add a new driver you must change the power management software
Re: [linux-pm] Power Management framework proposal
Hi! > > let me give you a real world example then, and the numbers I'm using are > > ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I > > just rounded them a little so that the math works out nice. > > > > power at full speed: 34W > > power at half speed: 24W > > power at idle: 1W > > I have usually seen different numbers, for example: > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf Trust Arjan, modern cpus work as he describes. > Although this paper speaks about thermal design power instead of power > consumption, i suppose that it should be roughly equal. > > For example Athlon 64 3700 (ADA3700AEP5AR): > > 2.4 GHz, 1.5 V -> 89 W > 2.2 GHz, 1.4 V -> 72 W > 2.0 GHz, 1.3 V -> 53 W > 1.8 GHz, 1.2 V -> 39 W > 1.0 GHz, 1.1 V -> 22 W I guess that means athlon 64 is 'old'. > Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power > consumption shows that it is more efficient to be busy for 2 time units > on 1 GHz than be busy for 1 time unit and be idle for 1 time unit > on 2 GHz. > > 1 GHz: > both cores idle: 48 W > one core busy:57 W > two cores busy: 66 W 2 sec decoding video at both cores: 132J > 2 GHz: > both cores idle: 54 W > one core busy:78 W > two cores busy: 95 W 1 sec decode @ 2GHz + 1 sec idle @ 1GHz: 143J So even on your hw difference is not too big... and take a look at numbers from core2duo. Actually... 4 sec decode @ 1 core @ 1GHz: 57*4=228J 1 sec decode @ 2 cores @ 2GHz, then idle: 95 + 48*3 = 142+95 = 235J... Ok, so it is still win, but even smaller one.. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
Hi! let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W I have usually seen different numbers, for example: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf Trust Arjan, modern cpus work as he describes. Although this paper speaks about thermal design power instead of power consumption, i suppose that it should be roughly equal. For example Athlon 64 3700 (ADA3700AEP5AR): 2.4 GHz, 1.5 V - 89 W 2.2 GHz, 1.4 V - 72 W 2.0 GHz, 1.3 V - 53 W 1.8 GHz, 1.2 V - 39 W 1.0 GHz, 1.1 V - 22 W I guess that means athlon 64 is 'old'. Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power consumption shows that it is more efficient to be busy for 2 time units on 1 GHz than be busy for 1 time unit and be idle for 1 time unit on 2 GHz. 1 GHz: both cores idle: 48 W one core busy:57 W two cores busy: 66 W 2 sec decoding video at both cores: 132J 2 GHz: both cores idle: 54 W one core busy:78 W two cores busy: 95 W 1 sec decode @ 2GHz + 1 sec idle @ 1GHz: 143J So even on your hw difference is not too big... and take a look at numbers from core2duo. Actually... 4 sec decode @ 1 core @ 1GHz: 57*4=228J 1 sec decode @ 2 cores @ 2GHz, then idle: 95 + 48*3 = 142+95 = 235J... Ok, so it is still win, but even smaller one.. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
[EMAIL PROTECTED] wrote: On Wed, 25 Jul 2007, Jerome Glisse wrote: On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. I take here ohm as a reference (this come from my limited understanding of this daemon so there might be inaccuracy) driver export through HAL there power management tunning capacity, Then an ohm plugin would use HAL to give a higher view of this capacity and also manage policy, preference, permission, ... Last consumer in power management food chain would be an user interface which will communicate with ohm (and with all ohm plugin) so desktop writter (gnome, kde, ...) can write some kind of power management center where each ohm plugin can have its own panel. So in the end the user got one place to do all its power management which is the goal i think you are trying to aim. no. I am talking about the interface to the drivers that things like HAL would use Ok, i was just trying to stress that the end result is the same from the user point of view. > For instance on graphics card you could do the following (maybe more): > -change GPU clock > -change memory clock > -disable part of engine > -disable unit > i truly don't think you can make a common interface for all this, more > over there might be constraint on how you can change things (GPU & > memory clock might need to follow a given ratio). So you definitely > need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption & capacity. it's possible (which is part of the reason I started the thread), but so far there hasn't been anything identified that is a really bad fit. Tell me how i do this in your model: GPU/VRAM memory clock change power consumption of the card and the power consumption is often not a trivial function of both of this parameters (i even here simplify the problem by omitting pipeline shutdown). So how with two different separate mode list (one for GPU speed another one for VRAM speed) can you provide consumption information while this consumption depends on the others settings. Then if you give as a solution to make only one list you end up with a more bigger list than previously needed (nrGPUmodes * nrVRAMmodes) do you expect the user to go through a lengthy list to find what he wants ? (remember that we will have to add pipeline power off, pll tweaking or many others way of saving power on such card). So by choosing this power consumption as a unit of measure you end up in non trivial case. There is also the question of overclocking, and other points already identified where unfortunately a global design such as your proposal does not seems to fit properly: local power decision (ethernet, wifi card, ... can power down them self is they are doing nothings but the place where you can know this is the driver), there is also the child/parent relation, how to estimate power usage (on some configuration one device consumption can be marginal toward all others things while on other this same device can be the most power hungry device)... I see all this as bad fit. And there is no way to design an abstraction given that all hw we will have to deal with are too much different and do not follow any standard things (beside ACPI there is other way to save power brightness, gpu/memory clock, pll, ...) so i don't see how one might give a common view of things which are fundamentally different in how they affect consumption (same end result with many different paths leading to it). so you are saying that the power management software must know the details of each and every driver, and if you add a new driver you must change the power management software before it can do anything (including allowing manual control of the modes) You have to provide an ohm plug in (in an ohm world) where policy for this device will be handled and this plug in need to be designed knowing what the hw export through HAL. Yes it's pain full but you don't want to put policy in the driver and to do policy you need knowledge on the things you deal with. seems to me I heard similar arguments several years ago about the CPU speed settings, it turns out that the cpufreq interface works really well for them and the software that's controlling things no longer needs to know the details of every
Re: [linux-pm] Power Management framework proposal
[EMAIL PROTECTED] wrote: On Wed, 25 Jul 2007, Jerome Glisse wrote: On 7/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. I take here ohm as a reference (this come from my limited understanding of this daemon so there might be inaccuracy) driver export through HAL there power management tunning capacity, Then an ohm plugin would use HAL to give a higher view of this capacity and also manage policy, preference, permission, ... Last consumer in power management food chain would be an user interface which will communicate with ohm (and with all ohm plugin) so desktop writter (gnome, kde, ...) can write some kind of power management center where each ohm plugin can have its own panel. So in the end the user got one place to do all its power management which is the goal i think you are trying to aim. no. I am talking about the interface to the drivers that things like HAL would use Ok, i was just trying to stress that the end result is the same from the user point of view. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption capacity. it's possible (which is part of the reason I started the thread), but so far there hasn't been anything identified that is a really bad fit. Tell me how i do this in your model: GPU/VRAM memory clock change power consumption of the card and the power consumption is often not a trivial function of both of this parameters (i even here simplify the problem by omitting pipeline shutdown). So how with two different separate mode list (one for GPU speed another one for VRAM speed) can you provide consumption information while this consumption depends on the others settings. Then if you give as a solution to make only one list you end up with a more bigger list than previously needed (nrGPUmodes * nrVRAMmodes) do you expect the user to go through a lengthy list to find what he wants ? (remember that we will have to add pipeline power off, pll tweaking or many others way of saving power on such card). So by choosing this power consumption as a unit of measure you end up in non trivial case. There is also the question of overclocking, and other points already identified where unfortunately a global design such as your proposal does not seems to fit properly: local power decision (ethernet, wifi card, ... can power down them self is they are doing nothings but the place where you can know this is the driver), there is also the child/parent relation, how to estimate power usage (on some configuration one device consumption can be marginal toward all others things while on other this same device can be the most power hungry device)... I see all this as bad fit. And there is no way to design an abstraction given that all hw we will have to deal with are too much different and do not follow any standard things (beside ACPI there is other way to save power brightness, gpu/memory clock, pll, ...) so i don't see how one might give a common view of things which are fundamentally different in how they affect consumption (same end result with many different paths leading to it). so you are saying that the power management software must know the details of each and every driver, and if you add a new driver you must change the power management software before it can do anything (including allowing manual control of the modes) You have to provide an ohm plug in (in an ohm world) where policy for this device will be handled and this plug in need to be designed knowing what the hw export through HAL. Yes it's pain full but you don't want to put policy in the driver and to do policy you need knowledge on the things you deal with. seems to me I heard similar arguments several years ago about the CPU speed settings, it turns out that the cpufreq interface works really well for them and the software that's controlling things no longer needs to know the details of every CPU. why did
Re: [linux-pm] Power Management framework proposal
On Wed, 25 Jul 2007, Jerome Glisse wrote: On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Tue, 24 Jul 2007, Jerome Glisse wrote: > On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > On Mon, 23 Jul 2007, Igor Stoppa wrote: > > > again, HAL / OHM / Mobilin > > > > I was trying to define the lower level interfaces that these tools > > need. > > today they can only know what is possible by reading the source code > > for > > each driver and implementing the driver-specific interfaces nessasary > > to > > set things, I was proposing a common interface that tools like this > > could > > use instead of requiring all the driver-specific knowledge. > > > > > > in a nutshell (and I know this is probably not detailed to be > > acceptable) > > > > 1. the software needs to know what the interconnects and dependancies > > between devices are (supposedly this is provided via sysfs) > > > > 2. the software needs to know what type of device this is (again, > > supposedly this is provided via sysfs) > > > > 3. the software needs to know what modes exist for a driver/piece of > > hardware. to make any decisions this infomation needs to provide > > some > > information about the capability of the mode and the power > > consumed in > > that mode. in addition there will need to be flags to indicate > > any > > special restrictions of a mode > > > > 4. the software needs to know the cost of switching from any mode to > > any > > other mode. since some transitions will interact with other > > devices > > there will need to be flags to indicate such requirements for > > specific > > transitions. > > > > 5. the software needs to be able to find out what mode a device is > > in. > > > > 6. the software needs to be able to tell the driver to switch to a > > different mode (I think it would be a very good thing if going to > > a > > particular mode was always the same command, no matter what mode > > it is > > currently in) > > > > 7. the software needs to figure out the desire of the user. > > > > my proposal was addressing items #3-#6. it isn't trying to decide > > what to > > do, simply to allow the software that _is_ trying to decide what to > > do a > > way to find out what it can do. > > > > David Lang > > I believe a central place where user can set/change hw state to save > power or to increase computational power is definitely a goal to pursue. > But i truly think that the OHM approach is the best one ie using plugins > so that one can make a plugin specific for each device. The point is > that > i believe there is no way to do an abstract interface for this and > trying to > do so will endup doing ugly code and any interface would fail to > encompass > all possible tweak that might exist for all devices. will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. I take here ohm as a reference (this come from my limited understanding of this daemon so there might be inaccuracy) driver export through HAL there power management tunning capacity, Then an ohm plugin would use HAL to give a higher view of this capacity and also manage policy, preference, permission, ... Last consumer in power management food chain would be an user interface which will communicate with ohm (and with all ohm plugin) so desktop writter (gnome, kde, ...) can write some kind of power management center where each ohm plugin can have its own panel. So in the end the user got one place to do all its power management which is the goal i think you are trying to aim. no. I am talking about the interface to the drivers that things like HAL would use > For instance on graphics card you could do the following (maybe more): > -change GPU clock > -change memory clock > -disable part of engine > -disable unit > i truly don't think you can make a common interface for all this, more > over there might be constraint on how you can change things (GPU & > memory clock might need to follow a given ratio). So you definitely > need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption & capacity. it's possible (which is part of the reason I started the thread), but so far there
Re: [linux-pm] Power Management framework proposal
On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Tue, 24 Jul 2007, Jerome Glisse wrote: > On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> On Mon, 23 Jul 2007, Igor Stoppa wrote: >> > again, HAL / OHM / Mobilin >> >> I was trying to define the lower level interfaces that these tools need. >> today they can only know what is possible by reading the source code for >> each driver and implementing the driver-specific interfaces nessasary to >> set things, I was proposing a common interface that tools like this could >> use instead of requiring all the driver-specific knowledge. >> >> >> in a nutshell (and I know this is probably not detailed to be acceptable) >> >> 1. the software needs to know what the interconnects and dependancies >> between devices are (supposedly this is provided via sysfs) >> >> 2. the software needs to know what type of device this is (again, >> supposedly this is provided via sysfs) >> >> 3. the software needs to know what modes exist for a driver/piece of >> hardware. to make any decisions this infomation needs to provide some >> information about the capability of the mode and the power consumed in >> that mode. in addition there will need to be flags to indicate any >> special restrictions of a mode >> >> 4. the software needs to know the cost of switching from any mode to any >> other mode. since some transitions will interact with other devices >> there will need to be flags to indicate such requirements for specific >> transitions. >> >> 5. the software needs to be able to find out what mode a device is in. >> >> 6. the software needs to be able to tell the driver to switch to a >> different mode (I think it would be a very good thing if going to a >> particular mode was always the same command, no matter what mode it is >> currently in) >> >> 7. the software needs to figure out the desire of the user. >> >> my proposal was addressing items #3-#6. it isn't trying to decide what to >> do, simply to allow the software that _is_ trying to decide what to do a >> way to find out what it can do. >> >> David Lang > > I believe a central place where user can set/change hw state to save > power or to increase computational power is definitely a goal to pursue. > But i truly think that the OHM approach is the best one ie using plugins > so that one can make a plugin specific for each device. The point is that > i believe there is no way to do an abstract interface for this and trying to > do so will endup doing ugly code and any interface would fail to encompass > all possible tweak that might exist for all devices. will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. I take here ohm as a reference (this come from my limited understanding of this daemon so there might be inaccuracy) driver export through HAL there power management tunning capacity, Then an ohm plugin would use HAL to give a higher view of this capacity and also manage policy, preference, permission, ... Last consumer in power management food chain would be an user interface which will communicate with ohm (and with all ohm plugin) so desktop writter (gnome, kde, ...) can write some kind of power management center where each ohm plugin can have its own panel. So in the end the user got one place to do all its power management which is the goal i think you are trying to aim. > For instance on graphics card you could do the following (maybe more): > -change GPU clock > -change memory clock > -disable part of engine > -disable unit > i truly don't think you can make a common interface for all this, more > over there might be constraint on how you can change things (GPU & > memory clock might need to follow a given ratio). So you definitely > need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption & capacity. And there is no way to design an abstraction given that all hw we will have to deal with are too much different and do not follow any standard things (beside ACPI there is other way to save power brightness, gpu/memory clock, pll, ...) so i don't see how one might give a common view of things which are fundamentally different in how they affect consumption (same end result with many different paths leading to it). best, Jerome Glisse - To unsubscribe from this list: send the line
Re: [linux-pm] Power Management framework proposal
On Tue, 24 Jul 2007, Igor Stoppa wrote: On Tue, 2007-07-24 at 10:43 +0200, ext Jerome Glisse wrote: I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU & memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. Even simpler case: LCD backlight can come in many flavors, both in terms of brightness levels and fixed amount of current required to keep it ON. Trying to abstract such details from the decision-making makes little sense. Isolating that into a separate module, instead, brings the best of both worlds: -containment of the HW-specific code -leveraging every possible, no matter how exotic, power saving mode available. huh?? in the proposal that I made all the HW specific code would be in the device driver. I was just proposing a way for the driver to advertise what it is able to do. why would you want to pull the code out into a seperate model? many levels of backlight with different power consumption is trivial to do. backlight 1 mode %capability %power aka brightness 00 0 1 100 100 2 7575 3 5050 4 2525 backlight 2 mode %capability %power aka brightness 00 0 1 100 100 2 8050 3 6030 3 4020 4 2515 backlight 2 mode %capability %power aka brightness 00 0 1 100 100 2 5090 why do you think the decision makeing logic needs to know the details of the hardware? if you can abstract the details out then the same control logic can be used for different things. if you infuse the hardware knowledge with the control logic then you have to change the control section every time you want to support a new piece of hardware. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Tue, 24 Jul 2007, Jerome Glisse wrote: On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Mon, 23 Jul 2007, Igor Stoppa wrote: > again, HAL / OHM / Mobilin I was trying to define the lower level interfaces that these tools need. today they can only know what is possible by reading the source code for each driver and implementing the driver-specific interfaces nessasary to set things, I was proposing a common interface that tools like this could use instead of requiring all the driver-specific knowledge. in a nutshell (and I know this is probably not detailed to be acceptable) 1. the software needs to know what the interconnects and dependancies between devices are (supposedly this is provided via sysfs) 2. the software needs to know what type of device this is (again, supposedly this is provided via sysfs) 3. the software needs to know what modes exist for a driver/piece of hardware. to make any decisions this infomation needs to provide some information about the capability of the mode and the power consumed in that mode. in addition there will need to be flags to indicate any special restrictions of a mode 4. the software needs to know the cost of switching from any mode to any other mode. since some transitions will interact with other devices there will need to be flags to indicate such requirements for specific transitions. 5. the software needs to be able to find out what mode a device is in. 6. the software needs to be able to tell the driver to switch to a different mode (I think it would be a very good thing if going to a particular mode was always the same command, no matter what mode it is currently in) 7. the software needs to figure out the desire of the user. my proposal was addressing items #3-#6. it isn't trying to decide what to do, simply to allow the software that _is_ trying to decide what to do a way to find out what it can do. David Lang I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU & memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Tue, 2007-07-24 at 10:43 +0200, ext Jerome Glisse wrote: > I believe a central place where user can set/change hw state to save > power or to increase computational power is definitely a goal to pursue. > But i truly think that the OHM approach is the best one ie using plugins > so that one can make a plugin specific for each device. The point is that > i believe there is no way to do an abstract interface for this and trying to > do so will endup doing ugly code and any interface would fail to encompass > all possible tweak that might exist for all devices. > > For instance on graphics card you could do the following (maybe more): > -change GPU clock > -change memory clock > -disable part of engine > -disable unit > i truly don't think you can make a common interface for all this, more > over there might be constraint on how you can change things (GPU & > memory clock might need to follow a given ratio). So you definitely > need knowledge in the user space program to handle this. Even simpler case: LCD backlight can come in many flavors, both in terms of brightness levels and fixed amount of current required to keep it ON. Trying to abstract such details from the decision-making makes little sense. Isolating that into a separate module, instead, brings the best of both worlds: -containment of the HW-specific code -leveraging every possible, no matter how exotic, power saving mode available. -- Cheers, Igor Igor Stoppa <[EMAIL PROTECTED]> (Nokia Multimedia - CP - OSSO / Helsinki, Finland) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Mon, 23 Jul 2007, Igor Stoppa wrote: > again, HAL / OHM / Mobilin I was trying to define the lower level interfaces that these tools need. today they can only know what is possible by reading the source code for each driver and implementing the driver-specific interfaces nessasary to set things, I was proposing a common interface that tools like this could use instead of requiring all the driver-specific knowledge. in a nutshell (and I know this is probably not detailed to be acceptable) 1. the software needs to know what the interconnects and dependancies between devices are (supposedly this is provided via sysfs) 2. the software needs to know what type of device this is (again, supposedly this is provided via sysfs) 3. the software needs to know what modes exist for a driver/piece of hardware. to make any decisions this infomation needs to provide some information about the capability of the mode and the power consumed in that mode. in addition there will need to be flags to indicate any special restrictions of a mode 4. the software needs to know the cost of switching from any mode to any other mode. since some transitions will interact with other devices there will need to be flags to indicate such requirements for specific transitions. 5. the software needs to be able to find out what mode a device is in. 6. the software needs to be able to tell the driver to switch to a different mode (I think it would be a very good thing if going to a particular mode was always the same command, no matter what mode it is currently in) 7. the software needs to figure out the desire of the user. my proposal was addressing items #3-#6. it isn't trying to decide what to do, simply to allow the software that _is_ trying to decide what to do a way to find out what it can do. David Lang I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU & memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. best, Jerome Glisse - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On 7/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Mon, 23 Jul 2007, Igor Stoppa wrote: again, HAL / OHM / Mobilin I was trying to define the lower level interfaces that these tools need. today they can only know what is possible by reading the source code for each driver and implementing the driver-specific interfaces nessasary to set things, I was proposing a common interface that tools like this could use instead of requiring all the driver-specific knowledge. in a nutshell (and I know this is probably not detailed to be acceptable) 1. the software needs to know what the interconnects and dependancies between devices are (supposedly this is provided via sysfs) 2. the software needs to know what type of device this is (again, supposedly this is provided via sysfs) 3. the software needs to know what modes exist for a driver/piece of hardware. to make any decisions this infomation needs to provide some information about the capability of the mode and the power consumed in that mode. in addition there will need to be flags to indicate any special restrictions of a mode 4. the software needs to know the cost of switching from any mode to any other mode. since some transitions will interact with other devices there will need to be flags to indicate such requirements for specific transitions. 5. the software needs to be able to find out what mode a device is in. 6. the software needs to be able to tell the driver to switch to a different mode (I think it would be a very good thing if going to a particular mode was always the same command, no matter what mode it is currently in) 7. the software needs to figure out the desire of the user. my proposal was addressing items #3-#6. it isn't trying to decide what to do, simply to allow the software that _is_ trying to decide what to do a way to find out what it can do. David Lang I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. best, Jerome Glisse - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Tue, 2007-07-24 at 10:43 +0200, ext Jerome Glisse wrote: I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. Even simpler case: LCD backlight can come in many flavors, both in terms of brightness levels and fixed amount of current required to keep it ON. Trying to abstract such details from the decision-making makes little sense. Isolating that into a separate module, instead, brings the best of both worlds: -containment of the HW-specific code -leveraging every possible, no matter how exotic, power saving mode available. -- Cheers, Igor Igor Stoppa [EMAIL PROTECTED] (Nokia Multimedia - CP - OSSO / Helsinki, Finland) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Tue, 24 Jul 2007, Jerome Glisse wrote: On 7/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Mon, 23 Jul 2007, Igor Stoppa wrote: again, HAL / OHM / Mobilin I was trying to define the lower level interfaces that these tools need. today they can only know what is possible by reading the source code for each driver and implementing the driver-specific interfaces nessasary to set things, I was proposing a common interface that tools like this could use instead of requiring all the driver-specific knowledge. in a nutshell (and I know this is probably not detailed to be acceptable) 1. the software needs to know what the interconnects and dependancies between devices are (supposedly this is provided via sysfs) 2. the software needs to know what type of device this is (again, supposedly this is provided via sysfs) 3. the software needs to know what modes exist for a driver/piece of hardware. to make any decisions this infomation needs to provide some information about the capability of the mode and the power consumed in that mode. in addition there will need to be flags to indicate any special restrictions of a mode 4. the software needs to know the cost of switching from any mode to any other mode. since some transitions will interact with other devices there will need to be flags to indicate such requirements for specific transitions. 5. the software needs to be able to find out what mode a device is in. 6. the software needs to be able to tell the driver to switch to a different mode (I think it would be a very good thing if going to a particular mode was always the same command, no matter what mode it is currently in) 7. the software needs to figure out the desire of the user. my proposal was addressing items #3-#6. it isn't trying to decide what to do, simply to allow the software that _is_ trying to decide what to do a way to find out what it can do. David Lang I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Tue, 24 Jul 2007, Igor Stoppa wrote: On Tue, 2007-07-24 at 10:43 +0200, ext Jerome Glisse wrote: I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. Even simpler case: LCD backlight can come in many flavors, both in terms of brightness levels and fixed amount of current required to keep it ON. Trying to abstract such details from the decision-making makes little sense. Isolating that into a separate module, instead, brings the best of both worlds: -containment of the HW-specific code -leveraging every possible, no matter how exotic, power saving mode available. huh?? in the proposal that I made all the HW specific code would be in the device driver. I was just proposing a way for the driver to advertise what it is able to do. why would you want to pull the code out into a seperate model? many levels of backlight with different power consumption is trivial to do. backlight 1 mode %capability %power aka brightness 00 0 1 100 100 2 7575 3 5050 4 2525 backlight 2 mode %capability %power aka brightness 00 0 1 100 100 2 8050 3 6030 3 4020 4 2515 backlight 2 mode %capability %power aka brightness 00 0 1 100 100 2 5090 why do you think the decision makeing logic needs to know the details of the hardware? if you can abstract the details out then the same control logic can be used for different things. if you infuse the hardware knowledge with the control logic then you have to change the control section every time you want to support a new piece of hardware. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On 7/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Tue, 24 Jul 2007, Jerome Glisse wrote: On 7/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Mon, 23 Jul 2007, Igor Stoppa wrote: again, HAL / OHM / Mobilin I was trying to define the lower level interfaces that these tools need. today they can only know what is possible by reading the source code for each driver and implementing the driver-specific interfaces nessasary to set things, I was proposing a common interface that tools like this could use instead of requiring all the driver-specific knowledge. in a nutshell (and I know this is probably not detailed to be acceptable) 1. the software needs to know what the interconnects and dependancies between devices are (supposedly this is provided via sysfs) 2. the software needs to know what type of device this is (again, supposedly this is provided via sysfs) 3. the software needs to know what modes exist for a driver/piece of hardware. to make any decisions this infomation needs to provide some information about the capability of the mode and the power consumed in that mode. in addition there will need to be flags to indicate any special restrictions of a mode 4. the software needs to know the cost of switching from any mode to any other mode. since some transitions will interact with other devices there will need to be flags to indicate such requirements for specific transitions. 5. the software needs to be able to find out what mode a device is in. 6. the software needs to be able to tell the driver to switch to a different mode (I think it would be a very good thing if going to a particular mode was always the same command, no matter what mode it is currently in) 7. the software needs to figure out the desire of the user. my proposal was addressing items #3-#6. it isn't trying to decide what to do, simply to allow the software that _is_ trying to decide what to do a way to find out what it can do. David Lang I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. I take here ohm as a reference (this come from my limited understanding of this daemon so there might be inaccuracy) driver export through HAL there power management tunning capacity, Then an ohm plugin would use HAL to give a higher view of this capacity and also manage policy, preference, permission, ... Last consumer in power management food chain would be an user interface which will communicate with ohm (and with all ohm plugin) so desktop writter (gnome, kde, ...) can write some kind of power management center where each ohm plugin can have its own panel. So in the end the user got one place to do all its power management which is the goal i think you are trying to aim. For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption capacity. And there is no way to design an abstraction given that all hw we will have to deal with are too much different and do not follow any standard things (beside ACPI there is other way to save power brightness, gpu/memory clock, pll, ...) so i don't see how one might give a common view of things which are fundamentally different in how they affect consumption (same end result with many different paths leading to it). best, Jerome Glisse - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [linux-pm] Power Management framework proposal
On Wed, 25 Jul 2007, Jerome Glisse wrote: On 7/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Tue, 24 Jul 2007, Jerome Glisse wrote: On 7/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Mon, 23 Jul 2007, Igor Stoppa wrote: again, HAL / OHM / Mobilin I was trying to define the lower level interfaces that these tools need. today they can only know what is possible by reading the source code for each driver and implementing the driver-specific interfaces nessasary to set things, I was proposing a common interface that tools like this could use instead of requiring all the driver-specific knowledge. in a nutshell (and I know this is probably not detailed to be acceptable) 1. the software needs to know what the interconnects and dependancies between devices are (supposedly this is provided via sysfs) 2. the software needs to know what type of device this is (again, supposedly this is provided via sysfs) 3. the software needs to know what modes exist for a driver/piece of hardware. to make any decisions this infomation needs to provide some information about the capability of the mode and the power consumed in that mode. in addition there will need to be flags to indicate any special restrictions of a mode 4. the software needs to know the cost of switching from any mode to any other mode. since some transitions will interact with other devices there will need to be flags to indicate such requirements for specific transitions. 5. the software needs to be able to find out what mode a device is in. 6. the software needs to be able to tell the driver to switch to a different mode (I think it would be a very good thing if going to a particular mode was always the same command, no matter what mode it is currently in) 7. the software needs to figure out the desire of the user. my proposal was addressing items #3-#6. it isn't trying to decide what to do, simply to allow the software that _is_ trying to decide what to do a way to find out what it can do. David Lang I believe a central place where user can set/change hw state to save power or to increase computational power is definitely a goal to pursue. But i truly think that the OHM approach is the best one ie using plugins so that one can make a plugin specific for each device. The point is that i believe there is no way to do an abstract interface for this and trying to do so will endup doing ugly code and any interface would fail to encompass all possible tweak that might exist for all devices. will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. I take here ohm as a reference (this come from my limited understanding of this daemon so there might be inaccuracy) driver export through HAL there power management tunning capacity, Then an ohm plugin would use HAL to give a higher view of this capacity and also manage policy, preference, permission, ... Last consumer in power management food chain would be an user interface which will communicate with ohm (and with all ohm plugin) so desktop writter (gnome, kde, ...) can write some kind of power management center where each ohm plugin can have its own panel. So in the end the user got one place to do all its power management which is the goal i think you are trying to aim. no. I am talking about the interface to the drivers that things like HAL would use For instance on graphics card you could do the following (maybe more): -change GPU clock -change memory clock -disable part of engine -disable unit i truly don't think you can make a common interface for all this, more over there might be constraint on how you can change things (GPU memory clock might need to follow a given ratio). So you definitely need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption capacity. it's possible (which is part of the reason I started the thread), but so far there hasn't been anything identified that is a really bad fit. And there is no way to design an abstraction given that all hw we will have to deal
Re: [linux-pm] Power Management framework proposal
On Mon, 23 Jul 2007, Arjan van de Ven wrote: On Sun, 2007-07-22 at 22:25 -0700, [EMAIL PROTECTED] wrote: only if the transitions don't cost anything significant, these are second order effects though. On a pc, the transition costs are quite low (as I said, single or low double digit microseconds). including pausing all drivers before the transition and unpausing them aftrwords? on a PC you don't need to do that. that's not what the OWAP documentation I was told to read said. it specificly lists a requirement to pause drivers before the clock change and unpause them afterwords. this works for all systems where the idle power is more lower than the power you save by dropping speed... and that is almost all of them in the PC world. if you can idle the system as a whole I agree with you fully. most PC hardware (including the mobile stuff) doesn't change it's power consumption much with load. even if the rest of the PC is unchanging (which it's not), it is just an offset to both sides of the equation, and the same on both sides at that. but a constant added to both sides makes the relative savings less. at Usenix there was a presentiation (I don't remember if it was by Amazon or Google) about this subject, showing that current PC hardware only goes down to 50% power when idle (short of switching power modes) and that they and other big companies were pushing vendors to improve their hardware, aiming to get the idle power down to 10% (again without suspending anything). so there's some chance that this will change before too long. on servers and such, there is a huge offset, sure, but still the effect is there. And it really isn't 50%. their measurements and graphs say otherwise. now you can argue that 0.5 seconds is a really really long time, and you'd be right. so for really really short stints (say a timer interrupt) you don't want to change the voltage at all (nor would you want to change the plls to change frequency for that matter). But once you start chaning those, you might as well go full speed. this assumes that you can cache 1 second of video, if you have more real-time requirements you have a much harder time (say video confrancing where you don't get the frame until just before you need to display it) the same basic math holds for just 1 frame at a fixed rate. At some point transition costs will get you (and that's where things like ondemand delayed speedup will save us); but to get back to your interface, the interface doesnt nearly give the info needed to make these decisions... what is it missing? it lets you find out what modes are avialable and (in relative terms) how much capability and power is available in each mode it lets you find out what the transition costs are from any mode to any other mode David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Mon, 23 Jul 2007, Igor Stoppa wrote: On Sun, 2007-07-22 at 14:21 -0700, ext [EMAIL PROTECTED] wrote: [snip] this is another one. I'd be happy to get pointers to prior ones to learn from. https://lists.linux-foundation.org/pipermail/linux-pm/2007-March/011204.html This is probably one of the latest. Previously there was some clash between powerop and oppoints that lead to people running away from too much confusion. thanks, I'll read through that Unfortunately, while it's true that there are significant similarities, there are also notable differencies; as far as i know the USB subsystem is the one that gets closer to what we have in the embedded arena, since it can have complex cases of parent-child powering and wakeup. this API is not trying to represent the parent-child hierarchy. as far as I know that's documented in sysfs (or is supposed to be). this is just an attempt to make it so that as you are going through the hierarchy you don't have to use vastly different API's to control the different functions. You are going to end up with parent child relationships, or user-consumer. Devices don't exist in the void, but are interconnected. correct, but the interconnections are already documented via sysfs aren't they? if they are why should this new API need to worry about that? I suspect that most (if not all) of the previous One Solutions have tried to completely handle all the details of their original case, and then branch out to the other cases. this attempt is working from the other direction. the user of this API doesn't care how something is done, it just wants to know what's possible and how to tell the system to switch modes. True, but you are endding up in the same situation: too much abstraction makes the governing system clumsy and inefficient. I see it as going the opposite direction, today there is no abstraction, you need to know all the details fo everything. I proposed an abstraction to avoid needing to kow all the details, this may nd up being just as bad, but it's not the same situation :-) other then just me searching through the lists, do you have a pointer to some of the differences between the different types that are seen as being so large that they can't be unified? I'll be more detailed in further replies to following emails from this same thread that have already piled up. thanks, even though I'm dropping the proposal it's always useful to learn more. while I was describing the issues to my roomates over dinner I realized that the same type of functions are needed for the CPU clocks. if you have an accepted framework in place there that can do what I described, please consider extending it to cover other types of devices and drivers. That is not part of the fw: the fw simply expresses parent-child clock distribution and keeps usecounts so that unused clocks are automatically gated. The actual clock tree description is platform/arch/board specific and doesn't affect the framework. You can just roll your own version for x86 by providing a description of the methods used to switch on/off every individual clock on your board. So what you are asking for is that somebody writes an x86 version of the clock fw. this is more then just setting the clocks on everything (although setting clocks seems like it fits well into the model) becouse some power modes are not easily represented just as clocks. The very same idea of power mode is something that can maybe fit some simple peripherals (simple as not fine grained contraollable in terms of what is on and what is off), but certainly it doesn't fit nicely modern SoC (see OMAP) since ata certain point of time you don't really know what is the power consumption because many resources are automatically gated by HW on an on-need basis. And you don't want to switch this feature off. it seems to me that you can either get some figure of power consumption for a mode (even if it's just relative power consumption compared to other modes) or you have no way of planning what to do becouse you have no clue what the results of your actions are. As for latencies, well, only few clocks really have significant impact. Most notably the main system oscillator. Everything else has 0 latency since it ends up in opening/closing a clock gate. Powering device on/off will certainly introduce more latency, but either the powering is supported by the hw, to make it quick or it has to go through most, if not all of he usual initialisation sequence; in that case it probably makes sense to avoid controlling it from kernelspace, since it will be slow and won't require dedcisions made with us precision. and many devices support both a quick almost-off mode and a slow almost-off mode (as well as a completely off mode), with the slow mode eating less power, but takeing longer to wake up from. that's the reason for providing the matrix to let the program makeing the decision decide if it's worth the time delays
Re: [linux-pm] Power Management framework proposal
On Mon, 23 Jul 2007, Ondrej Zajicek wrote: On Sun, Jul 22, 2007 at 09:19:17PM -0700, Arjan van de Ven wrote: let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W I have usually seen different numbers, for example: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf Although this paper speaks about thermal design power instead of power consumption, i suppose that it should be roughly equal. For example Athlon 64 3700 (ADA3700AEP5AR): 2.4 GHz, 1.5 V -> 89 W 2.2 GHz, 1.4 V -> 72 W 2.0 GHz, 1.3 V -> 53 W 1.8 GHz, 1.2 V -> 39 W 1.0 GHz, 1.1 V -> 22 W Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power consumption shows that it is more efficient to be busy for 2 time units on 1 GHz than be busy for 1 time unit and be idle for 1 time unit on 2 GHz. 1 GHz: both cores idle:48 W one core busy: 57 W two cores busy: 66 W 2 GHz: both cores idle:54 W one core busy: 78 W two cores busy: 95 W what Arjan is saying is one time unit at 2GHz with both cores busy, one time unit at 1GHz with both cores idle (this would be 132w/two time units vs 143W/two time units) still a win for running a 1GHz, but a smaller one or better still, one time unit at 2GHz with both cores busy, one time unit in sleep mode, in this case if the sleep mode is any good at all it wins. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 2007-07-22 at 22:25 -0700, [EMAIL PROTECTED] wrote: > >> > >> only if the transitions don't cost anything significant, > > > > these are second order effects though. On a pc, the transition costs are > > quite low (as I said, single or low double digit microseconds). > > including pausing all drivers before the transition and unpausing them > aftrwords? on a PC you don't need to do that. > > >> and the > >> computation capacity per watt of power is the same at all frequencies. the > >> chip performance numbers I've been seeing (which I admit are mostly > >> embedded datasheets) indicate that neither of these hold true. > > > > let me give you a real world example then, and the numbers I'm using are > > ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I > > just rounded them a little so that the math works out nice. > > > > power at full speed: 34W > > power at half speed: 24W > > power at idle: 1W > > are these numbers for the CPU itself or for the a larger chunk? the cpu at full load. > > this works for all systems where the idle power is more lower than the > > power you save by dropping speed... and that is almost all of them in > > the PC world. > > if you can idle the system as a whole I agree with you fully. most PC > hardware (including the mobile stuff) doesn't change it's power > consumption much with load. even if the rest of the PC is unchanging (which it's not), it is just an offset to both sides of the equation, and the same on both sides at that. > at Usenix there was a presentiation (I don't > remember if it was by Amazon or Google) about this subject, showing that > current PC hardware only goes down to 50% power when idle (short of > switching power modes) and that they and other big companies were pushing > vendors to improve their hardware, aiming to get the idle power down to > 10% (again without suspending anything). so there's some chance that this > will change before too long. on servers and such, there is a huge offset, sure, but still the effect is there. And it really isn't 50%. > > > now you can argue that 0.5 seconds is a really really long time, and > > you'd be right. so for really really short stints (say a timer > > interrupt) you don't want to change the voltage at all (nor would > you > > want to change the plls to change frequency for that matter). But > once > > you start chaning those, you might as well go full speed. > > this assumes that you can cache 1 second of video, if you have more > real-time requirements you have a much harder time (say video > confrancing > where you don't get the frame until just before you need to display > it) the same basic math holds for just 1 frame at a fixed rate. At some point transition costs will get you (and that's where things like ondemand delayed speedup will save us); but to get back to your interface, the interface doesnt nearly give the info needed to make these decisions... -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 2007-07-22 at 14:21 -0700, ext [EMAIL PROTECTED] wrote: [snip] > this is another one. I'd be happy to get pointers to prior ones to learn > from. https://lists.linux-foundation.org/pipermail/linux-pm/2007-March/011204.html This is probably one of the latest. Previously there was some clash between powerop and oppoints that lead to people running away from too much confusion. > > Unfortunately, while it's true that there are significant similarities, > > there are also notable differencies; as far as i know the USB subsystem > > is the one that gets closer to what we have in the embedded arena, since > > it can have complex cases of parent-child powering and wakeup. > > this API is not trying to represent the parent-child hierarchy. as far as > I know that's documented in sysfs (or is supposed to be). this is just an > attempt to make it so that as you are going through the hierarchy you > don't have to use vastly different API's to control the different > functions. You are going to end up with parent child relationships, or user-consumer. Devices don't exist in the void, but are interconnected. > I suspect that most (if not all) of the previous One Solutions have tried > to completely handle all the details of their original case, and then > branch out to the other cases. > > this attempt is working from the other direction. the user of this API > doesn't care how something is done, it just wants to know what's possible > and how to tell the system to switch modes. True, but you are endding up in the same situation: too much abstraction makes the governing system clumsy and inefficient. > other then just me searching through the lists, do you have a pointer to > some of the differences between the different types that are seen as being > so large that they can't be unified? I'll be more detailed in further replies to following emails from this same thread that have already piled up. > >> while I was describing the issues to my roomates over dinner I realized > >> that the same type of functions are needed for the CPU clocks. > >> > >> if you have an accepted framework in place there that can do what I > >> described, please consider extending it to cover other types of devices > >> and drivers. > > > > That is not part of the fw: the fw simply expresses parent-child clock > > distribution and keeps usecounts so that unused clocks are automatically > > gated. > > > > The actual clock tree description is platform/arch/board specific and > > doesn't affect the framework. You can just roll your own version for x86 > > by providing a description of the methods used to switch on/off every > > individual clock on your board. > > > > So what you are asking for is that somebody writes an x86 version of the > > clock fw. > > this is more then just setting the clocks on everything (although setting > clocks seems like it fits well into the model) becouse some power modes > are not easily represented just as clocks. The very same idea of power mode is something that can maybe fit some simple peripherals (simple as not fine grained contraollable in terms of what is on and what is off), but certainly it doesn't fit nicely modern SoC (see OMAP) since ata certain point of time you don't really know what is the power consumption because many resources are automatically gated by HW on an on-need basis. And you don't want to switch this feature off. > > As for latencies, well, only few clocks really have significant impact. > > Most notably the main system oscillator. Everything else has 0 latency > > since it ends up in opening/closing a clock gate. > > > > Powering device on/off will certainly introduce more latency, but either > > the powering is supported by the hw, to make it quick or it has to go > > through most, if not all of he usual initialisation sequence; in that > > case it probably makes sense to avoid controlling it from kernelspace, > > since it will be slow and won't require dedcisions made with us > > precision. > > and many devices support both a quick almost-off mode and a slow > almost-off mode (as well as a completely off mode), with the slow mode > eating less power, but takeing longer to wake up from. that's the reason > for providing the matrix to let the program makeing the decision decide if > it's worth the time delays to get the power savings > > as I note in anther message, this SPI isn't intended to be strictly > kernelspace or strictly userspace. for the ondemand speed governer you are > changing the settings quickly and so probably want to do so in the kernel, > however some people may be satisfied with slower controls and so could > have them in userspace (an extreme example of this would be turning off > wireless cards that aren't in use to save power and improve security) So you are goingto have 2 API: one for kernelspace (evolution of CPUfreq) and one for userspace, which seems more and more likely to be an extension to HAL. Are you
Re: [linux-pm] Power Management framework proposal
On Sun, Jul 22, 2007 at 09:19:17PM -0700, Arjan van de Ven wrote: > let me give you a real world example then, and the numbers I'm using are > ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I > just rounded them a little so that the math works out nice. > > power at full speed: 34W > power at half speed: 24W > power at idle: 1W I have usually seen different numbers, for example: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf Although this paper speaks about thermal design power instead of power consumption, i suppose that it should be roughly equal. For example Athlon 64 3700 (ADA3700AEP5AR): 2.4 GHz, 1.5 V -> 89 W 2.2 GHz, 1.4 V -> 72 W 2.0 GHz, 1.3 V -> 53 W 1.8 GHz, 1.2 V -> 39 W 1.0 GHz, 1.1 V -> 22 W Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power consumption shows that it is more efficient to be busy for 2 time units on 1 GHz than be busy for 1 time unit and be idle for 1 time unit on 2 GHz. 1 GHz: both cores idle:48 W one core busy: 57 W two cores busy: 66 W 2 GHz: both cores idle:54 W one core busy: 78 W two cores busy: 95 W -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: [EMAIL PROTECTED], jabber: [EMAIL PROTECTED]) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) "To err is human -- to blame it on a computer is even more so." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, Jul 22, 2007 at 09:19:17PM -0700, Arjan van de Ven wrote: let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W I have usually seen different numbers, for example: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf Although this paper speaks about thermal design power instead of power consumption, i suppose that it should be roughly equal. For example Athlon 64 3700 (ADA3700AEP5AR): 2.4 GHz, 1.5 V - 89 W 2.2 GHz, 1.4 V - 72 W 2.0 GHz, 1.3 V - 53 W 1.8 GHz, 1.2 V - 39 W 1.0 GHz, 1.1 V - 22 W Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power consumption shows that it is more efficient to be busy for 2 time units on 1 GHz than be busy for 1 time unit and be idle for 1 time unit on 2 GHz. 1 GHz: both cores idle:48 W one core busy: 57 W two cores busy: 66 W 2 GHz: both cores idle:54 W one core busy: 78 W two cores busy: 95 W -- Elen sila lumenn' omentielvo Ondrej 'SanTiago' Zajicek (email: [EMAIL PROTECTED], jabber: [EMAIL PROTECTED]) OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net) To err is human -- to blame it on a computer is even more so. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 2007-07-22 at 14:21 -0700, ext [EMAIL PROTECTED] wrote: [snip] this is another one. I'd be happy to get pointers to prior ones to learn from. https://lists.linux-foundation.org/pipermail/linux-pm/2007-March/011204.html This is probably one of the latest. Previously there was some clash between powerop and oppoints that lead to people running away from too much confusion. Unfortunately, while it's true that there are significant similarities, there are also notable differencies; as far as i know the USB subsystem is the one that gets closer to what we have in the embedded arena, since it can have complex cases of parent-child powering and wakeup. this API is not trying to represent the parent-child hierarchy. as far as I know that's documented in sysfs (or is supposed to be). this is just an attempt to make it so that as you are going through the hierarchy you don't have to use vastly different API's to control the different functions. You are going to end up with parent child relationships, or user-consumer. Devices don't exist in the void, but are interconnected. I suspect that most (if not all) of the previous One Solutions have tried to completely handle all the details of their original case, and then branch out to the other cases. this attempt is working from the other direction. the user of this API doesn't care how something is done, it just wants to know what's possible and how to tell the system to switch modes. True, but you are endding up in the same situation: too much abstraction makes the governing system clumsy and inefficient. other then just me searching through the lists, do you have a pointer to some of the differences between the different types that are seen as being so large that they can't be unified? I'll be more detailed in further replies to following emails from this same thread that have already piled up. while I was describing the issues to my roomates over dinner I realized that the same type of functions are needed for the CPU clocks. if you have an accepted framework in place there that can do what I described, please consider extending it to cover other types of devices and drivers. That is not part of the fw: the fw simply expresses parent-child clock distribution and keeps usecounts so that unused clocks are automatically gated. The actual clock tree description is platform/arch/board specific and doesn't affect the framework. You can just roll your own version for x86 by providing a description of the methods used to switch on/off every individual clock on your board. So what you are asking for is that somebody writes an x86 version of the clock fw. this is more then just setting the clocks on everything (although setting clocks seems like it fits well into the model) becouse some power modes are not easily represented just as clocks. The very same idea of power mode is something that can maybe fit some simple peripherals (simple as not fine grained contraollable in terms of what is on and what is off), but certainly it doesn't fit nicely modern SoC (see OMAP) since ata certain point of time you don't really know what is the power consumption because many resources are automatically gated by HW on an on-need basis. And you don't want to switch this feature off. As for latencies, well, only few clocks really have significant impact. Most notably the main system oscillator. Everything else has 0 latency since it ends up in opening/closing a clock gate. Powering device on/off will certainly introduce more latency, but either the powering is supported by the hw, to make it quick or it has to go through most, if not all of he usual initialisation sequence; in that case it probably makes sense to avoid controlling it from kernelspace, since it will be slow and won't require dedcisions made with us precision. and many devices support both a quick almost-off mode and a slow almost-off mode (as well as a completely off mode), with the slow mode eating less power, but takeing longer to wake up from. that's the reason for providing the matrix to let the program makeing the decision decide if it's worth the time delays to get the power savings as I note in anther message, this SPI isn't intended to be strictly kernelspace or strictly userspace. for the ondemand speed governer you are changing the settings quickly and so probably want to do so in the kernel, however some people may be satisfied with slower controls and so could have them in userspace (an extreme example of this would be turning off wireless cards that aren't in use to save power and improve security) So you are goingto have 2 API: one for kernelspace (evolution of CPUfreq) and one for userspace, which seems more and more likely to be an extension to HAL. Are you informed on HOM and Intel Mobilin ? http://ohm.freedesktop.org/wiki/ http://www.moblin.org/index.html I
Re: [linux-pm] Power Management framework proposal
On Sun, 2007-07-22 at 22:25 -0700, [EMAIL PROTECTED] wrote: only if the transitions don't cost anything significant, these are second order effects though. On a pc, the transition costs are quite low (as I said, single or low double digit microseconds). including pausing all drivers before the transition and unpausing them aftrwords? on a PC you don't need to do that. and the computation capacity per watt of power is the same at all frequencies. the chip performance numbers I've been seeing (which I admit are mostly embedded datasheets) indicate that neither of these hold true. let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W are these numbers for the CPU itself or for the a larger chunk? the cpu at full load. this works for all systems where the idle power is more lower than the power you save by dropping speed... and that is almost all of them in the PC world. if you can idle the system as a whole I agree with you fully. most PC hardware (including the mobile stuff) doesn't change it's power consumption much with load. even if the rest of the PC is unchanging (which it's not), it is just an offset to both sides of the equation, and the same on both sides at that. at Usenix there was a presentiation (I don't remember if it was by Amazon or Google) about this subject, showing that current PC hardware only goes down to 50% power when idle (short of switching power modes) and that they and other big companies were pushing vendors to improve their hardware, aiming to get the idle power down to 10% (again without suspending anything). so there's some chance that this will change before too long. on servers and such, there is a huge offset, sure, but still the effect is there. And it really isn't 50%. now you can argue that 0.5 seconds is a really really long time, and you'd be right. so for really really short stints (say a timer interrupt) you don't want to change the voltage at all (nor would you want to change the plls to change frequency for that matter). But once you start chaning those, you might as well go full speed. this assumes that you can cache 1 second of video, if you have more real-time requirements you have a much harder time (say video confrancing where you don't get the frame until just before you need to display it) the same basic math holds for just 1 frame at a fixed rate. At some point transition costs will get you (and that's where things like ondemand delayed speedup will save us); but to get back to your interface, the interface doesnt nearly give the info needed to make these decisions... -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Mon, 23 Jul 2007, Ondrej Zajicek wrote: On Sun, Jul 22, 2007 at 09:19:17PM -0700, Arjan van de Ven wrote: let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W I have usually seen different numbers, for example: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf Although this paper speaks about thermal design power instead of power consumption, i suppose that it should be roughly equal. For example Athlon 64 3700 (ADA3700AEP5AR): 2.4 GHz, 1.5 V - 89 W 2.2 GHz, 1.4 V - 72 W 2.0 GHz, 1.3 V - 53 W 1.8 GHz, 1.2 V - 39 W 1.0 GHz, 1.1 V - 22 W Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power consumption shows that it is more efficient to be busy for 2 time units on 1 GHz than be busy for 1 time unit and be idle for 1 time unit on 2 GHz. 1 GHz: both cores idle:48 W one core busy: 57 W two cores busy: 66 W 2 GHz: both cores idle:54 W one core busy: 78 W two cores busy: 95 W what Arjan is saying is one time unit at 2GHz with both cores busy, one time unit at 1GHz with both cores idle (this would be 132w/two time units vs 143W/two time units) still a win for running a 1GHz, but a smaller one or better still, one time unit at 2GHz with both cores busy, one time unit in sleep mode, in this case if the sleep mode is any good at all it wins. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Mon, 23 Jul 2007, Igor Stoppa wrote: On Sun, 2007-07-22 at 14:21 -0700, ext [EMAIL PROTECTED] wrote: [snip] this is another one. I'd be happy to get pointers to prior ones to learn from. https://lists.linux-foundation.org/pipermail/linux-pm/2007-March/011204.html This is probably one of the latest. Previously there was some clash between powerop and oppoints that lead to people running away from too much confusion. thanks, I'll read through that Unfortunately, while it's true that there are significant similarities, there are also notable differencies; as far as i know the USB subsystem is the one that gets closer to what we have in the embedded arena, since it can have complex cases of parent-child powering and wakeup. this API is not trying to represent the parent-child hierarchy. as far as I know that's documented in sysfs (or is supposed to be). this is just an attempt to make it so that as you are going through the hierarchy you don't have to use vastly different API's to control the different functions. You are going to end up with parent child relationships, or user-consumer. Devices don't exist in the void, but are interconnected. correct, but the interconnections are already documented via sysfs aren't they? if they are why should this new API need to worry about that? I suspect that most (if not all) of the previous One Solutions have tried to completely handle all the details of their original case, and then branch out to the other cases. this attempt is working from the other direction. the user of this API doesn't care how something is done, it just wants to know what's possible and how to tell the system to switch modes. True, but you are endding up in the same situation: too much abstraction makes the governing system clumsy and inefficient. I see it as going the opposite direction, today there is no abstraction, you need to know all the details fo everything. I proposed an abstraction to avoid needing to kow all the details, this may nd up being just as bad, but it's not the same situation :-) other then just me searching through the lists, do you have a pointer to some of the differences between the different types that are seen as being so large that they can't be unified? I'll be more detailed in further replies to following emails from this same thread that have already piled up. thanks, even though I'm dropping the proposal it's always useful to learn more. while I was describing the issues to my roomates over dinner I realized that the same type of functions are needed for the CPU clocks. if you have an accepted framework in place there that can do what I described, please consider extending it to cover other types of devices and drivers. That is not part of the fw: the fw simply expresses parent-child clock distribution and keeps usecounts so that unused clocks are automatically gated. The actual clock tree description is platform/arch/board specific and doesn't affect the framework. You can just roll your own version for x86 by providing a description of the methods used to switch on/off every individual clock on your board. So what you are asking for is that somebody writes an x86 version of the clock fw. this is more then just setting the clocks on everything (although setting clocks seems like it fits well into the model) becouse some power modes are not easily represented just as clocks. The very same idea of power mode is something that can maybe fit some simple peripherals (simple as not fine grained contraollable in terms of what is on and what is off), but certainly it doesn't fit nicely modern SoC (see OMAP) since ata certain point of time you don't really know what is the power consumption because many resources are automatically gated by HW on an on-need basis. And you don't want to switch this feature off. it seems to me that you can either get some figure of power consumption for a mode (even if it's just relative power consumption compared to other modes) or you have no way of planning what to do becouse you have no clue what the results of your actions are. As for latencies, well, only few clocks really have significant impact. Most notably the main system oscillator. Everything else has 0 latency since it ends up in opening/closing a clock gate. Powering device on/off will certainly introduce more latency, but either the powering is supported by the hw, to make it quick or it has to go through most, if not all of he usual initialisation sequence; in that case it probably makes sense to avoid controlling it from kernelspace, since it will be slow and won't require dedcisions made with us precision. and many devices support both a quick almost-off mode and a slow almost-off mode (as well as a completely off mode), with the slow mode eating less power, but takeing longer to wake up from. that's the reason for providing the matrix to let the program makeing the decision decide if it's worth the time delays
Re: [linux-pm] Power Management framework proposal
On Mon, 23 Jul 2007, Arjan van de Ven wrote: On Sun, 2007-07-22 at 22:25 -0700, [EMAIL PROTECTED] wrote: only if the transitions don't cost anything significant, these are second order effects though. On a pc, the transition costs are quite low (as I said, single or low double digit microseconds). including pausing all drivers before the transition and unpausing them aftrwords? on a PC you don't need to do that. that's not what the OWAP documentation I was told to read said. it specificly lists a requirement to pause drivers before the clock change and unpause them afterwords. this works for all systems where the idle power is more lower than the power you save by dropping speed... and that is almost all of them in the PC world. if you can idle the system as a whole I agree with you fully. most PC hardware (including the mobile stuff) doesn't change it's power consumption much with load. even if the rest of the PC is unchanging (which it's not), it is just an offset to both sides of the equation, and the same on both sides at that. but a constant added to both sides makes the relative savings less. at Usenix there was a presentiation (I don't remember if it was by Amazon or Google) about this subject, showing that current PC hardware only goes down to 50% power when idle (short of switching power modes) and that they and other big companies were pushing vendors to improve their hardware, aiming to get the idle power down to 10% (again without suspending anything). so there's some chance that this will change before too long. on servers and such, there is a huge offset, sure, but still the effect is there. And it really isn't 50%. their measurements and graphs say otherwise. now you can argue that 0.5 seconds is a really really long time, and you'd be right. so for really really short stints (say a timer interrupt) you don't want to change the voltage at all (nor would you want to change the plls to change frequency for that matter). But once you start chaning those, you might as well go full speed. this assumes that you can cache 1 second of video, if you have more real-time requirements you have a much harder time (say video confrancing where you don't get the frame until just before you need to display it) the same basic math holds for just 1 frame at a fixed rate. At some point transition costs will get you (and that's where things like ondemand delayed speedup will save us); but to get back to your interface, the interface doesnt nearly give the info needed to make these decisions... what is it missing? it lets you find out what modes are avialable and (in relative terms) how much capability and power is available in each mode it lets you find out what the transition costs are from any mode to any other mode David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Arjan van de Ven wrote: On Sun, 2007-07-22 at 21:04 -0700, [EMAIL PROTECTED] wrote: this strategy should work well on the normal unpredictable workload that most people deal with, but there are some cases where the workload becomes pretty predictable (media players for example) where there really is less variation, and a need for a constant availability of the cpu, so it may actually save a smidge of power to run below the highest freq that the voltage allows rather then running faster and being idle more cycles. that actually is the example showcase of race-to-idle where you absolutely want to run at the highest frequency.. only if the transitions don't cost anything significant, these are second order effects though. On a pc, the transition costs are quite low (as I said, single or low double digit microseconds). including pausing all drivers before the transition and unpausing them aftrwords? and the computation capacity per watt of power is the same at all frequencies. the chip performance numbers I've been seeing (which I admit are mostly embedded datasheets) indicate that neither of these hold true. let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W are these numbers for the CPU itself or for the a larger chunk? I could easily see these numbers for motherboard (including CPU and RAM), but it would surprise me if these numbers are for the CPU itself. I'm used to seeing datasheets that have a much more linear voltage/freq (and therefor a quadratic voltage/power) curve. in some cases the voltage requirements drop faster then the frequency. assume media playback, and a dumb one, that takes half a second to decode a second of media. (again to make the math simple) at half speed: Energy for a second is 0.5 * 24 + 0.5 * 1 = 12.5 J at full speed: Energy for a second is 0.25 * 34 + 0.75 * 1 = 9.25 J this works for all systems where the idle power is more lower than the power you save by dropping speed... and that is almost all of them in the PC world. if you can idle the system as a whole I agree with you fully. most PC hardware (including the mobile stuff) doesn't change it's power consumption much with load. at Usenix there was a presentiation (I don't remember if it was by Amazon or Google) about this subject, showing that current PC hardware only goes down to 50% power when idle (short of switching power modes) and that they and other big companies were pushing vendors to improve their hardware, aiming to get the idle power down to 10% (again without suspending anything). so there's some chance that this will change before too long. now you can argue that 0.5 seconds is a really really long time, and you'd be right. so for really really short stints (say a timer interrupt) you don't want to change the voltage at all (nor would you want to change the plls to change frequency for that matter). But once you start chaning those, you might as well go full speed. this assumes that you can cache 1 second of video, if you have more real-time requirements you have a much harder time (say video confrancing where you don't get the frame until just before you need to display it) David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 2007-07-22 at 21:04 -0700, [EMAIL PROTECTED] wrote: > >> the fact that you want to run at the max frequancy for a given voltage is > > > > no I want to run at the max frequency PERIOD. On just about any PC, it's > > more power efficient to go full speed when executing code, and then idle > > for as long as you can. (there are some second order effects that make > > this a bit more complex, but as first order approach it's a sound > > approach). Voltage follows, and that's fine. > > this seems to be contradicted by the fact that AMD is listing the ability > for each core to run at a different clock speed on the new 4-core chips as > an advantage. that's a marketing thing mostly.. they all still run at the same voltage anyway. > if you always want to run at the max frequency PERIOD then > why bother engineering the ability to do otherwise? (as opposed to just > shutting down unused cores) multicore changes the rules a little but not all that much. (the idle power is higher if not all cores are idle at the same time. Yet... each core individually trying to be idle as quickly as possible is the best way to get to the highest "all cores idle" time, unless there is some really special/weird synchronization) > >> this strategy should work well on the normal unpredictable workload that > >> most people deal with, but there are some cases where the workload becomes > >> pretty predictable (media players for example) where there really is less > >> variation, and a need for a constant availability of the cpu, so it may > >> actually save a smidge of power to run below the highest freq that the > >> voltage allows rather then running faster and being idle more cycles. > > > > that actually is the example showcase of race-to-idle where you > > absolutely want to run at the highest frequency.. > > only if the transitions don't cost anything significant, these are second order effects though. On a pc, the transition costs are quite low (as I said, single or low double digit microseconds). They are not zero, and that is why you see things like ondemand ramp up only after a little time, as a guestimate to make sure it's not just a really short lived code execution. > and the > computation capacity per watt of power is the same at all frequencies. the > chip performance numbers I've been seeing (which I admit are mostly > embedded datasheets) indicate that neither of these hold true. let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W assume media playback, and a dumb one, that takes half a second to decode a second of media. (again to make the math simple) at half speed: Energy for a second is 0.5 * 24 + 0.5 * 1 = 12.5 J at full speed: Energy for a second is 0.25 * 34 + 0.75 * 1 = 9.25 J this works for all systems where the idle power is more lower than the power you save by dropping speed... and that is almost all of them in the PC world. now you can argue that 0.5 seconds is a really really long time, and you'd be right. so for really really short stints (say a timer interrupt) you don't want to change the voltage at all (nor would you want to change the plls to change frequency for that matter). But once you start chaning those, you might as well go full speed. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Arjan van de Ven wrote: I disagree with you here. for each frequency setting you can say how much power the cpu/system is expected to use (especially as a percentage of the full power mode). creating this value requires you to take two things into account, the voltage you are running things at (by far the biggest effect), and the minor difference that the frequency makes at that voltage (possibly small enough to ignore entirely). the API I proposed has no problem with there being multiple modes that have the same %power but with different %capability numbers. how do you deal with the "power at idle" vs "power at full load".. you need both at each level to pick the best one, as well as relative performance etc. what I was thinking was to use power at full load for the power rateing of each mode. the fact that you want to run at the max frequancy for a given voltage is no I want to run at the max frequency PERIOD. On just about any PC, it's more power efficient to go full speed when executing code, and then idle for as long as you can. (there are some second order effects that make this a bit more complex, but as first order approach it's a sound approach). Voltage follows, and that's fine. this seems to be contradicted by the fact that AMD is listing the ability for each core to run at a different clock speed on the new 4-core chips as an advantage. if you always want to run at the max frequency PERIOD then why bother engineering the ability to do otherwise? (as opposed to just shutting down unused cores) another example is the 80 core demo chip that Intel has been makeing press about. it can run at 1Tflop on 25w of power and 2Tflop at 150w of power. running at max freq for a 1Tflop workload would have you eating ~75w of power (the numbers may be off, I'm going from memory, but the cost in power of doubling the speed was _far_ more then double the power requirements) this strategy should work well on the normal unpredictable workload that most people deal with, but there are some cases where the workload becomes pretty predictable (media players for example) where there really is less variation, and a need for a constant availability of the cpu, so it may actually save a smidge of power to run below the highest freq that the voltage allows rather then running faster and being idle more cycles. that actually is the example showcase of race-to-idle where you absolutely want to run at the highest frequency.. only if the transitions don't cost anything significant, and the computation capacity per watt of power is the same at all frequencies. the chip performance numbers I've been seeing (which I admit are mostly embedded datasheets) indicate that neither of these hold true. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
> I disagree with you here. for each frequency setting you can say how much > power the cpu/system is expected to use (especially as a percentage of the > full power mode). creating this value requires you to take two things into > account, the voltage you are running things at (by far the biggest > effect), and the minor difference that the frequency makes at that voltage > (possibly small enough to ignore entirely). > > the API I proposed has no problem with there being multiple modes that > have the same %power but with different %capability numbers. how do you deal with the "power at idle" vs "power at full load".. you need both at each level to pick the best one, as well as relative performance etc. > > I'm willing to bet that the current cpufreq software just looks at the > voltage as the value that tells you how much power the thing is going to > use at that setting it doesn't. > > the fact that you want to run at the max frequancy for a given voltage is no I want to run at the max frequency PERIOD. On just about any PC, it's more power efficient to go full speed when executing code, and then idle for as long as you can. (there are some second order effects that make this a bit more complex, but as first order approach it's a sound approach). Voltage follows, and that's fine. > a reasonable strategy, but it's a power saving _strategy_, not a > capability of the hardware and the API I'm mentioning should be enough to > let you pick the highest performance setting that has the same power > rating as the minimum performance you need (or for that matter to go one > step futher and go with the most efficiant setting in terms of > performance/power that has a performance number higher then what you need, > which could actually be better) why would I care about voltage? Most PCs don't expose it, and that's fine, they can switch to the voltage needed REALLY quickly (single or double digit microseconds). PCs in fact only expose numbered states (P0 to P7 at most), and some number that you can use to show the user, but doesn't mean anything beyond that. Some people interpret it as "frequency", and that's nice, but it doesn't really mean that. You really don't know anything beyond that and that's ok. As I said before, as a general strategy you want "highest speed when running code" for race-to-idle, with some 2nd order effects for when you execute code really shortly coming out of idle; in which case you don't want to do a voltage transition twice (most cpus have the idle voltage be the lowest-execute voltage as well). > this strategy should work well on the normal unpredictable workload that > most people deal with, but there are some cases where the workload becomes > pretty predictable (media players for example) where there really is less > variation, and a need for a constant availability of the cpu, so it may > actually save a smidge of power to run below the highest freq that the > voltage allows rather then running faster and being idle more cycles. that actually is the example showcase of race-to-idle where you absolutely want to run at the highest frequency.. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Arjan van de Ven wrote: son anyway) I don't think you have got it right: the only info being passed is the standard cpufreq list of frequencies; everything else is part of the cpufreq driver. to make the decisions the software makeing the decision needs to know how much power would be used at each freq setting. power used at a certain frequency is not a single variable. In fact, on most laptops and other similarly power aware devices, it's in fact better for power consumption to always go to the maximum frequency as quickly as possible, so that you can be idle for the longest possible time after that. Good luck finding a generic way to represent such things in a (userspace) interface I disagree with you here. for each frequency setting you can say how much power the cpu/system is expected to use (especially as a percentage of the full power mode). creating this value requires you to take two things into account, the voltage you are running things at (by far the biggest effect), and the minor difference that the frequency makes at that voltage (possibly small enough to ignore entirely). the API I proposed has no problem with there being multiple modes that have the same %power but with different %capability numbers. I'm willing to bet that the current cpufreq software just looks at the voltage as the value that tells you how much power the thing is going to use at that setting the fact that you want to run at the max frequancy for a given voltage is a reasonable strategy, but it's a power saving _strategy_, not a capability of the hardware and the API I'm mentioning should be enough to let you pick the highest performance setting that has the same power rating as the minimum performance you need (or for that matter to go one step futher and go with the most efficiant setting in terms of performance/power that has a performance number higher then what you need, which could actually be better) the fact that you currently want to use this strategy doesn't mean that the other possible modes don't exist, and even if you don't use them now they should be available within the API (including the cpufreq api) this strategy should work well on the normal unpredictable workload that most people deal with, but there are some cases where the workload becomes pretty predictable (media players for example) where there really is less variation, and a need for a constant availability of the cpu, so it may actually save a smidge of power to run below the highest freq that the voltage allows rather then running faster and being idle more cycles. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
> >> son anyway) > > > > I don't think you have got it right: the only info being passed is the > > standard cpufreq list of frequencies; everything else is part of the > > cpufreq driver. > > to make the decisions the software makeing the decision needs to know how > much power would be used at each freq setting. power used at a certain frequency is not a single variable. In fact, on most laptops and other similarly power aware devices, it's in fact better for power consumption to always go to the maximum frequency as quickly as possible, so that you can be idle for the longest possible time after that. Good luck finding a generic way to represent such things in a (userspace) interface - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Igor Stoppa wrote: On Sun, 2007-07-22 at 01:58 -0700, ext [EMAIL PROTECTED] wrote: On Sun, 22 Jul 2007, Igor Stoppa wrote: [snip] Could you elaborate on how your proposal is incompatible with enhancing the clock framework? It's not that I think it's incompatible with any existing powersaving tools (in fact I hope it's not) it's that I think that this (or something similar) could be made to cover all thevarious power options instead of CPU's having one interface, ACPI capable drivers having another, embeded devices presenting a third, etc this was triggered by the mess of different function calls for different purposes that are used for the suspend functions where you have a bunch of different functions that are each supposed to be called at a specific time from a specific mode during the suspend process. with all these different functions driver writes tend to not bother implementing any of them, and it seems like there is a fairly steady stream of new functions that end up being needed. the initial intent was to just change this into a generic set of calls that every driver writer would implement the minimum set of, and make it trivially extensable to future capabilities of hardware. Every now and then there is some attempt to find One solution to bind them all: x86, SoC, ACPI ... you name it. this is another one. I'd be happy to get pointers to prior ones to learn from. Unfortunately, while it's true that there are significant similarities, there are also notable differencies; as far as i know the USB subsystem is the one that gets closer to what we have in the embedded arena, since it can have complex cases of parent-child powering and wakeup. this API is not trying to represent the parent-child hierarchy. as far as I know that's documented in sysfs (or is supposed to be). this is just an attempt to make it so that as you are going through the hierarchy you don't have to use vastly different API's to control the different functions. I suspect that most (if not all) of the previous One Solutions have tried to completely handle all the details of their original case, and then branch out to the other cases. this attempt is working from the other direction. the user of this API doesn't care how something is done, it just wants to know what's possible and how to tell the system to switch modes. other then just me searching through the lists, do you have a pointer to some of the differences between the different types that are seen as being so large that they can't be unified? while I was describing the issues to my roomates over dinner I realized that the same type of functions are needed for the CPU clocks. if you have an accepted framework in place there that can do what I described, please consider extending it to cover other types of devices and drivers. That is not part of the fw: the fw simply expresses parent-child clock distribution and keeps usecounts so that unused clocks are automatically gated. The actual clock tree description is platform/arch/board specific and doesn't affect the framework. You can just roll your own version for x86 by providing a description of the methods used to switch on/off every individual clock on your board. So what you are asking for is that somebody writes an x86 version of the clock fw. this is more then just setting the clocks on everything (although setting clocks seems like it fits well into the model) becouse some power modes are not easily represented just as clocks. As for latencies, well, only few clocks really have significant impact. Most notably the main system oscillator. Everything else has 0 latency since it ends up in opening/closing a clock gate. Powering device on/off will certainly introduce more latency, but either the powering is supported by the hw, to make it quick or it has to go through most, if not all of he usual initialisation sequence; in that case it probably makes sense to avoid controlling it from kernelspace, since it will be slow and won't require dedcisions made with us precision. and many devices support both a quick almost-off mode and a slow almost-off mode (as well as a completely off mode), with the slow mode eating less power, but takeing longer to wake up from. that's the reason for providing the matrix to let the program makeing the decision decide if it's worth the time delays to get the power savings as I note in anther message, this SPI isn't intended to be strictly kernelspace or strictly userspace. for the ondemand speed governer you are changing the settings quickly and so probably want to do so in the kernel, however some people may be satisfied with slower controls and so could have them in userspace (an extreme example of this would be turning off wireless cards that aren't in use to save power and improve security) I think you are passing too much info up the chain to the part makeing the decision (that part doesn't need to
Re: [linux-pm] Power Management framework proposal
On Sun, 2007-07-22 at 01:58 -0700, ext [EMAIL PROTECTED] wrote: > On Sun, 22 Jul 2007, Igor Stoppa wrote: [snip] > > Could you elaborate on how your proposal is incompatible with enhancing > > the clock framework? > > It's not that I think it's incompatible with any existing powersaving > tools (in fact I hope it's not) > > it's that I think that this (or something similar) could be made to cover > all thevarious power options instead of CPU's having one interface, ACPI > capable drivers having another, embeded devices presenting a third, etc > > this was triggered by the mess of different function calls for different > purposes that are used for the suspend functions where you have a bunch of > different functions that are each supposed to be called at a specific time > from a specific mode during the suspend process. with all these different > functions driver writes tend to not bother implementing any of them, and > it seems like there is a fairly steady stream of new functions that end up > being needed. the initial intent was to just change this into a generic > set of calls that every driver writer would implement the minimum set of, > and make it trivially extensable to future capabilities of hardware. Every now and then there is some attempt to find One solution to bind them all: x86, SoC, ACPI ... you name it. Unfortunately, while it's true that there are significant similarities, there are also notable differencies; as far as i know the USB subsystem is the one that gets closer to what we have in the embedded arena, since it can have complex cases of parent-child powering and wakeup. > one other effect of this is that driver writers would see the mode > interface from day one rather then just completely ignoring it. right now > device driver authors tend to thing " why worry about figuring out how to > implement 'prepare to suspend', 'late suspend', 'suspend', 'quiese but > don't suspend', etc" if they aren't really interested in working on > suspend, it's not really clear what each of these should do even after > reading the docs on it. however listing the power modes that a device can > be in, documenting the cost of switching between them, and implementing > the transition is something very straightforward for the device driver > author to do (and they don't have to worry about the details of how and > when the various modes get used, that's up to the suspend/powersaving > software to figure out). as such I expect that the driver support for > powersaving modes to improve. in fact, I expect that some driver writers > will implement a whole bunch of modes, just to show off the features of > the hardware. and even if nothing uses the modes right now at least they > are implemented and documented for future use (and it should be trivial to > have a test routine that just runs every driver you have hardware for > through every mode transition to make sure that they all work, so the less > commonly used modes shouldn't bitrot too badly) What you are saying can be summarised as making the driver model more expressive. > while I was describing the issues to my roomates over dinner I realized > that the same type of functions are needed for the CPU clocks. > > if you have an accepted framework in place there that can do what I > described, please consider extending it to cover other types of devices > and drivers. That is not part of the fw: the fw simply expresses parent-child clock distribution and keeps usecounts so that unused clocks are automatically gated. The actual clock tree description is platform/arch/board specific and doesn't affect the framework. You can just roll your own version for x86 by providing a description of the methods used to switch on/off every individual clock on your board. So what you are asking for is that somebody writes an x86 version of the clock fw. As for latencies, well, only few clocks really have significant impact. Most notably the main system oscillator. Everything else has 0 latency since it ends up in opening/closing a clock gate. Powering device on/off will certainly introduce more latency, but either the powering is supported by the hw, to make it quick or it has to go through most, if not all of he usual initialisation sequence; in that case it probably makes sense to avoid controlling it from kernelspace, since it will be slow and won't require dedcisions made with us precision. > I want sanity and functionality far more then credit :-) I want to avoid redesigning the wheel: the current version is not round yet, but re-starting from a triangle every time is far less appealing. > thanks for the link. I've read through it, and it looks like there is a > lot of the same ideas in your proposal. Unless some new hw/technology shows up, I'm afraid the available set of ideas is very limited :-) > I think you are passing too much > info up the chain to the part makeing the decision (that part doesn't need >
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Igor Stoppa wrote: Hi, On Sat, 2007-07-21 at 23:49 -0700, ext [EMAIL PROTECTED] wrote: I'm deliberatly breaking the threading on this so that people who have tuned out the hibernation thread can take a look at this. below is the proposal that I made at the bottom of one of the posts on the hibernation thread. I have the impression that you are trying to describe a mix of the clock and latency frameworks. Could you elaborate on how your proposal is incompatible with enhancing the clock framework? It's not that I think it's incompatible with any existing powersaving tools (in fact I hope it's not) it's that I think that this (or something similar) could be made to cover all thevarious power options instead of CPU's having one interface, ACPI capable drivers having another, embeded devices presenting a third, etc this was triggered by the mess of different function calls for different purposes that are used for the suspend functions where you have a bunch of different functions that are each supposed to be called at a specific time from a specific mode during the suspend process. with all these different functions driver writes tend to not bother implementing any of them, and it seems like there is a fairly steady stream of new functions that end up being needed. the initial intent was to just change this into a generic set of calls that every driver writer would implement the minimum set of, and make it trivially extensable to future capabilities of hardware. one other effect of this is that driver writers would see the mode interface from day one rather then just completely ignoring it. right now device driver authors tend to thing " why worry about figuring out how to implement 'prepare to suspend', 'late suspend', 'suspend', 'quiese but don't suspend', etc" if they aren't really interested in working on suspend, it's not really clear what each of these should do even after reading the docs on it. however listing the power modes that a device can be in, documenting the cost of switching between them, and implementing the transition is something very straightforward for the device driver author to do (and they don't have to worry about the details of how and when the various modes get used, that's up to the suspend/powersaving software to figure out). as such I expect that the driver support for powersaving modes to improve. in fact, I expect that some driver writers will implement a whole bunch of modes, just to show off the features of the hardware. and even if nothing uses the modes right now at least they are implemented and documented for future use (and it should be trivial to have a test routine that just runs every driver you have hardware for through every mode transition to make sure that they all work, so the less commonly used modes shouldn't bitrot too badly) while I was describing the issues to my roomates over dinner I realized that the same type of functions are needed for the CPU clocks. if you have an accepted framework in place there that can do what I described, please consider extending it to cover other types of devices and drivers. I want sanity and functionality far more then credit :-) David Lang It looks like you are proposing a brand new shiny thing that frankly I would be happy to leave alone, unless it is crystal clear that the clock fw cannot be improved. The clocfk fw is used for OMAP and other architectures (including SH, iirc) and so far it has provided very good support for our power management needs (Nokia 770 and N800). Currently we are working on DVFS for OMAP2 (see slides presented at the linux-pm summit for OLS 2007 http://tinyurl.com/28tact ) and even if the current prototype is not actively involving the clock fw, our final goal is to make it capable of supporting atomic transactions for changing the core parameters. thanks for the link. I've read through it, and it looks like there is a lot of the same ideas in your proposal. I think you are passing too much info up the chain to the part makeing the decision (that part doesn't need to know the details of the voltage/freq choices, the %power/%capability numbers I suggested are in many ways more what they are making decision son anyway) in the slideshow you list in the sequence of changing the cpu speed to pre and post notify drivers. what exactly are the drivers expected to do with the notification? are you asking them to pause and then re-initialize for the new power level? OMAP3 will require suspend to ram implementation where the content of system memory is retained, while parts or all the SoC are switched off. The plan is still to have a clock fw based implementation (plus interaction with the power rails, of course). I think these are good examples of the non-ACPI systems you are mentioning. yes, I think they are. To make any proposal that has some chance of being accepted, you have to compare it against the
Re: [linux-pm] Power Management framework proposal
Hi, On Sat, 2007-07-21 at 23:49 -0700, ext [EMAIL PROTECTED] wrote: > I'm deliberatly breaking the threading on this so that people who have > tuned out the hibernation thread can take a look at this. > > below is the proposal that I made at the bottom of one of the posts on the > hibernation thread. I have the impression that you are trying to describe a mix of the clock and latency frameworks. Could you elaborate on how your proposal is incompatible with enhancing the clock framework? It looks like you are proposing a brand new shiny thing that frankly I would be happy to leave alone, unless it is crystal clear that the clock fw cannot be improved. The clocfk fw is used for OMAP and other architectures (including SH, iirc) and so far it has provided very good support for our power management needs (Nokia 770 and N800). Currently we are working on DVFS for OMAP2 (see slides presented at the linux-pm summit for OLS 2007 http://tinyurl.com/28tact ) and even if the current prototype is not actively involving the clock fw, our final goal is to make it capable of supporting atomic transactions for changing the core parameters. OMAP3 will require suspend to ram implementation where the content of system memory is retained, while parts or all the SoC are switched off. The plan is still to have a clock fw based implementation (plus interaction with the power rails, of course). I think these are good examples of the non-ACPI systems you are mentioning. To make any proposal that has some chance of being accepted, you have to compare it against the existing solution, explaining: -what it is bringing in terms of new functionalities -how it is different -why the current implementation cannot simply be enhanced You can refer to the linux-pm archives for examples of failed attempts over the last year or so, just search for "framework" in the subject. -- Cheers, Igor Igor Stoppa <[EMAIL PROTECTED]> (Nokia Multimedia - CP - OSSO / Helsinki, Finland) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
Hi, On Sat, 2007-07-21 at 23:49 -0700, ext [EMAIL PROTECTED] wrote: I'm deliberatly breaking the threading on this so that people who have tuned out the hibernation thread can take a look at this. below is the proposal that I made at the bottom of one of the posts on the hibernation thread. I have the impression that you are trying to describe a mix of the clock and latency frameworks. Could you elaborate on how your proposal is incompatible with enhancing the clock framework? It looks like you are proposing a brand new shiny thing that frankly I would be happy to leave alone, unless it is crystal clear that the clock fw cannot be improved. The clocfk fw is used for OMAP and other architectures (including SH, iirc) and so far it has provided very good support for our power management needs (Nokia 770 and N800). Currently we are working on DVFS for OMAP2 (see slides presented at the linux-pm summit for OLS 2007 http://tinyurl.com/28tact ) and even if the current prototype is not actively involving the clock fw, our final goal is to make it capable of supporting atomic transactions for changing the core parameters. OMAP3 will require suspend to ram implementation where the content of system memory is retained, while parts or all the SoC are switched off. The plan is still to have a clock fw based implementation (plus interaction with the power rails, of course). I think these are good examples of the non-ACPI systems you are mentioning. To make any proposal that has some chance of being accepted, you have to compare it against the existing solution, explaining: -what it is bringing in terms of new functionalities -how it is different -why the current implementation cannot simply be enhanced You can refer to the linux-pm archives for examples of failed attempts over the last year or so, just search for framework in the subject. -- Cheers, Igor Igor Stoppa [EMAIL PROTECTED] (Nokia Multimedia - CP - OSSO / Helsinki, Finland) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Igor Stoppa wrote: Hi, On Sat, 2007-07-21 at 23:49 -0700, ext [EMAIL PROTECTED] wrote: I'm deliberatly breaking the threading on this so that people who have tuned out the hibernation thread can take a look at this. below is the proposal that I made at the bottom of one of the posts on the hibernation thread. I have the impression that you are trying to describe a mix of the clock and latency frameworks. Could you elaborate on how your proposal is incompatible with enhancing the clock framework? It's not that I think it's incompatible with any existing powersaving tools (in fact I hope it's not) it's that I think that this (or something similar) could be made to cover all thevarious power options instead of CPU's having one interface, ACPI capable drivers having another, embeded devices presenting a third, etc this was triggered by the mess of different function calls for different purposes that are used for the suspend functions where you have a bunch of different functions that are each supposed to be called at a specific time from a specific mode during the suspend process. with all these different functions driver writes tend to not bother implementing any of them, and it seems like there is a fairly steady stream of new functions that end up being needed. the initial intent was to just change this into a generic set of calls that every driver writer would implement the minimum set of, and make it trivially extensable to future capabilities of hardware. one other effect of this is that driver writers would see the mode interface from day one rather then just completely ignoring it. right now device driver authors tend to thing why worry about figuring out how to implement 'prepare to suspend', 'late suspend', 'suspend', 'quiese but don't suspend', etc if they aren't really interested in working on suspend, it's not really clear what each of these should do even after reading the docs on it. however listing the power modes that a device can be in, documenting the cost of switching between them, and implementing the transition is something very straightforward for the device driver author to do (and they don't have to worry about the details of how and when the various modes get used, that's up to the suspend/powersaving software to figure out). as such I expect that the driver support for powersaving modes to improve. in fact, I expect that some driver writers will implement a whole bunch of modes, just to show off the features of the hardware. and even if nothing uses the modes right now at least they are implemented and documented for future use (and it should be trivial to have a test routine that just runs every driver you have hardware for through every mode transition to make sure that they all work, so the less commonly used modes shouldn't bitrot too badly) while I was describing the issues to my roomates over dinner I realized that the same type of functions are needed for the CPU clocks. if you have an accepted framework in place there that can do what I described, please consider extending it to cover other types of devices and drivers. I want sanity and functionality far more then credit :-) David Lang It looks like you are proposing a brand new shiny thing that frankly I would be happy to leave alone, unless it is crystal clear that the clock fw cannot be improved. The clocfk fw is used for OMAP and other architectures (including SH, iirc) and so far it has provided very good support for our power management needs (Nokia 770 and N800). Currently we are working on DVFS for OMAP2 (see slides presented at the linux-pm summit for OLS 2007 http://tinyurl.com/28tact ) and even if the current prototype is not actively involving the clock fw, our final goal is to make it capable of supporting atomic transactions for changing the core parameters. thanks for the link. I've read through it, and it looks like there is a lot of the same ideas in your proposal. I think you are passing too much info up the chain to the part makeing the decision (that part doesn't need to know the details of the voltage/freq choices, the %power/%capability numbers I suggested are in many ways more what they are making decision son anyway) in the slideshow you list in the sequence of changing the cpu speed to pre and post notify drivers. what exactly are the drivers expected to do with the notification? are you asking them to pause and then re-initialize for the new power level? OMAP3 will require suspend to ram implementation where the content of system memory is retained, while parts or all the SoC are switched off. The plan is still to have a clock fw based implementation (plus interaction with the power rails, of course). I think these are good examples of the non-ACPI systems you are mentioning. yes, I think they are. To make any proposal that has some chance of being accepted, you have to compare it against the
Re: [linux-pm] Power Management framework proposal
On Sun, 2007-07-22 at 01:58 -0700, ext [EMAIL PROTECTED] wrote: On Sun, 22 Jul 2007, Igor Stoppa wrote: [snip] Could you elaborate on how your proposal is incompatible with enhancing the clock framework? It's not that I think it's incompatible with any existing powersaving tools (in fact I hope it's not) it's that I think that this (or something similar) could be made to cover all thevarious power options instead of CPU's having one interface, ACPI capable drivers having another, embeded devices presenting a third, etc this was triggered by the mess of different function calls for different purposes that are used for the suspend functions where you have a bunch of different functions that are each supposed to be called at a specific time from a specific mode during the suspend process. with all these different functions driver writes tend to not bother implementing any of them, and it seems like there is a fairly steady stream of new functions that end up being needed. the initial intent was to just change this into a generic set of calls that every driver writer would implement the minimum set of, and make it trivially extensable to future capabilities of hardware. Every now and then there is some attempt to find One solution to bind them all: x86, SoC, ACPI ... you name it. Unfortunately, while it's true that there are significant similarities, there are also notable differencies; as far as i know the USB subsystem is the one that gets closer to what we have in the embedded arena, since it can have complex cases of parent-child powering and wakeup. one other effect of this is that driver writers would see the mode interface from day one rather then just completely ignoring it. right now device driver authors tend to thing why worry about figuring out how to implement 'prepare to suspend', 'late suspend', 'suspend', 'quiese but don't suspend', etc if they aren't really interested in working on suspend, it's not really clear what each of these should do even after reading the docs on it. however listing the power modes that a device can be in, documenting the cost of switching between them, and implementing the transition is something very straightforward for the device driver author to do (and they don't have to worry about the details of how and when the various modes get used, that's up to the suspend/powersaving software to figure out). as such I expect that the driver support for powersaving modes to improve. in fact, I expect that some driver writers will implement a whole bunch of modes, just to show off the features of the hardware. and even if nothing uses the modes right now at least they are implemented and documented for future use (and it should be trivial to have a test routine that just runs every driver you have hardware for through every mode transition to make sure that they all work, so the less commonly used modes shouldn't bitrot too badly) What you are saying can be summarised as making the driver model more expressive. while I was describing the issues to my roomates over dinner I realized that the same type of functions are needed for the CPU clocks. if you have an accepted framework in place there that can do what I described, please consider extending it to cover other types of devices and drivers. That is not part of the fw: the fw simply expresses parent-child clock distribution and keeps usecounts so that unused clocks are automatically gated. The actual clock tree description is platform/arch/board specific and doesn't affect the framework. You can just roll your own version for x86 by providing a description of the methods used to switch on/off every individual clock on your board. So what you are asking for is that somebody writes an x86 version of the clock fw. As for latencies, well, only few clocks really have significant impact. Most notably the main system oscillator. Everything else has 0 latency since it ends up in opening/closing a clock gate. Powering device on/off will certainly introduce more latency, but either the powering is supported by the hw, to make it quick or it has to go through most, if not all of he usual initialisation sequence; in that case it probably makes sense to avoid controlling it from kernelspace, since it will be slow and won't require dedcisions made with us precision. I want sanity and functionality far more then credit :-) I want to avoid redesigning the wheel: the current version is not round yet, but re-starting from a triangle every time is far less appealing. thanks for the link. I've read through it, and it looks like there is a lot of the same ideas in your proposal. Unless some new hw/technology shows up, I'm afraid the available set of ideas is very limited :-) I think you are passing too much info up the chain to the part makeing the decision (that part doesn't need to know the details of the voltage/freq choices, the
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Igor Stoppa wrote: On Sun, 2007-07-22 at 01:58 -0700, ext [EMAIL PROTECTED] wrote: On Sun, 22 Jul 2007, Igor Stoppa wrote: [snip] Could you elaborate on how your proposal is incompatible with enhancing the clock framework? It's not that I think it's incompatible with any existing powersaving tools (in fact I hope it's not) it's that I think that this (or something similar) could be made to cover all thevarious power options instead of CPU's having one interface, ACPI capable drivers having another, embeded devices presenting a third, etc this was triggered by the mess of different function calls for different purposes that are used for the suspend functions where you have a bunch of different functions that are each supposed to be called at a specific time from a specific mode during the suspend process. with all these different functions driver writes tend to not bother implementing any of them, and it seems like there is a fairly steady stream of new functions that end up being needed. the initial intent was to just change this into a generic set of calls that every driver writer would implement the minimum set of, and make it trivially extensable to future capabilities of hardware. Every now and then there is some attempt to find One solution to bind them all: x86, SoC, ACPI ... you name it. this is another one. I'd be happy to get pointers to prior ones to learn from. Unfortunately, while it's true that there are significant similarities, there are also notable differencies; as far as i know the USB subsystem is the one that gets closer to what we have in the embedded arena, since it can have complex cases of parent-child powering and wakeup. this API is not trying to represent the parent-child hierarchy. as far as I know that's documented in sysfs (or is supposed to be). this is just an attempt to make it so that as you are going through the hierarchy you don't have to use vastly different API's to control the different functions. I suspect that most (if not all) of the previous One Solutions have tried to completely handle all the details of their original case, and then branch out to the other cases. this attempt is working from the other direction. the user of this API doesn't care how something is done, it just wants to know what's possible and how to tell the system to switch modes. other then just me searching through the lists, do you have a pointer to some of the differences between the different types that are seen as being so large that they can't be unified? while I was describing the issues to my roomates over dinner I realized that the same type of functions are needed for the CPU clocks. if you have an accepted framework in place there that can do what I described, please consider extending it to cover other types of devices and drivers. That is not part of the fw: the fw simply expresses parent-child clock distribution and keeps usecounts so that unused clocks are automatically gated. The actual clock tree description is platform/arch/board specific and doesn't affect the framework. You can just roll your own version for x86 by providing a description of the methods used to switch on/off every individual clock on your board. So what you are asking for is that somebody writes an x86 version of the clock fw. this is more then just setting the clocks on everything (although setting clocks seems like it fits well into the model) becouse some power modes are not easily represented just as clocks. As for latencies, well, only few clocks really have significant impact. Most notably the main system oscillator. Everything else has 0 latency since it ends up in opening/closing a clock gate. Powering device on/off will certainly introduce more latency, but either the powering is supported by the hw, to make it quick or it has to go through most, if not all of he usual initialisation sequence; in that case it probably makes sense to avoid controlling it from kernelspace, since it will be slow and won't require dedcisions made with us precision. and many devices support both a quick almost-off mode and a slow almost-off mode (as well as a completely off mode), with the slow mode eating less power, but takeing longer to wake up from. that's the reason for providing the matrix to let the program makeing the decision decide if it's worth the time delays to get the power savings as I note in anther message, this SPI isn't intended to be strictly kernelspace or strictly userspace. for the ondemand speed governer you are changing the settings quickly and so probably want to do so in the kernel, however some people may be satisfied with slower controls and so could have them in userspace (an extreme example of this would be turning off wireless cards that aren't in use to save power and improve security) I think you are passing too much info up the chain to the part makeing the decision (that part doesn't need to
Re: [linux-pm] Power Management framework proposal
son anyway) I don't think you have got it right: the only info being passed is the standard cpufreq list of frequencies; everything else is part of the cpufreq driver. to make the decisions the software makeing the decision needs to know how much power would be used at each freq setting. power used at a certain frequency is not a single variable. In fact, on most laptops and other similarly power aware devices, it's in fact better for power consumption to always go to the maximum frequency as quickly as possible, so that you can be idle for the longest possible time after that. Good luck finding a generic way to represent such things in a (userspace) interface - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Arjan van de Ven wrote: son anyway) I don't think you have got it right: the only info being passed is the standard cpufreq list of frequencies; everything else is part of the cpufreq driver. to make the decisions the software makeing the decision needs to know how much power would be used at each freq setting. power used at a certain frequency is not a single variable. In fact, on most laptops and other similarly power aware devices, it's in fact better for power consumption to always go to the maximum frequency as quickly as possible, so that you can be idle for the longest possible time after that. Good luck finding a generic way to represent such things in a (userspace) interface I disagree with you here. for each frequency setting you can say how much power the cpu/system is expected to use (especially as a percentage of the full power mode). creating this value requires you to take two things into account, the voltage you are running things at (by far the biggest effect), and the minor difference that the frequency makes at that voltage (possibly small enough to ignore entirely). the API I proposed has no problem with there being multiple modes that have the same %power but with different %capability numbers. I'm willing to bet that the current cpufreq software just looks at the voltage as the value that tells you how much power the thing is going to use at that setting the fact that you want to run at the max frequancy for a given voltage is a reasonable strategy, but it's a power saving _strategy_, not a capability of the hardware and the API I'm mentioning should be enough to let you pick the highest performance setting that has the same power rating as the minimum performance you need (or for that matter to go one step futher and go with the most efficiant setting in terms of performance/power that has a performance number higher then what you need, which could actually be better) the fact that you currently want to use this strategy doesn't mean that the other possible modes don't exist, and even if you don't use them now they should be available within the API (including the cpufreq api) this strategy should work well on the normal unpredictable workload that most people deal with, but there are some cases where the workload becomes pretty predictable (media players for example) where there really is less variation, and a need for a constant availability of the cpu, so it may actually save a smidge of power to run below the highest freq that the voltage allows rather then running faster and being idle more cycles. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
I disagree with you here. for each frequency setting you can say how much power the cpu/system is expected to use (especially as a percentage of the full power mode). creating this value requires you to take two things into account, the voltage you are running things at (by far the biggest effect), and the minor difference that the frequency makes at that voltage (possibly small enough to ignore entirely). the API I proposed has no problem with there being multiple modes that have the same %power but with different %capability numbers. how do you deal with the power at idle vs power at full load.. you need both at each level to pick the best one, as well as relative performance etc. I'm willing to bet that the current cpufreq software just looks at the voltage as the value that tells you how much power the thing is going to use at that setting it doesn't. the fact that you want to run at the max frequancy for a given voltage is no I want to run at the max frequency PERIOD. On just about any PC, it's more power efficient to go full speed when executing code, and then idle for as long as you can. (there are some second order effects that make this a bit more complex, but as first order approach it's a sound approach). Voltage follows, and that's fine. a reasonable strategy, but it's a power saving _strategy_, not a capability of the hardware and the API I'm mentioning should be enough to let you pick the highest performance setting that has the same power rating as the minimum performance you need (or for that matter to go one step futher and go with the most efficiant setting in terms of performance/power that has a performance number higher then what you need, which could actually be better) why would I care about voltage? Most PCs don't expose it, and that's fine, they can switch to the voltage needed REALLY quickly (single or double digit microseconds). PCs in fact only expose numbered states (P0 to P7 at most), and some number that you can use to show the user, but doesn't mean anything beyond that. Some people interpret it as frequency, and that's nice, but it doesn't really mean that. You really don't know anything beyond that and that's ok. As I said before, as a general strategy you want highest speed when running code for race-to-idle, with some 2nd order effects for when you execute code really shortly coming out of idle; in which case you don't want to do a voltage transition twice (most cpus have the idle voltage be the lowest-execute voltage as well). this strategy should work well on the normal unpredictable workload that most people deal with, but there are some cases where the workload becomes pretty predictable (media players for example) where there really is less variation, and a need for a constant availability of the cpu, so it may actually save a smidge of power to run below the highest freq that the voltage allows rather then running faster and being idle more cycles. that actually is the example showcase of race-to-idle where you absolutely want to run at the highest frequency.. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Arjan van de Ven wrote: I disagree with you here. for each frequency setting you can say how much power the cpu/system is expected to use (especially as a percentage of the full power mode). creating this value requires you to take two things into account, the voltage you are running things at (by far the biggest effect), and the minor difference that the frequency makes at that voltage (possibly small enough to ignore entirely). the API I proposed has no problem with there being multiple modes that have the same %power but with different %capability numbers. how do you deal with the power at idle vs power at full load.. you need both at each level to pick the best one, as well as relative performance etc. what I was thinking was to use power at full load for the power rateing of each mode. the fact that you want to run at the max frequancy for a given voltage is no I want to run at the max frequency PERIOD. On just about any PC, it's more power efficient to go full speed when executing code, and then idle for as long as you can. (there are some second order effects that make this a bit more complex, but as first order approach it's a sound approach). Voltage follows, and that's fine. this seems to be contradicted by the fact that AMD is listing the ability for each core to run at a different clock speed on the new 4-core chips as an advantage. if you always want to run at the max frequency PERIOD then why bother engineering the ability to do otherwise? (as opposed to just shutting down unused cores) another example is the 80 core demo chip that Intel has been makeing press about. it can run at 1Tflop on 25w of power and 2Tflop at 150w of power. running at max freq for a 1Tflop workload would have you eating ~75w of power (the numbers may be off, I'm going from memory, but the cost in power of doubling the speed was _far_ more then double the power requirements) this strategy should work well on the normal unpredictable workload that most people deal with, but there are some cases where the workload becomes pretty predictable (media players for example) where there really is less variation, and a need for a constant availability of the cpu, so it may actually save a smidge of power to run below the highest freq that the voltage allows rather then running faster and being idle more cycles. that actually is the example showcase of race-to-idle where you absolutely want to run at the highest frequency.. only if the transitions don't cost anything significant, and the computation capacity per watt of power is the same at all frequencies. the chip performance numbers I've been seeing (which I admit are mostly embedded datasheets) indicate that neither of these hold true. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 2007-07-22 at 21:04 -0700, [EMAIL PROTECTED] wrote: the fact that you want to run at the max frequancy for a given voltage is no I want to run at the max frequency PERIOD. On just about any PC, it's more power efficient to go full speed when executing code, and then idle for as long as you can. (there are some second order effects that make this a bit more complex, but as first order approach it's a sound approach). Voltage follows, and that's fine. this seems to be contradicted by the fact that AMD is listing the ability for each core to run at a different clock speed on the new 4-core chips as an advantage. that's a marketing thing mostly.. they all still run at the same voltage anyway. if you always want to run at the max frequency PERIOD then why bother engineering the ability to do otherwise? (as opposed to just shutting down unused cores) multicore changes the rules a little but not all that much. (the idle power is higher if not all cores are idle at the same time. Yet... each core individually trying to be idle as quickly as possible is the best way to get to the highest all cores idle time, unless there is some really special/weird synchronization) this strategy should work well on the normal unpredictable workload that most people deal with, but there are some cases where the workload becomes pretty predictable (media players for example) where there really is less variation, and a need for a constant availability of the cpu, so it may actually save a smidge of power to run below the highest freq that the voltage allows rather then running faster and being idle more cycles. that actually is the example showcase of race-to-idle where you absolutely want to run at the highest frequency.. only if the transitions don't cost anything significant, these are second order effects though. On a pc, the transition costs are quite low (as I said, single or low double digit microseconds). They are not zero, and that is why you see things like ondemand ramp up only after a little time, as a guestimate to make sure it's not just a really short lived code execution. and the computation capacity per watt of power is the same at all frequencies. the chip performance numbers I've been seeing (which I admit are mostly embedded datasheets) indicate that neither of these hold true. let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W assume media playback, and a dumb one, that takes half a second to decode a second of media. (again to make the math simple) at half speed: Energy for a second is 0.5 * 24 + 0.5 * 1 = 12.5 J at full speed: Energy for a second is 0.25 * 34 + 0.75 * 1 = 9.25 J this works for all systems where the idle power is more lower than the power you save by dropping speed... and that is almost all of them in the PC world. now you can argue that 0.5 seconds is a really really long time, and you'd be right. so for really really short stints (say a timer interrupt) you don't want to change the voltage at all (nor would you want to change the plls to change frequency for that matter). But once you start chaning those, you might as well go full speed. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Sun, 22 Jul 2007, Arjan van de Ven wrote: On Sun, 2007-07-22 at 21:04 -0700, [EMAIL PROTECTED] wrote: this strategy should work well on the normal unpredictable workload that most people deal with, but there are some cases where the workload becomes pretty predictable (media players for example) where there really is less variation, and a need for a constant availability of the cpu, so it may actually save a smidge of power to run below the highest freq that the voltage allows rather then running faster and being idle more cycles. that actually is the example showcase of race-to-idle where you absolutely want to run at the highest frequency.. only if the transitions don't cost anything significant, these are second order effects though. On a pc, the transition costs are quite low (as I said, single or low double digit microseconds). including pausing all drivers before the transition and unpausing them aftrwords? and the computation capacity per watt of power is the same at all frequencies. the chip performance numbers I've been seeing (which I admit are mostly embedded datasheets) indicate that neither of these hold true. let me give you a real world example then, and the numbers I'm using are ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I just rounded them a little so that the math works out nice. power at full speed: 34W power at half speed: 24W power at idle: 1W are these numbers for the CPU itself or for the a larger chunk? I could easily see these numbers for motherboard (including CPU and RAM), but it would surprise me if these numbers are for the CPU itself. I'm used to seeing datasheets that have a much more linear voltage/freq (and therefor a quadratic voltage/power) curve. in some cases the voltage requirements drop faster then the frequency. assume media playback, and a dumb one, that takes half a second to decode a second of media. (again to make the math simple) at half speed: Energy for a second is 0.5 * 24 + 0.5 * 1 = 12.5 J at full speed: Energy for a second is 0.25 * 34 + 0.75 * 1 = 9.25 J this works for all systems where the idle power is more lower than the power you save by dropping speed... and that is almost all of them in the PC world. if you can idle the system as a whole I agree with you fully. most PC hardware (including the mobile stuff) doesn't change it's power consumption much with load. at Usenix there was a presentiation (I don't remember if it was by Amazon or Google) about this subject, showing that current PC hardware only goes down to 50% power when idle (short of switching power modes) and that they and other big companies were pushing vendors to improve their hardware, aiming to get the idle power down to 10% (again without suspending anything). so there's some chance that this will change before too long. now you can argue that 0.5 seconds is a really really long time, and you'd be right. so for really really short stints (say a timer interrupt) you don't want to change the voltage at all (nor would you want to change the plls to change frequency for that matter). But once you start chaning those, you might as well go full speed. this assumes that you can cache 1 second of video, if you have more real-time requirements you have a much harder time (say video confrancing where you don't get the frame until just before you need to display it) David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/