Re: [linux-pm] Power Management framework proposal

2007-07-29 Thread david

sorry for the delay in responding

On Wed, 25 Jul 2007, Jerome Glisse wrote:

[EMAIL PROTECTED] wrote:

 On Wed, 25 Jul 2007, Jerome Glisse wrote:

>  On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > >   For instance on graphics card you could do the following (maybe 
> >  more):

> > >   -change GPU clock
> > >   -change memory clock
> > >   -disable part of engine
> > >   -disable unit
> > >   i truly don't think you can make a common interface for all this, 
> >  more

> > >   over there might be constraint on how you can change things (GPU &
> > >   memory clock might need to follow a given ratio). So you definitely
> > >   need knowledge in the user space program to handle this.
> > 
> >  sure you can, just enumerate all the options the driver writer wants 
> >  to

> >   offer as options. yes this could be a lengthy list, so what?
> > 
> 
>  My point was that your interface by trying to fit square pegs into round 
>  hole
>  will fail to expose all subtility of each device which might in the end 
>  bring

>  to wrong power management decision. So i believe we can't sum up
>  power management to list of mode whose attribute are power consumption

& capacity.


 it's possible (which is part of the reason I started the thread), but so
 far there hasn't been anything identified that is a really bad fit.


Tell me how i do this in your model:
GPU/VRAM memory clock change power consumption of the card and
the power consumption is often not a trivial function of both of this 
parameters

(i even here simplify the problem by omitting pipeline shutdown). So how with
two different separate mode list (one for GPU speed another one for VRAM 
speed)

can you provide consumption information while this consumption depends on the
others settings. Then if you give as a solution to make only one list you end 
up

with a more bigger list than previously needed (nrGPUmodes * nrVRAMmodes)
do you expect the user to go through a lengthy list to find what he wants ?
(remember that we will have to add pipeline power off, pll tweaking or many
others way of saving power on such card).


yes I expect that it would be a large list in some conditions. but one 
purpose of this API is to make these options able to be discovered by 
software. right now nothing could be done at all without driver specific 
knowledge. even a lengthy list can be better then that.


presenting the list to the user directly is a last resort, only for 
experimentation or when nothing else wants to deal with devices of that 
type.


with a description field (which I didn't include initially, but seems 
obviously needed now) it should be fairly easy to create descriptions that 
let the software see that there are multiple factors involved.



So by choosing this power consumption as a unit of measure you end up
in non trivial case. There is also the question of overclocking


if the driver supports overclocking then list it in the modes (nothing 
says that % capacity couldn't go over 100% for example)


, and other points already identified where unfortunately a global 
design such as your proposal does not seems to fit properly: local power 
decision (ethernet, wifi card, ... can power down them self is they are 
doing nothings but the place where you can know this is the driver)


if they power themselves down with no notice to the system they should 
power themselves back up with no need for the rest of the system to tell 
them either. so this ca either be ignored or presented as a mode between 
off and on that enables this behavior.



, there is also the child/parent relation, how to
estimate power usage (on some configuration one device consumption can
be marginal toward all others things while on other this same device can be
the most power hungry device)... I see all this as bad fit.


ahh, here we see a disconnect. I was not intending for the power field to 
be that exact. there are just too many variables. for example: even for a 
cpu, the power used isn't exactly tied to the clock speed and voltage, the 
mix of commands that the cpu is running will affect the power it eats, 
sometimes by a significant amount. it was intended to be an ordering 
factor and approximate the power used so that things could make a 
peroformance/power tradeoff with a good chance of makeing a reasonable 
choice.


it's not intended for 'make this laptop use 24w of power instead of 25w of 
power'


>  And there is no way to design an abstraction given that all hw we will 
>  have
>  to deal with are too much different and do not follow any standard 
>  things

>  (beside ACPI there is other way to save power brightness, gpu/memory
>  clock, pll, ...) so i don't see how one might give a common view of 
>  things
>  which are fundamentally different in how they affect consumption (same 
>  end

>  result with many different paths leading to it).

 so you are saying that the power management software must know the details
 of each and every driver, and if you 

Re: [linux-pm] Power Management framework proposal

2007-07-29 Thread david

sorry for the delay in responding

On Wed, 25 Jul 2007, Jerome Glisse wrote:

[EMAIL PROTECTED] wrote:

 On Wed, 25 Jul 2007, Jerome Glisse wrote:

  On 7/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 For instance on graphics card you could do the following (maybe 
   more):

 -change GPU clock
 -change memory clock
 -disable part of engine
 -disable unit
 i truly don't think you can make a common interface for all this, 
   more

 over there might be constraint on how you can change things (GPU 
 memory clock might need to follow a given ratio). So you definitely
 need knowledge in the user space program to handle this.
  
   sure you can, just enumerate all the options the driver writer wants 
   to

offer as options. yes this could be a lengthy list, so what?
  
 
  My point was that your interface by trying to fit square pegs into round 
  hole
  will fail to expose all subtility of each device which might in the end 
  bring

  to wrong power management decision. So i believe we can't sum up
  power management to list of mode whose attribute are power consumption

 capacity.


 it's possible (which is part of the reason I started the thread), but so
 far there hasn't been anything identified that is a really bad fit.


Tell me how i do this in your model:
GPU/VRAM memory clock change power consumption of the card and
the power consumption is often not a trivial function of both of this 
parameters

(i even here simplify the problem by omitting pipeline shutdown). So how with
two different separate mode list (one for GPU speed another one for VRAM 
speed)

can you provide consumption information while this consumption depends on the
others settings. Then if you give as a solution to make only one list you end 
up

with a more bigger list than previously needed (nrGPUmodes * nrVRAMmodes)
do you expect the user to go through a lengthy list to find what he wants ?
(remember that we will have to add pipeline power off, pll tweaking or many
others way of saving power on such card).


yes I expect that it would be a large list in some conditions. but one 
purpose of this API is to make these options able to be discovered by 
software. right now nothing could be done at all without driver specific 
knowledge. even a lengthy list can be better then that.


presenting the list to the user directly is a last resort, only for 
experimentation or when nothing else wants to deal with devices of that 
type.


with a description field (which I didn't include initially, but seems 
obviously needed now) it should be fairly easy to create descriptions that 
let the software see that there are multiple factors involved.



So by choosing this power consumption as a unit of measure you end up
in non trivial case. There is also the question of overclocking


if the driver supports overclocking then list it in the modes (nothing 
says that % capacity couldn't go over 100% for example)


, and other points already identified where unfortunately a global 
design such as your proposal does not seems to fit properly: local power 
decision (ethernet, wifi card, ... can power down them self is they are 
doing nothings but the place where you can know this is the driver)


if they power themselves down with no notice to the system they should 
power themselves back up with no need for the rest of the system to tell 
them either. so this ca either be ignored or presented as a mode between 
off and on that enables this behavior.



, there is also the child/parent relation, how to
estimate power usage (on some configuration one device consumption can
be marginal toward all others things while on other this same device can be
the most power hungry device)... I see all this as bad fit.


ahh, here we see a disconnect. I was not intending for the power field to 
be that exact. there are just too many variables. for example: even for a 
cpu, the power used isn't exactly tied to the clock speed and voltage, the 
mix of commands that the cpu is running will affect the power it eats, 
sometimes by a significant amount. it was intended to be an ordering 
factor and approximate the power used so that things could make a 
peroformance/power tradeoff with a good chance of makeing a reasonable 
choice.


it's not intended for 'make this laptop use 24w of power instead of 25w of 
power'


  And there is no way to design an abstraction given that all hw we will 
  have
  to deal with are too much different and do not follow any standard 
  things

  (beside ACPI there is other way to save power brightness, gpu/memory
  clock, pll, ...) so i don't see how one might give a common view of 
  things
  which are fundamentally different in how they affect consumption (same 
  end

  result with many different paths leading to it).

 so you are saying that the power management software must know the details
 of each and every driver, and if you add a new driver you must change the
 power management software 

Re: [linux-pm] Power Management framework proposal

2007-07-27 Thread Pavel Machek
Hi!

> > let me give you a real world example then, and the numbers I'm using are
> > ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
> > just rounded them a little so that the math works out nice.
> > 
> > power at full speed: 34W
> > power at half speed: 24W
> > power at idle: 1W
> 
> I have usually seen different numbers, for example:
> 
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf

Trust Arjan, modern cpus work as he describes.

> Although this paper speaks about thermal design power instead of power
> consumption, i suppose that it should be roughly equal.
> 
> For example Athlon 64 3700 (ADA3700AEP5AR):
> 
> 2.4 GHz, 1.5 V -> 89 W
> 2.2 GHz, 1.4 V -> 72 W
> 2.0 GHz, 1.3 V -> 53 W
> 1.8 GHz, 1.2 V -> 39 W
> 1.0 GHz, 1.1 V -> 22 W

I guess that means athlon 64 is 'old'.

> Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power
> consumption shows that it is more efficient to be busy for 2 time units
> on 1 GHz than be busy for 1 time unit and be idle for 1 time unit
> on 2 GHz.
> 
> 1 GHz:
> both cores idle:  48 W
> one core busy:57 W
> two cores busy:   66 W

2 sec decoding video at both cores: 132J

> 2 GHz:
> both cores idle:  54 W
> one core busy:78 W
> two cores busy:   95 W

1 sec decode @ 2GHz + 1 sec idle @ 1GHz: 143J

So even on your hw difference is not too big... and take a look at
numbers from core2duo.

Actually...

4 sec decode @ 1 core @ 1GHz: 57*4=228J
1 sec decode @ 2 cores @ 2GHz, then idle: 95 + 48*3 = 142+95 = 235J...

Ok, so it is still   win, but even smaller one..
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-27 Thread Pavel Machek
Hi!

  let me give you a real world example then, and the numbers I'm using are
  ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
  just rounded them a little so that the math works out nice.
  
  power at full speed: 34W
  power at half speed: 24W
  power at idle: 1W
 
 I have usually seen different numbers, for example:
 
 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf

Trust Arjan, modern cpus work as he describes.

 Although this paper speaks about thermal design power instead of power
 consumption, i suppose that it should be roughly equal.
 
 For example Athlon 64 3700 (ADA3700AEP5AR):
 
 2.4 GHz, 1.5 V - 89 W
 2.2 GHz, 1.4 V - 72 W
 2.0 GHz, 1.3 V - 53 W
 1.8 GHz, 1.2 V - 39 W
 1.0 GHz, 1.1 V - 22 W

I guess that means athlon 64 is 'old'.

 Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power
 consumption shows that it is more efficient to be busy for 2 time units
 on 1 GHz than be busy for 1 time unit and be idle for 1 time unit
 on 2 GHz.
 
 1 GHz:
 both cores idle:  48 W
 one core busy:57 W
 two cores busy:   66 W

2 sec decoding video at both cores: 132J

 2 GHz:
 both cores idle:  54 W
 one core busy:78 W
 two cores busy:   95 W

1 sec decode @ 2GHz + 1 sec idle @ 1GHz: 143J

So even on your hw difference is not too big... and take a look at
numbers from core2duo.

Actually...

4 sec decode @ 1 core @ 1GHz: 57*4=228J
1 sec decode @ 2 cores @ 2GHz, then idle: 95 + 48*3 = 142+95 = 235J...

Ok, so it is still   win, but even smaller one..
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-25 Thread Jerome Glisse

[EMAIL PROTECTED] wrote:

On Wed, 25 Jul 2007, Jerome Glisse wrote:


On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
 will each plugin have it's own interface? or will you have one 
interface
 to access the plugins and then the plugins do things behind the 
scenes?


 I'll bet that the API for the plugins is common, and if so then it 
could

 be similar to the API that I suggested.


I take here ohm as a reference (this come from my limited 
understanding of

this daemon so there might be inaccuracy) driver export through HAL
there power management tunning capacity, Then an ohm plugin would use
HAL to give a higher
view of this capacity and also manage policy, preference, permission, 
...


Last consumer in power management food chain would be an user 
interface which
will communicate with ohm (and with all ohm plugin) so desktop 
writter (gnome,
kde, ...) can write some kind of power management center where each 
ohm plugin
can have its own panel. So in the end the user got one place to do 
all its

power management which is the goal i think you are trying to aim.


no. I am talking about the interface to the drivers that things like 
HAL would use


Ok, i was just trying to stress that the end result is the same from the 
user point of

view.
>  For instance on graphics card you could do the following (maybe 
more):

>  -change GPU clock
>  -change memory clock
>  -disable part of engine
>  -disable unit
>  i truly don't think you can make a common interface for all this, 
more

>  over there might be constraint on how you can change things (GPU &
>  memory clock might need to follow a given ratio). So you definitely
>  need knowledge in the user space program to handle this.

 sure you can, just enumerate all the options the driver writer 
wants to

 offer as options. yes this could be a lengthy list, so what?



My point was that your interface by trying to fit square pegs into 
round hole
will fail to expose all subtility of each device which might in the 
end bring

to wrong power management decision. So i believe we can't sum up
power management to list of mode whose attribute are power consumption
& capacity.


it's possible (which is part of the reason I started the thread), but 
so far there hasn't been anything identified that is a really bad fit.



Tell me how i do this in your model:
GPU/VRAM memory clock change power consumption of the card and
the power consumption is often not a trivial function of both of this 
parameters
(i even here simplify the problem by omitting pipeline shutdown). So how 
with
two different separate mode list (one for GPU speed another one for VRAM 
speed)
can you provide consumption information while this consumption depends 
on the
others settings. Then if you give as a solution to make only one list 
you end up

with a more bigger list than previously needed (nrGPUmodes * nrVRAMmodes)
do you expect the user to go through a lengthy list to find what he wants ?
(remember that we will have to add pipeline power off, pll tweaking or many
others way of saving power on such card).

So by choosing this power consumption as a unit of measure you end up
in non trivial case. There is also the question of overclocking, and 
other points

already identified where unfortunately a global design such as your proposal
does not seems to fit properly: local power decision (ethernet, wifi 
card, ...

can power down them self is they are doing nothings but the place where you
can know this is the driver), there is also the child/parent relation, 
how to

estimate power usage (on some configuration one device consumption can
be marginal toward all others things while on other this same device can be
the most power hungry device)... I see all this as bad fit.
And there is no way to design an abstraction given that all hw we 
will have
to deal with are too much different and do not follow any standard 
things

(beside ACPI there is other way to save power brightness, gpu/memory
clock, pll, ...) so i don't see how one might give a common view of 
things
which are fundamentally different in how they affect consumption 
(same end

result with many different paths leading to it).


so you are saying that the power management software must know the 
details of each and every driver, and if you add a new driver you must 
change the power management software before it can do anything 
(including allowing manual control of the modes)


You have to provide an ohm plug in (in an ohm world) where policy for 
this device will be
handled and this plug in need to be designed knowing what the hw export 
through HAL.
Yes it's pain full but you don't want to put policy in the driver and to 
do policy you need

knowledge on the things you deal with.
seems to me I heard similar arguments several years ago about the CPU 
speed settings, it turns out that the cpufreq interface works really 
well for them and the software that's controlling things no longer 
needs to know the details of every 

Re: [linux-pm] Power Management framework proposal

2007-07-25 Thread Jerome Glisse

[EMAIL PROTECTED] wrote:

On Wed, 25 Jul 2007, Jerome Glisse wrote:


On 7/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 will each plugin have it's own interface? or will you have one 
interface
 to access the plugins and then the plugins do things behind the 
scenes?


 I'll bet that the API for the plugins is common, and if so then it 
could

 be similar to the API that I suggested.


I take here ohm as a reference (this come from my limited 
understanding of

this daemon so there might be inaccuracy) driver export through HAL
there power management tunning capacity, Then an ohm plugin would use
HAL to give a higher
view of this capacity and also manage policy, preference, permission, 
...


Last consumer in power management food chain would be an user 
interface which
will communicate with ohm (and with all ohm plugin) so desktop 
writter (gnome,
kde, ...) can write some kind of power management center where each 
ohm plugin
can have its own panel. So in the end the user got one place to do 
all its

power management which is the goal i think you are trying to aim.


no. I am talking about the interface to the drivers that things like 
HAL would use


Ok, i was just trying to stress that the end result is the same from the 
user point of

view.
  For instance on graphics card you could do the following (maybe 
more):

  -change GPU clock
  -change memory clock
  -disable part of engine
  -disable unit
  i truly don't think you can make a common interface for all this, 
more

  over there might be constraint on how you can change things (GPU 
  memory clock might need to follow a given ratio). So you definitely
  need knowledge in the user space program to handle this.

 sure you can, just enumerate all the options the driver writer 
wants to

 offer as options. yes this could be a lengthy list, so what?



My point was that your interface by trying to fit square pegs into 
round hole
will fail to expose all subtility of each device which might in the 
end bring

to wrong power management decision. So i believe we can't sum up
power management to list of mode whose attribute are power consumption
 capacity.


it's possible (which is part of the reason I started the thread), but 
so far there hasn't been anything identified that is a really bad fit.



Tell me how i do this in your model:
GPU/VRAM memory clock change power consumption of the card and
the power consumption is often not a trivial function of both of this 
parameters
(i even here simplify the problem by omitting pipeline shutdown). So how 
with
two different separate mode list (one for GPU speed another one for VRAM 
speed)
can you provide consumption information while this consumption depends 
on the
others settings. Then if you give as a solution to make only one list 
you end up

with a more bigger list than previously needed (nrGPUmodes * nrVRAMmodes)
do you expect the user to go through a lengthy list to find what he wants ?
(remember that we will have to add pipeline power off, pll tweaking or many
others way of saving power on such card).

So by choosing this power consumption as a unit of measure you end up
in non trivial case. There is also the question of overclocking, and 
other points

already identified where unfortunately a global design such as your proposal
does not seems to fit properly: local power decision (ethernet, wifi 
card, ...

can power down them self is they are doing nothings but the place where you
can know this is the driver), there is also the child/parent relation, 
how to

estimate power usage (on some configuration one device consumption can
be marginal toward all others things while on other this same device can be
the most power hungry device)... I see all this as bad fit.
And there is no way to design an abstraction given that all hw we 
will have
to deal with are too much different and do not follow any standard 
things

(beside ACPI there is other way to save power brightness, gpu/memory
clock, pll, ...) so i don't see how one might give a common view of 
things
which are fundamentally different in how they affect consumption 
(same end

result with many different paths leading to it).


so you are saying that the power management software must know the 
details of each and every driver, and if you add a new driver you must 
change the power management software before it can do anything 
(including allowing manual control of the modes)


You have to provide an ohm plug in (in an ohm world) where policy for 
this device will be
handled and this plug in need to be designed knowing what the hw export 
through HAL.
Yes it's pain full but you don't want to put policy in the driver and to 
do policy you need

knowledge on the things you deal with.
seems to me I heard similar arguments several years ago about the CPU 
speed settings, it turns out that the cpufreq interface works really 
well for them and the software that's controlling things no longer 
needs to know the details of every CPU.


why did 

Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread david

On Wed, 25 Jul 2007, Jerome Glisse wrote:


On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

 On Tue, 24 Jul 2007, Jerome Glisse wrote:

>  On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >   On Mon, 23 Jul 2007, Igor Stoppa wrote:
> > >   again, HAL / OHM / Mobilin
> > 
> >   I was trying to define the lower level interfaces that these tools 
> >   need.
> >   today they can only know what is possible by reading the source code 
> >   for
> >   each driver and implementing the driver-specific interfaces nessasary 
> >   to
> >   set things, I was proposing a common interface that tools like this 
> >   could

> >   use instead of requiring all the driver-specific knowledge.
> > 
> > 
> >   in a nutshell (and I know this is probably not detailed to be 
> >   acceptable)
> > 
> >   1. the software needs to know what the interconnects and dependancies

> >   between devices are (supposedly this is provided via sysfs)
> > 
> >   2. the software needs to know what type of device this is (again,

> >   supposedly this is provided via sysfs)
> > 
> >   3. the software needs to know what modes exist for a driver/piece of
> >   hardware. to make any decisions this infomation needs to provide 
> >   some
> >   information about the capability of the mode and the power 
> >   consumed in
> >   that mode. in addition there will need to be flags to indicate 
> >   any

> >   special restrictions of a mode
> > 
> >   4. the software needs to know the cost of switching from any mode to 
> >   any
> >   other mode. since some transitions will interact with other 
> >   devices
> >   there will need to be flags to indicate such requirements for 
> >   specific

> >   transitions.
> > 
> >   5. the software needs to be able to find out what mode a device is 
> >   in.
> > 
> >   6. the software needs to be able to tell the driver to switch to a
> >   different mode (I think it would be a very good thing if going to 
> >   a
> >   particular mode was always the same command, no matter what mode 
> >   it is

> >   currently in)
> > 
> >   7. the software needs to figure out the desire of the user.
> > 
> >   my proposal was addressing items #3-#6. it isn't trying to decide 
> >   what to
> >   do, simply to allow the software that _is_ trying to decide what to 
> >   do a

> >   way to find out what it can do.
> > 
> >   David Lang
> 
>  I believe a central place where user can set/change hw state to save

>  power or to increase computational power is definitely a goal to pursue.
>  But i truly think that the OHM approach is the best one ie using plugins
>  so that one can make a plugin specific for each device. The point is 
>  that
>  i believe there is no way to do an abstract interface for this and 
>  trying to
>  do so will endup doing ugly code and any interface would fail to 
>  encompass

>  all possible tweak that might exist for all devices.

 will each plugin have it's own interface? or will you have one interface
 to access the plugins and then the plugins do things behind the scenes?

 I'll bet that the API for the plugins is common, and if so then it could
 be similar to the API that I suggested.


I take here ohm as a reference (this come from my limited understanding of
this daemon so there might be inaccuracy) driver export through HAL
there power management tunning capacity, Then an ohm plugin would use
HAL to give a higher
view of this capacity and also manage policy, preference, permission, ...

Last consumer in power management food chain would be an user interface which
will communicate with ohm (and with all ohm plugin) so desktop writter 
(gnome,
kde, ...) can write some kind of power management center where each ohm 
plugin

can have its own panel. So in the end the user got one place to do all its
power management which is the goal i think you are trying to aim.


no. I am talking about the interface to the drivers that things like HAL 
would use



>  For instance on graphics card you could do the following (maybe more):
>  -change GPU clock
>  -change memory clock
>  -disable part of engine
>  -disable unit
>  i truly don't think you can make a common interface for all this, more
>  over there might be constraint on how you can change things (GPU &
>  memory clock might need to follow a given ratio). So you definitely
>  need knowledge in the user space program to handle this.

 sure you can, just enumerate all the options the driver writer wants to
 offer as options. yes this could be a lengthy list, so what?



My point was that your interface by trying to fit square pegs into round hole
will fail to expose all subtility of each device which might in the end bring
to wrong power management decision. So i believe we can't sum up
power management to list of mode whose attribute are power consumption
& capacity.


it's possible (which is part of the reason I started the thread), but so 
far there 

Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread Jerome Glisse

On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

On Tue, 24 Jul 2007, Jerome Glisse wrote:

> On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>>  On Mon, 23 Jul 2007, Igor Stoppa wrote:
>> >  again, HAL / OHM / Mobilin
>>
>>  I was trying to define the lower level interfaces that these tools need.
>>  today they can only know what is possible by reading the source code for
>>  each driver and implementing the driver-specific interfaces nessasary to
>>  set things, I was proposing a common interface that tools like this could
>>  use instead of requiring all the driver-specific knowledge.
>>
>>
>>  in a nutshell (and I know this is probably not detailed to be acceptable)
>>
>>  1. the software needs to know what the interconnects and dependancies
>>  between devices are (supposedly this is provided via sysfs)
>>
>>  2. the software needs to know what type of device this is (again,
>>  supposedly this is provided via sysfs)
>>
>>  3. the software needs to know what modes exist for a driver/piece of
>>  hardware. to make any decisions this infomation needs to provide some
>>  information about the capability of the mode and the power consumed in
>>  that mode. in addition there will need to be flags to indicate any
>>  special restrictions of a mode
>>
>>  4. the software needs to know the cost of switching from any mode to any
>>  other mode. since some transitions will interact with other devices
>>  there will need to be flags to indicate such requirements for specific
>>  transitions.
>>
>>  5. the software needs to be able to find out what mode a device is in.
>>
>>  6. the software needs to be able to tell the driver to switch to a
>>  different mode (I think it would be a very good thing if going to a
>>  particular mode was always the same command, no matter what mode it is
>>  currently in)
>>
>>  7. the software needs to figure out the desire of the user.
>>
>>  my proposal was addressing items #3-#6. it isn't trying to decide what to
>>  do, simply to allow the software that _is_ trying to decide what to do a
>>  way to find out what it can do.
>>
>>  David Lang
>
> I believe a central place where user can set/change hw state to save
> power or to increase computational power is definitely a goal to pursue.
> But i truly think that the OHM approach is the best one ie using plugins
> so that one can make a plugin specific for each device. The point is that
> i believe there is no way to do an abstract interface for this and trying to
> do so will endup doing ugly code and any interface would fail to encompass
> all possible tweak that might exist for all devices.

will each plugin have it's own interface? or will you have one interface
to access the plugins and then the plugins do things behind the scenes?

I'll bet that the API for the plugins is common, and if so then it could
be similar to the API that I suggested.


I take here ohm as a reference (this come from my limited understanding of
this daemon so there might be inaccuracy) driver export through HAL
there power management tunning capacity, Then an ohm plugin would use
HAL to give a higher
view of this capacity and also manage policy, preference, permission, ...

Last consumer in power management food chain would be an user interface which
will communicate with ohm (and with all ohm plugin) so desktop writter (gnome,
kde, ...) can write some kind of power management center where each ohm plugin
can have its own panel. So in the end the user got one place to do all its
power management which is the goal i think you are trying to aim.


> For instance on graphics card you could do the following (maybe more):
> -change GPU clock
> -change memory clock
> -disable part of engine
> -disable unit
> i truly don't think you can make a common interface for all this, more
> over there might be constraint on how you can change things (GPU &
> memory clock might need to follow a given ratio). So you definitely
> need knowledge in the user space program to handle this.

sure you can, just enumerate all the options the driver writer wants to
offer as options. yes this could be a lengthy list, so what?



My point was that your interface by trying to fit square pegs into round hole
will fail to expose all subtility of each device which might in the end bring
to wrong power management decision. So i believe we can't sum up
power management to list of mode whose attribute are power consumption
& capacity.

And there is no way to design an abstraction given that all hw we will have
to deal with are too much different and do not follow any standard things
(beside ACPI there is other way to save power brightness, gpu/memory
clock, pll, ...) so i don't see how one might give a common view of things
which are fundamentally different in how they affect consumption (same end
result with many different paths leading to it).

best,
Jerome Glisse
-
To unsubscribe from this list: send the line 

Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread david

On Tue, 24 Jul 2007, Igor Stoppa wrote:


On Tue, 2007-07-24 at 10:43 +0200, ext Jerome Glisse wrote:


I believe a central place where user can set/change hw state to save
power or to increase computational power is definitely a goal to pursue.
But i truly think that the OHM approach is the best one ie using plugins
so that one can make a plugin specific for each device. The point is that
i believe there is no way to do an abstract interface for this and trying to
do so will endup doing ugly code and any interface would fail to encompass
all possible tweak that might exist for all devices.

For instance on graphics card you could do the following (maybe more):
-change GPU clock
-change memory clock
-disable part of engine
-disable unit
i truly don't think you can make a common interface for all this, more
over there might be constraint on how you can change things (GPU &
memory clock might need to follow a given ratio). So you definitely
need knowledge in the user space program to handle this.


Even simpler case: LCD backlight can come in many flavors, both in terms
of brightness levels and fixed amount of current required to keep it ON.

Trying to abstract such details from the decision-making makes little
sense.
Isolating that into a separate module, instead, brings the best of both
worlds:
-containment of the HW-specific code
-leveraging every possible, no matter how exotic, power saving mode
available.


huh??

in the proposal that I made all the HW specific code would be in the 
device driver. I was just proposing a way for the driver to advertise what 
it is able to do.


why would you want to pull the code out into a seperate model?

many levels of backlight with different power consumption is trivial to 
do.


backlight 1

mode %capability %power
   aka brightness
00 0
1  100   100
2   7575
3   5050
4   2525

backlight 2

mode %capability %power
   aka brightness
00 0
1  100   100
2   8050
3   6030
3   4020
4   2515

backlight 2

mode %capability %power
   aka brightness
00 0
1  100   100
2   5090


why do you think the decision makeing logic needs to know the details of 
the hardware? if you can abstract the details out then the same control 
logic can be used for different things. if you infuse the hardware 
knowledge with the control logic then you have to change the control 
section every time you want to support a new piece of hardware.


David Lang


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread david

On Tue, 24 Jul 2007, Jerome Glisse wrote:


On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

 On Mon, 23 Jul 2007, Igor Stoppa wrote:
>  again, HAL / OHM / Mobilin

 I was trying to define the lower level interfaces that these tools need.
 today they can only know what is possible by reading the source code for
 each driver and implementing the driver-specific interfaces nessasary to
 set things, I was proposing a common interface that tools like this could
 use instead of requiring all the driver-specific knowledge.


 in a nutshell (and I know this is probably not detailed to be acceptable)

 1. the software needs to know what the interconnects and dependancies
 between devices are (supposedly this is provided via sysfs)

 2. the software needs to know what type of device this is (again,
 supposedly this is provided via sysfs)

 3. the software needs to know what modes exist for a driver/piece of
 hardware. to make any decisions this infomation needs to provide some
 information about the capability of the mode and the power consumed in
 that mode. in addition there will need to be flags to indicate any
 special restrictions of a mode

 4. the software needs to know the cost of switching from any mode to any
 other mode. since some transitions will interact with other devices
 there will need to be flags to indicate such requirements for specific
 transitions.

 5. the software needs to be able to find out what mode a device is in.

 6. the software needs to be able to tell the driver to switch to a
 different mode (I think it would be a very good thing if going to a
 particular mode was always the same command, no matter what mode it is
 currently in)

 7. the software needs to figure out the desire of the user.

 my proposal was addressing items #3-#6. it isn't trying to decide what to
 do, simply to allow the software that _is_ trying to decide what to do a
 way to find out what it can do.

 David Lang


I believe a central place where user can set/change hw state to save
power or to increase computational power is definitely a goal to pursue.
But i truly think that the OHM approach is the best one ie using plugins
so that one can make a plugin specific for each device. The point is that
i believe there is no way to do an abstract interface for this and trying to
do so will endup doing ugly code and any interface would fail to encompass
all possible tweak that might exist for all devices.


will each plugin have it's own interface? or will you have one interface 
to access the plugins and then the plugins do things behind the scenes?


I'll bet that the API for the plugins is common, and if so then it could 
be similar to the API that I suggested.



For instance on graphics card you could do the following (maybe more):
-change GPU clock
-change memory clock
-disable part of engine
-disable unit
i truly don't think you can make a common interface for all this, more
over there might be constraint on how you can change things (GPU &
memory clock might need to follow a given ratio). So you definitely
need knowledge in the user space program to handle this.


sure you can, just enumerate all the options the driver writer wants to 
offer as options. yes this could be a lengthy list, so what?


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread Igor Stoppa
On Tue, 2007-07-24 at 10:43 +0200, ext Jerome Glisse wrote:

> I believe a central place where user can set/change hw state to save
> power or to increase computational power is definitely a goal to pursue.
> But i truly think that the OHM approach is the best one ie using plugins
> so that one can make a plugin specific for each device. The point is that
> i believe there is no way to do an abstract interface for this and trying to
> do so will endup doing ugly code and any interface would fail to encompass
> all possible tweak that might exist for all devices.
> 
> For instance on graphics card you could do the following (maybe more):
> -change GPU clock
> -change memory clock
> -disable part of engine
> -disable unit
> i truly don't think you can make a common interface for all this, more
> over there might be constraint on how you can change things (GPU &
> memory clock might need to follow a given ratio). So you definitely
> need knowledge in the user space program to handle this.

Even simpler case: LCD backlight can come in many flavors, both in terms
of brightness levels and fixed amount of current required to keep it ON.

Trying to abstract such details from the decision-making makes little
sense.
Isolating that into a separate module, instead, brings the best of both
worlds:
-containment of the HW-specific code
-leveraging every possible, no matter how exotic, power saving mode
available.


-- 
Cheers, Igor

Igor Stoppa <[EMAIL PROTECTED]>
(Nokia Multimedia - CP - OSSO / Helsinki, Finland)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread Jerome Glisse

On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

On Mon, 23 Jul 2007, Igor Stoppa wrote:
> again, HAL / OHM / Mobilin

I was trying to define the lower level interfaces that these tools need.
today they can only know what is possible by reading the source code for
each driver and implementing the driver-specific interfaces nessasary to
set things, I was proposing a common interface that tools like this could
use instead of requiring all the driver-specific knowledge.


in a nutshell (and I know this is probably not detailed to be acceptable)

1. the software needs to know what the interconnects and dependancies
between devices are (supposedly this is provided via sysfs)

2. the software needs to know what type of device this is (again,
supposedly this is provided via sysfs)

3. the software needs to know what modes exist for a driver/piece of
hardware. to make any decisions this infomation needs to provide some
information about the capability of the mode and the power consumed in
that mode. in addition there will need to be flags to indicate any
special restrictions of a mode

4. the software needs to know the cost of switching from any mode to any
other mode. since some transitions will interact with other devices
there will need to be flags to indicate such requirements for specific
transitions.

5. the software needs to be able to find out what mode a device is in.

6. the software needs to be able to tell the driver to switch to a
different mode (I think it would be a very good thing if going to a
particular mode was always the same command, no matter what mode it is
currently in)

7. the software needs to figure out the desire of the user.

my proposal was addressing items #3-#6. it isn't trying to decide what to
do, simply to allow the software that _is_ trying to decide what to do a
way to find out what it can do.

David Lang


I believe a central place where user can set/change hw state to save
power or to increase computational power is definitely a goal to pursue.
But i truly think that the OHM approach is the best one ie using plugins
so that one can make a plugin specific for each device. The point is that
i believe there is no way to do an abstract interface for this and trying to
do so will endup doing ugly code and any interface would fail to encompass
all possible tweak that might exist for all devices.

For instance on graphics card you could do the following (maybe more):
-change GPU clock
-change memory clock
-disable part of engine
-disable unit
i truly don't think you can make a common interface for all this, more
over there might be constraint on how you can change things (GPU &
memory clock might need to follow a given ratio). So you definitely
need knowledge in the user space program to handle this.

best,
Jerome Glisse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread Jerome Glisse

On 7/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

On Mon, 23 Jul 2007, Igor Stoppa wrote:
 again, HAL / OHM / Mobilin

I was trying to define the lower level interfaces that these tools need.
today they can only know what is possible by reading the source code for
each driver and implementing the driver-specific interfaces nessasary to
set things, I was proposing a common interface that tools like this could
use instead of requiring all the driver-specific knowledge.


in a nutshell (and I know this is probably not detailed to be acceptable)

1. the software needs to know what the interconnects and dependancies
between devices are (supposedly this is provided via sysfs)

2. the software needs to know what type of device this is (again,
supposedly this is provided via sysfs)

3. the software needs to know what modes exist for a driver/piece of
hardware. to make any decisions this infomation needs to provide some
information about the capability of the mode and the power consumed in
that mode. in addition there will need to be flags to indicate any
special restrictions of a mode

4. the software needs to know the cost of switching from any mode to any
other mode. since some transitions will interact with other devices
there will need to be flags to indicate such requirements for specific
transitions.

5. the software needs to be able to find out what mode a device is in.

6. the software needs to be able to tell the driver to switch to a
different mode (I think it would be a very good thing if going to a
particular mode was always the same command, no matter what mode it is
currently in)

7. the software needs to figure out the desire of the user.

my proposal was addressing items #3-#6. it isn't trying to decide what to
do, simply to allow the software that _is_ trying to decide what to do a
way to find out what it can do.

David Lang


I believe a central place where user can set/change hw state to save
power or to increase computational power is definitely a goal to pursue.
But i truly think that the OHM approach is the best one ie using plugins
so that one can make a plugin specific for each device. The point is that
i believe there is no way to do an abstract interface for this and trying to
do so will endup doing ugly code and any interface would fail to encompass
all possible tweak that might exist for all devices.

For instance on graphics card you could do the following (maybe more):
-change GPU clock
-change memory clock
-disable part of engine
-disable unit
i truly don't think you can make a common interface for all this, more
over there might be constraint on how you can change things (GPU 
memory clock might need to follow a given ratio). So you definitely
need knowledge in the user space program to handle this.

best,
Jerome Glisse
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread Igor Stoppa
On Tue, 2007-07-24 at 10:43 +0200, ext Jerome Glisse wrote:

 I believe a central place where user can set/change hw state to save
 power or to increase computational power is definitely a goal to pursue.
 But i truly think that the OHM approach is the best one ie using plugins
 so that one can make a plugin specific for each device. The point is that
 i believe there is no way to do an abstract interface for this and trying to
 do so will endup doing ugly code and any interface would fail to encompass
 all possible tweak that might exist for all devices.
 
 For instance on graphics card you could do the following (maybe more):
 -change GPU clock
 -change memory clock
 -disable part of engine
 -disable unit
 i truly don't think you can make a common interface for all this, more
 over there might be constraint on how you can change things (GPU 
 memory clock might need to follow a given ratio). So you definitely
 need knowledge in the user space program to handle this.

Even simpler case: LCD backlight can come in many flavors, both in terms
of brightness levels and fixed amount of current required to keep it ON.

Trying to abstract such details from the decision-making makes little
sense.
Isolating that into a separate module, instead, brings the best of both
worlds:
-containment of the HW-specific code
-leveraging every possible, no matter how exotic, power saving mode
available.


-- 
Cheers, Igor

Igor Stoppa [EMAIL PROTECTED]
(Nokia Multimedia - CP - OSSO / Helsinki, Finland)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread david

On Tue, 24 Jul 2007, Jerome Glisse wrote:


On 7/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 On Mon, 23 Jul 2007, Igor Stoppa wrote:
  again, HAL / OHM / Mobilin

 I was trying to define the lower level interfaces that these tools need.
 today they can only know what is possible by reading the source code for
 each driver and implementing the driver-specific interfaces nessasary to
 set things, I was proposing a common interface that tools like this could
 use instead of requiring all the driver-specific knowledge.


 in a nutshell (and I know this is probably not detailed to be acceptable)

 1. the software needs to know what the interconnects and dependancies
 between devices are (supposedly this is provided via sysfs)

 2. the software needs to know what type of device this is (again,
 supposedly this is provided via sysfs)

 3. the software needs to know what modes exist for a driver/piece of
 hardware. to make any decisions this infomation needs to provide some
 information about the capability of the mode and the power consumed in
 that mode. in addition there will need to be flags to indicate any
 special restrictions of a mode

 4. the software needs to know the cost of switching from any mode to any
 other mode. since some transitions will interact with other devices
 there will need to be flags to indicate such requirements for specific
 transitions.

 5. the software needs to be able to find out what mode a device is in.

 6. the software needs to be able to tell the driver to switch to a
 different mode (I think it would be a very good thing if going to a
 particular mode was always the same command, no matter what mode it is
 currently in)

 7. the software needs to figure out the desire of the user.

 my proposal was addressing items #3-#6. it isn't trying to decide what to
 do, simply to allow the software that _is_ trying to decide what to do a
 way to find out what it can do.

 David Lang


I believe a central place where user can set/change hw state to save
power or to increase computational power is definitely a goal to pursue.
But i truly think that the OHM approach is the best one ie using plugins
so that one can make a plugin specific for each device. The point is that
i believe there is no way to do an abstract interface for this and trying to
do so will endup doing ugly code and any interface would fail to encompass
all possible tweak that might exist for all devices.


will each plugin have it's own interface? or will you have one interface 
to access the plugins and then the plugins do things behind the scenes?


I'll bet that the API for the plugins is common, and if so then it could 
be similar to the API that I suggested.



For instance on graphics card you could do the following (maybe more):
-change GPU clock
-change memory clock
-disable part of engine
-disable unit
i truly don't think you can make a common interface for all this, more
over there might be constraint on how you can change things (GPU 
memory clock might need to follow a given ratio). So you definitely
need knowledge in the user space program to handle this.


sure you can, just enumerate all the options the driver writer wants to 
offer as options. yes this could be a lengthy list, so what?


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread david

On Tue, 24 Jul 2007, Igor Stoppa wrote:


On Tue, 2007-07-24 at 10:43 +0200, ext Jerome Glisse wrote:


I believe a central place where user can set/change hw state to save
power or to increase computational power is definitely a goal to pursue.
But i truly think that the OHM approach is the best one ie using plugins
so that one can make a plugin specific for each device. The point is that
i believe there is no way to do an abstract interface for this and trying to
do so will endup doing ugly code and any interface would fail to encompass
all possible tweak that might exist for all devices.

For instance on graphics card you could do the following (maybe more):
-change GPU clock
-change memory clock
-disable part of engine
-disable unit
i truly don't think you can make a common interface for all this, more
over there might be constraint on how you can change things (GPU 
memory clock might need to follow a given ratio). So you definitely
need knowledge in the user space program to handle this.


Even simpler case: LCD backlight can come in many flavors, both in terms
of brightness levels and fixed amount of current required to keep it ON.

Trying to abstract such details from the decision-making makes little
sense.
Isolating that into a separate module, instead, brings the best of both
worlds:
-containment of the HW-specific code
-leveraging every possible, no matter how exotic, power saving mode
available.


huh??

in the proposal that I made all the HW specific code would be in the 
device driver. I was just proposing a way for the driver to advertise what 
it is able to do.


why would you want to pull the code out into a seperate model?

many levels of backlight with different power consumption is trivial to 
do.


backlight 1

mode %capability %power
   aka brightness
00 0
1  100   100
2   7575
3   5050
4   2525

backlight 2

mode %capability %power
   aka brightness
00 0
1  100   100
2   8050
3   6030
3   4020
4   2515

backlight 2

mode %capability %power
   aka brightness
00 0
1  100   100
2   5090


why do you think the decision makeing logic needs to know the details of 
the hardware? if you can abstract the details out then the same control 
logic can be used for different things. if you infuse the hardware 
knowledge with the control logic then you have to change the control 
section every time you want to support a new piece of hardware.


David Lang


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread Jerome Glisse

On 7/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

On Tue, 24 Jul 2007, Jerome Glisse wrote:

 On 7/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  On Mon, 23 Jul 2007, Igor Stoppa wrote:
   again, HAL / OHM / Mobilin

  I was trying to define the lower level interfaces that these tools need.
  today they can only know what is possible by reading the source code for
  each driver and implementing the driver-specific interfaces nessasary to
  set things, I was proposing a common interface that tools like this could
  use instead of requiring all the driver-specific knowledge.


  in a nutshell (and I know this is probably not detailed to be acceptable)

  1. the software needs to know what the interconnects and dependancies
  between devices are (supposedly this is provided via sysfs)

  2. the software needs to know what type of device this is (again,
  supposedly this is provided via sysfs)

  3. the software needs to know what modes exist for a driver/piece of
  hardware. to make any decisions this infomation needs to provide some
  information about the capability of the mode and the power consumed in
  that mode. in addition there will need to be flags to indicate any
  special restrictions of a mode

  4. the software needs to know the cost of switching from any mode to any
  other mode. since some transitions will interact with other devices
  there will need to be flags to indicate such requirements for specific
  transitions.

  5. the software needs to be able to find out what mode a device is in.

  6. the software needs to be able to tell the driver to switch to a
  different mode (I think it would be a very good thing if going to a
  particular mode was always the same command, no matter what mode it is
  currently in)

  7. the software needs to figure out the desire of the user.

  my proposal was addressing items #3-#6. it isn't trying to decide what to
  do, simply to allow the software that _is_ trying to decide what to do a
  way to find out what it can do.

  David Lang

 I believe a central place where user can set/change hw state to save
 power or to increase computational power is definitely a goal to pursue.
 But i truly think that the OHM approach is the best one ie using plugins
 so that one can make a plugin specific for each device. The point is that
 i believe there is no way to do an abstract interface for this and trying to
 do so will endup doing ugly code and any interface would fail to encompass
 all possible tweak that might exist for all devices.

will each plugin have it's own interface? or will you have one interface
to access the plugins and then the plugins do things behind the scenes?

I'll bet that the API for the plugins is common, and if so then it could
be similar to the API that I suggested.


I take here ohm as a reference (this come from my limited understanding of
this daemon so there might be inaccuracy) driver export through HAL
there power management tunning capacity, Then an ohm plugin would use
HAL to give a higher
view of this capacity and also manage policy, preference, permission, ...

Last consumer in power management food chain would be an user interface which
will communicate with ohm (and with all ohm plugin) so desktop writter (gnome,
kde, ...) can write some kind of power management center where each ohm plugin
can have its own panel. So in the end the user got one place to do all its
power management which is the goal i think you are trying to aim.


 For instance on graphics card you could do the following (maybe more):
 -change GPU clock
 -change memory clock
 -disable part of engine
 -disable unit
 i truly don't think you can make a common interface for all this, more
 over there might be constraint on how you can change things (GPU 
 memory clock might need to follow a given ratio). So you definitely
 need knowledge in the user space program to handle this.

sure you can, just enumerate all the options the driver writer wants to
offer as options. yes this could be a lengthy list, so what?



My point was that your interface by trying to fit square pegs into round hole
will fail to expose all subtility of each device which might in the end bring
to wrong power management decision. So i believe we can't sum up
power management to list of mode whose attribute are power consumption
 capacity.

And there is no way to design an abstraction given that all hw we will have
to deal with are too much different and do not follow any standard things
(beside ACPI there is other way to save power brightness, gpu/memory
clock, pll, ...) so i don't see how one might give a common view of things
which are fundamentally different in how they affect consumption (same end
result with many different paths leading to it).

best,
Jerome Glisse
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: [linux-pm] Power Management framework proposal

2007-07-24 Thread david

On Wed, 25 Jul 2007, Jerome Glisse wrote:


On 7/24/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 On Tue, 24 Jul 2007, Jerome Glisse wrote:

  On 7/23/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
On Mon, 23 Jul 2007, Igor Stoppa wrote:
 again, HAL / OHM / Mobilin
  
I was trying to define the lower level interfaces that these tools 
need.
today they can only know what is possible by reading the source code 
for
each driver and implementing the driver-specific interfaces nessasary 
to
set things, I was proposing a common interface that tools like this 
could

use instead of requiring all the driver-specific knowledge.
  
  
in a nutshell (and I know this is probably not detailed to be 
acceptable)
  
1. the software needs to know what the interconnects and dependancies

between devices are (supposedly this is provided via sysfs)
  
2. the software needs to know what type of device this is (again,

supposedly this is provided via sysfs)
  
3. the software needs to know what modes exist for a driver/piece of
hardware. to make any decisions this infomation needs to provide 
some
information about the capability of the mode and the power 
consumed in
that mode. in addition there will need to be flags to indicate 
any

special restrictions of a mode
  
4. the software needs to know the cost of switching from any mode to 
any
other mode. since some transitions will interact with other 
devices
there will need to be flags to indicate such requirements for 
specific

transitions.
  
5. the software needs to be able to find out what mode a device is 
in.
  
6. the software needs to be able to tell the driver to switch to a
different mode (I think it would be a very good thing if going to 
a
particular mode was always the same command, no matter what mode 
it is

currently in)
  
7. the software needs to figure out the desire of the user.
  
my proposal was addressing items #3-#6. it isn't trying to decide 
what to
do, simply to allow the software that _is_ trying to decide what to 
do a

way to find out what it can do.
  
David Lang
 
  I believe a central place where user can set/change hw state to save

  power or to increase computational power is definitely a goal to pursue.
  But i truly think that the OHM approach is the best one ie using plugins
  so that one can make a plugin specific for each device. The point is 
  that
  i believe there is no way to do an abstract interface for this and 
  trying to
  do so will endup doing ugly code and any interface would fail to 
  encompass

  all possible tweak that might exist for all devices.

 will each plugin have it's own interface? or will you have one interface
 to access the plugins and then the plugins do things behind the scenes?

 I'll bet that the API for the plugins is common, and if so then it could
 be similar to the API that I suggested.


I take here ohm as a reference (this come from my limited understanding of
this daemon so there might be inaccuracy) driver export through HAL
there power management tunning capacity, Then an ohm plugin would use
HAL to give a higher
view of this capacity and also manage policy, preference, permission, ...

Last consumer in power management food chain would be an user interface which
will communicate with ohm (and with all ohm plugin) so desktop writter 
(gnome,
kde, ...) can write some kind of power management center where each ohm 
plugin

can have its own panel. So in the end the user got one place to do all its
power management which is the goal i think you are trying to aim.


no. I am talking about the interface to the drivers that things like HAL 
would use



  For instance on graphics card you could do the following (maybe more):
  -change GPU clock
  -change memory clock
  -disable part of engine
  -disable unit
  i truly don't think you can make a common interface for all this, more
  over there might be constraint on how you can change things (GPU 
  memory clock might need to follow a given ratio). So you definitely
  need knowledge in the user space program to handle this.

 sure you can, just enumerate all the options the driver writer wants to
 offer as options. yes this could be a lengthy list, so what?



My point was that your interface by trying to fit square pegs into round hole
will fail to expose all subtility of each device which might in the end bring
to wrong power management decision. So i believe we can't sum up
power management to list of mode whose attribute are power consumption
 capacity.


it's possible (which is part of the reason I started the thread), but so 
far there hasn't been anything identified that is a really bad fit.



And there is no way to design an abstraction given that all hw we will have
to deal 

Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread david

On Mon, 23 Jul 2007, Arjan van de Ven wrote:


On Sun, 2007-07-22 at 22:25 -0700, [EMAIL PROTECTED] wrote:


only if the transitions don't cost anything significant,


these are second order effects though. On a pc, the transition costs are
quite low (as I said, single or low double digit microseconds).


including pausing all drivers before the transition and unpausing them
aftrwords?


on a PC you don't need to do that.


that's not what the OWAP documentation I was told to read said. it 
specificly lists a requirement to pause drivers before the clock change 
and unpause them afterwords.



this works for all systems where the idle power is more lower than the
power you save by dropping speed... and that is almost all of them in
the PC world.


if you can idle the system as a whole I agree with you fully. most PC
hardware (including the mobile stuff) doesn't change it's power
consumption much with load.


even if the rest of the PC is unchanging (which it's not), it is just an
offset to both sides of the equation, and the same on both sides at
that.


but a constant added to both sides makes the relative savings less.


 at Usenix there was a presentiation (I don't
remember if it was by Amazon or Google) about this subject, showing that
current PC hardware only goes down to 50% power when idle (short of
switching power modes) and that they and other big companies were pushing
vendors to improve their hardware, aiming to get the idle power down to
10% (again without suspending anything). so there's some chance that this
will change before too long.


on servers and such, there is a huge offset, sure, but still the effect
is there. And it really isn't 50%.


their measurements and graphs say otherwise.


now you can argue that 0.5 seconds is a really really long time, and
you'd be right. so for really really short stints (say a timer
interrupt) you don't want to change the voltage at all (nor would

you

want to change the plls to change frequency for that matter). But

once

you start chaning those, you might as well go full speed.


this assumes that you can cache 1 second of video, if you have more
real-time requirements you have a much harder time (say video
confrancing
where you don't get the frame until just before you need to display
it)


the same basic math holds for just 1 frame at a fixed rate. At some
point transition costs will get you (and that's where things like
ondemand delayed speedup will save us); but to get back to your
interface, the interface doesnt nearly give the info needed to make
these decisions...


what is it missing?

it lets you find out what modes are avialable and (in relative terms) how 
much capability and power is available in each mode


it lets you find out what the transition costs are from any mode to any 
other mode


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread david

On Mon, 23 Jul 2007, Igor Stoppa wrote:


On Sun, 2007-07-22 at 14:21 -0700, ext [EMAIL PROTECTED] wrote:

[snip]


this is another one. I'd be happy to get pointers to prior ones to learn
from.


https://lists.linux-foundation.org/pipermail/linux-pm/2007-March/011204.html

This is probably one of the latest. Previously there was some clash
between powerop and oppoints that lead to people running away from too
much confusion.


thanks, I'll read through that


Unfortunately, while it's true that there are significant similarities,
there are also notable differencies; as far as i know the USB subsystem
is the one that gets closer to what we have in the embedded arena, since
it can have complex cases of parent-child powering and wakeup.


this API is not trying to represent the parent-child hierarchy. as far as
I know that's documented in sysfs (or is supposed to be). this is just an
attempt to make it so that as you are going through the hierarchy you
don't have to use vastly different API's to control the different
functions.


You are going to end up with parent child relationships, or
user-consumer.
Devices don't exist in the void, but are interconnected.


correct, but the interconnections are already documented via sysfs aren't 
they? if they are why should this new API need to worry about that?



I suspect that most (if not all) of the previous One Solutions have tried
to completely handle all the details of their original case, and then
branch out to the other cases.

this attempt is working from the other direction. the user of this API
doesn't care how something is done, it just wants to know what's possible
and how to tell the system to switch modes.


True, but you are endding up in the same situation: too much abstraction
makes the governing system clumsy and inefficient.


I see it as going the opposite direction, today there is no abstraction, 
you need to know all the details fo everything. I proposed an abstraction 
to avoid needing to kow all the details, this may nd up being just as bad, 
but it's not the same situation :-)



other then just me searching through the lists, do you have a pointer to
some of the differences between the different types that are seen as being
so large that they can't be unified?


I'll be more detailed in further replies to following emails from this
same thread that have already piled up.


thanks, even though I'm dropping the proposal it's always useful to learn 
more.



while I was describing the issues to my roomates over dinner I realized
that the same type of functions are needed for the CPU clocks.

if you have an accepted framework in place there that can do what I
described, please consider extending it to cover other types of devices
and drivers.


That is not part of the fw: the fw simply expresses parent-child clock
distribution and keeps usecounts so that unused clocks are automatically
gated.

The actual clock tree description is platform/arch/board specific and
doesn't affect the framework. You can just roll your own version for x86
by providing a description of the methods used to switch on/off every
individual clock on your board.

So what you are asking for is that somebody writes an x86 version of the
clock fw.


this is more then just setting the clocks on everything (although setting
clocks seems like it fits well into the model) becouse some power modes
are not easily represented just as clocks.


The very same idea of power mode is something that can maybe fit some
simple peripherals (simple as not fine grained contraollable in terms of
what is on and what is off), but certainly it doesn't fit nicely modern
SoC (see OMAP) since ata certain point of time you don't really know
what is the power consumption because many resources are automatically
gated by HW on an on-need basis.
And you don't want to switch this feature off.


it seems to me that you can either get some figure of power consumption 
for a mode (even if it's just relative power consumption compared to other 
modes) or you have no way of planning what to do becouse you have no clue 
what the results of your actions are.



As for latencies, well, only few clocks really have significant impact.
Most notably the main system oscillator. Everything else has 0 latency
since it ends up in opening/closing a clock gate.

Powering device on/off will certainly introduce more latency, but either
the powering is supported by the hw, to make it quick or it has to go
through most, if not all of he usual initialisation sequence; in that
case it probably makes sense to avoid controlling it from kernelspace,
since it will be slow and won't require dedcisions made with us
precision.


and many devices support both a quick almost-off mode and a slow
almost-off mode (as well as a completely off mode), with the slow mode
eating less power, but takeing longer to wake up from. that's the reason
for providing the matrix to let the program makeing the decision decide if
it's worth the time delays 

Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread david

On Mon, 23 Jul 2007, Ondrej Zajicek wrote:


On Sun, Jul 22, 2007 at 09:19:17PM -0700, Arjan van de Ven wrote:

let me give you a real world example then, and the numbers I'm using are
ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
just rounded them a little so that the math works out nice.

power at full speed: 34W
power at half speed: 24W
power at idle: 1W


I have usually seen different numbers, for example:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf

Although this paper speaks about thermal design power instead of power
consumption, i suppose that it should be roughly equal.

For example Athlon 64 3700 (ADA3700AEP5AR):

2.4 GHz, 1.5 V -> 89 W
2.2 GHz, 1.4 V -> 72 W
2.0 GHz, 1.3 V -> 53 W
1.8 GHz, 1.2 V -> 39 W
1.0 GHz, 1.1 V -> 22 W


Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power
consumption shows that it is more efficient to be busy for 2 time units
on 1 GHz than be busy for 1 time unit and be idle for 1 time unit
on 2 GHz.

1 GHz:
both cores idle:48 W
one core busy:  57 W
two cores busy: 66 W

2 GHz:
both cores idle:54 W
one core busy:  78 W
two cores busy: 95 W


what Arjan is saying is one time unit at 2GHz with both cores busy, one 
time unit at 1GHz with both cores idle (this would be 132w/two time units 
vs 143W/two time units) still a win for running a 1GHz, but a smaller one


or better still, one time unit at 2GHz with both cores busy, one time unit 
in sleep mode, in this case if the sleep mode is any good at all it wins.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread Arjan van de Ven
On Sun, 2007-07-22 at 22:25 -0700, [EMAIL PROTECTED] wrote:
> >>
> >> only if the transitions don't cost anything significant,
> >
> > these are second order effects though. On a pc, the transition costs are
> > quite low (as I said, single or low double digit microseconds).
> 
> including pausing all drivers before the transition and unpausing them 
> aftrwords?

on a PC you don't need to do that.

> 
> >> and the
> >> computation capacity per watt of power is the same at all frequencies. the
> >> chip performance numbers I've been seeing (which I admit are mostly
> >> embedded datasheets) indicate that neither of these hold true.
> >
> > let me give you a real world example then, and the numbers I'm using are
> > ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
> > just rounded them a little so that the math works out nice.
> >
> > power at full speed: 34W
> > power at half speed: 24W
> > power at idle: 1W
> 
> are these numbers for the CPU itself or for the a larger chunk?

the cpu at full load.

> > this works for all systems where the idle power is more lower than the
> > power you save by dropping speed... and that is almost all of them in
> > the PC world.
> 
> if you can idle the system as a whole I agree with you fully. most PC 
> hardware (including the mobile stuff) doesn't change it's power 
> consumption much with load.

even if the rest of the PC is unchanging (which it's not), it is just an
offset to both sides of the equation, and the same on both sides at
that.

>  at Usenix there was a presentiation (I don't 
> remember if it was by Amazon or Google) about this subject, showing that 
> current PC hardware only goes down to 50% power when idle (short of 
> switching power modes) and that they and other big companies were pushing 
> vendors to improve their hardware, aiming to get the idle power down to 
> 10% (again without suspending anything). so there's some chance that this 
> will change before too long.

on servers and such, there is a huge offset, sure, but still the effect
is there. And it really isn't 50%.

> 
> > now you can argue that 0.5 seconds is a really really long time, and
> > you'd be right. so for really really short stints (say a timer
> > interrupt) you don't want to change the voltage at all (nor would
> you
> > want to change the plls to change frequency for that matter). But
> once
> > you start chaning those, you might as well go full speed.
> 
> this assumes that you can cache 1 second of video, if you have more 
> real-time requirements you have a much harder time (say video
> confrancing 
> where you don't get the frame until just before you need to display
> it)

the same basic math holds for just 1 frame at a fixed rate. At some
point transition costs will get you (and that's where things like
ondemand delayed speedup will save us); but to get back to your
interface, the interface doesnt nearly give the info needed to make
these decisions...

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread Igor Stoppa
On Sun, 2007-07-22 at 14:21 -0700, ext [EMAIL PROTECTED] wrote:

[snip]

> this is another one. I'd be happy to get pointers to prior ones to learn 
> from.

https://lists.linux-foundation.org/pipermail/linux-pm/2007-March/011204.html

This is probably one of the latest. Previously there was some clash
between powerop and oppoints that lead to people running away from too
much confusion.

> > Unfortunately, while it's true that there are significant similarities,
> > there are also notable differencies; as far as i know the USB subsystem
> > is the one that gets closer to what we have in the embedded arena, since
> > it can have complex cases of parent-child powering and wakeup.
> 
> this API is not trying to represent the parent-child hierarchy. as far as 
> I know that's documented in sysfs (or is supposed to be). this is just an 
> attempt to make it so that as you are going through the hierarchy you 
> don't have to use vastly different API's to control the different 
> functions.

You are going to end up with parent child relationships, or
user-consumer.
Devices don't exist in the void, but are interconnected.

> I suspect that most (if not all) of the previous One Solutions have tried 
> to completely handle all the details of their original case, and then 
> branch out to the other cases.
> 
> this attempt is working from the other direction. the user of this API 
> doesn't care how something is done, it just wants to know what's possible 
> and how to tell the system to switch modes.

True, but you are endding up in the same situation: too much abstraction
makes the governing system clumsy and inefficient.

> other then just me searching through the lists, do you have a pointer to 
> some of the differences between the different types that are seen as being 
> so large that they can't be unified?

I'll be more detailed in further replies to following emails from this
same thread that have already piled up.

> >> while I was describing the issues to my roomates over dinner I realized
> >> that the same type of functions are needed for the CPU clocks.
> >>
> >> if you have an accepted framework in place there that can do what I
> >> described, please consider extending it to cover other types of devices
> >> and drivers.
> >
> > That is not part of the fw: the fw simply expresses parent-child clock
> > distribution and keeps usecounts so that unused clocks are automatically
> > gated.
> >
> > The actual clock tree description is platform/arch/board specific and
> > doesn't affect the framework. You can just roll your own version for x86
> > by providing a description of the methods used to switch on/off every
> > individual clock on your board.
> >
> > So what you are asking for is that somebody writes an x86 version of the
> > clock fw.
> 
> this is more then just setting the clocks on everything (although setting 
> clocks seems like it fits well into the model) becouse some power modes 
> are not easily represented just as clocks.

The very same idea of power mode is something that can maybe fit some
simple peripherals (simple as not fine grained contraollable in terms of
what is on and what is off), but certainly it doesn't fit nicely modern
SoC (see OMAP) since ata certain point of time you don't really know
what is the power consumption because many resources are automatically
gated by HW on an on-need basis.
And you don't want to switch this feature off.

> > As for latencies, well, only few clocks really have significant impact.
> > Most notably the main system oscillator. Everything else has 0 latency
> > since it ends up in opening/closing a clock gate.
> >
> > Powering device on/off will certainly introduce more latency, but either
> > the powering is supported by the hw, to make it quick or it has to go
> > through most, if not all of he usual initialisation sequence; in that
> > case it probably makes sense to avoid controlling it from kernelspace,
> > since it will be slow and won't require dedcisions made with us
> > precision.
> 
> and many devices support both a quick almost-off mode and a slow 
> almost-off mode (as well as a completely off mode), with the slow mode 
> eating less power, but takeing longer to wake up from. that's the reason 
> for providing the matrix to let the program makeing the decision decide if 
> it's worth the time delays to get the power savings
> 
> as I note in anther message, this SPI isn't intended to be strictly 
> kernelspace or strictly userspace. for the ondemand speed governer you are 
> changing the settings quickly and so probably want to do so in the kernel, 
> however some people may be satisfied with slower controls and so could 
> have them in userspace (an extreme example of this would be turning off 
> wireless cards that aren't in use to save power and improve security)

So you are goingto have 2 API: one for kernelspace (evolution of
CPUfreq) and one for userspace, which seems more and more likely to be
an extension to HAL.
Are you 

Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread Ondrej Zajicek
On Sun, Jul 22, 2007 at 09:19:17PM -0700, Arjan van de Ven wrote:
> let me give you a real world example then, and the numbers I'm using are
> ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
> just rounded them a little so that the math works out nice.
> 
> power at full speed: 34W
> power at half speed: 24W
> power at idle: 1W

I have usually seen different numbers, for example:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf

Although this paper speaks about thermal design power instead of power
consumption, i suppose that it should be roughly equal.

For example Athlon 64 3700 (ADA3700AEP5AR):

2.4 GHz, 1.5 V -> 89 W
2.2 GHz, 1.4 V -> 72 W
2.0 GHz, 1.3 V -> 53 W
1.8 GHz, 1.2 V -> 39 W
1.0 GHz, 1.1 V -> 22 W


Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power
consumption shows that it is more efficient to be busy for 2 time units
on 1 GHz than be busy for 1 time unit and be idle for 1 time unit
on 2 GHz.

1 GHz:
both cores idle:48 W
one core busy:  57 W
two cores busy: 66 W

2 GHz:
both cores idle:54 W
one core busy:  78 W
two cores busy: 95 W

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: [EMAIL PROTECTED], jabber: [EMAIL PROTECTED])
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread Ondrej Zajicek
On Sun, Jul 22, 2007 at 09:19:17PM -0700, Arjan van de Ven wrote:
 let me give you a real world example then, and the numbers I'm using are
 ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
 just rounded them a little so that the math works out nice.
 
 power at full speed: 34W
 power at half speed: 24W
 power at idle: 1W

I have usually seen different numbers, for example:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf

Although this paper speaks about thermal design power instead of power
consumption, i suppose that it should be roughly equal.

For example Athlon 64 3700 (ADA3700AEP5AR):

2.4 GHz, 1.5 V - 89 W
2.2 GHz, 1.4 V - 72 W
2.0 GHz, 1.3 V - 53 W
1.8 GHz, 1.2 V - 39 W
1.0 GHz, 1.1 V - 22 W


Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power
consumption shows that it is more efficient to be busy for 2 time units
on 1 GHz than be busy for 1 time unit and be idle for 1 time unit
on 2 GHz.

1 GHz:
both cores idle:48 W
one core busy:  57 W
two cores busy: 66 W

2 GHz:
both cores idle:54 W
one core busy:  78 W
two cores busy: 95 W

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: [EMAIL PROTECTED], jabber: [EMAIL PROTECTED])
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
To err is human -- to blame it on a computer is even more so.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread Igor Stoppa
On Sun, 2007-07-22 at 14:21 -0700, ext [EMAIL PROTECTED] wrote:

[snip]

 this is another one. I'd be happy to get pointers to prior ones to learn 
 from.

https://lists.linux-foundation.org/pipermail/linux-pm/2007-March/011204.html

This is probably one of the latest. Previously there was some clash
between powerop and oppoints that lead to people running away from too
much confusion.

  Unfortunately, while it's true that there are significant similarities,
  there are also notable differencies; as far as i know the USB subsystem
  is the one that gets closer to what we have in the embedded arena, since
  it can have complex cases of parent-child powering and wakeup.
 
 this API is not trying to represent the parent-child hierarchy. as far as 
 I know that's documented in sysfs (or is supposed to be). this is just an 
 attempt to make it so that as you are going through the hierarchy you 
 don't have to use vastly different API's to control the different 
 functions.

You are going to end up with parent child relationships, or
user-consumer.
Devices don't exist in the void, but are interconnected.

 I suspect that most (if not all) of the previous One Solutions have tried 
 to completely handle all the details of their original case, and then 
 branch out to the other cases.
 
 this attempt is working from the other direction. the user of this API 
 doesn't care how something is done, it just wants to know what's possible 
 and how to tell the system to switch modes.

True, but you are endding up in the same situation: too much abstraction
makes the governing system clumsy and inefficient.

 other then just me searching through the lists, do you have a pointer to 
 some of the differences between the different types that are seen as being 
 so large that they can't be unified?

I'll be more detailed in further replies to following emails from this
same thread that have already piled up.

  while I was describing the issues to my roomates over dinner I realized
  that the same type of functions are needed for the CPU clocks.
 
  if you have an accepted framework in place there that can do what I
  described, please consider extending it to cover other types of devices
  and drivers.
 
  That is not part of the fw: the fw simply expresses parent-child clock
  distribution and keeps usecounts so that unused clocks are automatically
  gated.
 
  The actual clock tree description is platform/arch/board specific and
  doesn't affect the framework. You can just roll your own version for x86
  by providing a description of the methods used to switch on/off every
  individual clock on your board.
 
  So what you are asking for is that somebody writes an x86 version of the
  clock fw.
 
 this is more then just setting the clocks on everything (although setting 
 clocks seems like it fits well into the model) becouse some power modes 
 are not easily represented just as clocks.

The very same idea of power mode is something that can maybe fit some
simple peripherals (simple as not fine grained contraollable in terms of
what is on and what is off), but certainly it doesn't fit nicely modern
SoC (see OMAP) since ata certain point of time you don't really know
what is the power consumption because many resources are automatically
gated by HW on an on-need basis.
And you don't want to switch this feature off.

  As for latencies, well, only few clocks really have significant impact.
  Most notably the main system oscillator. Everything else has 0 latency
  since it ends up in opening/closing a clock gate.
 
  Powering device on/off will certainly introduce more latency, but either
  the powering is supported by the hw, to make it quick or it has to go
  through most, if not all of he usual initialisation sequence; in that
  case it probably makes sense to avoid controlling it from kernelspace,
  since it will be slow and won't require dedcisions made with us
  precision.
 
 and many devices support both a quick almost-off mode and a slow 
 almost-off mode (as well as a completely off mode), with the slow mode 
 eating less power, but takeing longer to wake up from. that's the reason 
 for providing the matrix to let the program makeing the decision decide if 
 it's worth the time delays to get the power savings
 
 as I note in anther message, this SPI isn't intended to be strictly 
 kernelspace or strictly userspace. for the ondemand speed governer you are 
 changing the settings quickly and so probably want to do so in the kernel, 
 however some people may be satisfied with slower controls and so could 
 have them in userspace (an extreme example of this would be turning off 
 wireless cards that aren't in use to save power and improve security)

So you are goingto have 2 API: one for kernelspace (evolution of
CPUfreq) and one for userspace, which seems more and more likely to be
an extension to HAL.
Are you informed on HOM and Intel Mobilin ?
http://ohm.freedesktop.org/wiki/
http://www.moblin.org/index.html
 
  I 

Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread Arjan van de Ven
On Sun, 2007-07-22 at 22:25 -0700, [EMAIL PROTECTED] wrote:
 
  only if the transitions don't cost anything significant,
 
  these are second order effects though. On a pc, the transition costs are
  quite low (as I said, single or low double digit microseconds).
 
 including pausing all drivers before the transition and unpausing them 
 aftrwords?

on a PC you don't need to do that.

 
  and the
  computation capacity per watt of power is the same at all frequencies. the
  chip performance numbers I've been seeing (which I admit are mostly
  embedded datasheets) indicate that neither of these hold true.
 
  let me give you a real world example then, and the numbers I'm using are
  ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
  just rounded them a little so that the math works out nice.
 
  power at full speed: 34W
  power at half speed: 24W
  power at idle: 1W
 
 are these numbers for the CPU itself or for the a larger chunk?

the cpu at full load.

  this works for all systems where the idle power is more lower than the
  power you save by dropping speed... and that is almost all of them in
  the PC world.
 
 if you can idle the system as a whole I agree with you fully. most PC 
 hardware (including the mobile stuff) doesn't change it's power 
 consumption much with load.

even if the rest of the PC is unchanging (which it's not), it is just an
offset to both sides of the equation, and the same on both sides at
that.

  at Usenix there was a presentiation (I don't 
 remember if it was by Amazon or Google) about this subject, showing that 
 current PC hardware only goes down to 50% power when idle (short of 
 switching power modes) and that they and other big companies were pushing 
 vendors to improve their hardware, aiming to get the idle power down to 
 10% (again without suspending anything). so there's some chance that this 
 will change before too long.

on servers and such, there is a huge offset, sure, but still the effect
is there. And it really isn't 50%.

 
  now you can argue that 0.5 seconds is a really really long time, and
  you'd be right. so for really really short stints (say a timer
  interrupt) you don't want to change the voltage at all (nor would
 you
  want to change the plls to change frequency for that matter). But
 once
  you start chaning those, you might as well go full speed.
 
 this assumes that you can cache 1 second of video, if you have more 
 real-time requirements you have a much harder time (say video
 confrancing 
 where you don't get the frame until just before you need to display
 it)

the same basic math holds for just 1 frame at a fixed rate. At some
point transition costs will get you (and that's where things like
ondemand delayed speedup will save us); but to get back to your
interface, the interface doesnt nearly give the info needed to make
these decisions...

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread david

On Mon, 23 Jul 2007, Ondrej Zajicek wrote:


On Sun, Jul 22, 2007 at 09:19:17PM -0700, Arjan van de Ven wrote:

let me give you a real world example then, and the numbers I'm using are
ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
just rounded them a little so that the math works out nice.

power at full speed: 34W
power at half speed: 24W
power at idle: 1W


I have usually seen different numbers, for example:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/30430.pdf

Although this paper speaks about thermal design power instead of power
consumption, i suppose that it should be roughly equal.

For example Athlon 64 3700 (ADA3700AEP5AR):

2.4 GHz, 1.5 V - 89 W
2.2 GHz, 1.4 V - 72 W
2.0 GHz, 1.3 V - 53 W
1.8 GHz, 1.2 V - 39 W
1.0 GHz, 1.1 V - 22 W


Even my measurement on PC (Athlon X2, VIA K8T890) of complete PC power
consumption shows that it is more efficient to be busy for 2 time units
on 1 GHz than be busy for 1 time unit and be idle for 1 time unit
on 2 GHz.

1 GHz:
both cores idle:48 W
one core busy:  57 W
two cores busy: 66 W

2 GHz:
both cores idle:54 W
one core busy:  78 W
two cores busy: 95 W


what Arjan is saying is one time unit at 2GHz with both cores busy, one 
time unit at 1GHz with both cores idle (this would be 132w/two time units 
vs 143W/two time units) still a win for running a 1GHz, but a smaller one


or better still, one time unit at 2GHz with both cores busy, one time unit 
in sleep mode, in this case if the sleep mode is any good at all it wins.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread david

On Mon, 23 Jul 2007, Igor Stoppa wrote:


On Sun, 2007-07-22 at 14:21 -0700, ext [EMAIL PROTECTED] wrote:

[snip]


this is another one. I'd be happy to get pointers to prior ones to learn
from.


https://lists.linux-foundation.org/pipermail/linux-pm/2007-March/011204.html

This is probably one of the latest. Previously there was some clash
between powerop and oppoints that lead to people running away from too
much confusion.


thanks, I'll read through that


Unfortunately, while it's true that there are significant similarities,
there are also notable differencies; as far as i know the USB subsystem
is the one that gets closer to what we have in the embedded arena, since
it can have complex cases of parent-child powering and wakeup.


this API is not trying to represent the parent-child hierarchy. as far as
I know that's documented in sysfs (or is supposed to be). this is just an
attempt to make it so that as you are going through the hierarchy you
don't have to use vastly different API's to control the different
functions.


You are going to end up with parent child relationships, or
user-consumer.
Devices don't exist in the void, but are interconnected.


correct, but the interconnections are already documented via sysfs aren't 
they? if they are why should this new API need to worry about that?



I suspect that most (if not all) of the previous One Solutions have tried
to completely handle all the details of their original case, and then
branch out to the other cases.

this attempt is working from the other direction. the user of this API
doesn't care how something is done, it just wants to know what's possible
and how to tell the system to switch modes.


True, but you are endding up in the same situation: too much abstraction
makes the governing system clumsy and inefficient.


I see it as going the opposite direction, today there is no abstraction, 
you need to know all the details fo everything. I proposed an abstraction 
to avoid needing to kow all the details, this may nd up being just as bad, 
but it's not the same situation :-)



other then just me searching through the lists, do you have a pointer to
some of the differences between the different types that are seen as being
so large that they can't be unified?


I'll be more detailed in further replies to following emails from this
same thread that have already piled up.


thanks, even though I'm dropping the proposal it's always useful to learn 
more.



while I was describing the issues to my roomates over dinner I realized
that the same type of functions are needed for the CPU clocks.

if you have an accepted framework in place there that can do what I
described, please consider extending it to cover other types of devices
and drivers.


That is not part of the fw: the fw simply expresses parent-child clock
distribution and keeps usecounts so that unused clocks are automatically
gated.

The actual clock tree description is platform/arch/board specific and
doesn't affect the framework. You can just roll your own version for x86
by providing a description of the methods used to switch on/off every
individual clock on your board.

So what you are asking for is that somebody writes an x86 version of the
clock fw.


this is more then just setting the clocks on everything (although setting
clocks seems like it fits well into the model) becouse some power modes
are not easily represented just as clocks.


The very same idea of power mode is something that can maybe fit some
simple peripherals (simple as not fine grained contraollable in terms of
what is on and what is off), but certainly it doesn't fit nicely modern
SoC (see OMAP) since ata certain point of time you don't really know
what is the power consumption because many resources are automatically
gated by HW on an on-need basis.
And you don't want to switch this feature off.


it seems to me that you can either get some figure of power consumption 
for a mode (even if it's just relative power consumption compared to other 
modes) or you have no way of planning what to do becouse you have no clue 
what the results of your actions are.



As for latencies, well, only few clocks really have significant impact.
Most notably the main system oscillator. Everything else has 0 latency
since it ends up in opening/closing a clock gate.

Powering device on/off will certainly introduce more latency, but either
the powering is supported by the hw, to make it quick or it has to go
through most, if not all of he usual initialisation sequence; in that
case it probably makes sense to avoid controlling it from kernelspace,
since it will be slow and won't require dedcisions made with us
precision.


and many devices support both a quick almost-off mode and a slow
almost-off mode (as well as a completely off mode), with the slow mode
eating less power, but takeing longer to wake up from. that's the reason
for providing the matrix to let the program makeing the decision decide if
it's worth the time delays 

Re: [linux-pm] Power Management framework proposal

2007-07-23 Thread david

On Mon, 23 Jul 2007, Arjan van de Ven wrote:


On Sun, 2007-07-22 at 22:25 -0700, [EMAIL PROTECTED] wrote:


only if the transitions don't cost anything significant,


these are second order effects though. On a pc, the transition costs are
quite low (as I said, single or low double digit microseconds).


including pausing all drivers before the transition and unpausing them
aftrwords?


on a PC you don't need to do that.


that's not what the OWAP documentation I was told to read said. it 
specificly lists a requirement to pause drivers before the clock change 
and unpause them afterwords.



this works for all systems where the idle power is more lower than the
power you save by dropping speed... and that is almost all of them in
the PC world.


if you can idle the system as a whole I agree with you fully. most PC
hardware (including the mobile stuff) doesn't change it's power
consumption much with load.


even if the rest of the PC is unchanging (which it's not), it is just an
offset to both sides of the equation, and the same on both sides at
that.


but a constant added to both sides makes the relative savings less.


 at Usenix there was a presentiation (I don't
remember if it was by Amazon or Google) about this subject, showing that
current PC hardware only goes down to 50% power when idle (short of
switching power modes) and that they and other big companies were pushing
vendors to improve their hardware, aiming to get the idle power down to
10% (again without suspending anything). so there's some chance that this
will change before too long.


on servers and such, there is a huge offset, sure, but still the effect
is there. And it really isn't 50%.


their measurements and graphs say otherwise.


now you can argue that 0.5 seconds is a really really long time, and
you'd be right. so for really really short stints (say a timer
interrupt) you don't want to change the voltage at all (nor would

you

want to change the plls to change frequency for that matter). But

once

you start chaning those, you might as well go full speed.


this assumes that you can cache 1 second of video, if you have more
real-time requirements you have a much harder time (say video
confrancing
where you don't get the frame until just before you need to display
it)


the same basic math holds for just 1 frame at a fixed rate. At some
point transition costs will get you (and that's where things like
ondemand delayed speedup will save us); but to get back to your
interface, the interface doesnt nearly give the info needed to make
these decisions...


what is it missing?

it lets you find out what modes are avialable and (in relative terms) how 
much capability and power is available in each mode


it lets you find out what the transition costs are from any mode to any 
other mode


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Arjan van de Ven wrote:


On Sun, 2007-07-22 at 21:04 -0700, [EMAIL PROTECTED] wrote:


this strategy should work well on the normal unpredictable workload that
most people deal with, but there are some cases where the workload becomes
pretty predictable (media players for example) where there really is less
variation, and a need for a constant availability of the cpu, so it may
actually save a smidge of power to run below the highest freq that the
voltage allows rather then running faster and being idle more cycles.


that actually is the example showcase of race-to-idle where you
absolutely want to run at the highest frequency..


only if the transitions don't cost anything significant,


these are second order effects though. On a pc, the transition costs are
quite low (as I said, single or low double digit microseconds).


including pausing all drivers before the transition and unpausing them 
aftrwords?



and the
computation capacity per watt of power is the same at all frequencies. the
chip performance numbers I've been seeing (which I admit are mostly
embedded datasheets) indicate that neither of these hold true.


let me give you a real world example then, and the numbers I'm using are
ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
just rounded them a little so that the math works out nice.

power at full speed: 34W
power at half speed: 24W
power at idle: 1W


are these numbers for the CPU itself or for the a larger chunk? I could 
easily see these numbers for motherboard (including CPU and RAM), but it 
would surprise me if these numbers are for the CPU itself. I'm used to 
seeing datasheets that have a much more linear voltage/freq (and therefor 
a quadratic voltage/power) curve. in some cases the voltage requirements 
drop faster then the frequency.



assume media playback, and a dumb one, that takes half a second to
decode a second of media. (again to make the math simple)

at half speed: Energy for a second is 0.5 * 24 + 0.5 * 1 = 12.5 J
at full speed: Energy for a second is 0.25 * 34 + 0.75 * 1 = 9.25 J

this works for all systems where the idle power is more lower than the
power you save by dropping speed... and that is almost all of them in
the PC world.


if you can idle the system as a whole I agree with you fully. most PC 
hardware (including the mobile stuff) doesn't change it's power 
consumption much with load. at Usenix there was a presentiation (I don't 
remember if it was by Amazon or Google) about this subject, showing that 
current PC hardware only goes down to 50% power when idle (short of 
switching power modes) and that they and other big companies were pushing 
vendors to improve their hardware, aiming to get the idle power down to 
10% (again without suspending anything). so there's some chance that this 
will change before too long.



now you can argue that 0.5 seconds is a really really long time, and
you'd be right. so for really really short stints (say a timer
interrupt) you don't want to change the voltage at all (nor would you
want to change the plls to change frequency for that matter). But once
you start chaning those, you might as well go full speed.


this assumes that you can cache 1 second of video, if you have more 
real-time requirements you have a much harder time (say video confrancing 
where you don't get the frame until just before you need to display it)


David Lang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Arjan van de Ven
On Sun, 2007-07-22 at 21:04 -0700, [EMAIL PROTECTED] wrote:
> >> the fact that you want to run at the max frequancy for a given voltage is
> >
> > no I want to run at the max frequency PERIOD. On just about any PC, it's
> > more power efficient to go full speed when executing code, and then idle
> > for as long as you can. (there are some second order effects that make
> > this a bit more complex, but as first order approach it's a sound
> > approach). Voltage follows, and that's fine.
> 
> this seems to be contradicted by the fact that AMD is listing the ability 
> for each core to run at a different clock speed on the new 4-core chips as 
> an advantage.

that's a marketing thing mostly.. they all still run at the same voltage
anyway.

>  if you always want to run at the max frequency PERIOD then 
> why bother engineering the ability to do otherwise? (as opposed to just 
> shutting down unused cores)

multicore changes the rules a little but not all that much. (the idle
power is higher if not all cores are idle at the same time. Yet... each
core individually trying to be idle as quickly as possible is the best
way to get to the highest "all cores idle" time, unless there is some
really special/weird synchronization)


> >> this strategy should work well on the normal unpredictable workload that
> >> most people deal with, but there are some cases where the workload becomes
> >> pretty predictable (media players for example) where there really is less
> >> variation, and a need for a constant availability of the cpu, so it may
> >> actually save a smidge of power to run below the highest freq that the
> >> voltage allows rather then running faster and being idle more cycles.
> >
> > that actually is the example showcase of race-to-idle where you
> > absolutely want to run at the highest frequency..
> 
> only if the transitions don't cost anything significant, 

these are second order effects though. On a pc, the transition costs are
quite low (as I said, single or low double digit microseconds).
They are not zero, and that is why you see things like ondemand ramp up
only after a little time, as a guestimate to make sure it's not just a
really short lived code execution.

> and the 
> computation capacity per watt of power is the same at all frequencies. the 
> chip performance numbers I've been seeing (which I admit are mostly 
> embedded datasheets) indicate that neither of these hold true.

let me give you a real world example then, and the numbers I'm using are
ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
just rounded them a little so that the math works out nice.

power at full speed: 34W
power at half speed: 24W
power at idle: 1W

assume media playback, and a dumb one, that takes half a second to
decode a second of media. (again to make the math simple)

at half speed: Energy for a second is 0.5 * 24 + 0.5 * 1 = 12.5 J
at full speed: Energy for a second is 0.25 * 34 + 0.75 * 1 = 9.25 J

this works for all systems where the idle power is more lower than the
power you save by dropping speed... and that is almost all of them in
the PC world.

now you can argue that 0.5 seconds is a really really long time, and
you'd be right. so for really really short stints (say a timer
interrupt) you don't want to change the voltage at all (nor would you
want to change the plls to change frequency for that matter). But once
you start chaning those, you might as well go full speed.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Arjan van de Ven wrote:


I disagree with you here. for each frequency setting you can say how much
power the cpu/system is expected to use (especially as a percentage of the
full power mode). creating this value requires you to take two things into
account, the voltage you are running things at (by far the biggest
effect), and the minor difference that the frequency makes at that voltage
(possibly small enough to ignore entirely).

the API I proposed has no problem with there being multiple modes that
have the same %power but with different %capability numbers.


how do you deal with the "power at idle" vs "power at full load".. you
need both at each level to pick the best one, as well as relative
performance etc.


what I was thinking was to use power at full load for the power rateing of 
each mode.



the fact that you want to run at the max frequancy for a given voltage is


no I want to run at the max frequency PERIOD. On just about any PC, it's
more power efficient to go full speed when executing code, and then idle
for as long as you can. (there are some second order effects that make
this a bit more complex, but as first order approach it's a sound
approach). Voltage follows, and that's fine.


this seems to be contradicted by the fact that AMD is listing the ability 
for each core to run at a different clock speed on the new 4-core chips as 
an advantage. if you always want to run at the max frequency PERIOD then 
why bother engineering the ability to do otherwise? (as opposed to just 
shutting down unused cores)


another example is the 80 core demo chip that Intel has been makeing press 
about. it can run at 1Tflop on 25w of power and 2Tflop at 150w of power. 
running at max freq for a 1Tflop workload would have you eating ~75w of 
power (the numbers may be off, I'm going from memory, but the cost in 
power of doubling the speed was _far_ more then double the power 
requirements)



this strategy should work well on the normal unpredictable workload that
most people deal with, but there are some cases where the workload becomes
pretty predictable (media players for example) where there really is less
variation, and a need for a constant availability of the cpu, so it may
actually save a smidge of power to run below the highest freq that the
voltage allows rather then running faster and being idle more cycles.


that actually is the example showcase of race-to-idle where you
absolutely want to run at the highest frequency..


only if the transitions don't cost anything significant, and the 
computation capacity per watt of power is the same at all frequencies. the 
chip performance numbers I've been seeing (which I admit are mostly 
embedded datasheets) indicate that neither of these hold true.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Arjan van de Ven

> I disagree with you here. for each frequency setting you can say how much 
> power the cpu/system is expected to use (especially as a percentage of the 
> full power mode). creating this value requires you to take two things into 
> account, the voltage you are running things at (by far the biggest 
> effect), and the minor difference that the frequency makes at that voltage 
> (possibly small enough to ignore entirely).
> 
> the API I proposed has no problem with there being multiple modes that 
> have the same %power but with different %capability numbers.

how do you deal with the "power at idle" vs "power at full load".. you
need both at each level to pick the best one, as well as relative
performance etc.

> 
> I'm willing to bet that the current cpufreq software just looks at the 
> voltage as the value that tells you how much power the thing is going to 
> use at that setting

it doesn't. 
> 
> the fact that you want to run at the max frequancy for a given voltage is 

no I want to run at the max frequency PERIOD. On just about any PC, it's
more power efficient to go full speed when executing code, and then idle
for as long as you can. (there are some second order effects that make
this a bit more complex, but as first order approach it's a sound
approach). Voltage follows, and that's fine.


> a reasonable strategy, but it's a power saving _strategy_, not a 
> capability of the hardware and the API I'm mentioning should be enough to 
> let you pick the highest performance setting that has the same power 
> rating as the minimum performance you need (or for that matter to go one 
> step futher and go with the most efficiant setting in terms of 
> performance/power that has a performance number higher then what you need, 
> which could actually be better)

why would I care about voltage? Most PCs don't expose it, and that's
fine, they can switch to the voltage needed REALLY quickly (single or
double digit microseconds). PCs in fact only expose numbered states (P0
to P7 at most), and some number that you can use to show the user, but
doesn't mean anything beyond that. Some people interpret it as
"frequency", and that's nice, but it doesn't really mean that. You
really don't know anything beyond that

and that's ok. As I said before, as a general strategy you want "highest
speed when running code" for race-to-idle, with some 2nd order effects
for when you execute code really shortly coming out of idle; in which
case you don't want to do a voltage transition twice (most cpus have the
idle voltage be the lowest-execute voltage as well).



> this strategy should work well on the normal unpredictable workload that 
> most people deal with, but there are some cases where the workload becomes 
> pretty predictable (media players for example) where there really is less 
> variation, and a need for a constant availability of the cpu, so it may 
> actually save a smidge of power to run below the highest freq that the 
> voltage allows rather then running faster and being idle more cycles.

that actually is the example showcase of race-to-idle where you
absolutely want to run at the highest frequency..

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Arjan van de Ven wrote:


son anyway)


I don't think you have got it right: the only info being passed is the
standard cpufreq list of frequencies; everything else is part of the
cpufreq driver.


to make the decisions the software makeing the decision needs to know how
much power would be used at each freq setting.


power used at a certain frequency is not a single variable.
In fact, on most laptops and other similarly power aware devices, it's
in fact better for power consumption to always go to the maximum
frequency as quickly as possible, so that you can be idle for the
longest possible time after that. Good luck finding a generic way to
represent such things in a (userspace) interface


I disagree with you here. for each frequency setting you can say how much 
power the cpu/system is expected to use (especially as a percentage of the 
full power mode). creating this value requires you to take two things into 
account, the voltage you are running things at (by far the biggest 
effect), and the minor difference that the frequency makes at that voltage 
(possibly small enough to ignore entirely).


the API I proposed has no problem with there being multiple modes that 
have the same %power but with different %capability numbers.


I'm willing to bet that the current cpufreq software just looks at the 
voltage as the value that tells you how much power the thing is going to 
use at that setting


the fact that you want to run at the max frequancy for a given voltage is 
a reasonable strategy, but it's a power saving _strategy_, not a 
capability of the hardware and the API I'm mentioning should be enough to 
let you pick the highest performance setting that has the same power 
rating as the minimum performance you need (or for that matter to go one 
step futher and go with the most efficiant setting in terms of 
performance/power that has a performance number higher then what you need, 
which could actually be better)


the fact that you currently want to use this strategy doesn't mean that 
the other possible modes don't exist, and even if you don't use them now 
they should be available within the API (including the cpufreq api)


this strategy should work well on the normal unpredictable workload that 
most people deal with, but there are some cases where the workload becomes 
pretty predictable (media players for example) where there really is less 
variation, and a need for a constant availability of the cpu, so it may 
actually save a smidge of power to run below the highest freq that the 
voltage allows rather then running faster and being idle more cycles.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Arjan van de Ven
> >> son anyway)
> >
> > I don't think you have got it right: the only info being passed is the
> > standard cpufreq list of frequencies; everything else is part of the
> > cpufreq driver.
> 
> to make the decisions the software makeing the decision needs to know how 
> much power would be used at each freq setting.

power used at a certain frequency is not a single variable. 
In fact, on most laptops and other similarly power aware devices, it's
in fact better for power consumption to always go to the maximum
frequency as quickly as possible, so that you can be idle for the
longest possible time after that. Good luck finding a generic way to
represent such things in a (userspace) interface


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Igor Stoppa wrote:


On Sun, 2007-07-22 at 01:58 -0700, ext [EMAIL PROTECTED] wrote:

On Sun, 22 Jul 2007, Igor Stoppa wrote:


[snip]


Could you elaborate on how your proposal is incompatible with enhancing
the clock framework?


It's not that I think it's incompatible with any existing powersaving
tools (in fact I hope it's not)

it's that I think that this (or something similar) could be made to cover
all thevarious power options instead of CPU's having one interface, ACPI
capable drivers having another, embeded devices presenting a third, etc

this was triggered by the mess of different function calls for different
purposes that are used for the suspend functions where you have a bunch of
different functions that are each supposed to be called at a specific time
from a specific mode during the suspend process. with all these different
functions driver writes tend to not bother implementing any of them, and
it seems like there is a fairly steady stream of new functions that end up
being needed. the initial intent was to just change this into a generic
set of calls that every driver writer would implement the minimum set of,
and make it trivially extensable to future capabilities of hardware.


Every now and then there is some attempt to find One solution to bind
them all: x86, SoC, ACPI ... you name it.


this is another one. I'd be happy to get pointers to prior ones to learn 
from.



Unfortunately, while it's true that there are significant similarities,
there are also notable differencies; as far as i know the USB subsystem
is the one that gets closer to what we have in the embedded arena, since
it can have complex cases of parent-child powering and wakeup.


this API is not trying to represent the parent-child hierarchy. as far as 
I know that's documented in sysfs (or is supposed to be). this is just an 
attempt to make it so that as you are going through the hierarchy you 
don't have to use vastly different API's to control the different 
functions.


I suspect that most (if not all) of the previous One Solutions have tried 
to completely handle all the details of their original case, and then 
branch out to the other cases.


this attempt is working from the other direction. the user of this API 
doesn't care how something is done, it just wants to know what's possible 
and how to tell the system to switch modes.


other then just me searching through the lists, do you have a pointer to 
some of the differences between the different types that are seen as being 
so large that they can't be unified?



while I was describing the issues to my roomates over dinner I realized
that the same type of functions are needed for the CPU clocks.

if you have an accepted framework in place there that can do what I
described, please consider extending it to cover other types of devices
and drivers.


That is not part of the fw: the fw simply expresses parent-child clock
distribution and keeps usecounts so that unused clocks are automatically
gated.

The actual clock tree description is platform/arch/board specific and
doesn't affect the framework. You can just roll your own version for x86
by providing a description of the methods used to switch on/off every
individual clock on your board.

So what you are asking for is that somebody writes an x86 version of the
clock fw.


this is more then just setting the clocks on everything (although setting 
clocks seems like it fits well into the model) becouse some power modes 
are not easily represented just as clocks.



As for latencies, well, only few clocks really have significant impact.
Most notably the main system oscillator. Everything else has 0 latency
since it ends up in opening/closing a clock gate.

Powering device on/off will certainly introduce more latency, but either
the powering is supported by the hw, to make it quick or it has to go
through most, if not all of he usual initialisation sequence; in that
case it probably makes sense to avoid controlling it from kernelspace,
since it will be slow and won't require dedcisions made with us
precision.


and many devices support both a quick almost-off mode and a slow 
almost-off mode (as well as a completely off mode), with the slow mode 
eating less power, but takeing longer to wake up from. that's the reason 
for providing the matrix to let the program makeing the decision decide if 
it's worth the time delays to get the power savings


as I note in anther message, this SPI isn't intended to be strictly 
kernelspace or strictly userspace. for the ondemand speed governer you are 
changing the settings quickly and so probably want to do so in the kernel, 
however some people may be satisfied with slower controls and so could 
have them in userspace (an extreme example of this would be turning off 
wireless cards that aren't in use to save power and improve security)




I think you are passing too much
info up the chain to the part makeing the decision (that part doesn't need
to 

Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Igor Stoppa
On Sun, 2007-07-22 at 01:58 -0700, ext [EMAIL PROTECTED] wrote:
> On Sun, 22 Jul 2007, Igor Stoppa wrote:

[snip]

> > Could you elaborate on how your proposal is incompatible with enhancing
> > the clock framework?
> 
> It's not that I think it's incompatible with any existing powersaving 
> tools (in fact I hope it's not)
> 
> it's that I think that this (or something similar) could be made to cover 
> all thevarious power options instead of CPU's having one interface, ACPI 
> capable drivers having another, embeded devices presenting a third, etc
> 
> this was triggered by the mess of different function calls for different 
> purposes that are used for the suspend functions where you have a bunch of 
> different functions that are each supposed to be called at a specific time 
> from a specific mode during the suspend process. with all these different 
> functions driver writes tend to not bother implementing any of them, and 
> it seems like there is a fairly steady stream of new functions that end up 
> being needed. the initial intent was to just change this into a generic 
> set of calls that every driver writer would implement the minimum set of, 
> and make it trivially extensable to future capabilities of hardware.

Every now and then there is some attempt to find One solution to bind
them all: x86, SoC, ACPI ... you name it.

Unfortunately, while it's true that there are significant similarities,
there are also notable differencies; as far as i know the USB subsystem
is the one that gets closer to what we have in the embedded arena, since
it can have complex cases of parent-child powering and wakeup.

> one other effect of this is that driver writers would see the mode 
> interface from day one rather then just completely ignoring it. right now 
> device driver authors tend to thing " why worry about figuring out how to 
> implement 'prepare to suspend', 'late suspend', 'suspend', 'quiese but 
> don't suspend', etc" if they aren't really interested in working on 
> suspend, it's not really clear what each of these should do even after 
> reading the docs on it. however listing the power modes that a device can 
> be in, documenting the cost of switching between them, and implementing 
> the transition is something very straightforward for the device driver 
> author to do (and they don't have to worry about the details of how and 
> when the various modes get used, that's up to the suspend/powersaving 
> software to figure out). as such I expect that the driver support for 
> powersaving modes to improve. in fact, I expect that some driver writers 
> will implement a whole bunch of modes, just to show off the features of 
> the hardware. and even if nothing uses the modes right now at least they 
> are implemented and documented for future use (and it should be trivial to 
> have a test routine that just runs every driver you have hardware for 
> through every mode transition to make sure that they all work, so the less 
> commonly used modes shouldn't bitrot too badly)

What you are saying can be summarised as making the driver model more
expressive.

> while I was describing the issues to my roomates over dinner I realized 
> that the same type of functions are needed for the CPU clocks.
> 
> if you have an accepted framework in place there that can do what I 
> described, please consider extending it to cover other types of devices 
> and drivers.

That is not part of the fw: the fw simply expresses parent-child clock
distribution and keeps usecounts so that unused clocks are automatically
gated.

The actual clock tree description is platform/arch/board specific and
doesn't affect the framework. You can just roll your own version for x86
by providing a description of the methods used to switch on/off every
individual clock on your board.

So what you are asking for is that somebody writes an x86 version of the
clock fw.

As for latencies, well, only few clocks really have significant impact.
Most notably the main system oscillator. Everything else has 0 latency
since it ends up in opening/closing a clock gate.

Powering device on/off will certainly introduce more latency, but either
the powering is supported by the hw, to make it quick or it has to go
through most, if not all of he usual initialisation sequence; in that
case it probably makes sense to avoid controlling it from kernelspace,
since it will be slow and won't require dedcisions made with us
precision.

> I want sanity and functionality far more then credit :-)

I want to avoid redesigning the wheel: the current version is not round
yet, but re-starting from a triangle every time is far less appealing.

> thanks for the link. I've read through it, and it looks like there is a 
> lot of the same ideas in your proposal. 

Unless some new hw/technology shows up, I'm afraid the available set of
ideas is very limited :-)

> I think you are passing too much 
> info up the chain to the part makeing the decision (that part doesn't need 
> 

Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Igor Stoppa wrote:


Hi,
On Sat, 2007-07-21 at 23:49 -0700, ext
[EMAIL PROTECTED] wrote:

I'm deliberatly breaking the threading on this so that people who have
tuned out the hibernation thread can take a look at this.

below is the proposal that I made at the bottom of one of the posts on the
hibernation thread.


I have the impression that you are trying to describe a mix of the clock
and latency frameworks.

Could you elaborate on how your proposal is incompatible with enhancing
the clock framework?


It's not that I think it's incompatible with any existing powersaving 
tools (in fact I hope it's not)


it's that I think that this (or something similar) could be made to cover 
all thevarious power options instead of CPU's having one interface, ACPI 
capable drivers having another, embeded devices presenting a third, etc


this was triggered by the mess of different function calls for different 
purposes that are used for the suspend functions where you have a bunch of 
different functions that are each supposed to be called at a specific time 
from a specific mode during the suspend process. with all these different 
functions driver writes tend to not bother implementing any of them, and 
it seems like there is a fairly steady stream of new functions that end up 
being needed. the initial intent was to just change this into a generic 
set of calls that every driver writer would implement the minimum set of, 
and make it trivially extensable to future capabilities of hardware.


one other effect of this is that driver writers would see the mode 
interface from day one rather then just completely ignoring it. right now 
device driver authors tend to thing " why worry about figuring out how to 
implement 'prepare to suspend', 'late suspend', 'suspend', 'quiese but 
don't suspend', etc" if they aren't really interested in working on 
suspend, it's not really clear what each of these should do even after 
reading the docs on it. however listing the power modes that a device can 
be in, documenting the cost of switching between them, and implementing 
the transition is something very straightforward for the device driver 
author to do (and they don't have to worry about the details of how and 
when the various modes get used, that's up to the suspend/powersaving 
software to figure out). as such I expect that the driver support for 
powersaving modes to improve. in fact, I expect that some driver writers 
will implement a whole bunch of modes, just to show off the features of 
the hardware. and even if nothing uses the modes right now at least they 
are implemented and documented for future use (and it should be trivial to 
have a test routine that just runs every driver you have hardware for 
through every mode transition to make sure that they all work, so the less 
commonly used modes shouldn't bitrot too badly)


while I was describing the issues to my roomates over dinner I realized 
that the same type of functions are needed for the CPU clocks.


if you have an accepted framework in place there that can do what I 
described, please consider extending it to cover other types of devices 
and drivers.


I want sanity and functionality far more then credit :-)

David Lang


It looks like you are proposing a brand new shiny thing that frankly I
would be happy to leave alone, unless it is crystal clear that the clock
fw cannot be improved.

The clocfk fw is used for OMAP and other architectures (including SH,
iirc) and so far it has provided very good support for our power
management needs (Nokia 770 and N800).

Currently we are working on DVFS for OMAP2 (see slides presented at the
linux-pm summit for OLS 2007 http://tinyurl.com/28tact ) and even if the
current prototype is not actively involving the clock fw, our final goal
is to make it capable of supporting atomic transactions for changing the
core parameters.


thanks for the link. I've read through it, and it looks like there is a 
lot of the same ideas in your proposal. I think you are passing too much 
info up the chain to the part makeing the decision (that part doesn't need 
to know the details of the voltage/freq choices, the %power/%capability 
numbers I suggested are in many ways more what they are making decision 
son anyway)


in the slideshow you list in the sequence of changing the cpu speed to pre 
and post notify drivers. what exactly are the drivers expected to do with 
the notification? are you asking them to pause and then re-initialize for 
the new power level?



OMAP3 will require suspend to ram implementation where the content of
system memory is retained, while parts or all the SoC are switched off.
The plan is still to have a clock fw based implementation (plus
interaction with the power rails, of course).

I think these are good examples of the non-ACPI systems you are
mentioning.


yes, I think they are.


To make any proposal that has some chance of being accepted, you have to
compare it against the 

Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Igor Stoppa
Hi,
On Sat, 2007-07-21 at 23:49 -0700, ext
[EMAIL PROTECTED] wrote:
> I'm deliberatly breaking the threading on this so that people who have 
> tuned out the hibernation thread can take a look at this.
> 
> below is the proposal that I made at the bottom of one of the posts on the 
> hibernation thread.

I have the impression that you are trying to describe a mix of the clock
and latency frameworks.

Could you elaborate on how your proposal is incompatible with enhancing
the clock framework? 

It looks like you are proposing a brand new shiny thing that frankly I
would be happy to leave alone, unless it is crystal clear that the clock
fw cannot be improved.

The clocfk fw is used for OMAP and other architectures (including SH,
iirc) and so far it has provided very good support for our power
management needs (Nokia 770 and N800).

Currently we are working on DVFS for OMAP2 (see slides presented at the
linux-pm summit for OLS 2007 http://tinyurl.com/28tact ) and even if the
current prototype is not actively involving the clock fw, our final goal
is to make it capable of supporting atomic transactions for changing the
core parameters.

OMAP3 will require suspend to ram implementation where the content of
system memory is retained, while parts or all the SoC are switched off.
The plan is still to have a clock fw based implementation (plus
interaction with the power rails, of course).

I think these are good examples of the non-ACPI systems you are
mentioning.

To make any proposal that has some chance of being accepted, you have to
compare it against the existing solution, explaining:

-what it is bringing in terms of new functionalities
-how it is different
-why the current implementation cannot simply be enhanced

You can refer to the linux-pm archives for examples of failed attempts
over the last year or so, just search for "framework" in the subject.

-- 
Cheers, Igor

Igor Stoppa <[EMAIL PROTECTED]>
(Nokia Multimedia - CP - OSSO / Helsinki, Finland)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Igor Stoppa
Hi,
On Sat, 2007-07-21 at 23:49 -0700, ext
[EMAIL PROTECTED] wrote:
 I'm deliberatly breaking the threading on this so that people who have 
 tuned out the hibernation thread can take a look at this.
 
 below is the proposal that I made at the bottom of one of the posts on the 
 hibernation thread.

I have the impression that you are trying to describe a mix of the clock
and latency frameworks.

Could you elaborate on how your proposal is incompatible with enhancing
the clock framework? 

It looks like you are proposing a brand new shiny thing that frankly I
would be happy to leave alone, unless it is crystal clear that the clock
fw cannot be improved.

The clocfk fw is used for OMAP and other architectures (including SH,
iirc) and so far it has provided very good support for our power
management needs (Nokia 770 and N800).

Currently we are working on DVFS for OMAP2 (see slides presented at the
linux-pm summit for OLS 2007 http://tinyurl.com/28tact ) and even if the
current prototype is not actively involving the clock fw, our final goal
is to make it capable of supporting atomic transactions for changing the
core parameters.

OMAP3 will require suspend to ram implementation where the content of
system memory is retained, while parts or all the SoC are switched off.
The plan is still to have a clock fw based implementation (plus
interaction with the power rails, of course).

I think these are good examples of the non-ACPI systems you are
mentioning.

To make any proposal that has some chance of being accepted, you have to
compare it against the existing solution, explaining:

-what it is bringing in terms of new functionalities
-how it is different
-why the current implementation cannot simply be enhanced

You can refer to the linux-pm archives for examples of failed attempts
over the last year or so, just search for framework in the subject.

-- 
Cheers, Igor

Igor Stoppa [EMAIL PROTECTED]
(Nokia Multimedia - CP - OSSO / Helsinki, Finland)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Igor Stoppa wrote:


Hi,
On Sat, 2007-07-21 at 23:49 -0700, ext
[EMAIL PROTECTED] wrote:

I'm deliberatly breaking the threading on this so that people who have
tuned out the hibernation thread can take a look at this.

below is the proposal that I made at the bottom of one of the posts on the
hibernation thread.


I have the impression that you are trying to describe a mix of the clock
and latency frameworks.

Could you elaborate on how your proposal is incompatible with enhancing
the clock framework?


It's not that I think it's incompatible with any existing powersaving 
tools (in fact I hope it's not)


it's that I think that this (or something similar) could be made to cover 
all thevarious power options instead of CPU's having one interface, ACPI 
capable drivers having another, embeded devices presenting a third, etc


this was triggered by the mess of different function calls for different 
purposes that are used for the suspend functions where you have a bunch of 
different functions that are each supposed to be called at a specific time 
from a specific mode during the suspend process. with all these different 
functions driver writes tend to not bother implementing any of them, and 
it seems like there is a fairly steady stream of new functions that end up 
being needed. the initial intent was to just change this into a generic 
set of calls that every driver writer would implement the minimum set of, 
and make it trivially extensable to future capabilities of hardware.


one other effect of this is that driver writers would see the mode 
interface from day one rather then just completely ignoring it. right now 
device driver authors tend to thing  why worry about figuring out how to 
implement 'prepare to suspend', 'late suspend', 'suspend', 'quiese but 
don't suspend', etc if they aren't really interested in working on 
suspend, it's not really clear what each of these should do even after 
reading the docs on it. however listing the power modes that a device can 
be in, documenting the cost of switching between them, and implementing 
the transition is something very straightforward for the device driver 
author to do (and they don't have to worry about the details of how and 
when the various modes get used, that's up to the suspend/powersaving 
software to figure out). as such I expect that the driver support for 
powersaving modes to improve. in fact, I expect that some driver writers 
will implement a whole bunch of modes, just to show off the features of 
the hardware. and even if nothing uses the modes right now at least they 
are implemented and documented for future use (and it should be trivial to 
have a test routine that just runs every driver you have hardware for 
through every mode transition to make sure that they all work, so the less 
commonly used modes shouldn't bitrot too badly)


while I was describing the issues to my roomates over dinner I realized 
that the same type of functions are needed for the CPU clocks.


if you have an accepted framework in place there that can do what I 
described, please consider extending it to cover other types of devices 
and drivers.


I want sanity and functionality far more then credit :-)

David Lang


It looks like you are proposing a brand new shiny thing that frankly I
would be happy to leave alone, unless it is crystal clear that the clock
fw cannot be improved.

The clocfk fw is used for OMAP and other architectures (including SH,
iirc) and so far it has provided very good support for our power
management needs (Nokia 770 and N800).

Currently we are working on DVFS for OMAP2 (see slides presented at the
linux-pm summit for OLS 2007 http://tinyurl.com/28tact ) and even if the
current prototype is not actively involving the clock fw, our final goal
is to make it capable of supporting atomic transactions for changing the
core parameters.


thanks for the link. I've read through it, and it looks like there is a 
lot of the same ideas in your proposal. I think you are passing too much 
info up the chain to the part makeing the decision (that part doesn't need 
to know the details of the voltage/freq choices, the %power/%capability 
numbers I suggested are in many ways more what they are making decision 
son anyway)


in the slideshow you list in the sequence of changing the cpu speed to pre 
and post notify drivers. what exactly are the drivers expected to do with 
the notification? are you asking them to pause and then re-initialize for 
the new power level?



OMAP3 will require suspend to ram implementation where the content of
system memory is retained, while parts or all the SoC are switched off.
The plan is still to have a clock fw based implementation (plus
interaction with the power rails, of course).

I think these are good examples of the non-ACPI systems you are
mentioning.


yes, I think they are.


To make any proposal that has some chance of being accepted, you have to
compare it against the 

Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Igor Stoppa
On Sun, 2007-07-22 at 01:58 -0700, ext [EMAIL PROTECTED] wrote:
 On Sun, 22 Jul 2007, Igor Stoppa wrote:

[snip]

  Could you elaborate on how your proposal is incompatible with enhancing
  the clock framework?
 
 It's not that I think it's incompatible with any existing powersaving 
 tools (in fact I hope it's not)
 
 it's that I think that this (or something similar) could be made to cover 
 all thevarious power options instead of CPU's having one interface, ACPI 
 capable drivers having another, embeded devices presenting a third, etc
 
 this was triggered by the mess of different function calls for different 
 purposes that are used for the suspend functions where you have a bunch of 
 different functions that are each supposed to be called at a specific time 
 from a specific mode during the suspend process. with all these different 
 functions driver writes tend to not bother implementing any of them, and 
 it seems like there is a fairly steady stream of new functions that end up 
 being needed. the initial intent was to just change this into a generic 
 set of calls that every driver writer would implement the minimum set of, 
 and make it trivially extensable to future capabilities of hardware.

Every now and then there is some attempt to find One solution to bind
them all: x86, SoC, ACPI ... you name it.

Unfortunately, while it's true that there are significant similarities,
there are also notable differencies; as far as i know the USB subsystem
is the one that gets closer to what we have in the embedded arena, since
it can have complex cases of parent-child powering and wakeup.

 one other effect of this is that driver writers would see the mode 
 interface from day one rather then just completely ignoring it. right now 
 device driver authors tend to thing  why worry about figuring out how to 
 implement 'prepare to suspend', 'late suspend', 'suspend', 'quiese but 
 don't suspend', etc if they aren't really interested in working on 
 suspend, it's not really clear what each of these should do even after 
 reading the docs on it. however listing the power modes that a device can 
 be in, documenting the cost of switching between them, and implementing 
 the transition is something very straightforward for the device driver 
 author to do (and they don't have to worry about the details of how and 
 when the various modes get used, that's up to the suspend/powersaving 
 software to figure out). as such I expect that the driver support for 
 powersaving modes to improve. in fact, I expect that some driver writers 
 will implement a whole bunch of modes, just to show off the features of 
 the hardware. and even if nothing uses the modes right now at least they 
 are implemented and documented for future use (and it should be trivial to 
 have a test routine that just runs every driver you have hardware for 
 through every mode transition to make sure that they all work, so the less 
 commonly used modes shouldn't bitrot too badly)

What you are saying can be summarised as making the driver model more
expressive.

 while I was describing the issues to my roomates over dinner I realized 
 that the same type of functions are needed for the CPU clocks.
 
 if you have an accepted framework in place there that can do what I 
 described, please consider extending it to cover other types of devices 
 and drivers.

That is not part of the fw: the fw simply expresses parent-child clock
distribution and keeps usecounts so that unused clocks are automatically
gated.

The actual clock tree description is platform/arch/board specific and
doesn't affect the framework. You can just roll your own version for x86
by providing a description of the methods used to switch on/off every
individual clock on your board.

So what you are asking for is that somebody writes an x86 version of the
clock fw.

As for latencies, well, only few clocks really have significant impact.
Most notably the main system oscillator. Everything else has 0 latency
since it ends up in opening/closing a clock gate.

Powering device on/off will certainly introduce more latency, but either
the powering is supported by the hw, to make it quick or it has to go
through most, if not all of he usual initialisation sequence; in that
case it probably makes sense to avoid controlling it from kernelspace,
since it will be slow and won't require dedcisions made with us
precision.

 I want sanity and functionality far more then credit :-)

I want to avoid redesigning the wheel: the current version is not round
yet, but re-starting from a triangle every time is far less appealing.

 thanks for the link. I've read through it, and it looks like there is a 
 lot of the same ideas in your proposal. 

Unless some new hw/technology shows up, I'm afraid the available set of
ideas is very limited :-)

 I think you are passing too much 
 info up the chain to the part makeing the decision (that part doesn't need 
 to know the details of the voltage/freq choices, the 

Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Igor Stoppa wrote:


On Sun, 2007-07-22 at 01:58 -0700, ext [EMAIL PROTECTED] wrote:

On Sun, 22 Jul 2007, Igor Stoppa wrote:


[snip]


Could you elaborate on how your proposal is incompatible with enhancing
the clock framework?


It's not that I think it's incompatible with any existing powersaving
tools (in fact I hope it's not)

it's that I think that this (or something similar) could be made to cover
all thevarious power options instead of CPU's having one interface, ACPI
capable drivers having another, embeded devices presenting a third, etc

this was triggered by the mess of different function calls for different
purposes that are used for the suspend functions where you have a bunch of
different functions that are each supposed to be called at a specific time
from a specific mode during the suspend process. with all these different
functions driver writes tend to not bother implementing any of them, and
it seems like there is a fairly steady stream of new functions that end up
being needed. the initial intent was to just change this into a generic
set of calls that every driver writer would implement the minimum set of,
and make it trivially extensable to future capabilities of hardware.


Every now and then there is some attempt to find One solution to bind
them all: x86, SoC, ACPI ... you name it.


this is another one. I'd be happy to get pointers to prior ones to learn 
from.



Unfortunately, while it's true that there are significant similarities,
there are also notable differencies; as far as i know the USB subsystem
is the one that gets closer to what we have in the embedded arena, since
it can have complex cases of parent-child powering and wakeup.


this API is not trying to represent the parent-child hierarchy. as far as 
I know that's documented in sysfs (or is supposed to be). this is just an 
attempt to make it so that as you are going through the hierarchy you 
don't have to use vastly different API's to control the different 
functions.


I suspect that most (if not all) of the previous One Solutions have tried 
to completely handle all the details of their original case, and then 
branch out to the other cases.


this attempt is working from the other direction. the user of this API 
doesn't care how something is done, it just wants to know what's possible 
and how to tell the system to switch modes.


other then just me searching through the lists, do you have a pointer to 
some of the differences between the different types that are seen as being 
so large that they can't be unified?



while I was describing the issues to my roomates over dinner I realized
that the same type of functions are needed for the CPU clocks.

if you have an accepted framework in place there that can do what I
described, please consider extending it to cover other types of devices
and drivers.


That is not part of the fw: the fw simply expresses parent-child clock
distribution and keeps usecounts so that unused clocks are automatically
gated.

The actual clock tree description is platform/arch/board specific and
doesn't affect the framework. You can just roll your own version for x86
by providing a description of the methods used to switch on/off every
individual clock on your board.

So what you are asking for is that somebody writes an x86 version of the
clock fw.


this is more then just setting the clocks on everything (although setting 
clocks seems like it fits well into the model) becouse some power modes 
are not easily represented just as clocks.



As for latencies, well, only few clocks really have significant impact.
Most notably the main system oscillator. Everything else has 0 latency
since it ends up in opening/closing a clock gate.

Powering device on/off will certainly introduce more latency, but either
the powering is supported by the hw, to make it quick or it has to go
through most, if not all of he usual initialisation sequence; in that
case it probably makes sense to avoid controlling it from kernelspace,
since it will be slow and won't require dedcisions made with us
precision.


and many devices support both a quick almost-off mode and a slow 
almost-off mode (as well as a completely off mode), with the slow mode 
eating less power, but takeing longer to wake up from. that's the reason 
for providing the matrix to let the program makeing the decision decide if 
it's worth the time delays to get the power savings


as I note in anther message, this SPI isn't intended to be strictly 
kernelspace or strictly userspace. for the ondemand speed governer you are 
changing the settings quickly and so probably want to do so in the kernel, 
however some people may be satisfied with slower controls and so could 
have them in userspace (an extreme example of this would be turning off 
wireless cards that aren't in use to save power and improve security)




I think you are passing too much
info up the chain to the part makeing the decision (that part doesn't need
to 

Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Arjan van de Ven
  son anyway)
 
  I don't think you have got it right: the only info being passed is the
  standard cpufreq list of frequencies; everything else is part of the
  cpufreq driver.
 
 to make the decisions the software makeing the decision needs to know how 
 much power would be used at each freq setting.

power used at a certain frequency is not a single variable. 
In fact, on most laptops and other similarly power aware devices, it's
in fact better for power consumption to always go to the maximum
frequency as quickly as possible, so that you can be idle for the
longest possible time after that. Good luck finding a generic way to
represent such things in a (userspace) interface


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Arjan van de Ven wrote:


son anyway)


I don't think you have got it right: the only info being passed is the
standard cpufreq list of frequencies; everything else is part of the
cpufreq driver.


to make the decisions the software makeing the decision needs to know how
much power would be used at each freq setting.


power used at a certain frequency is not a single variable.
In fact, on most laptops and other similarly power aware devices, it's
in fact better for power consumption to always go to the maximum
frequency as quickly as possible, so that you can be idle for the
longest possible time after that. Good luck finding a generic way to
represent such things in a (userspace) interface


I disagree with you here. for each frequency setting you can say how much 
power the cpu/system is expected to use (especially as a percentage of the 
full power mode). creating this value requires you to take two things into 
account, the voltage you are running things at (by far the biggest 
effect), and the minor difference that the frequency makes at that voltage 
(possibly small enough to ignore entirely).


the API I proposed has no problem with there being multiple modes that 
have the same %power but with different %capability numbers.


I'm willing to bet that the current cpufreq software just looks at the 
voltage as the value that tells you how much power the thing is going to 
use at that setting


the fact that you want to run at the max frequancy for a given voltage is 
a reasonable strategy, but it's a power saving _strategy_, not a 
capability of the hardware and the API I'm mentioning should be enough to 
let you pick the highest performance setting that has the same power 
rating as the minimum performance you need (or for that matter to go one 
step futher and go with the most efficiant setting in terms of 
performance/power that has a performance number higher then what you need, 
which could actually be better)


the fact that you currently want to use this strategy doesn't mean that 
the other possible modes don't exist, and even if you don't use them now 
they should be available within the API (including the cpufreq api)


this strategy should work well on the normal unpredictable workload that 
most people deal with, but there are some cases where the workload becomes 
pretty predictable (media players for example) where there really is less 
variation, and a need for a constant availability of the cpu, so it may 
actually save a smidge of power to run below the highest freq that the 
voltage allows rather then running faster and being idle more cycles.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Arjan van de Ven

 I disagree with you here. for each frequency setting you can say how much 
 power the cpu/system is expected to use (especially as a percentage of the 
 full power mode). creating this value requires you to take two things into 
 account, the voltage you are running things at (by far the biggest 
 effect), and the minor difference that the frequency makes at that voltage 
 (possibly small enough to ignore entirely).
 
 the API I proposed has no problem with there being multiple modes that 
 have the same %power but with different %capability numbers.

how do you deal with the power at idle vs power at full load.. you
need both at each level to pick the best one, as well as relative
performance etc.

 
 I'm willing to bet that the current cpufreq software just looks at the 
 voltage as the value that tells you how much power the thing is going to 
 use at that setting

it doesn't. 
 
 the fact that you want to run at the max frequancy for a given voltage is 

no I want to run at the max frequency PERIOD. On just about any PC, it's
more power efficient to go full speed when executing code, and then idle
for as long as you can. (there are some second order effects that make
this a bit more complex, but as first order approach it's a sound
approach). Voltage follows, and that's fine.


 a reasonable strategy, but it's a power saving _strategy_, not a 
 capability of the hardware and the API I'm mentioning should be enough to 
 let you pick the highest performance setting that has the same power 
 rating as the minimum performance you need (or for that matter to go one 
 step futher and go with the most efficiant setting in terms of 
 performance/power that has a performance number higher then what you need, 
 which could actually be better)

why would I care about voltage? Most PCs don't expose it, and that's
fine, they can switch to the voltage needed REALLY quickly (single or
double digit microseconds). PCs in fact only expose numbered states (P0
to P7 at most), and some number that you can use to show the user, but
doesn't mean anything beyond that. Some people interpret it as
frequency, and that's nice, but it doesn't really mean that. You
really don't know anything beyond that

and that's ok. As I said before, as a general strategy you want highest
speed when running code for race-to-idle, with some 2nd order effects
for when you execute code really shortly coming out of idle; in which
case you don't want to do a voltage transition twice (most cpus have the
idle voltage be the lowest-execute voltage as well).



 this strategy should work well on the normal unpredictable workload that 
 most people deal with, but there are some cases where the workload becomes 
 pretty predictable (media players for example) where there really is less 
 variation, and a need for a constant availability of the cpu, so it may 
 actually save a smidge of power to run below the highest freq that the 
 voltage allows rather then running faster and being idle more cycles.

that actually is the example showcase of race-to-idle where you
absolutely want to run at the highest frequency..

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Arjan van de Ven wrote:


I disagree with you here. for each frequency setting you can say how much
power the cpu/system is expected to use (especially as a percentage of the
full power mode). creating this value requires you to take two things into
account, the voltage you are running things at (by far the biggest
effect), and the minor difference that the frequency makes at that voltage
(possibly small enough to ignore entirely).

the API I proposed has no problem with there being multiple modes that
have the same %power but with different %capability numbers.


how do you deal with the power at idle vs power at full load.. you
need both at each level to pick the best one, as well as relative
performance etc.


what I was thinking was to use power at full load for the power rateing of 
each mode.



the fact that you want to run at the max frequancy for a given voltage is


no I want to run at the max frequency PERIOD. On just about any PC, it's
more power efficient to go full speed when executing code, and then idle
for as long as you can. (there are some second order effects that make
this a bit more complex, but as first order approach it's a sound
approach). Voltage follows, and that's fine.


this seems to be contradicted by the fact that AMD is listing the ability 
for each core to run at a different clock speed on the new 4-core chips as 
an advantage. if you always want to run at the max frequency PERIOD then 
why bother engineering the ability to do otherwise? (as opposed to just 
shutting down unused cores)


another example is the 80 core demo chip that Intel has been makeing press 
about. it can run at 1Tflop on 25w of power and 2Tflop at 150w of power. 
running at max freq for a 1Tflop workload would have you eating ~75w of 
power (the numbers may be off, I'm going from memory, but the cost in 
power of doubling the speed was _far_ more then double the power 
requirements)



this strategy should work well on the normal unpredictable workload that
most people deal with, but there are some cases where the workload becomes
pretty predictable (media players for example) where there really is less
variation, and a need for a constant availability of the cpu, so it may
actually save a smidge of power to run below the highest freq that the
voltage allows rather then running faster and being idle more cycles.


that actually is the example showcase of race-to-idle where you
absolutely want to run at the highest frequency..


only if the transitions don't cost anything significant, and the 
computation capacity per watt of power is the same at all frequencies. the 
chip performance numbers I've been seeing (which I admit are mostly 
embedded datasheets) indicate that neither of these hold true.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread Arjan van de Ven
On Sun, 2007-07-22 at 21:04 -0700, [EMAIL PROTECTED] wrote:
  the fact that you want to run at the max frequancy for a given voltage is
 
  no I want to run at the max frequency PERIOD. On just about any PC, it's
  more power efficient to go full speed when executing code, and then idle
  for as long as you can. (there are some second order effects that make
  this a bit more complex, but as first order approach it's a sound
  approach). Voltage follows, and that's fine.
 
 this seems to be contradicted by the fact that AMD is listing the ability 
 for each core to run at a different clock speed on the new 4-core chips as 
 an advantage.

that's a marketing thing mostly.. they all still run at the same voltage
anyway.

  if you always want to run at the max frequency PERIOD then 
 why bother engineering the ability to do otherwise? (as opposed to just 
 shutting down unused cores)

multicore changes the rules a little but not all that much. (the idle
power is higher if not all cores are idle at the same time. Yet... each
core individually trying to be idle as quickly as possible is the best
way to get to the highest all cores idle time, unless there is some
really special/weird synchronization)


  this strategy should work well on the normal unpredictable workload that
  most people deal with, but there are some cases where the workload becomes
  pretty predictable (media players for example) where there really is less
  variation, and a need for a constant availability of the cpu, so it may
  actually save a smidge of power to run below the highest freq that the
  voltage allows rather then running faster and being idle more cycles.
 
  that actually is the example showcase of race-to-idle where you
  absolutely want to run at the highest frequency..
 
 only if the transitions don't cost anything significant, 

these are second order effects though. On a pc, the transition costs are
quite low (as I said, single or low double digit microseconds).
They are not zero, and that is why you see things like ondemand ramp up
only after a little time, as a guestimate to make sure it's not just a
really short lived code execution.

 and the 
 computation capacity per watt of power is the same at all frequencies. the 
 chip performance numbers I've been seeing (which I admit are mostly 
 embedded datasheets) indicate that neither of these hold true.

let me give you a real world example then, and the numbers I'm using are
ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
just rounded them a little so that the math works out nice.

power at full speed: 34W
power at half speed: 24W
power at idle: 1W

assume media playback, and a dumb one, that takes half a second to
decode a second of media. (again to make the math simple)

at half speed: Energy for a second is 0.5 * 24 + 0.5 * 1 = 12.5 J
at full speed: Energy for a second is 0.25 * 34 + 0.75 * 1 = 9.25 J

this works for all systems where the idle power is more lower than the
power you save by dropping speed... and that is almost all of them in
the PC world.

now you can argue that 0.5 seconds is a really really long time, and
you'd be right. so for really really short stints (say a timer
interrupt) you don't want to change the voltage at all (nor would you
want to change the plls to change frequency for that matter). But once
you start chaning those, you might as well go full speed.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Power Management framework proposal

2007-07-22 Thread david

On Sun, 22 Jul 2007, Arjan van de Ven wrote:


On Sun, 2007-07-22 at 21:04 -0700, [EMAIL PROTECTED] wrote:


this strategy should work well on the normal unpredictable workload that
most people deal with, but there are some cases where the workload becomes
pretty predictable (media players for example) where there really is less
variation, and a need for a constant availability of the cpu, so it may
actually save a smidge of power to run below the highest freq that the
voltage allows rather then running faster and being idle more cycles.


that actually is the example showcase of race-to-idle where you
absolutely want to run at the highest frequency..


only if the transitions don't cost anything significant,


these are second order effects though. On a pc, the transition costs are
quite low (as I said, single or low double digit microseconds).


including pausing all drivers before the transition and unpausing them 
aftrwords?



and the
computation capacity per watt of power is the same at all frequencies. the
chip performance numbers I've been seeing (which I admit are mostly
embedded datasheets) indicate that neither of these hold true.


let me give you a real world example then, and the numbers I'm using are
ballpark the same as you'll find in a (mobile) core 2 duo datasheet, I
just rounded them a little so that the math works out nice.

power at full speed: 34W
power at half speed: 24W
power at idle: 1W


are these numbers for the CPU itself or for the a larger chunk? I could 
easily see these numbers for motherboard (including CPU and RAM), but it 
would surprise me if these numbers are for the CPU itself. I'm used to 
seeing datasheets that have a much more linear voltage/freq (and therefor 
a quadratic voltage/power) curve. in some cases the voltage requirements 
drop faster then the frequency.



assume media playback, and a dumb one, that takes half a second to
decode a second of media. (again to make the math simple)

at half speed: Energy for a second is 0.5 * 24 + 0.5 * 1 = 12.5 J
at full speed: Energy for a second is 0.25 * 34 + 0.75 * 1 = 9.25 J

this works for all systems where the idle power is more lower than the
power you save by dropping speed... and that is almost all of them in
the PC world.


if you can idle the system as a whole I agree with you fully. most PC 
hardware (including the mobile stuff) doesn't change it's power 
consumption much with load. at Usenix there was a presentiation (I don't 
remember if it was by Amazon or Google) about this subject, showing that 
current PC hardware only goes down to 50% power when idle (short of 
switching power modes) and that they and other big companies were pushing 
vendors to improve their hardware, aiming to get the idle power down to 
10% (again without suspending anything). so there's some chance that this 
will change before too long.



now you can argue that 0.5 seconds is a really really long time, and
you'd be right. so for really really short stints (say a timer
interrupt) you don't want to change the voltage at all (nor would you
want to change the plls to change frequency for that matter). But once
you start chaning those, you might as well go full speed.


this assumes that you can cache 1 second of video, if you have more 
real-time requirements you have a much harder time (say video confrancing 
where you don't get the frame until just before you need to display it)


David Lang

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/