Thomas De Schampheleire wrote:
Hey,
I looked at both sunpm.c and us_drv.c (as suggested by sarito).
So, if I understand correctly, the code in sunpm.c allows for devices
to be put in a lower power level when they exceeded a certain
threshold of idle time. They are brought back up again when needed.
When we would consider the off state as a power level as well, I
believe it is possible to extend sunpm.c so that it handles this off
state correctly. This could be using the cpu.c functions you
mentioned.
sunpm.c does not require modification to handle an "off" power state.
The pm framework considers a power level of 0 to be off.
The us driver doesn't support an off state (doesn't advertise a 0
level in its pm-components property) because on the platforms that
it runs on (SPARC workstations), there is no "off" state of the
processor (no way to turn off the processor).
Now, while this can take down the processor when idle, it will be
possibly be woken up again by the scheduler very shortly. So that's
where the story of the power-aware scheduler comes in. We could for
example envision a strategy that works opposite to load balancing.
Instead of trying to balance, try to put as many tasks to the
currently available (in an active state) processors, and only when a
certain condition is reached (not sure yet which parameters, but I
think the cpu load, and queue length are of importance, right?) will a
sleeping processor be woken up and assigned a task to.
For the us_drv.c, if I understand correctly, currently it only
supports cpu frequency scaling, and no power states. Is that correct?
Frequency scaling is a power state (it greatly reduces power consumption
to reduce the clock frequency). us_drv.c does not handle a 0 power
state (off)
because the hardware it supports can't do it.
It says that it is not DDI-compatible. What are the implications of this?
This means that it uses interfaces that aren't in the DDI.
This makes it at risk for changes in the behavior of
interfaces it uses "illegally" (or their removal). Aside from this it is
a normal device driver.
There do seem to be us_attach() and us_detach() methods.
Yes, it is a normal driver. It will, however, only attach successfully
if the device node exports a clock-divisors property (only workstations),
because that is the model it implements.
What exactly
does the us stand for?
UltraSPARC (IIi or III).
Can these functions be used as an alternative
to cpu_add_unit() and _del_unit() for example?
No function in any device driver can be called directly
from outside the driver.
The us_attach and us_detach functions are called through
the driver's dev_ops vector by the device configuration framework.
The us driver only deals with the power level (frequency) of a cpu.
It was created to allow workstations to meet the EPA's EnergyStar
guidelines by reducing their power consumption. It is not involved
in adding or removing cpus from the system because no workstations
support that functionality (at least in 2000 when it was done--though
I'm not aware of any that do now either--the X86 cpus can't be turned
off either in the current workstations that I know of, they can only
have their frequency and voltage reduced).
I understand that a new opensolaris project has been started to
do this over now that there are more interesting cpus and
more general concern for power consumption on the horizon.
-sarito
Thanks, Thomas
On 3/15/07, Eric Saxe <[EMAIL PROTECTED]> wrote:
Hi Thomas,
Thomas De Schampheleire wrote:
>> Ah. :) So DR on x86 doesn't exist yet. DR allows deletion of the CPU
>> from the system, to the extent where the CPU can be
>> powered off. This obviously requires support from the hardware. On
x86,
>> taking the CPU offline (where it is essentially parked in the C1
>> state down in the idle() loop) is as low as we can go right now.
>>
>
> I don't think I mentioned this before, but the architecture we are
currently targeting is SPARC. DR does exist on that, right?k
>
Right. On the SPARC based systems where DR is possible, there are kernel
DR modules that implement DR platform support for system boards, I/O,
etc. cfgadm(1M) is the command that can be used to do DR of systems
components. cfgadm has a number of "plugins" that do the ioctl(2)s to
the DR kernel modules. For example, take a look at cfgadm_sbd(1M) man
page. There is also a set of interfaces (see config_admin(3CFGADM). The
DR drivers handle the platform specific aspects of DR, and call into the
platform independent portions of the kernel to "delete" or "add" CPUs
to/from the system. (See cpu_add_unit() / cpu_del_unit()) in cpu.c.
>> What a coincidence. We've been thinking about this as
>> well. :) We think that having a power management aware
>> dispatcher is important for a few reasons:
>> - Existing performance optimizing dispatcher
>> policy that emphasizes preading out load across the system
>> (optimizing for CMT, for example) tends to
>> minimize availability of power manageable CPU resources. Having
>> a "default" policy where the dispatcher considers both throughput
>> nd power efficiency would be better.
>> - CPU PM policy will be more effective if the dispatcher cooperates.
>> - The system will perform better (overall) given
>> a PM aware dispatcher since it won't by trying to schedule things
>> to run on CPUs that have been clocked down.
>> the next week or so, i'd like to kick off a project
>> to go down the road of making the dispatcher power aware. Perhaps
>> that would be a good way for us (and interested others) to
>> collaborate?
>>
>
> A collaboration would certainly be interesting and useful. However
there are a few things that come to mind:
> - I am not sure whether this would be architecture-specific. The
dispatcher code is common, but power management would probably not...
If you are looking into x86 and I'm into SPARC, then this could
either be a problem, or complementary.
>
The mechanisms at play for doing CPU power management seem to be
platform specific. The policies and abstractions relating to the
implementation of the policies i'm looking to make platform independent.
Ideally, you would want to be able to apply the same policy in the same
way regardless of what the hardware looks like.
> - Timeline: my work is in the context of a master thesis, which
should be finished before July 2007. By then, the actual
implementation should hopefully be already finished for some time so
I have the time to concentrate on writing the report.
> I am not sure how long a complete implementation would take, or
which timeline you had in mind. In case yours is much longer than,
say, May, then we might need to keep separate projects (with of
course the possibility to communicate).
>
What we have in mind is a bit longer term, I think. But I think there is
still opportunity for collaboration.
> - The purpose of my thesis project is to implement a periodic
shutdown of processors in OpenSolaris: each period (period length is
a parameter) the idleness of all processors will be evaluated and
idle processors will be taken offline). The whole idea is to save
energy for battery-powered devices.
>
This sounds similar to what's currently done for current SPARC desktop
power management, except the processors aren't shutdown, but are brought
into a lower power consuming state. That code also looks at the idleness
of system's CPUs, power managing them lower when idle. (See
common/os/sunpm.c), and power.conf(4).
> Obviously, implementing this is much more specific than creating a
complete well-written power-aware scheduler. Due to my time
constraints, the latter could be unfeasible for me to cooperate on. I
think it all depends on your plans and timeline.
Let's keep in touch. Depending on what you want to do, you might be able
to leverage some of our work in progress.
Thanks,
-Eric
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code