[tesla-dev] CPU power management policies

Mark Haywood Fri, 08 Jun 2007 16:44:10 -0400

David Vengerov wrote:
> Mark Haywood wrote:
>
>> David Vengerov wrote:
>>
>>> Thanks, Bart. It seems then that there two types of policies that 
>>> can be deployed in a system. The first type of policy decreases CPU 
>>> clock frequency if the CPU utilization drops below 100% and 
>>> increases the frequency as the CPU utilization rises. The 
>>> interesting question with this policy is what frequency f should be 
>>> used (as a fraction of the maximum) when a certain CPU utilization 
>>> is observed. 
>>
>>
>> Actually, for x86 this already defined for you. CPUs cannot 
>> necessarily be changed to an arbitrary frequency. Usually, there are 
>> a limited number of frequencies that are supported and those 
>> frequencies are exported to the OS via the ACPI _PSS objects. 
>
> Yes, there are only several possible frequencies that can be chosen, 
> but which one of them should be chosen? It is not optimal to keep 
> stepping through them after a certain jump in the CPU utilization, and 
> a jump between frequencies might be more appropriate.
A combination of user policy and utilization thresholds based on 
frequency percentages would probably help make the decision. I think it 
would require some experimentation and tools for measuring power and 
performance to figure out our behavior.
>
>>
>>> As Bart pointed out, different workloads will respond differently to 
>>> decreases in CPU frequency (the CPU utilization may rise 
>>> proportionately or it may not rise at all). The best way to approach 
>>> this problem, I think, is to ask the user to specify (or choose from 
>>> several options) a utility curve describing the accepted performance 
>>> degradation vs. a decrease in the consumed power.
>>
>>
>> Yes, I believe the Hardware Abstraction Layer (HAL) specification 
>> takes this approach:
>>
>> http://people.freedesktop.org/~david/hal-spec/hal-spec.html#interface-cpufreq
>>  
>>
>>
>> See the [GS]etCPUFreqPerformance method. 
>
> I didn't see any mention of performance vs. power utility functions 
> there. Is it hidden somewhere there?
Maybe I've misunderstood you. The description of the HAL 
SetCPUFreqPerformance method is "Sets the performance of the dynamic 
scaling mechanism. This method summarizes and abstracts all the 
different settings which can be taken for dynamic frequency adjustments, 
like at which load to switch up frequency or how many steps the 
mechanism should traverse until reaching the maximum frequency. The 
higher the value, the more performance you get. Respectively, the higher 
the value, the sooner and the more often the frequency is switched up."


So I read this as it's the method a user would use "to specify (or 
choose from several options) a utility curve."

So, if I understood you earlier, then what you're suggesting (different 
power vs performance options) is already accepted within the community.

>
>>> Then, the application feedback can be used to tune the policy that 
>>> sets CPU frequency based on observed CPU utilization so as to 
>>> maximize the user utility. What do you think about this approach? Is 
>>> this something you would like to experiment with?
>>
>>
>> Sure. We should have a good start once I integrate the current 
>> Enhanced Speedstep support into Nevada (before the end of the month). 
>> I think we'd want to decouple the CPU driver from the Solaris Power 
>> Management framework (initially, anyway) so that we could have finer 
>> control. 
>
> What kind of application feedback will available in Nevada?
Unfortunately, very little. The implementation mirrors what was done for 
Solaris SPARC CPU power management. There is no direct feedback from the 
CPU driver itself (other than some kstat output). All application 
feedback would come via the Solaris Power Management framework ioctls 
(many of which are undocumented and are not public). Note that I did 
say, we have "a good start". There is much room for improvement.

>
>>
>>
>>> The second type of policy decides whether the load on several CPUs 
>>> should be "compacted" into fewer CPUs so as to create some idle CPUs 
>>> that can be kept running at the minimum frequency. This decision can 
>>> be made based on the current and recent utilization of the CPUs, 
>>> their run queue lengths, etc. The ultimate choice between policies 
>>> of this type should be made based on the application feedback and on 
>>> the performance vs. power utility curve specified, so that the 
>>> policy that maximizes the final utility should be chosen. Do you 
>>> think this approach is also worth evaluating?
>>
>>
>> Yes, I do. You seem to be implying that these two approaches are 
>> mutually exclusive? Why wouldn't we want to do both? 
>
> We should do both. I just wanted to point out that a policy of type 1 
> does not require implementation of a policy of type 2.
Ok.

Mark

[tesla-dev] CPU power management policies

Reply via email to