[tesla-dev] Adaptive Optimizations page update (ML)

David Vengerov Tue, 05 Jun 2007 13:43:07 -0700

Eric Saxe wrote:

> David Vengerov wrote:
>
>> Eric Saxe wrote:
>>
>>> If there are N or more running threads in an N-CPU system, then 
>>> utilization is at 100%. Generally, i'm thinking that the CPUs should 
>>> all be clocked up, if we want to maximize performance. There's not 
>>> much opportunity to squander power in this case. It's really only 
>>> the "partial utilization" scenario, where power is being directed at 
>>> the part of the system that isn't being used. 
>>
>> If you hold this view, then the policy I described previously that 
>> decides on the clock rate of idle CPUs based on how their number has 
>> fluctuated in the past should be very relevant and allows us to find 
>> the desired tradeoff between maximizing the the system's performance 
>> (by never clocking down the idle CPUs) and minimizing the power 
>> consumption (by running idle CPUs at the lowest power and increasing 
>> their clock rate only when they receive some threads). The policy I 
>> am envisioning should be able to clock down CPUs to different extents 
>> based on estimated probabilities of some of them receiving threads in 
>> the near future.
>
>
> Where I think this is especially relevant is where there exists a 
> non-trivial amount of time to bring online additional resources 
> (latency). If ML techniques can help predict when utilization will 
> increase/decrease, then perhaps the latency associated with bringing 
> online/offline the additional capacity can be better hidden. In the 
> data center context, where the unit of power management may be 
> suspended / powered off systems, this could be significant.


Right. What latency do you expect in AMD and Intel systems? Do you think 
it will be more efficient for the power management algorithm to poll 
CPUs at regular time intervals enquiring about their workload, or would 
it be better for each CPU to notify the algorithm immediately whenever 
it becomes idle and whenever it switches from idle to working state? In 
the former case, the time interval between polls will result in an 
additional latency before the CPU power can be changed. In the latter 
case no additional latency will be experienced. However, there might 
still be a cost associated with changing the CPU clock frequency. Do you 
envision any such cost being present? If so, then we will want to try to 
minimize the number of CPU power changes, and so if the 8-CPU system 
usually has 4 or 5 threads running on it but occasionally the number of 
threads increases to 8 for a brief time, then the algorithm might not 
want to clock up the last CPU fully, since it will be expected to become 
idle very soon. Vice versa, if the system usually has 8 threads running 
on it but this number can briefly drop down to 4, then it might not be 
necessary to clock down the idle CPUs since they will become busy soon.

David

[tesla-dev] Adaptive Optimizations page update (ML)

Reply via email to