[tesla-dev] Adaptive Optimizations page update (ML)

Mark Haywood Wed, 06 Jun 2007 21:43:12 -0400

Mark Haywood wrote:
> Dana H. Myers wrote:
>> Eric Saxe wrote:
>>> David Vengerov wrote:
>>>> Eric Saxe wrote:
>>>>
>>>>> David Vengerov wrote:
>>>>>
>>>>>> Eric Saxe wrote:
>>>>>>
>>>>>>> If there are N or more running threads in an N-CPU system, then 
>>>>>>> utilization is at 100%. Generally, i'm thinking that the CPUs 
>>>>>>> should all be clocked up, if we want to maximize performance. 
>>>>>>> There's not much opportunity to squander power in this case. It's 
>>>>>>> really only the "partial utilization" scenario, where power is 
>>>>>>> being directed at the part of the system that isn't being used. 
>>>>>>
>>>>>> If you hold this view, then the policy I described previously that 
>>>>>> decides on the clock rate of idle CPUs based on how their number 
>>>>>> has fluctuated in the past should be very relevant and allows us 
>>>>>> to find the desired tradeoff between maximizing the the system's 
>>>>>> performance (by never clocking down the idle CPUs) and minimizing 
>>>>>> the power consumption (by running idle CPUs at the lowest power 
>>>>>> and increasing their clock rate only when they receive some 
>>>>>> threads). The policy I am envisioning should be able to clock down 
>>>>>> CPUs to different extents based on estimated probabilities of some 
>>>>>> of them receiving threads in the near future.
>>>>>
>>>>>
>>>>> Where I think this is especially relevant is where there exists a 
>>>>> non-trivial amount of time to bring online additional resources 
>>>>> (latency). If ML techniques can help predict when utilization will 
>>>>> increase/decrease, then perhaps the latency associated with 
>>>>> bringing online/offline the additional capacity can be better 
>>>>> hidden. In the data center context, where the unit of power 
>>>>> management may be suspended / powered off systems, this could be 
>>>>> significant. 
>>>>
>>>> Right. What latency do you expect in AMD and Intel systems? 
>>>
>>> For P-state (frequency/voltage scaling) of processors, I believe the 
>>> latencies involved are very low.
>> Doesn't it depend on the specific chip?  I believe the Intel Enhanced 
>> Speed Step
>> are pretty quick to change state, but Opteron changes are more involved.
>> Mark - any observations here?
> 
> Yes, it does depend upon the chip. Intel claims to have much less 
> latency than AMD with their newer chips, which is why they believe that 
> changing P-states multiple times a second is optimal. I don't know the 
> numbers off the top of my head though.


Actually, I lie. Off the top of my head I was pretty sure that Intel was 
claiming 10 milliseconds and I just googled to confirm it:

http://www.intel.com/cd/ids/developer/asmo-na/eng/195910.htm?page=4

I don't know about Opteron. I haven't measured it. I'm not sure it 
matters, as I don't believe we'll implement P-state support for Opteron. 
Greyhound though ...

[tesla-dev] Adaptive Optimizations page update (ML)

Reply via email to