Hi Aubrey,

The current governor has a problem (bug):  it does not detect
when a busy CPU goes idle fast enough.  The current algorithm
requires 5 idle cycles (cpupm_cs_sample_tunable) before it
re-evaluates which C-state to go to over the next period.
5 idle/busy cycles can be a very long time when the CPU becomes
really idle.  :-(    The CPU continues to burn power in C1 when
it could be going to a deeper C-states.  Your proposal fixes this.  :-)

Are there any other tests we need to run for putback?

Regards,
Bill

 
On 05/25/09 20:32, Li, Aubrey wrote:
> Hi Bill,
> Bill.Holler wrote:
>
>   
>> Li, Aubrey wrote:
>>     
>>> Bill.Holler wrote:
>>>
>>>
>>>       
>>>> Hi Aubrey,
>>>>
>>>> Time based sample periods were initially investigated, but they
>>>> performed poorly with "ping pong" type workloads such as
>>>> producer consumer etc.  The problem was it took too long to
>>>> recognize a load change when the CPU had very short idle
>>>> and load periods.  The current idle-rate based sampling shows
>>>> very little to no regression on benchmarks such as libmicro.
>>>>
>>>> How does the proposed change look in libmicro?
>>>>
>>>> We may need to use a hybrid governor which looks at both
>>>> idle rate and a fixed sample period.
>>>>
>>>> Thank you,
>>>> Bill
>>>>
>>>>         
>>> The initial ladder governor we used should have the good performance
>>> with "ping pong" type workload but poor perf/power tradeoff.
>>>
>>> If the sample period is too short, we can't avoid transient flick so
>>> that we have C1 residency when idle, not in C3, and especially, the
>>> package 
>>>
>>>       
>> Yes.  While tuning the "putback" c-state algorithm it was
>> noticed that sampling two or more consecutive idle periods
>> made a *huge* difference in reducing power without any
>> performance issues.  Sampling just one period did not detect
>> what Eric calls "transient busy".  The CPU seems busy
>> when sampled over one idle/busy period, but really it is not.
>>
>> The 100ms interval you propose may be long enough to
>> ensure the sample period spans multiple idle/busy cycles
>> when there is a transient busy flick.
>>
>>
>>     
>>> c-state residency is poor. And if the sample period is too long, we
>>> may have bad latency issue with "ping pong" workload. So a good
>>> tradeoff is desired, the suggested interval in patch is a good value
>>> for SPECpower. I'll send libmicro result to you next week.
>>>
>>> A hybrid governor may be better, depends on how we implement it, :)
>>>
>>>       
>> If we need to we can add idle-rate sampling to more quickly
>> notice when a CPU becomes busy.  We are on holiday until Tuesday.  :-)
>>
>>     
>
> I'm afraid idle-rate sampling will capture transient busy and throttle CPU
> into C1 but actually the system is not. 
>
> Thanks,
> -Aubrey
>   


Reply via email to