Hi Aubrey, The current governor has a problem (bug): it does not detect when a busy CPU goes idle fast enough. The current algorithm requires 5 idle cycles (cpupm_cs_sample_tunable) before it re-evaluates which C-state to go to over the next period. 5 idle/busy cycles can be a very long time when the CPU becomes really idle. :-( The CPU continues to burn power in C1 when it could be going to a deeper C-states. Your proposal fixes this. :-)
Are there any other tests we need to run for putback? Regards, Bill On 05/25/09 20:32, Li, Aubrey wrote: > Hi Bill, > Bill.Holler wrote: > > >> Li, Aubrey wrote: >> >>> Bill.Holler wrote: >>> >>> >>> >>>> Hi Aubrey, >>>> >>>> Time based sample periods were initially investigated, but they >>>> performed poorly with "ping pong" type workloads such as >>>> producer consumer etc. The problem was it took too long to >>>> recognize a load change when the CPU had very short idle >>>> and load periods. The current idle-rate based sampling shows >>>> very little to no regression on benchmarks such as libmicro. >>>> >>>> How does the proposed change look in libmicro? >>>> >>>> We may need to use a hybrid governor which looks at both >>>> idle rate and a fixed sample period. >>>> >>>> Thank you, >>>> Bill >>>> >>>> >>> The initial ladder governor we used should have the good performance >>> with "ping pong" type workload but poor perf/power tradeoff. >>> >>> If the sample period is too short, we can't avoid transient flick so >>> that we have C1 residency when idle, not in C3, and especially, the >>> package >>> >>> >> Yes. While tuning the "putback" c-state algorithm it was >> noticed that sampling two or more consecutive idle periods >> made a *huge* difference in reducing power without any >> performance issues. Sampling just one period did not detect >> what Eric calls "transient busy". The CPU seems busy >> when sampled over one idle/busy period, but really it is not. >> >> The 100ms interval you propose may be long enough to >> ensure the sample period spans multiple idle/busy cycles >> when there is a transient busy flick. >> >> >> >>> c-state residency is poor. And if the sample period is too long, we >>> may have bad latency issue with "ping pong" workload. So a good >>> tradeoff is desired, the suggested interval in patch is a good value >>> for SPECpower. I'll send libmicro result to you next week. >>> >>> A hybrid governor may be better, depends on how we implement it, :) >>> >>> >> If we need to we can add idle-rate sampling to more quickly >> notice when a CPU becomes busy. We are on holiday until Tuesday. :-) >> >> > > I'm afraid idle-rate sampling will capture transient busy and throttle CPU > into C1 but actually the system is not. > > Thanks, > -Aubrey >
