Bill.Holler wrote: > Hi Aubrey, > > The current governor has a problem (bug): it does not detect > when a busy CPU goes idle fast enough. The current algorithm > requires 5 idle cycles (cpupm_cs_sample_tunable) before it > re-evaluates which C-state to go to over the next period. > 5 idle/busy cycles can be a very long time when the CPU becomes > really idle. :-( The CPU continues to burn power in C1 when > it could be going to a deeper C-states. Your proposal fixes this. > :-) > > Are there any other tests we need to run for putback? > > Regards, > Bill
Do we need to run the regular PERF-PIT/ON-PIT? Thanks, -Aubrey > > > On 05/25/09 20:32, Li, Aubrey wrote: >> Hi Bill, >> Bill.Holler wrote: >> >> >>> Li, Aubrey wrote: >>> >>>> Bill.Holler wrote: >>>> >>>> >>>> >>>>> Hi Aubrey, >>>>> >>>>> Time based sample periods were initially investigated, but they >>>>> performed poorly with "ping pong" type workloads such as >>>>> producer consumer etc. The problem was it took too long to >>>>> recognize a load change when the CPU had very short idle >>>>> and load periods. The current idle-rate based sampling shows >>>>> very little to no regression on benchmarks such as libmicro. >>>>> >>>>> How does the proposed change look in libmicro? >>>>> >>>>> We may need to use a hybrid governor which looks at both >>>>> idle rate and a fixed sample period. >>>>> >>>>> Thank you, >>>>> Bill >>>>> >>>>> >>>> The initial ladder governor we used should have the good >>>> performance with "ping pong" type workload but poor perf/power >>>> tradeoff. >>>> >>>> If the sample period is too short, we can't avoid transient flick >>>> so that we have C1 residency when idle, not in C3, and especially, >>>> the package >>>> >>>> >>> Yes. While tuning the "putback" c-state algorithm it was >>> noticed that sampling two or more consecutive idle periods >>> made a *huge* difference in reducing power without any >>> performance issues. Sampling just one period did not detect >>> what Eric calls "transient busy". The CPU seems busy >>> when sampled over one idle/busy period, but really it is not. >>> >>> The 100ms interval you propose may be long enough to >>> ensure the sample period spans multiple idle/busy cycles >>> when there is a transient busy flick. >>> >>> >>> >>>> c-state residency is poor. And if the sample period is too long, we >>>> may have bad latency issue with "ping pong" workload. So a good >>>> tradeoff is desired, the suggested interval in patch is a good >>>> value for SPECpower. I'll send libmicro result to you next week. >>>> >>>> A hybrid governor may be better, depends on how we implement it, :) >>>> >>>> >>> If we need to we can add idle-rate sampling to more quickly >>> notice when a CPU becomes busy. We are on holiday until Tuesday. >>> :-) >>> >>> >> >> I'm afraid idle-rate sampling will capture transient busy and >> throttle CPU into C1 but actually the system is not. >> >> Thanks, >> -Aubrey
