Hi Bill, Bill.Holler wrote: > Li, Aubrey wrote: >> Bill.Holler wrote: >> >> >>> Hi Aubrey, >>> >>> Time based sample periods were initially investigated, but they >>> performed poorly with "ping pong" type workloads such as >>> producer consumer etc. The problem was it took too long to >>> recognize a load change when the CPU had very short idle >>> and load periods. The current idle-rate based sampling shows >>> very little to no regression on benchmarks such as libmicro. >>> >>> How does the proposed change look in libmicro? >>> >>> We may need to use a hybrid governor which looks at both >>> idle rate and a fixed sample period. >>> >>> Thank you, >>> Bill >>> >> >> The initial ladder governor we used should have the good performance >> with "ping pong" type workload but poor perf/power tradeoff. >> >> If the sample period is too short, we can't avoid transient flick so >> that we have C1 residency when idle, not in C3, and especially, the >> package >> > > Yes. While tuning the "putback" c-state algorithm it was > noticed that sampling two or more consecutive idle periods > made a *huge* difference in reducing power without any > performance issues. Sampling just one period did not detect > what Eric calls "transient busy". The CPU seems busy > when sampled over one idle/busy period, but really it is not. > > The 100ms interval you propose may be long enough to > ensure the sample period spans multiple idle/busy cycles > when there is a transient busy flick. > > >> c-state residency is poor. And if the sample period is too long, we >> may have bad latency issue with "ping pong" workload. So a good >> tradeoff is desired, the suggested interval in patch is a good value >> for SPECpower. I'll send libmicro result to you next week. >> >> A hybrid governor may be better, depends on how we implement it, :) >> > > If we need to we can add idle-rate sampling to more quickly > notice when a CPU becomes busy. We are on holiday until Tuesday. :-) >
I'm afraid idle-rate sampling will capture transient busy and throttle CPU into C1 but actually the system is not. Thanks, -Aubrey
