On 12/04/08 17:29, Li, Aubrey wrote:
> Bill Holler wrote:
>
>   
>> Hi Tesla Dev,
>>
>> I have been experimenting not going into C3 or C2 when
>> the number of non-idle CPUs in the cpu-partion exceeds
>> a threshold.  This is an attempt to regain the high load performance.
>> These numbers are on a 2-socket system.
>>
>> With C3 threshold = 40% active and C2 threshold = 60% active
>> the libmicro fork_1000 benchmark completes in 80/100 the time.
>>
>> With C3 threshold = 20% active and C2 threshold = 30% active
>> the libmicro fork_1000 benchmark completes in 50/100 the time.
>> These numbers are still about 80/100 slower than with c-states
>> totally disabled. 
>>
>>
>> I am also going to experiment with cpu idle/wakeup rate.
>> I suspect this may be more important for performance than
>> the number of active cpus.
>>
>>     
>
> How do you calculate the active percent?
> Are you using a polling mechanism?
> Looking forward to revewing the code...
>
> Thanks,
> -Aubrey
>   

Each CPU's idle loop keeps a count of the number of times it was
entered in the last 10 milliseconds. If this number exceeds 10 it
will not enter C3. It is very light weight. :-) The count is almost
always 0 or 1 on an idle system. Longer periods than 10millisec
hurt libmicro performance. Counts larger than 18 also hurt
performance.

My current test repository also has per-PG idle callbacks and
system-wide idle cpu count cstate-throttling. Neither of these
were near as useful. A diff will be available as soon as the
PG idle callback code has been removed.

Bill


Reply via email to