On 12/04/08 17:29, Li, Aubrey wrote: > Bill Holler wrote: > > >> Hi Tesla Dev, >> >> I have been experimenting not going into C3 or C2 when >> the number of non-idle CPUs in the cpu-partion exceeds >> a threshold. This is an attempt to regain the high load performance. >> These numbers are on a 2-socket system. >> >> With C3 threshold = 40% active and C2 threshold = 60% active >> the libmicro fork_1000 benchmark completes in 80/100 the time. >> >> With C3 threshold = 20% active and C2 threshold = 30% active >> the libmicro fork_1000 benchmark completes in 50/100 the time. >> These numbers are still about 80/100 slower than with c-states >> totally disabled. >> >> >> I am also going to experiment with cpu idle/wakeup rate. >> I suspect this may be more important for performance than >> the number of active cpus. >> >> > > How do you calculate the active percent? > Are you using a polling mechanism? > Looking forward to revewing the code... > > Thanks, > -Aubrey >
Each CPU's idle loop keeps a count of the number of times it was entered in the last 10 milliseconds. If this number exceeds 10 it will not enter C3. It is very light weight. :-) The count is almost always 0 or 1 on an idle system. Longer periods than 10millisec hurt libmicro performance. Counts larger than 18 also hurt performance. My current test repository also has per-PG idle callbacks and system-wide idle cpu count cstate-throttling. Neither of these were near as useful. A diff will be available as soon as the PG idle callback code has been removed. Bill
