Hi Tesla Dev, I have been experimenting not going into C3 or C2 when the number of non-idle CPUs in the cpu-partion exceeds a threshold. This is an attempt to regain the high load performance. These numbers are on a 2-socket system.
With C3 threshold = 40% active and C2 threshold = 60% active the libmicro fork_1000 benchmark completes in 80/100 the time. With C3 threshold = 20% active and C2 threshold = 30% active the libmicro fork_1000 benchmark completes in 50/100 the time. These numbers are still about 80/100 slower than with c-states totally disabled. I am also going to experiment with cpu idle/wakeup rate. I suspect this may be more important for performance than the number of active cpus. Regards, Bill
