On 3/13/09 10:29 AM, "Scott Carey" <sc...@richrelevance.com> wrote:


-----------------
Now, with 0ms delay, no threading change:
Throughput is 136000/min @184 users, response time 13ms.  Response time has not 
jumped too drastically yet, but linear performance increases stopped at about 
130 users or so. ProcArrayLock busy, very busy.  CPU: 35% user, 11% system, 54% 
idle

With 0ms delay, and lock modification 2 (wake some, but not all)
Throughput is 161000/min @328 users, response time 28ms.  At 184 users as 
before the change, throughput is 147000/min with response time 0.12ms.  
Performance scales linearly to 144 users, then slows down and slightly 
increases after that with more concurrency.
Throughput increase is between 15% and 25%.


Forgot some data:  with the second test above, CPU: 48% user, 18% sys, 35% 
idle.   CPU increased from 46% used in the first test to 65% used, the 
corresponding throughput increase was not as large, but that is expected on an 
8-threads per core server since memory bandwidth and cache resources at a 
minimum are shared and only trivial tasks can scale 100%.

Based on the above, I would guess that attaining closer to 100% utilization 
(its hard to get past 90% with that many cores no matter what), will probablyl 
give another 10 to 15% improvement at most, to maybe 180000/min throughput.

Its also rather interesting that the 2000 connection case with wait times gets 
170000/min throughput and beats the 328 users with 0 delay result above.  I 
suspect the 'wake all' version is just faster.  I would love to see a 'wake all 
shared, leave exclusives at front of queue' version, since that would not allow 
lock starvation.

Reply via email to