Howard Chu wrote:
Howard Chu wrote:
Howard Chu wrote:
Howard Chu wrote:
Howard Chu wrote:
Well, it doesn't look like this patch caused any harm for the default case.
I'm only seeing about a 10% gain in throughput using two listener threads on a
16 core machine. Not earth-shattering, not bad.
There is a slight drop in throughput for a single listener thread compared to
the pre-patched code. It's around 1%, consistent enough to not be a
measurement error, but not really significant.
Eh. 10% was on a pretty lightly loaded test. On a heavy load the advantage is
only 1.2%. Hardly seems worth the trouble.
At least the advantage always outweighs the above-mentioned 1% loss. I.e.,
cancelling both effects out, we're still ahead overall.
For anyone curious, the slamd reports from these test runs are available on
http://highlandsun.com/hyc/slamd/
Comparing the results, with a single listener thread there are several points
where it is obviously scaling poorly. With two listener threads, those weak
spots in the single listener graphs are gone and everything runs smoothly up
to the peak load.
E.g. comparing single listener
http://highlandsun.com/hyc/slamd/squeeze/singlenew/jobs/optimizing_job_20100807225841-94461048.html
vs double listener
http://highlandsun.com/hyc/slamd/squeeze/double/jobs/optimizing_job_20100808012435-00746454.html
at 56 client threads, the double-listener slapd is 37.6% faster. Dunno why 56
clients is a magic number for the single listener, it jumps up to a more
reasonable throughput at 64 client threads, and the double is only 11.7% faster.
When looking for a performance bottleneck in a system, it always helps to
search in the right component.......
Tossing out the 4 old load generator machines and replacing them with two
8-core servers (and using slamd 2.0.1 instead of 2.0.0) paints quite a
different picture.
http://highlandsun.com/hyc/slamd/squeeze/doublenew/jobs/optimizing_job_20100809225241-75794623.html
With the old client machines the latency went up to the 2-3msec range at peak
load, with the new machines it stays under .9msec. So basically the slowdowns
were due to the load generators getting overloaded, not any part of slapd
getting overloaded.
The shape of the graph still looks odd with this kernel. (The column for 3
threads per client is out of whack.) But the results are so consistent I don't
think there's any measuring error to blame.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/