On Wed, Aug 24, 2016 at 11:54 AM, Heikki Linnakangas <[email protected]> wrote:
> On 08/23/2016 06:18 PM, Heikki Linnakangas wrote: > >> On 08/22/2016 08:38 PM, Andres Freund wrote: >> >>> On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote: >>> >>>> I >>>> remember seeing ProcArrayLock contention very visible earlier, but I >>>> can't >>>> hit that now. I suspect you'd still see contention on bigger hardware, >>>> though, my laptop has oly 4 cores. I'll have to find a real server for >>>> the >>>> next round of testing. >>>> >>> >>> Yea, I think that's true. I can just about see ProcArrayLock contention >>> on my more powerful laptop, to see it really bad you need bigger >>> hardware / higher concurrency. >>> >> >> As soon as I sent my previous post, Vladimir Borodin kindly offered >> access to a 32-core server for performance testing. Thanks Vladimir! >> >> I installed Greg Smith's pgbench-tools kit on that server, and ran some >> tests. I'm seeing some benefit on "pgbench -N" workload, but only after >> modifying the test script to use "-M prepared", and using Unix domain >> sockets instead of TCP to connect. Apparently those things add enough >> overhead to mask out the little difference. >> >> Attached is a graph with the results. Full results are available at >> https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch >> improved throughput, measured in TPS, with >= 32 or so clients. The >> biggest difference was with 44 clients, which saw about 5% improvement. >> >> So, not phenomenal, but it's something. I suspect that with more cores, >> the difference would become more clear. >> >> Like on a cue, Alexander Korotkov just offered access to a 72-core >> system :-). Thanks! I'll run the same tests on that. >> > > And here are the results on the 72 core machine (thanks again, > Alexander!). The test setup was the same as on the 32-core machine, except > that I ran it with more clients since the system has more CPU cores. In > summary, in the best case, the patch increases throughput by about 10%. > That peak is with 64 clients. Interestingly, as the number of clients > increases further, the gain evaporates, and the CSN version actually > performs worse than unpatched master. I don't know why that is. One theory > that by eliminating one bottleneck, we're now hitting another bottleneck > which doesn't degrade as gracefully when there's contention. > Did you try to identify this second bottleneck with perf or something? It would be nice to also run pgbench -S. Also, it would be nice to check something like 10% of writes, 90% of reads (which is quite typical workload in real life I believe). ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
