On Wed, Nov 16, 2011 at 9:33 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Nov 15, 2011 at 6:24 AM, Simon Riggs <si...@2ndquadrant.com> wrote: >> Patch adds cacheline padding around parts of XLogCtl. >> >> Idea from way back, but never got performance tested in a situation >> where WALInsertLock was the bottleneck. >> >> So this is speculative, in need of testing to investigate value. > > I kicked off a round of benchmarking around this. I'll post results > in a few hours when the run finishes.
I thought that the best way to see any possible benefit of this patch would be to use permanent tables (since unlogged tables won't generated enough XLogInsert traffic to throttle the system) and lots of clients. So I ran tests at 32 clients and at 80 clients. There appeared to be a very small speedup. resultswp.master.32:tps = 10275.238580 (including connections establishing) resultswp.master.32:tps = 10231.634195 (including connections establishing) resultswp.master.32:tps = 10220.907852 (including connections establishing) resultswp.master.32:tps = 10115.534399 (including connections establishing) resultswp.master.32:tps = 10048.268430 (including connections establishing) resultswp.xloginsert-padding.32:tps = 10310.046371 (including connections establishing) resultswp.xloginsert-padding.32:tps = 10269.839056 (including connections establishing) resultswp.xloginsert-padding.32:tps = 10259.268030 (including connections establishing) resultswp.xloginsert-padding.32:tps = 10242.861923 (including connections establishing) resultswp.xloginsert-padding.32:tps = 10167.947496 (including connections establishing) Taking the median of those five results, the patch seems to have sped things up by 0.3%. At 80 clients: resultswp.master.80:tps = 7540.388586 (including connections establishing) resultswp.master.80:tps = 7502.803369 (including connections establishing) resultswp.master.80:tps = 7469.338925 (including connections establishing) resultswp.master.80:tps = 7505.377256 (including connections establishing) resultswp.master.80:tps = 7509.402704 (including connections establishing) resultswp.xloginsert-padding.80:tps = 7541.305973 (including connections establishing) resultswp.xloginsert-padding.80:tps = 7568.942936 (including connections establishing) resultswp.xloginsert-padding.80:tps = 7533.128618 (including connections establishing) resultswp.xloginsert-padding.80:tps = 7558.258839 (including connections establishing) resultswp.xloginsert-padding.80:tps = 7562.585635 (including connections establishing) Again taking the median of the five results, that's about an 0.7% speedup. I also tried this over top of my flexlock patch. I figured that the benefit ought to be greater there, because with ProcArrayLock contention reduced, WALInsertLock contention should be the whole banana. But: resultswp.flexlock.32:tps = 11628.279679 (including connections establishing) resultswp.flexlock.32:tps = 11556.254524 (including connections establishing) resultswp.flexlock.32:tps = 11606.840931 (including connections establishing) resultswp.flexlock-xloginsert-padding.32:tps = 11357.904459 (including connections establishing) resultswp.flexlock-xloginsert-padding.32:tps = 11226.538104 (including connections establishing) resultswp.flexlock-xloginsert-padding.32:tps = 11187.932415 (including connections establishing) That's about a 3.3% slowdown. resultswp.flexlock.80:tps = 11560.508772 (including connections establishing) resultswp.flexlock.80:tps = 11673.255674 (including connections establishing) resultswp.flexlock.80:tps = 11597.229252 (including connections establishing) resultswp.flexlock-xloginsert-padding.80:tps = 11092.549726 (including connections establishing) resultswp.flexlock-xloginsert-padding.80:tps = 11179.927207 (including connections establishing) resultswp.flexlock-xloginsert-padding.80:tps = 11112.771332 (including connections establishing) That's even a little worse. One possible explanation is that I don't believe that what you've done here is actually sufficient to guarantee alignment on cache-line boundary - ShmemAlloc() only guarantees MAXALIGN alignment unless the allocation is at least BLCKSZ bytes. So depending on exactly how much shared memory gets allocated before XLogCtlData gets allocated, it's possible that the change could actually end up splitting all of the things you're trying to put on their own cache line across two cache lines. Might be worth fixing that and running this through its paces again. (I wonder if we shouldn't just align every shared memory allocation to 64 or 128 bytes. It wouldn't waste much memory and it would make us much more resistant to performance changes caused by minor modifications to the shared memory layout.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers