On Wed, Nov 16, 2011 at 9:33 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Nov 15, 2011 at 6:24 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
>> Patch adds cacheline padding around parts of XLogCtl.
>> Idea from way back, but never got performance tested in a situation
>> where WALInsertLock was the bottleneck.
>> So this is speculative, in need of testing to investigate value.
> I kicked off a round of benchmarking around this.  I'll post results
> in a few hours when the run finishes.

I thought that the best way to see any possible benefit of this patch
would be to use permanent tables (since unlogged tables won't
generated enough XLogInsert traffic to throttle the system) and lots
of clients.  So I ran tests at 32 clients and at 80 clients.  There
appeared to be a very small speedup.

resultswp.master.32:tps = 10275.238580 (including connections establishing)
resultswp.master.32:tps = 10231.634195 (including connections establishing)
resultswp.master.32:tps = 10220.907852 (including connections establishing)
resultswp.master.32:tps = 10115.534399 (including connections establishing)
resultswp.master.32:tps = 10048.268430 (including connections establishing)
resultswp.xloginsert-padding.32:tps = 10310.046371 (including
connections establishing)
resultswp.xloginsert-padding.32:tps = 10269.839056 (including
connections establishing)
resultswp.xloginsert-padding.32:tps = 10259.268030 (including
connections establishing)
resultswp.xloginsert-padding.32:tps = 10242.861923 (including
connections establishing)
resultswp.xloginsert-padding.32:tps = 10167.947496 (including
connections establishing)

Taking the median of those five results, the patch seems to have sped
things up by 0.3%.  At 80 clients:

resultswp.master.80:tps = 7540.388586 (including connections establishing)
resultswp.master.80:tps = 7502.803369 (including connections establishing)
resultswp.master.80:tps = 7469.338925 (including connections establishing)
resultswp.master.80:tps = 7505.377256 (including connections establishing)
resultswp.master.80:tps = 7509.402704 (including connections establishing)
resultswp.xloginsert-padding.80:tps = 7541.305973 (including
connections establishing)
resultswp.xloginsert-padding.80:tps = 7568.942936 (including
connections establishing)
resultswp.xloginsert-padding.80:tps = 7533.128618 (including
connections establishing)
resultswp.xloginsert-padding.80:tps = 7558.258839 (including
connections establishing)
resultswp.xloginsert-padding.80:tps = 7562.585635 (including
connections establishing)

Again taking the median of the five results, that's about an 0.7% speedup.

I also tried this over top of my flexlock patch.  I figured that the
benefit ought to be greater there, because with ProcArrayLock
contention reduced, WALInsertLock contention should be the whole
banana.  But:

resultswp.flexlock.32:tps = 11628.279679 (including connections establishing)
resultswp.flexlock.32:tps = 11556.254524 (including connections establishing)
resultswp.flexlock.32:tps = 11606.840931 (including connections establishing)
resultswp.flexlock-xloginsert-padding.32:tps = 11357.904459 (including
connections establishing)
resultswp.flexlock-xloginsert-padding.32:tps = 11226.538104 (including
connections establishing)
resultswp.flexlock-xloginsert-padding.32:tps = 11187.932415 (including
connections establishing)

That's about a 3.3% slowdown.

resultswp.flexlock.80:tps = 11560.508772 (including connections establishing)
resultswp.flexlock.80:tps = 11673.255674 (including connections establishing)
resultswp.flexlock.80:tps = 11597.229252 (including connections establishing)
resultswp.flexlock-xloginsert-padding.80:tps = 11092.549726 (including
connections establishing)
resultswp.flexlock-xloginsert-padding.80:tps = 11179.927207 (including
connections establishing)
resultswp.flexlock-xloginsert-padding.80:tps = 11112.771332 (including
connections establishing)

That's even a little worse.

One possible explanation is that I don't believe that what you've done
here is actually sufficient to guarantee alignment on cache-line
boundary - ShmemAlloc() only guarantees MAXALIGN alignment unless the
allocation is at least BLCKSZ bytes.  So depending on exactly how much
shared memory gets allocated before XLogCtlData gets allocated, it's
possible that the change could actually end up splitting all of the
things you're trying to put on their own cache line across two cache
lines.  Might be worth fixing that and running this through its paces

(I wonder if we shouldn't just align every shared memory allocation to
64 or 128 bytes.  It wouldn't waste much memory and it would make us
much more resistant to performance changes caused by minor
modifications to the shared memory layout.)

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to