wrote on 06/07/2017 04:06:57 PM:

> > 
> > Did you intend to attach a patch?
> Yes I do -- tomorrow or Thursday -- needs a little cleaning up ...
meant Friday

> > > Sokolov Yura has a patch which, to me, looks good for pgbench rw
> > > performance.  Does not do so well with hammerdb (about the same 
> as base) on
> > > single socket and two socket.
> > 
> > Any idea why?  I think we will have to understand *why* certain things
> > help in some situations and not others, not just *that* they do, in
> > order to come up with a good solution to this problem.
> Looking at the data now -- LWLockAquire philosophy is different -- 
> at first glance I would have guessed "about the same" as the base, 
> but I can not yet explain why we have super pgbench rw performance 
> and "the same" hammerdb performance. 
(data taken from perf cycles when I invoked the performance data gathering 
script, generally in the middle of the run)
In hammerdb two socket, the ProcArrayLock is the bottle neck in 
LWLockAcquire (called from GetSnapshotData about 75% of the calls to 
LWLockAquire). With Sokolov's patch, LWLockAcquire (with LWLockAttemptLock 
included) is a little over 9%; pgbench, on the other hand, has 
LWLockAquire at 1.3% with GetSnapshotData calling only 11% of the calls to 

What I think that means is that there is no ProcArrayLock bottleneck in 
pgbench. GetSnapshotData runs the entire proc chain of PGXACT's so is held 
a rather long time. Guessing that the other locks are held a much shorter 
time;  Sukolov's patch handles the other locks better because of spinning. 
We see much more time in LWLockAcquire with hammerdb because of the 
spinning -- with the ProcArrayLock, spinning does not help much because of 
the longer hold time.

The spin count is relatively high (100/2), so I made it much smaller 
(20/2) in the hopes that the spin would still handle the shorter hold time 
locks but not be a bother with long hold times.

Running pgbench with 96 users, the thruput was slightly less at 70K tsp vs 
75K tps (vs base of 40K tps at 96 threads and peak of 58K at 64 threads); 
hammerdb two socket was slightly better (about 3%) than the peak base.

What all this tells me is that LWLockAcquire would (probably) benefit from 
some spinning.
> > 
> > -- 
> > Robert Haas
> > EnterpriseDB:
> > The Enterprise PostgreSQL Company
> > 

Reply via email to