Hi Sokolov --

I tried your patch. I only had time for doing a few points on power8. 
pgbench rw  on two sockets is awesome! Keeps getting more throughput as 
threads are added -- in contrast to base and my prototype. I did not run 
single socket pgbench.

Hammerdb, 1 socket was in the same ballpark as the base, but slightly 
lower. 2 socket was also in the same ballpark as the base, again slightly 
lower.  I did not do a series of points (just one at the previous "sweet 
spot"), so the "final" results may be better, The ProcArrayLock multiple 
parts was lower except in two socket case. The performance data I 
collected for your patch on hammerdb showed the same sort of issues  as 
the base.

I don't see much point in combining the two because of the ProcArrayLock 
down side -- that is, single socket. poor performance. Unless we could 
come up with some heuristic to use one part on light loads and two on 
heavy (and still stay correct), then I don't see it ... With the 
combination, what I think we would see is awesome pgbench rw, awesome 
hammerdb 2 socket performance, and  degraded single socket hammerdb.


From:   Sokolov Yura <y.soko...@postgrespro.ru>
To:     Jim Van Fleet <vanfl...@us.ibm.com>
Cc:     pgsql-hackers@postgresql.org
Date:   06/05/2017 03:28 PM
Subject:        Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into 
multiple parts
Sent by:        pgsql-hackers-ow...@postgresql.org

Excuse me, Jim.

I was tired and misunderstand proposal: I thought of ProcArray sharding, 
but proposal is about ProcArrayLock sharding.

BTW, I just posted improvement to LWLock:


Would you mind to test against that and together with that?

5 июня 2017 г. 11:11 PM пользователь Sokolov Yura 
<y.soko...@postgrespro.ru> написал:
Hi, Jim.

How do you ensure of transaction order?

- you lock shard A and gather info. You find transaction T1 in-progress.
- Then you unlock shard A.
- T1 completes. T2, that depends on T1, also completes. But T2 was on 
shard B.
- you lock shard B, and gather info from.
- You didn't saw T2 as in progress, so you will lookup into clog then and 
will find it as commited.

Now you see T2 as commited, but T1 as in-progress - clear violation of 
transaction order.

Probably you've already solved this issue. If so it would be great to 
learn the solution.

5 июня 2017 г. 10:30 PM пользователь Jim Van Fleet <vanfl...@us.ibm.com> 

I have been experimenting with splitting  the ProcArrayLock into parts. 
 That is, to Acquire the ProcArrayLock in shared mode, it is only 
necessary to acquire one of the parts in shared mode; to acquire the lock 
in exclusive mode, all of the parts must be acquired in exclusive mode. 
For those interested, I have attached a design description of the change.

This approach has been quite successful on large systems with the hammerdb 
benchmark.With a prototype based on 10 master source and running on power8 
(model 8335-GCA with 2sockets, 20 core)
 hammerdb  improved by 16%; On intel (Intel(R) Xeon(R) CPU E5-2699 v4 @ 
2.20GHz, 2 socket, 44 core) with 9.6 base and prototype hammerdb improved 
by 4%. (attached is a set of spreadsheets for power8.

The down side is that on smaller configurations (single socket) where 
there is less "lock thrashing" in the storage subsystem and there are 
multiple Lwlocks to take for an exclusive acquire, there is a decided 
downturn in performance. On  hammerdb, the prototype was 6% worse than the 
base on a single socket power configuration.

If there is interest in this approach, I will submit a patch.

Jim Van Fleet

