Hi everybody! During FOSDEM/PGDay 2017 developer meeting I said that I have some special assembly optimization for multicore Power machines. From the answers of other hackers I realized following.
1. There are some big Power machines with PostgreSQL in production use. Not as many as Intel, but some of them. 2. Community could be interested in special assembly optimization for Power machines despite cost of maintaining it. Power processors use specific implementation of atomic operations. This implementation is some kind of optimistic locking. 'lwarx' instruction 'reserves index', but that reservation could be broken on 'stwcx', and then we have to retry. So, for instance CAS operation on Power processor is a loop. So, loop of CAS operations is two level nested loop. Benchmarks showed that it becomes real problem for LWLockAttemptLock(). However, one actually can put arbitrary logic between 'lwarx' and 'stwcx' and make it a single loop. The downside is that this logic has to be implemented in assembly. See  for experiment details. Results in  have a lot of junk which isn't relevant anymore. This is why I draw a separate graph. power8-lwlock-asm-ro.png – results of read-only pgbench test on IBM E880 which have 32 physical cores and 256 virtual thread via SMT. The curves have following meaning. * 9.5: unpatched PostgreSQL 9.5 * pinunpin-cas: PostgreSQL 9.5 + earlier version of 48354581 * pinunpin-lwlock-asm: PostgreSQL 9.5 + earlier version of 48354581 + LWLock implementation in assembly. lwlock-power-1.patch – is the patch for assembly implementation of LWLock which I used that time rebased to current master. Using assembly in lwlock.c looks rough. This is why I refactored it by introducing new atomic operation pg_atomic_fetch_mask_add_u32 (see lwlock-power-2.patch). It checks that all masked bits are clear and then adds to variable. This atomic have special assembly implementation for Power, and generic implementation for other platforms with loop of CAS. Probably we would have other implementations for other architectures in future. This level of abstraction is the best I managed to invent. Unfortunately, I have no big enough Power machine at hand to reproduce that results. Actually, I have no Power machine at hand at all. So, lwlock-power-2.patch was written "blindly". I would very appreciate if someone would help me with testing and benchmarking. 1. https://www.postgresql.org/message-id/CAPpHfdsogj38HTDhNMLE56uJy9N8- =gya2nnuwbpujgp2n1...@mail.gmail.com ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Description: Binary data
Description: Binary data
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers