On Fri, Oct 7, 2016 at 3:38 AM, Robert Haas <robertmh...@gmail.com> wrote: > I decided to do some testing on hydra (IBM-provided community > resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using > the newly-enhanced wait event stuff to try to get an idea of what > we're waiting for during pgbench. I did 30-minute pgbench runs with > various configurations, but all had max_connections = 200, > shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit = > off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, > log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints = > on. During each run, I ran this psql script in another window and > captured the output:
Nice. Thanks for sharing. > \t > select wait_event_type, wait_event from pg_stat_activity where pid != > pg_backend_pid() > \watch 0.5 > > Then, I used a little shell-scripting to count up the number of times > each wait event occurred in the output. Or an INSERT SELECT on an unlogged table? No need of extra maths then. > First, I tried scale factor > 3000 with 32 clients and got these results: > > 1 LWLockTranche | buffer_mapping > [...] > 21014 LWLockNamed | WALWriteLock > 28497 | > 58279 LWLockTranche | wal_insert > > tps = 1150.803133 (including connections establishing) > > What jumps out here is, at least to me, is that there is furious > contention on both the wal_insert locks and on WALWriteLock. > Apparently, the system simply can't get WAL on disk fast enough to > keep up with this workload. Relation extension locks and > buffer_content locks also are also pretty common, both ahead of > ClientRead, a relatively uncommon wait event on this test. The load > average on the system was only about 3 during this test, indicating > that most processes are in fact spending most of their time off-CPU. Increasing the number of WAL insert slots would help? With your tests this is at 8 all the time. > The first thing I tried was switching to unlogged tables, which > produces these results: > > 1 BufferPin | BufferPin > 1 LWLockTranche | lock_manager > 2 LWLockTranche | buffer_mapping > 8 LWLockNamed | ProcArrayLock > 9 LWLockNamed | CheckpointerCommLock > 9 LWLockNamed | CLogControlLock > 11 LWLockTranche | buffer_content > 37 LWLockTranche | clog > 153 Lock | tuple > 388 LWLockNamed | XidGenLock > 827 Lock | transactionid > 1267 Client | ClientRead > 20631 Lock | extend > 91767 | > > tps = 1223.239416 (including connections establishing) > > If you don't look at the TPS number, these results look like a vast > improvement. The overall amount of time spent not waiting for > anything is now much higher, and the problematic locks have largely > disappeared from the picture. However, the load average now shoots up > to about 30, because most of the time that the backends are "not > waiting for anything" they are in fact in kernel wait state D; that > is, they're stuck doing I/O. This suggests that we might want to > consider advertising a wait state when a backend is doing I/O, so we > can measure this sort of thing. Maybe something in fd.c.. It may be a good idea to actually have a look at a perf output on Linux to see where this contention is happening, use this conclusion to decide the place of a wait point, and then see if on other OSes share a similar pattern. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers