On 2016-01-19 12:58:38 -0500, Robert Haas wrote: > This seems like a problem with the WAL writer quite independent of > anything else. It seems likely to be inadvertent fallout from this > patch: > > Author: Simon Riggs <si...@2ndquadrant.com> > Branch: master Release: REL9_2_BR [4de82f7d7] 2011-11-13 09:00:57 +0000 > > Wakeup WALWriter as needed for asynchronous commit performance. > Previously we waited for wal_writer_delay before flushing WAL. Now > we also wake WALWriter as soon as a WAL buffer page has filled. > Significant effect observed on performance of asynchronous commits > by Robert Haas, attributed to the ability to set hint bits on tuples > earlier and so reducing contention caused by clog lookups.
In addition to that the "powersaving" effort also plays a role - without the latch we'd not wake up at any meaningful rate at all atm. > If I understand correctly, prior to that commit, WAL writer woke up 5 > times per second and flushed just that often (unless you changed the > default settings). But as the commit message explained, that turned > out to suck - you could make performance go up very significantly by > radically decreasing wal_writer_delay. This commit basically lets it > flush at maximum velocity - as fast as we finish one flush, we can > start the next. That must have seemed like a win at the time from the > way the commit message was written, but you seem to now be seeing the > opposite effect, where performance is suffering because flushes are > too frequent rather than too infrequent. I wonder if there's an ideal > flush rate and what it is, and how much it depends on what hardware > you have got. I think the problem isn't really that it's flushing too much WAL in total, it's that it's flushing WAL in a too granular fashion. I suspect we want something where we attempt a minimum number of flushes per second (presumably tied to wal_writer_delay) and, once exceeded, a minimum number of pages per flush. I think we even could continue to write() the data at the same rate as today, we just would need to reduce the number of fdatasync()s we issue. And possibly could make the eventual fdatasync()s cheaper by hinting the kernel to write them out earlier. Now the question what the minimum number of pages we want to flush for (setting wal_writer_delay triggered ones aside) isn't easy to answer. A simple model would be to statically tie it to the size of wal_buffers; say, don't flush unless at least 10% of XLogBuffers have been written since the last flush. More complex approaches would be to measure the continuous WAL writeout rate. By tying it to both a minimum rate under activity (ensuring things go to disk fast) and a minimum number of pages to sync (ensuring a reasonable number of cache flush operations) we should be able to mostly accomodate the different types of workloads. I think. Andres -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers