On 10/7/16 10:42 AM, Andres Freund wrote:

On 2016-10-06 20:52:22 -0700, Alfred Perlstein wrote:
This contention on WAL reminds me of another scenario I've heard about that
was similar.

To fix things what happened was that anyone that the first person to block
would be responsible for writing out all buffers for anyone blocked behind
We pretty much do that already. But while that's happening, the other
would-be-writers show up as blocking on the lock.  We don't use kind of
an odd locking model for the waiters (LWLockAcquireOrWait()), which
waits for the lock to be released, but doesn't try to acquire it
afterwards. Instead the wal position is rechecked, and in many cases
we'll be done afterwards, because enough has been written out.


Andres Freund

Are the batched writes all done before fsync is called?

Are you sure that A only calls fsync after flushing all the buffers from B, C, and D? Or will it fsync twice? Is there instrumentation to show that?

I know there's a tremendous level of skill involved in this code, but simply asking in case there's some tricks.

Another strategy that may work is actually intentionally waiting/buffering some few ms between flushes/fsync, for example, make sure that the number of flushes per second doesn't exceed some configurable amount because each flush likely eats at least one iop from the disk and there is a maximum iops per disk, so might as well buffer more if you're exceeding that iops count. You'll trade some latency, but gain throughput for doing that.

Does this make sense? Again apologies if this has been covered. Is there a whitepaper or blog post or clear way I can examine the algorithm wrt locks/buffering for flushing WAL logs?


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to