On Tue, 24 Mar 2009, Sam Mason wrote:

The conceptual idea is to have at most one outstanding flush for the
log going through the filesystem at any one time.

Quoting from src/backend/access/transam/xlog.c, inside XLogFlush:

"Since fsync is usually a horribly expensive operation, we try to piggyback as much data as we can on each fsync: if we see any more data entered into the xlog buffer, we'll write and fsync that too, so that the final value of LogwrtResult.Flush is as large as possible. This gives us some chance of avoiding another fsync immediately after."

The logic implementing that idea takes care of bunching up flushes for WAL data that also happens to be ready to go at that point. You can see this most easily by doing inserts into a system that's limited by a slow fsync, like a single disk without write cache where you're bound by RPM speed. If you have, say, a 7200RPM disk, no one client can commit faster than 120 times/second. But if you have 10 clients all pushing small inserts in, it's fairly easy to see >500 transactions/second, because a bunch of commits will get batched up during the time the last fsync is waiting for the disk to finish.

The other idea you'll already find implemented in there is controlled by commit_delay. If there are more than commit_siblings worth of open transactions at the point where a commit is supposed to happen, that will pause commit_delay microseconds in hopes that other transactions will jump onboard via the mechanism described above. In practice, it's very hard to tune that usefully. You can use it to help bunch together commits a bit better into bigger batches on a really busy system (where not having more than one commit ready is unexpected), it's not much help outside of that context.

Check out the rest of the comments in xlog.c, there's a lot in there that's not really covered in the README. If you turn on WAL_DEBUG and XLOG_DEBUG you can actually watch some of this happen. I found time spent reading the source to that file and src/backend/storage/buffer/bufmgr.c to be really well spent, some of the most interesting parts of the codebase to understand from a low-level performance tuning perspective are in those two.

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to