This code is our pre-9.2 group commit implementation, pretty much in its entirety:
if (CommitDelay > 0 && enableFsync && MinimumActiveBackends(CommitSiblings)) pg_usleep(CommitDelay); This code is placed directly before the RecordTransactionCommit() call of XLogFlush(). It seeks to create a delay of commit_delay immediately prior to flushing when MinimumActiveBackends(CommitSiblings), in the hope that that delay will be enough that when we do eventually reach XLogFlush(), we can fastpath out of it due to Postgres by then having already flushed up to the XLogRecPtr we need flushed. In this way, commits can piggyback off of each other and there won't be what are effectively duplicate write/fsync requests, or at least that's the idea. It is hardly surprising that the practical advice surrounding commit_delay and commit_siblings is only subtlety different from "don't use in production". There is even a big fat caveat attached to this code in a comment: "This needs work still, because on most Unixen, the minimum select() delay is 10msec or more, which is way too long". The original group commit patch that Simon and I submitted deprecated these settings, and I fail to understand why the committed patch didn't do that too. These days, the only way this code could possibly result in a fastpath is here, at transam/xlog.c:2094, with the potentially stale (and likely too stale to be useful when new group commit is in play) LogwrtResult value: /* done already? */ if (XLByteLE(record, LogwrtResult.Flush)) break; Now, previously, we could also fastpath a bit later, at about transam/xlog.c:2114: /* Got the lock */ LogwrtResult = XLogCtl->LogwrtResult; if (!XLByteLE(record, LogwrtResult.Flush)) { ..... } /* control might have fastpathed and missed the above block. We're done now. */ ...but now control won't even reach here unless we have the exclusive lock on WALWriteLock (i.e. we're the leader backend during 9.2's group commit), so fastpathing out of XLogFlush() becomes very rare in 9.2 . One illuminating way of quickly explaining the new group commit code is that it also inserts a delay at approximately the same place (well, more places now, since the delay was previously inserted only at the xact.c callsite of XLogFlush(), and there are plenty more sites than that), only that delay is now just about perfectly adaptive. That isn't quite the full truth, since we *also* have the benefit of *knowing* that there is an active leader/flusher at all times we're delayed. The unusual semantics of LWLockAcquireOrWait() result in the new 9.2 code almost always just causing a delay for most backends when group commit matters (that is, a delay before they reach their final, "productive" iteration of the for(;;) loop for the commit, usually without them ever actually acquiring the exclusive lock/XlogWrite()'ing). Have I missed something? Why do we keep around this foot-gun that now appears to me to be at best useless and at worst harmful? I can see why the temptation to keep this setting around used to exist, since it probably wasn't too hard to get good numbers from extremely synthetic pgbench runs, but I cannot see why the new adaptive implementation wouldn't entirely shadow the old one even in that situation. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers