[HACKERS] Why do we still have commit_delay and commit_siblings?

Peter Geoghegan Sun, 13 May 2012 16:18:43 -0700

This code is our pre-9.2 group commit implementation, pretty much in
its entirety:


if (CommitDelay > 0 && enableFsync &&
        MinimumActiveBackends(CommitSiblings))
        pg_usleep(CommitDelay);

This code is placed directly before the RecordTransactionCommit() call
of XLogFlush(). It seeks to create a delay of commit_delay immediately
prior to flushing when MinimumActiveBackends(CommitSiblings), in the
hope that that delay will be enough that when we do eventually reach
XLogFlush(), we can fastpath out of it due to Postgres by then having
already flushed up to the XLogRecPtr we need flushed. In this way,
commits can piggyback off of each other and there won't be what are
effectively duplicate write/fsync requests, or at least that's the
idea.

It is hardly surprising that the practical advice surrounding
commit_delay and commit_siblings is only subtlety different from
"don't use in production". There is even a big fat caveat attached to
this code in a comment: "This needs work still, because on most
Unixen, the minimum select() delay is 10msec or more, which is way too
long".

The original group commit patch that Simon and I submitted deprecated
these settings, and I fail to understand why the committed patch
didn't do that too. These days, the only way this code could possibly
result in a fastpath is here, at transam/xlog.c:2094, with the
potentially stale (and likely too stale to be useful when new group
commit is in play) LogwrtResult value:

                /* done already? */
                if (XLByteLE(record, LogwrtResult.Flush))
                        break;

Now, previously, we could also fastpath a bit later, at about
transam/xlog.c:2114:

                /* Got the lock */
                LogwrtResult = XLogCtl->LogwrtResult;
                if (!XLByteLE(record, LogwrtResult.Flush))
                {
                        .....
                }
                /* control might have fastpathed and missed the above block. 
We're
done now. */

...but now control won't even reach here unless we have the exclusive
lock on WALWriteLock (i.e. we're the leader backend during 9.2's group
commit), so fastpathing out of XLogFlush() becomes very rare in 9.2 .

One illuminating way of quickly explaining the new group commit code
is that it also inserts a delay at approximately the same place (well,
more places now, since the delay was previously inserted only at the
xact.c callsite of XLogFlush(), and there are plenty more sites than
that), only that delay is now just about perfectly adaptive. That
isn't quite the full truth, since we *also* have the benefit of
*knowing* that there is an active leader/flusher at all times we're
delayed. The unusual semantics of LWLockAcquireOrWait() result in the
new 9.2 code almost always just causing a delay for most backends when
group commit matters (that is, a delay before they reach their final,
"productive" iteration of the for(;;) loop for the commit, usually
without them ever actually acquiring the exclusive
lock/XlogWrite()'ing).

Have I missed something? Why do we keep around this foot-gun that now
appears to me to be at best useless and at worst harmful? I can see
why the temptation to keep this setting around used to exist, since it
probably wasn't too hard to get good numbers from extremely synthetic
pgbench runs, but I cannot see why the new adaptive implementation
wouldn't entirely shadow the old one even in that situation.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Why do we still have commit_delay and commit_siblings?

Reply via email to