On 30 May 2012 15:25, Simon Riggs <si...@2ndquadrant.com> wrote:
>> 1. It seems wrong to do it in xact_redo_commit_internal().  It won't
>> matter if commit_siblings>0 since there won't be any other backends
>> with transaction IDs anyway, but if commit_siblings==0 then we'll
>> sleep for no possible benefit.
>
> Agreed
>
>> 2. Doing it in FlushBuffer() seems slightly iffy since we might be
>> sitting on a buffer lock.  But maybe it's a win anyway, or just not
>> worth worrying about.
>
> Agreed.

As I've pointed out, we cannot meaningfully skip the wait for
auxiliary processes alone (except perhaps by having commit_siblings
set sufficiently high).

> The remaining cases aren't worth worrying about, apart from
> SlruPhysicalWritePage() which happens during visibility checks and
> needs to happen as quickly as possible also.

I'm not so sure. It says in that function:

                /*
                 * We must determine the largest async-commit LSN for the page. 
This
                 * is a bit tedious, but since this entire function is a slow 
path
                 * anyway, it seems better to do this here than to maintain a 
per-page
                 * LSN variable (which'd need an extra comparison in the
                 * transaction-commit path).
                 */

> I would say the additional contention from waiting outweighs the
> benefit of the wait in those 3 places, so skipping the wait is wise.

MinimumActiveBackends() reports the "count backends (other than
myself) that are in active transactions", so unnecessary calls will
have to occur when we have active transactions >= CommitSiblings, not
connections >= CommitSiblings as was previously the case.

What if we were to skip the wait during recovery only, by specially
setting CommitDelay to 0 in the start-up process? Would that satisfy
everyone's concerns about unhelpful delays? I'm not sure how this
might interact with hot standby.

The only concrete way I can see of avoiding a possibly useless wait
(setting commit_siblings prudently alone is a very big help) would be
semantics that are not generally acceptable for XLogFlush(): a way of
calling XLogFlush() such that we either become the leader and don't
wait OR join the queue and return immediately under the assumption
that the leader will successfully flush up to our LSN. It is true that
part of the premise of having a leader backend was that "XLogFlush is
pretty much all critical section, so if it fails we're going to PANIC
anyway".

That said, at least some of the problematic XLogFlush() call sites
named require that we verify that we've flushed up to their LSN before
control returns to them. FlushBuffer() does for sure.

> I think if we were to rename the parameters, then they should be called
>
> group_commit_delay
> group_commit_siblings
>
> Since "Group Commit" is the accepted term for this behaviour, even
> though, as you point out that the behaviour isn't just about commit.

+1. I understand that that is the precise definition of group commit
in Oracle, for example, where the feature is explicitly framed as a
trade-off between throughput and latency (any of the other things that
we might have called group commit at various times are not). The fact
that group commit implies that non-commit flushes will also be batched
seems rather academic, on reflection.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to