Re: [HACKERS] synchronous_commit = remote_flush

Robert Haas Sun, 21 Aug 2016 19:07:01 -0700

On Sun, Aug 21, 2016 at 6:08 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Fri, Aug 19, 2016 at 6:30 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:
>> I'm wondering if we've hit the point where trying to put all of this in a
>> single GUC is a bad idea... changing that probably means a config
>> compatibility break, but I don't think that's necessarily a bad thing at
>> this point...
>
> Aside from the (IMHO) slightly confusing way that "on" works, which is
> the smaller issue I was raising in this thread, I agree that we might
> eventually want to escape from the assumption that "local apply" (=
> off), local flush, remote write, remote flush, remote apply happen in
> that order and therefore a single linear control knob can describe
> which of those to wait for.
>
> Some pie-in-the-sky thoughts: we currently can't reach
> "group-safe"[1], where you wait only for N servers to have the WAL in
> memory (let's say that for us that means write but not flush): the
> closest we can get is "1-safe and group-safe", using remote_write to
> wait for the standbys to write (= "group-safe"), which implies local
> flush (= "1-safe").  Now that'd be a terrible level to use unless your
> recovery procedure included cluster-wide communication to straighten
> things out, and without any such clusterware it makes a lot of sense
> to have the master flush before sending, and I'm not actually
> proposing we change that, I'm just speculating that someone might
> eventually want it.  We also can't have standbys apply before they
> flush; as far as I know there is no theoretical reason why that
> shouldn't be allowed, except maybe for some special synchronisation
> steps around checkpoint records so that recovery doesn't get too far
> ahead.


Well, in order to remain recoverable, the standby has to obey the
WAL-before-data rule: if it writes a page with a given LSN, that LSN
had better be flushed to disk first.  In practice, this means that if
you want a standby to remain recoverable without needing to contact
the rest of the cluster, you can't let its minimum recovery point pass
the WAL flush point.  In short, this comes up anytime you evict a
buffer, not just around checkpoints.

> That'd mirror what happens on the master more closely.
> Imagine if you wanted to wait for your transaction to become visible
> on certain other servers, but didn't want to wait for any disks:
> that'd be the distributed equivalent of today's "off", but today's
> "remote_apply" implies local flush and remote flush.  Or more likely
> you'd want some combination: 2-safe or group-safe on some subset of
> servers to satisfy your durability requirements, and applied on some
> other perhaps larger subset of servers for consistency.  But this is
> just water cooler handwaving.

Sure, that stuff would be great, and we'll probably have to redesign
synchronous_commit entirely if and when we get there, but I'm not sure
it makes sense to tinker with it now just for that.  The original
reason why I suggested the current design for synchronous_commit is to
avoid forcing people to set yet another GUC in order to use
synchronous replication.  The default of 'on' means that you can just
configure synchronous_standby_names and away you go.  Perhaps a better
design as we added more values would have been to keep
synchronous_commit as on/local/off and use a separate GUC, say,
synchronous_replication to define what "on" means: remote_apply,
remote_flush, remote_apply, 2safe+groupsafe, or whatever.  And when
synchronous_standby_names='' then the value of synchronous_replication
is ignored, and synchronous_commit=on means the same as
synchronous_commit=local, just as it does today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] synchronous_commit = remote_flush

Reply via email to