Here’s version 3 then, which piggy-backs on the existing flag :

synchronous_commit = on | off | local | fallback

Where “fallback” now means “fall back from sync replication when no
(suitable) standbys are connected”.

This was done on input from Guillaume Lelarge.

> That said, I agree it's not necessarily reasonable to try to defend
> against that in a two node cluster.

That’s what I’ve been trying to say all along but I didn’t give enough
context before so I understand we took a turn there.

You can always walk up to any setup and say “hey, if you nuke that
site from orbit and crash that other thing, and ...” ;) I’m just
kidding of course but you get the point. Nothing is absolute.

And so we get back to the three likelihoods in our two-node setup :

1.The master fails
  - Okay, promote the standby

2.The standby fails
  - Okay, the system still works but you no longer have data
redundancy. Deal with it.

3.Both fail, together or one after the other.

I’ve stated that 1 and 2 together covers way more than 99.9% of what’s
expected in my setup on any given day.

But 3. is what we’ve been talking about ... And well in that case
there is no reason to just go ahead and promote a standby because,
granted, it could be lagging behind if the master decided to switch to
standalone mode just before going down itself.

As long as you do not prematurely or rather instinctively promote the
standby when it has *possibly* lagged behind, you’re good and there is
no risk of data loss. The data might be sitting on a crashed or
otherwise unavailable master, but it’s not lost. Promoting the standby
however is basically saying “forget the master and its data, continue
from where the standby is currently at”.

Now granted this is operationally harder/more complicated than just
synchronous replication where you can always, in any case, just
promote the standby after a master failure, knowing that all data is
guaranteed to be replicated.

> I'm worried that the interface seems a bit
> fragile and that it's hard to "be sure".

With this setup, you can’t promote the standby without first checking
if the replication link was disconnected prior to the master failure.

For me, the benefits outweigh this one drawback because I get more
standby replication guarantee than async replication and more master
availability than sync replication in the most plausible outcomes.



Attachment: sync-standalone-v3.patch
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to