Okay, Here’s version 3 then, which piggy-backs on the existing flag :
synchronous_commit = on | off | local | fallback Where “fallback” now means “fall back from sync replication when no (suitable) standbys are connected”. This was done on input from Guillaume Lelarge. > That said, I agree it's not necessarily reasonable to try to defend > against that in a two node cluster. That’s what I’ve been trying to say all along but I didn’t give enough context before so I understand we took a turn there. You can always walk up to any setup and say “hey, if you nuke that site from orbit and crash that other thing, and ...” ;) I’m just kidding of course but you get the point. Nothing is absolute. And so we get back to the three likelihoods in our two-node setup : 1.The master fails - Okay, promote the standby 2.The standby fails - Okay, the system still works but you no longer have data redundancy. Deal with it. 3.Both fail, together or one after the other. I’ve stated that 1 and 2 together covers way more than 99.9% of what’s expected in my setup on any given day. But 3. is what we’ve been talking about ... And well in that case there is no reason to just go ahead and promote a standby because, granted, it could be lagging behind if the master decided to switch to standalone mode just before going down itself. As long as you do not prematurely or rather instinctively promote the standby when it has *possibly* lagged behind, you’re good and there is no risk of data loss. The data might be sitting on a crashed or otherwise unavailable master, but it’s not lost. Promoting the standby however is basically saying “forget the master and its data, continue from where the standby is currently at”. Now granted this is operationally harder/more complicated than just synchronous replication where you can always, in any case, just promote the standby after a master failure, knowing that all data is guaranteed to be replicated. > I'm worried that the interface seems a bit > fragile and that it's hard to "be sure". With this setup, you can’t promote the standby without first checking if the replication link was disconnected prior to the master failure. For me, the benefits outweigh this one drawback because I get more standby replication guarantee than async replication and more master availability than sync replication in the most plausible outcomes. Cheers, /A
sync-standalone-v3.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers