On 07/03/2015 03:12 AM, Sawada Masahiko wrote:
> Thanks. So we can choice the next master server using by checking the
> progress of each server, if hot standby is enabled.
> And a such procedure is needed even today replication.
> 
> I think that the #2 problem which is Josh pointed out seems to be solved;
>     1. I need to ensure that data is replicated to X places.
>     2. I need to *know* which places data was synchronously replicated
> to when the master goes down.
> And we can address #1 problem using quorum commit.

It's not solved. I still have zero ways of knowing if a replica was in
sync or not at the time the master went down.

Now, you and others have argued persuasively that there are valuable use
cases for quorum commit even without solving that particular issue, but
there's a big difference between "we can work around this problem" and
the problem is solved.  I forked the subject line because I think that
the inability to identify synch replicas under failover conditions is a
serious problem with synch rep *today*, and pretending that it doesn't
exist doesn't help us even if we don't fix it in 9.6.

Let me give you three cases where our lack of information on the replica
side about whether it thinks it's in sync or not causes synch rep to
fail to protect data.  The first case is one I've actually seen in
production, and the other two are hypothetical but entirely plausible.

Case #1: two synchronous replica servers have the application name
"synchreplica".  An admin uses the wrong Chef template, and deploys a
server which was supposed to be an async replica with the same
recovery.conf template, and it ends up in the "synchreplica" group as
well. Due to restarts (pushing out an update release), the new server
ends up seizing and keeping sync. Then the master dies.  Because the new
server wasn't supposed to be a sync replica in the first place, it is
not checked; they just fail over to the furthest ahead of the two
original synch replicas, neither of which was actually in synch.

Case #2: "2 { local, london, nyc }" setup.  At 2am, the links between
data centers become unreliable, such that the on-call sysadmin disables
synch rep because commits on the master are intolerably slow.  Then, at
10am, the links between data centers fail entirely.  The day shift, not
knowing that the night shift disabled sync, fail over to London thinking
that they can do so with zero data loss.

Case #3 "1 { london, frankfurt }, 1 { sydney, tokyo }" multi-group
priority setup.  We lose communication with everything but Europe.  How
can we decide whether to wait to get sydney back, or to promote London
immedately?

I could come up with numerous other situations, but all of the three
above completely reasonable cases show how having the knowledge of what
time a replica thought it was last in sync is vital to preventing bad
failovers and data loss, and to knowing the quantity of data loss when
it can't be prevented.

It's an issue *now* that the only data we have about the state of sync
rep is on the master, and dies with the master.   And it severely limits
the actual utility of our synch rep.  People implement synch rep in the
first place because the "best effort" of asynch rep isn't good enough
for them, and yet when it comes to failover we're just telling them
"give it your best effort".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to