Zeugswetter Andreas SB SD wrote:
> 
> > > From our previous discussion of 2-phase commit, there was concern that
> > > the failure modes of 2-phase commit were not solvable.  However, I think
> > > multi-master replication is going to have similar non-solvable failure
> > > modes, yet people still want multi-master replication.
> > 
> > No.  The real problem with 2PC in my mind is that its failure modes
> > occur *after* you have promised commit to one or more parties.  In
> > multi-master, if you fail you know it before you have told the client
> > his data is committed.
> 
> Hmm ? The appl cannot take the first phase commit as its commit info. It 
> needs to wait for the second phase commit. The second phase is only finished
> when all coservers have reported back. 2PC is synchronous.
> 
> The problems with 2PC are when after second phase commit was sent to all
> servers and before all report back one of them becomes unreachable/down ...
> (did it receive and do the 2nd commit or not) Such a transaction must stay
> open until the coserver is reachable again or an administrator committed/aborted it. 
> 
> It is multi master replication that usually has an asynchronous mode for
> performance, and there the trouble starts.

Let me diagram this so we can see the issues.  Normal operation is:

        Master          Slave
        ------          -----
        commit ready-->
                        <--OK
        commit done--->
                        <--OK
        completed

One possible failure is:

        Master          Slave
        ------          -----
        commit ready-->
                        <--OK
        commit done--->
                        dies here
        stuck waiting

Another possible failure is:

        Master          Slave
        ------          -----
        commit ready-->
                        <--OK
        dies here
                        stuck waiting

Are these the issues?  Can't we just add GUC timeouts to cause the
commit to fail, and the slave to stop waiting?  I suppose a problem is:

        Master          Slave
        ------          -----
        commit ready-->
                        <--OK
        sleep
                        stuck waiting, times out
        commit done

Could we allow slaves to check if the backend is still alive, perhaps by
asking the postmaster, similar to what we do with the cancel signal ---
that way, the slave would never time out and always wait if the master
was alive.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to