Re: [HACKERS] streaming replication breaks horribly if master crashes

Josh Berkus Wed, 16 Jun 2010 13:15:03 -0700

> The first problem I noticed is that the slave never seems to realize
> that the master has gone away.  Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at least for
> longer than I was willing to wait.


Yes, I've noticed this.  That was the reason for forcing walreceiver to
shut down on a restart per prior discussion and patches.  This needs to
be on the open items list ... possibly it'll be fixed by Simon's
keepalive patch?  Or is it just a tcp_keeplalive issue?

> More seriously, I was able to demonstrate that the problem linked in
> the thread above is real: if the master crashes after streaming WAL
> that it hasn't yet fsync'd, then on recovery the slave's xlog position
> is ahead of the master.  So far I've only been able to reproduce this
> with fsync=off, but I believe it's possible anyway, 

... and some users will turn fsync off.  This is, in fact, one of the
primary uses for streaming replication: Durability via replicas.

> and this just
> makes it more likely.  After the most recent crash, the master thought
> pg_current_xlog_location() was 1/86CD4000; the slave thought
> pg_last_xlog_receive_location() was 1/8733C000.  After reconnecting to
> the master, the slave then thought that
> pg_last_xlog_receive_location() was 1/87000000.  

So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
have actually prevented the slave from being corrupted.

My question, though, is detecting out-of-sequence xlogs *enough*?  Are
there any crash conditions on the master which would cause the master to
reuse the same locations for different records, for example?  I don't
think so, but I'd like to be certain.

-- 
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication breaks horribly if master crashes

Reply via email to