Re: [HACKERS] streaming replication breaks horribly if master crashes

Tom Lane Wed, 16 Jun 2010 13:57:17 -0700

Robert Haas <robertmh...@gmail.com> writes:
> The first problem I noticed is that the slave never seems to realize
> that the master has gone away.  Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at least for
> longer than I was willing to wait.


TCP timeout is the answer there.

> More seriously, I was able to demonstrate that the problem linked in
> the thread above is real: if the master crashes after streaming WAL
> that it hasn't yet fsync'd, then on recovery the slave's xlog position
> is ahead of the master.

So indeed we'd better change walsender to not get ahead of the fsync'd
position.  And probably also warn people to not disable fsync on the
master, unless they're willing to write it off and fail over at any
system crash.

> I don't know what to do about this, but I'm pretty sure we can't ship it 
> as-is.

Doesn't seem tremendously insoluble from here ...

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] streaming replication breaks horribly if master crashes

Reply via email to