On 2014-05-05 10:16:27 -0700, Josh Berkus wrote: > On 05/03/2014 01:07 AM, Andres Freund wrote: > > On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: > >> Just got a report of a replication issue with 9.2.8 from a community > >> member: > >> > >> Here's the sequence: > >> > >> 1) A --> B (sync rep) > >> > >> 2) Shut down B > >> > >> 3) Shut down A > >> > >> 4) Start up B as a master > >> > >> 5) Start up A as sync replica of B > >> > >> 6) A successfully joins B as a sync replica, even though its transaction > >> log is 1016 bytes *ahead* of B. > >> > >> 7) Transactions written to B all hang > >> > >> 8) Xlog on A is now corrupt, although the database itself is OK > > > > This is fundamentally borked practice. > > > >> Now, the above sequence happened because of the user misunderstanding > >> what sync rep really means. However, A should not have been able to > >> connect with B in replication mode, especially in sync rep mode; that > >> should have failed. Any thoughts on why it didn't? > > > > I'd guess that B, while starting up, has written further WAL records > > bringing it further ahead of A. > > Apparently not; from what I've seen pg_stat_replication even *shows* > that the replica is ahead of the master. Futher, Postgres should have > recognized that there was a timeline branch point before A's last > record, no?
There wasn't any timeline increase because - as far as I understand the above - there wasn't any promotion. The cluster was shut down and recovery.conf was created/removed respectively. To me this is a operator error. We could try to defend against it more vigorously, but thats's hard to do without breaking actual usecases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers