On Nov 4, 2013, at 11:06, Heikki Linnakangas wrote: > On 01.11.2013 11:42, Mika Eloranta wrote: >> pg_receivexlog calculated the xlog segment number incorrectly >> when started after the previous instance was interrupted. >> >> Resuming streaming only worked when the physical wal segment >> counter was zero, i.e. for the first 256 segments or so. > > Oops. Fixed, thanks for the report! > > It's a bit scary that this bug went unnoticed for this long; it was > introduced quite early in the 9.3 development cycle. Seems that I did all the > testing of streaming timeline changes with pg_receivexlog later in 9.3 cycle > with segment numbers < 256, and no-one else have done long-running tests with > pg_receivexlog either.
Thanks for the fix, Heikki! It sounds like either PostgreSQL 9.3.x and/or pg_receivexlog is not yet used in a lot of places. Otherwise this probably would have been found earlier. Affected versions: $ git tag --contains dfda6eba REL9_3_0 REL9_3_1 REL9_3_BETA1 REL9_3_BETA2 REL9_3_RC1 What makes this a really sneaky and severe problem is the way it stays dormant for a period of time after a fresh db init or pg_upgrade. Here's how I bumped into it: 1. Old postgresql 9.2 db running, pg_receivexlog streaming extra backups to a remote box. 2. pg_upgrade to 9.3.1. 3. pg_receivexlog from the upgraded DB still works ok and handles restarts fine, because the xlog indexes were reset back to zero at pg_upgrade. 4. xlog history eventually grows over 256 * 16MB. 5. pg_receivexlog gets interrupted for whatever reason (gets stopped, killed, crashes, host is restarted). 6. A new pg_receivexlog instance fails to resume streaming and there is no easy workaround that would maintain an uninterrupted, gapless xlog history. Initially, before I had analysed the problem any further, I had to stash the xlogs, restart pg_receivexlog and after that trigger new pg_basebackups. Regardless of this bug, I find that pg_receivexlog (and pg_basebackup) are excellent tools and people should use them more! PS. something like "pg_receivexlog --start-pos=2D/15000000" might be nice for overriding the streaming start position. -- Mika Eloranta Ohmu Ltd. http://www.ohmu.fi/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers