On Nov 4, 2013, at 11:06, Heikki Linnakangas wrote:
> On 01.11.2013 11:42, Mika Eloranta wrote:
>> pg_receivexlog calculated the xlog segment number incorrectly
>> when started after the previous instance was interrupted.
>> 
>> Resuming streaming only worked when the physical wal segment
>> counter was zero, i.e. for the first 256 segments or so.
> 
> Oops. Fixed, thanks for the report!
> 
> It's a bit scary that this bug went unnoticed for this long; it was 
> introduced quite early in the 9.3 development cycle. Seems that I did all the 
> testing of streaming timeline changes with pg_receivexlog later in 9.3 cycle 
> with segment numbers < 256, and no-one else have done long-running tests with 
> pg_receivexlog either.

Thanks for the fix, Heikki!

It sounds like either PostgreSQL 9.3.x and/or pg_receivexlog is not yet used in 
a lot of places. Otherwise this probably would have been found earlier.

Affected versions:

$ git tag --contains dfda6eba
REL9_3_0
REL9_3_1
REL9_3_BETA1
REL9_3_BETA2
REL9_3_RC1

What makes this a really sneaky and severe problem is the way it stays dormant 
for a period of time after a fresh db init or pg_upgrade. Here's how I bumped 
into it:

1. Old postgresql 9.2 db running, pg_receivexlog streaming extra backups to a 
remote box.
2. pg_upgrade to 9.3.1.
3. pg_receivexlog from the upgraded DB still works ok and handles restarts 
fine, because the xlog indexes were reset back to zero at pg_upgrade.
4. xlog history eventually grows over 256 * 16MB.
5. pg_receivexlog gets interrupted for whatever reason (gets stopped, killed, 
crashes, host is restarted).
6. A new pg_receivexlog instance fails to resume streaming and there is no easy 
workaround that would maintain an uninterrupted, gapless xlog history.

Initially, before I had analysed the problem any further, I had to stash the 
xlogs, restart pg_receivexlog and after that trigger new pg_basebackups.

Regardless of this bug, I find that pg_receivexlog (and pg_basebackup) are 
excellent tools and people should use them more!

PS. something like "pg_receivexlog --start-pos=2D/15000000" might be nice for 
overriding the streaming start position.

-- 
Mika Eloranta
Ohmu Ltd.  http://www.ohmu.fi/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to