On Wed, Feb 20, 2013 at 5:02 PM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > On 20.02.2013 17:53, Selena Deckelmann wrote: >> >> On Wed, Feb 20, 2013 at 6:23 AM, Magnus >> Hagander<mag...@hagander.net>wrote: >> >>> Selena, was this reasonably reproducible for you? Would it be possible to >>> get a network trace of it to show of that's the kind of package coming >>> across, or by hacking up pg_basebackup to print the exact position it was >>> at when the problem occurred? >> >> >> This is happening with a very busy 700 GB system, so I'm going to rule out >> a network trace out for the moment. The error is occurring "sometime" in >> the middle of the backup. Last time it was at least 30-40 minutes into a 2 >> hr backup. > > > If you could pinpoint the WAL position where the error happens, that would > already help somewhat. For starters, put pg_receivexlog to verbose mode, so > that it will print a line after each WAL segment. If my theory is correct, > the error should happen at xlogid boundaries, ie. just after finishing a WAL > segment whose filename ends with "FE".
Your theory is correct, it happens at xlogid boundaries. The missing information is that AFAICT it can only happen if pg_basebackup is run against a slave, and never on the master. I've applied a patch that just accepts this case, and ignores it. Originally I had pg_basebackup write a warning in that case, but on second thought I think that's just wrong - it will send out warning messages in cases that are absolutely normal. I'm not going to bother with a backend side patch, since this is mostly harmless (it sends a single packet of an extra 25 bytes in what's usually a large backup, so it doesn't matter), and it's all gone in 9.3 anyway. And in 9.1 and earlier, the support isn't there. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers