Re: [HACKERS] Funny WAL corruption issue

Chris Travers Fri, 11 Aug 2017 05:54:32 -0700

On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark <[email protected]> wrote:

> On 10 August 2017 at 15:26, Chris Travers <[email protected]> wrote:
> >
> >
> > The bitwise comparison is interesting.  Remember the error was:
> >
> > pg_xlogdump: FATAL:  error in WAL record at 1E39C/E1117FB8: unexpected
> > pageaddr 1E375/61118000 in log segment 000000000001E39C000000E1, offset
> > 1146880
> ...
> > Since this didn't throw a checksum error (we have data checksums
> disabled but wal records ISTR have a separate CRC check), would this
> perhaps indicate that the checksum operated over incorrect data?
>
> No checksum error and this "unexpected pageaddr" doesn't necessarily
> mean data corruption. It could mean that when the database stopped logging
> it was reusing a wal file and the old wal stream had a record boundary
> on the same byte position. So the previous record checksum passed and
> the following record checksum passes but the record header is for a
> different wal stream position.
>


I expect to test this theory shortly.

Assuming it is correct, what can we do to prevent restarts of slaves from
running into it?


> I think you could actually hack xlogdump to ignore this condition and
> keep outputting and you'll see whether the records that follow appear
> to be old wal log data.  I haven't actually tried this though.
>
> --
> greg
>



-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more

Re: [HACKERS] Funny WAL corruption issue

Reply via email to