On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark <st...@mit.edu> wrote:
> On 10 August 2017 at 15:26, Chris Travers <chris.trav...@gmail.com> wrote:
> > The bitwise comparison is interesting. Remember the error was:
> > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected
> > pageaddr 1E375/61118000 in log segment 000000000001E39C000000E1, offset
> > 1146880
> > Since this didn't throw a checksum error (we have data checksums
> disabled but wal records ISTR have a separate CRC check), would this
> perhaps indicate that the checksum operated over incorrect data?
> No checksum error and this "unexpected pageaddr" doesn't necessarily
> mean data corruption. It could mean that when the database stopped logging
> it was reusing a wal file and the old wal stream had a record boundary
> on the same byte position. So the previous record checksum passed and
> the following record checksum passes but the record header is for a
> different wal stream position.
I expect to test this theory shortly.
Assuming it is correct, what can we do to prevent restarts of slaves from
running into it?
> I think you could actually hack xlogdump to ignore this condition and
> keep outputting and you'll see whether the records that follow appear
> to be old wal log data. I haven't actually tried this though.
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor