On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark <st...@mit.edu> wrote: > On 10 August 2017 at 15:26, Chris Travers <chris.trav...@gmail.com> wrote: > > > > > > The bitwise comparison is interesting. Remember the error was: > > > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected > > pageaddr 1E375/61118000 in log segment 000000000001E39C000000E1, offset > > 1146880 > ... > > Since this didn't throw a checksum error (we have data checksums > disabled but wal records ISTR have a separate CRC check), would this > perhaps indicate that the checksum operated over incorrect data? > > No checksum error and this "unexpected pageaddr" doesn't necessarily > mean data corruption. It could mean that when the database stopped logging > it was reusing a wal file and the old wal stream had a record boundary > on the same byte position. So the previous record checksum passed and > the following record checksum passes but the record header is for a > different wal stream position. >
I expect to test this theory shortly. Assuming it is correct, what can we do to prevent restarts of slaves from running into it? > I think you could actually hack xlogdump to ignore this condition and > keep outputting and you'll see whether the records that follow appear > to be old wal log data. I haven't actually tried this though. > > -- > greg > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more