On Saturday, May 17, 2014, Heikki Linnakangas <hlinnakan...@vmware.com> wrote:
> On 05/17/2014 12:28 AM, Jeff Janes wrote: > >> More fun with my torn page injection test program on 9.4. >> >> 24171 2014-05-16 14:00:44.934 PDT:WARNING: 01000: page verification >> failed, calculated checksum 21100 but expected 3356 >> 24171 2014-05-16 14:00:44.934 PDT:CONTEXT: xlog redo split_l: rel >> 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright >> 192 >> 24171 2014-05-16 14:00:44.934 PDT:LOCATION: PageIsVerified, >> bufpage.c:145 >> 24171 2014-05-16 14:00:44.934 PDT:FATAL: XX001: invalid page in block >> 34666 of relation base/16384/16405 >> 24171 2014-05-16 14:00:44.934 PDT:CONTEXT: xlog redo split_l: rel >> 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright >> 192 >> 24171 2014-05-16 14:00:44.934 PDT:LOCATION: ReadBuffer_common, >> bufmgr.c:483 >> >> >> I've seen this twice now, the checksum failure was both times for the >> block >> labelled "next" in the redo record. Is this another case where the block >> needs to be reinitialized upon replay? >> > > Hmm, it looks like I fumbled the numbering of the backup blocks in the > b-tree split WAL record (in 9.4). I blame the comments; the comments where > the record is generated numbers the backup blocks starting from 1, but > XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number them > starting from 0. > > Attached is a patch that I think fixes them. In addition to the > rnext-reference, clearing the incomplete-split flag in the child page, had > a similar numbering mishap. > The seems to have fixed it. Thanks, Jeff