On Tue, Apr 09, 2024 at 10:25:36AM +0000, Hayato Kuroda (Fujitsu) wrote: >> On Fri, Apr 05, 2024 at 06:22:58AM +0000, Hayato Kuroda (Fujitsu) wrote: >> I'm slightly annoyed by the fact that there is no check on >> is_prim_bucket_same_wrt for buffer 1 in the BLK_NEEDS_REDO case to >> show the symmetry between the insert and replay paths. Shouldn't >> there be at least an assert for that in the branch when there are no >> tuples (imagine an else to cover xldata->ntups == 0)? I mean between >> just before updating the hash page's opaque area when >> is_prev_bucket_same_wrt. > > Indeed, added an Assert() in else part. Was it same as your expectation?
Yep, close enough. Thanks to the insert path we know that there will be no tuples if (is_prim_bucket_same_wrt || is_prev_bucket_same_wrt), and the replay path where the assertion is added. >> I've been thinking about ways to make such cases detectable in an >> automated fashion. The best choice would be >> verifyBackupPageConsistency(), just after RestoreBlockImage() on the >> copy of the block image stored in WAL before applying the page masking >> which would mask the LSN. A problem, unfortunately, is that we would >> need to transfer the knowledge of REGBUF_NO_CHANGE as a new BKPIMAGE_* >> flag so we would be able to track if the block rebuilt from the record >> has the *same* LSN as the copy used for the consistency check. So >> this edge consistency case would come at a cost, I am afraid, and the >> 8 bits of bimg_info are precious :/ > > I could not decide that the change has more benefit than its cost, so I did > nothing for it. It does not prevent doing an implementation that can be used for some test validation in the context of this thread. Using this idea, I have done the attached 0002. This is not something to merge into the tree, of course, just a toy to: - Validate the fix for the problem we know. - More braodly, check if there are other areas covered by the main regression test suite if there are other problems. And from what I can see, applying 0002 without 0001 causes the following test to fail as the standby chokes on a inconsistent LSN with a FATAL because the LSN of the apply page coming from the primary and the LSN saved in the page replayed don't match. Here is a command to reproduce the problem: cd src/test/recovery/ && \ PG_TEST_EXTRA=wal_consistency_checking \ PROVE_TESTS=t/027_stream_regress.pl make check And then in the logs you'd see stuff like: 2024-04-10 16:52:21.844 JST [2437] FATAL: inconsistent page LSN replayed 0/A7E5CD18 primary 0/A7E56348 2024-04-10 16:52:21.844 JST [2437] CONTEXT: WAL redo at 0/A7E59F68 for Hash/SQUEEZE_PAGE: prevblkno 28, nextblkno 4294967295, ntups 0, is_primary T; blkref #1: rel 1663/16384/25434, blk 7 FPW; blkref #2: rel 1663/16384/25434, blk 29 FPW; blkref #3: rel 1663/16384/25434, blk 28 FPW; blkref #5: rel 1663/16384/25434, blk 9 FPW; blkref #6: rel 1663/16384/25434, blk 0 FPW I don't see other areas with a similar problem, at the extent of the core regression tests, that is to say. That's my way to say that your patch looks good to me and that I'm planning to apply it to fix the issue. This shows me a more interesting issue unrelated to this thread: 027_stream_regress.pl would be stuck if the standby finds an inconsistent page under wal_consistency_checking. This needs to be fixed before I'm able to create a buildfarm animal with this setup. I'll spawn a new thread about that tomorrow. -- Michael
From 4c6597f2ce57439696aecdbda08b374857ab715d Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Wed, 10 Apr 2024 16:47:29 +0900 Subject: [PATCH v3 1/2] Fix inconsistency with replay of hash squeeze record for clean buffers Blah. --- src/backend/access/hash/hash_xlog.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c index cb1a63cfee..d7ba74579b 100644 --- a/src/backend/access/hash/hash_xlog.c +++ b/src/backend/access/hash/hash_xlog.c @@ -666,6 +666,7 @@ hash_xlog_squeeze_page(XLogReaderState *record) char *data; Size datalen; uint16 ninserted = 0; + bool mod_wbuf = false; data = begin = XLogRecGetBlockData(record, 1, &datalen); @@ -695,6 +696,17 @@ hash_xlog_squeeze_page(XLogReaderState *record) ninserted++; } + + mod_wbuf = true; + } + else + { + /* + * See _hash_freeovflpage() which has a similar assertion when + * there are no tuples. + */ + Assert(xldata->is_prim_bucket_same_wrt || + xldata->is_prev_bucket_same_wrt); } /* @@ -711,10 +723,15 @@ hash_xlog_squeeze_page(XLogReaderState *record) HashPageOpaque writeopaque = HashPageGetOpaque(writepage); writeopaque->hasho_nextblkno = xldata->nextblkno; + mod_wbuf = true; } - PageSetLSN(writepage, lsn); - MarkBufferDirty(writebuf); + /* Set LSN and mark writebuf dirty iff it is modified */ + if (mod_wbuf) + { + PageSetLSN(writepage, lsn); + MarkBufferDirty(writebuf); + } } /* replay the record for initializing overflow buffer */ -- 2.43.0
From 22d5d6bbc093a80688ff4e9d4b5865c8fde37d9c Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Wed, 10 Apr 2024 16:48:21 +0900 Subject: [PATCH v3 2/2] Implement consistency check with clean buffers for wal_consistency_checking This extends WAL records to track if a buffer registered in a record was "clean", aka it is registered without being modified. This is done by adding a new flag called BKPIMAGE_CLEAN, recorded when a block uses REGBUF_NO_CHANGE. This uses a bit in a record for nothing, so while it is useful for sanity checks, this should never *ever* be applied as-is. --- src/include/access/xlogreader.h | 3 +++ src/include/access/xlogrecord.h | 2 ++ src/backend/access/rmgrdesc/xlogdesc.c | 11 +++++++++-- src/backend/access/transam/xloginsert.c | 4 ++++ src/backend/access/transam/xlogreader.c | 1 + src/backend/access/transam/xlogrecovery.c | 15 +++++++++++++++ 6 files changed, 34 insertions(+), 2 deletions(-) diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h index 2e9e5f43eb..6e42346c31 100644 --- a/src/include/access/xlogreader.h +++ b/src/include/access/xlogreader.h @@ -135,6 +135,7 @@ typedef struct /* Information on full-page image, if any */ bool has_image; /* has image, even for consistency checking */ bool apply_image; /* has image that should be restored */ + bool clean_image; /* image cleanly registered */ char *bkp_image; uint16 hole_offset; uint16 hole_length; @@ -424,6 +425,8 @@ extern bool DecodeXLogRecord(XLogReaderState *state, ((decoder)->record->blocks[block_id].has_image) #define XLogRecBlockImageApply(decoder, block_id) \ ((decoder)->record->blocks[block_id].apply_image) +#define XLogRecBlockImageClean(decoder, block_id) \ + ((decoder)->record->blocks[block_id].clean_image) #define XLogRecHasBlockData(decoder, block_id) \ ((decoder)->record->blocks[block_id].has_data) diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h index b9e5c59fae..c38de8a96e 100644 --- a/src/include/access/xlogrecord.h +++ b/src/include/access/xlogrecord.h @@ -157,6 +157,8 @@ typedef struct XLogRecordBlockImageHeader #define BKPIMAGE_HAS_HOLE 0x01 /* page image has "hole" */ #define BKPIMAGE_APPLY 0x02 /* page image should be restored * during replay */ +#define BKPIMAGE_CLEAN 0x20 /* clean buffer registered */ + /* compression methods supported */ #define BKPIMAGE_COMPRESS_PGLZ 0x04 #define BKPIMAGE_COMPRESS_LZ4 0x08 diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index e455400716..f045dfe8a0 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -272,8 +272,10 @@ XLogRecGetBlockRefInfo(XLogReaderState *record, bool pretty, method = "unknown"; appendStringInfo(buf, - " (FPW%s); hole: offset: %u, length: %u, " + " (FPW%s%s); hole: offset: %u, length: %u, " "compression saved: %u, method: %s", + XLogRecBlockImageClean(record, block_id) ? + "" : " clean", XLogRecBlockImageApply(record, block_id) ? "" : " for WAL verification", XLogRecGetBlock(record, block_id)->hole_offset, @@ -286,7 +288,9 @@ XLogRecGetBlockRefInfo(XLogReaderState *record, bool pretty, else { appendStringInfo(buf, - " (FPW%s); hole: offset: %u, length: %u", + " (FPW%s%s); hole: offset: %u, length: %u", + XLogRecBlockImageClean(record, block_id) ? + "" : " clean", XLogRecBlockImageApply(record, block_id) ? "" : " for WAL verification", XLogRecGetBlock(record, block_id)->hole_offset, @@ -329,6 +333,9 @@ XLogRecGetBlockRefInfo(XLogReaderState *record, bool pretty, appendStringInfoString(buf, " FPW"); else appendStringInfoString(buf, " FPW for WAL verification"); + + if (XLogRecBlockImageClean(record, block_id)) + appendStringInfoString(buf, " clean"); } } } diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 9047601534..227c83c477 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -720,6 +720,10 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, if (needs_backup) bimg.bimg_info |= BKPIMAGE_APPLY; + /* If buffer is clean, register this information */ + if (regbuf->flags & REGBUF_NO_CHANGE) + bimg.bimg_info |= BKPIMAGE_CLEAN; + if (is_compressed) { /* The current compression is stored in the WAL record */ diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 37d2a57961..6737cbe9e3 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -1791,6 +1791,7 @@ DecodeXLogRecord(XLogReaderState *state, COPY_HEADER_FIELD(&blk->bimg_info, sizeof(uint8)); blk->apply_image = ((blk->bimg_info & BKPIMAGE_APPLY) != 0); + blk->clean_image = ((blk->bimg_info & BKPIMAGE_CLEAN) != 0); if (BKPIMAGE_COMPRESSED(blk->bimg_info)) { diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index b2fe2d04cc..98ecb85c0b 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -2551,6 +2551,21 @@ verifyBackupPageConsistency(XLogReaderState *record) (errcode(ERRCODE_INTERNAL_ERROR), errmsg_internal("%s", record->errormsg_buf))); + /* + * If the replayed block was registered while clean, check that its + * LSN matches with the copy coming from the primary. + */ + if (XLogRecBlockImageClean(record, block_id)) + { + XLogRecPtr primary_lsn = PageGetLSN((Page) primary_image_masked); + XLogRecPtr replay_lsn = PageGetLSN((Page) replay_image_masked); + + if (primary_lsn != replay_lsn) + elog(FATAL, "inconsistent page LSN replayed %X/%X primary %X/%X", + LSN_FORMAT_ARGS(replay_lsn), + LSN_FORMAT_ARGS(primary_lsn)); + } + /* * If masking function is defined, mask both the primary and replay * images -- 2.43.0
signature.asc
Description: PGP signature