On Wed, May 6, 2026 at 5:46 PM Melanie Plageman <[email protected]> wrote: > > I don't see anywhere where something like this is documented. Though > I'm not entirely sure what you mean. In the cases with one VM block > and one heap block, you mean we should have documented that you should > acquire both locks before making changes to either page? > > Thinking about this made me wonder if I should refactor the patches a > bit to acquire the vmbuffer lock before the critical section instead.
I've done this in attached v3. I've only included proposed changes to 18, as this ended up being a pretty invasive refactor that I'd like to get feedback on first. While working on v3, some new tests that were added (by another commit) exposed a bug in this patch. Because it's possible and not incorrect for the VM bits to already be clear in the VM while PD_ALL_VISIBLE is set on the heap page and because visibilitymap_clear() only marks the VM buffer dirty if we actually had to clear the bits, we can end up needing to clear PD_ALL_VISIBLE but not dirtying the VM buffer. This basically worked before we started registering the VM buffer in the WAL record, but now that we do that, we have to be sure we only do so if the buffer has been modified (because the WAL machinery asserts that the registered buffer is dirty). Tests passed before, so we must not have had coverage of this scenario. Unfortunately that requirement makes an already complicated patch more complicated -- especially heap_update(). Fixing that for heap_update() necessitated several new variables to represent the various states. It also creates several complicated scenarios. For example, it is possible that though both new and old pages need PD_ALL_VISIBLE cleared and each of their VM bits were on different pages but the old VM buffer is omitted because it was already clear. This was previously indistinguishable from when old and new pages VM bits are on the same VM buffer. I think this is fixed by checking if old page's VM bits are on vmbuffer_new...but wowza, it's all pretty complicated now. - Melanie
From c2ec92546047b0450e3f1adfcb03accc67ebb63a Mon Sep 17 00:00:00 2001 From: Melanie Plageman <[email protected]> Date: Mon, 4 May 2026 10:41:13 -0400 Subject: [PATCH v3 1/4] Alias WAL block references for some record types A future commit will add block references for pages of the VM to update, insert, delete, lock, and multi-insert WAL records. Before doing that, alias the heap block references so that they are easy to distinguish. Author: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/66mqpfyti3qhfttcsv6r2lbvqqd32rrmpn6i47ovrsnvguts46%40gou54xc> Backpatch through: 17 --- src/backend/access/heap/heapam.c | 43 +++++++++++--------- src/backend/access/heap/heapam_xlog.c | 56 +++++++++++++++++---------- src/include/access/heapam_xlog.h | 32 +++++++++++---- 3 files changed, 86 insertions(+), 45 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 6203e3d7f8d..3c3b65b3cbf 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2218,10 +2218,12 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, * write the whole page to the xlog, we don't need to store * xl_heap_header in the xlog. */ - XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags); - XLogRegisterBufData(0, &xlhdr, SizeOfHeapHeader); + XLogRegisterBuffer(HEAP_INSERT_BLKREF_HEAP, buffer, + REGBUF_STANDARD | bufflags); + XLogRegisterBufData(HEAP_INSERT_BLKREF_HEAP, &xlhdr, + SizeOfHeapHeader); /* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */ - XLogRegisterBufData(0, + XLogRegisterBufData(HEAP_INSERT_BLKREF_HEAP, (char *) heaptup->t_data + SizeofHeapTupleHeader, heaptup->t_len - SizeofHeapTupleHeader); @@ -2619,9 +2621,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, XLogBeginInsert(); XLogRegisterData(xlrec, tupledata - scratch.data); - XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags); + XLogRegisterBuffer(HEAP_MULTI_INSERT_BLKREF_HEAP, buffer, + REGBUF_STANDARD | bufflags); - XLogRegisterBufData(0, tupledata, totaldatalen); + XLogRegisterBufData(HEAP_MULTI_INSERT_BLKREF_HEAP, tupledata, + totaldatalen); /* filtering by origin on a row level is much more efficient */ XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN); @@ -3112,7 +3116,7 @@ l1: XLogBeginInsert(); XLogRegisterData(&xlrec, SizeOfHeapDelete); - XLogRegisterBuffer(0, buffer, REGBUF_STANDARD); + XLogRegisterBuffer(HEAP_DELETE_BLKREF_HEAP, buffer, REGBUF_STANDARD); /* * Log replica identity of the deleted tuple if there is one @@ -3883,7 +3887,7 @@ l2: XLogRecPtr recptr; XLogBeginInsert(); - XLogRegisterBuffer(0, buffer, REGBUF_STANDARD); + XLogRegisterBuffer(HEAP_LOCK_BLKREF_HEAP, buffer, REGBUF_STANDARD); xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self); xlrec.xmax = xmax_lock_old_tuple; @@ -5219,7 +5223,7 @@ failed: XLogRecPtr recptr; XLogBeginInsert(); - XLogRegisterBuffer(0, *buffer, REGBUF_STANDARD); + XLogRegisterBuffer(HEAP_LOCK_BLKREF_HEAP, *buffer, REGBUF_STANDARD); xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self); xlrec.xmax = xid; @@ -5971,7 +5975,7 @@ l4: Page page = BufferGetPage(buf); XLogBeginInsert(); - XLogRegisterBuffer(0, buf, REGBUF_STANDARD); + XLogRegisterBuffer(HEAP_LOCK_BLKREF_HEAP, buf, REGBUF_STANDARD); xlrec.offnum = ItemPointerGetOffsetNumber(&mytup.t_self); xlrec.xmax = new_xmax; @@ -8972,9 +8976,9 @@ log_heap_update(Relation reln, Buffer oldbuf, if (need_tuple_data) bufflags |= REGBUF_KEEP_DATA; - XLogRegisterBuffer(0, newbuf, bufflags); + XLogRegisterBuffer(HEAP_UPDATE_BLKREF_NEW, newbuf, bufflags); if (oldbuf != newbuf) - XLogRegisterBuffer(1, oldbuf, REGBUF_STANDARD); + XLogRegisterBuffer(HEAP_UPDATE_BLKREF_OLD, oldbuf, REGBUF_STANDARD); XLogRegisterData(&xlrec, SizeOfHeapUpdate); @@ -8987,15 +8991,18 @@ log_heap_update(Relation reln, Buffer oldbuf, { prefix_suffix[0] = prefixlen; prefix_suffix[1] = suffixlen; - XLogRegisterBufData(0, &prefix_suffix, sizeof(uint16) * 2); + XLogRegisterBufData(HEAP_UPDATE_BLKREF_NEW, &prefix_suffix, + sizeof(uint16) * 2); } else if (prefixlen > 0) { - XLogRegisterBufData(0, &prefixlen, sizeof(uint16)); + XLogRegisterBufData(HEAP_UPDATE_BLKREF_NEW, &prefixlen, + sizeof(uint16)); } else { - XLogRegisterBufData(0, &suffixlen, sizeof(uint16)); + XLogRegisterBufData(HEAP_UPDATE_BLKREF_NEW, &suffixlen, + sizeof(uint16)); } } @@ -9009,10 +9016,10 @@ log_heap_update(Relation reln, Buffer oldbuf, * * The 'data' doesn't include the common prefix or suffix. */ - XLogRegisterBufData(0, &xlhdr, SizeOfHeapHeader); + XLogRegisterBufData(HEAP_UPDATE_BLKREF_NEW, &xlhdr, SizeOfHeapHeader); if (prefixlen == 0) { - XLogRegisterBufData(0, + XLogRegisterBufData(HEAP_UPDATE_BLKREF_NEW, (char *) newtup->t_data + SizeofHeapTupleHeader, newtup->t_len - SizeofHeapTupleHeader - suffixlen); } @@ -9025,13 +9032,13 @@ log_heap_update(Relation reln, Buffer oldbuf, /* bitmap [+ padding] [+ oid] */ if (newtup->t_data->t_hoff - SizeofHeapTupleHeader > 0) { - XLogRegisterBufData(0, + XLogRegisterBufData(HEAP_UPDATE_BLKREF_NEW, (char *) newtup->t_data + SizeofHeapTupleHeader, newtup->t_data->t_hoff - SizeofHeapTupleHeader); } /* data after common prefix */ - XLogRegisterBufData(0, + XLogRegisterBufData(HEAP_UPDATE_BLKREF_NEW, (char *) newtup->t_data + newtup->t_data->t_hoff + prefixlen, newtup->t_len - newtup->t_data->t_hoff - prefixlen - suffixlen); } diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c index eb4bd3d6ae3..6cc3bc991c3 100644 --- a/src/backend/access/heap/heapam_xlog.c +++ b/src/backend/access/heap/heapam_xlog.c @@ -350,7 +350,8 @@ heap_xlog_delete(XLogReaderState *record) RelFileLocator target_locator; ItemPointerData target_tid; - XLogRecGetBlockTag(record, 0, &target_locator, NULL, &blkno); + XLogRecGetBlockTag(record, HEAP_DELETE_BLKREF_HEAP, &target_locator, NULL, + &blkno); ItemPointerSetBlockNumber(&target_tid, blkno); ItemPointerSetOffsetNumber(&target_tid, xlrec->offnum); @@ -369,7 +370,8 @@ heap_xlog_delete(XLogReaderState *record) FreeFakeRelcacheEntry(reln); } - if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO) + if (XLogReadBufferForRedo(record, HEAP_DELETE_BLKREF_HEAP, + &buffer) == BLK_NEEDS_REDO) { page = BufferGetPage(buffer); @@ -434,7 +436,8 @@ heap_xlog_insert(XLogReaderState *record) ItemPointerData target_tid; XLogRedoAction action; - XLogRecGetBlockTag(record, 0, &target_locator, NULL, &blkno); + XLogRecGetBlockTag(record, HEAP_INSERT_BLKREF_HEAP, &target_locator, NULL, + &blkno); ItemPointerSetBlockNumber(&target_tid, blkno); ItemPointerSetOffsetNumber(&target_tid, xlrec->offnum); @@ -462,13 +465,14 @@ heap_xlog_insert(XLogReaderState *record) */ if (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) { - buffer = XLogInitBufferForRedo(record, 0); + buffer = XLogInitBufferForRedo(record, HEAP_INSERT_BLKREF_HEAP); page = BufferGetPage(buffer); PageInit(page, BufferGetPageSize(buffer), 0); action = BLK_NEEDS_REDO; } else - action = XLogReadBufferForRedo(record, 0, &buffer); + action = XLogReadBufferForRedo(record, HEAP_INSERT_BLKREF_HEAP, + &buffer); if (action == BLK_NEEDS_REDO) { Size datalen; @@ -479,7 +483,7 @@ heap_xlog_insert(XLogReaderState *record) if (PageGetMaxOffsetNumber(page) + 1 < xlrec->offnum) elog(PANIC, "invalid max offset number"); - data = XLogRecGetBlockData(record, 0, &datalen); + data = XLogRecGetBlockData(record, HEAP_INSERT_BLKREF_HEAP, &datalen); newlen = datalen - SizeOfHeapHeader; Assert(datalen > SizeOfHeapHeader && newlen <= MaxHeapTupleSize); @@ -559,7 +563,8 @@ heap_xlog_multi_insert(XLogReaderState *record) */ xlrec = (xl_heap_multi_insert *) XLogRecGetData(record); - XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno); + XLogRecGetBlockTag(record, HEAP_MULTI_INSERT_BLKREF_HEAP, &rlocator, NULL, + &blkno); /* check that the mutually exclusive flags are not both set */ Assert(!((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) && @@ -582,13 +587,14 @@ heap_xlog_multi_insert(XLogReaderState *record) if (isinit) { - buffer = XLogInitBufferForRedo(record, 0); + buffer = XLogInitBufferForRedo(record, HEAP_MULTI_INSERT_BLKREF_HEAP); page = BufferGetPage(buffer); PageInit(page, BufferGetPageSize(buffer), 0); action = BLK_NEEDS_REDO; } else - action = XLogReadBufferForRedo(record, 0, &buffer); + action = XLogReadBufferForRedo(record, HEAP_MULTI_INSERT_BLKREF_HEAP, + &buffer); if (action == BLK_NEEDS_REDO) { char *tupdata; @@ -596,7 +602,8 @@ heap_xlog_multi_insert(XLogReaderState *record) Size len; /* Tuples are stored as block data */ - tupdata = XLogRecGetBlockData(record, 0, &len); + tupdata = XLogRecGetBlockData(record, HEAP_MULTI_INSERT_BLKREF_HEAP, + &len); endptr = tupdata + len; page = (Page) BufferGetPage(buffer); @@ -713,8 +720,10 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) oldtup.t_data = NULL; oldtup.t_len = 0; - XLogRecGetBlockTag(record, 0, &rlocator, NULL, &newblk); - if (XLogRecGetBlockTagExtended(record, 1, NULL, NULL, &oldblk, NULL)) + XLogRecGetBlockTag(record, HEAP_UPDATE_BLKREF_NEW, &rlocator, NULL, + &newblk); + if (XLogRecGetBlockTagExtended(record, HEAP_UPDATE_BLKREF_OLD, NULL, NULL, + &oldblk, NULL)) { /* HOT updates are never done across pages */ Assert(!hot_update); @@ -750,7 +759,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) */ /* Deal with old tuple version */ - oldaction = XLogReadBufferForRedo(record, (oldblk == newblk) ? 0 : 1, + oldaction = XLogReadBufferForRedo(record, (oldblk == newblk) ? + HEAP_UPDATE_BLKREF_NEW : HEAP_UPDATE_BLKREF_OLD, &obuffer); if (oldaction == BLK_NEEDS_REDO) { @@ -800,13 +810,14 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) } else if (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) { - nbuffer = XLogInitBufferForRedo(record, 0); + nbuffer = XLogInitBufferForRedo(record, HEAP_UPDATE_BLKREF_NEW); page = (Page) BufferGetPage(nbuffer); PageInit(page, BufferGetPageSize(nbuffer), 0); newaction = BLK_NEEDS_REDO; } else - newaction = XLogReadBufferForRedo(record, 0, &nbuffer); + newaction = XLogReadBufferForRedo(record, HEAP_UPDATE_BLKREF_NEW, + &nbuffer); /* * The visibility map may need to be fixed even if the heap page is @@ -831,7 +842,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) Size datalen; Size tuplen; - recdata = XLogRecGetBlockData(record, 0, &datalen); + recdata = XLogRecGetBlockData(record, HEAP_UPDATE_BLKREF_NEW, + &datalen); recdata_end = recdata + datalen; page = BufferGetPage(nbuffer); @@ -1015,7 +1027,8 @@ heap_xlog_lock(XLogReaderState *record) BlockNumber block; Relation reln; - XLogRecGetBlockTag(record, 0, &rlocator, NULL, &block); + XLogRecGetBlockTag(record, HEAP_LOCK_BLKREF_HEAP, &rlocator, NULL, + &block); reln = CreateFakeRelcacheEntry(rlocator); visibilitymap_pin(reln, block, &vmbuffer); @@ -1025,7 +1038,8 @@ heap_xlog_lock(XLogReaderState *record) FreeFakeRelcacheEntry(reln); } - if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO) + if (XLogReadBufferForRedo(record, HEAP_LOCK_BLKREF_HEAP, + &buffer) == BLK_NEEDS_REDO) { page = (Page) BufferGetPage(buffer); @@ -1091,7 +1105,8 @@ heap_xlog_lock_updated(XLogReaderState *record) BlockNumber block; Relation reln; - XLogRecGetBlockTag(record, 0, &rlocator, NULL, &block); + XLogRecGetBlockTag(record, HEAP_LOCK_BLKREF_HEAP, &rlocator, NULL, + &block); reln = CreateFakeRelcacheEntry(rlocator); visibilitymap_pin(reln, block, &vmbuffer); @@ -1101,7 +1116,8 @@ heap_xlog_lock_updated(XLogReaderState *record) FreeFakeRelcacheEntry(reln); } - if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO) + if (XLogReadBufferForRedo(record, HEAP_LOCK_BLKREF_HEAP, + &buffer) == BLK_NEEDS_REDO) { page = BufferGetPage(buffer); diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index 277df6b3cf0..55e3c7b0015 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -110,6 +110,9 @@ (XLH_DELETE_CONTAINS_OLD_TUPLE | XLH_DELETE_CONTAINS_OLD_KEY) /* This is what we need to know about delete */ +#define HEAP_DELETE_BLKREF_HEAP 0 +#define HEAP_DELETE_BLKREF_VM 1 + typedef struct xl_heap_delete { TransactionId xmax; /* xmax of the deleted tuple */ @@ -157,12 +160,15 @@ typedef struct xl_heap_header #define SizeOfHeapHeader (offsetof(xl_heap_header, t_hoff) + sizeof(uint8)) /* This is what we need to know about insert */ +#define HEAP_INSERT_BLKREF_HEAP 0 +#define HEAP_INSERT_BLKREF_VM 1 + typedef struct xl_heap_insert { OffsetNumber offnum; /* inserted tuple's offset */ uint8 flags; - /* xl_heap_header & TUPLE DATA in backup block 0 */ + /* xl_heap_header & TUPLE DATA in HEAP_INSERT_BLKREF_HEAP */ } xl_heap_insert; #define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint8)) @@ -173,10 +179,14 @@ typedef struct xl_heap_insert * The main data of the record consists of this xl_heap_multi_insert header. * 'offsets' array is omitted if the whole page is reinitialized * (XLOG_HEAP_INIT_PAGE). - * - * In block 0's data portion, there is an xl_multi_insert_tuple struct, - * followed by the tuple data for each tuple. There is padding to align - * each xl_multi_insert_tuple struct. + */ +#define HEAP_MULTI_INSERT_BLKREF_HEAP 0 +#define HEAP_MULTI_INSERT_BLKREF_VM 1 + +/* + * In HEAP_MULTI_INSERT_BLKREF_HEAP's data portion, there is an + * xl_multi_insert_tuple struct, followed by the tuple data for each tuple. + * There is padding to align each xl_multi_insert_tuple struct. */ typedef struct xl_heap_multi_insert { @@ -201,7 +211,7 @@ typedef struct xl_multi_insert_tuple /* * This is what we need to know about update|hot_update * - * Backup blk 0: new page + * HEAP_UPDATE_BLKREF_NEW: new page * * If XLH_UPDATE_PREFIX_FROM_OLD or XLH_UPDATE_SUFFIX_FROM_OLD flags are set, * the prefix and/or suffix come first, as one or two uint16s. @@ -213,8 +223,13 @@ typedef struct xl_multi_insert_tuple * If XLH_UPDATE_CONTAINS_NEW_TUPLE flag is given, the tuple data is * included even if a full-page image was taken. * - * Backup blk 1: old page, if different. (no data, just a reference to the blk) + * HEAP_UPDATE_BLKREF_OLD: old page, if different. (no data, just a reference + * to the block) */ + +#define HEAP_UPDATE_BLKREF_NEW 0 +#define HEAP_UPDATE_BLKREF_OLD 1 + typedef struct xl_heap_update { TransactionId old_xmax; /* xmax of the old tuple */ @@ -393,6 +408,9 @@ typedef struct xlhp_prune_items #define XLH_LOCK_ALL_FROZEN_CLEARED 0x01 /* This is what we need to know about lock */ +#define HEAP_LOCK_BLKREF_HEAP 0 +#define HEAP_LOCK_BLKREF_VM 1 + typedef struct xl_heap_lock { TransactionId xmax; /* might be a MultiXactId */ -- 2.43.0
From 9358530be17b2e6d84bb1956dc09afadd817e5df Mon Sep 17 00:00:00 2001 From: Melanie Plageman <[email protected]> Date: Wed, 29 Apr 2026 12:53:49 -0400 Subject: [PATCH v3 2/4] Fix WAL logging of VM clears Previously, we failed to register the visibility map buffer when emitting WAL after clearing the VM bits for heap pages (recovery code read the VM page normally). This meant that we couldn't emit FPIs of VM pages when clearing VM bits. While this would not result in a checksum error because the visibility map is read with ZERO_ON_ERROR, the WAL summarizer only includes pages which were registered, so a restore from an incremental backup would not include the modified VM pages -- leading to data corruption. Fix this by registering the VM buffer in the WAL record when clearing VM bits. This also means the VM buffer must be locked throughout the duration of the critical section where we make changes to the VM and heap page and emit WAL. Author: Melanie Plageman <[email protected]> Author: Andres Freund <[email protected]> Discussion: https://postgr.es/m/flat/CAAKRu_bn%2Be7F4yPFBgFbnP%2BsyJRKyNK092bjD2LKvZW7O4Svag> Backpatch-through: 17 --- contrib/pg_surgery/heap_surgery.c | 39 ++- src/backend/access/heap/heapam.c | 358 ++++++++++++++++++++---- src/backend/access/heap/heapam_xlog.c | 219 ++++++++++----- src/backend/access/heap/visibilitymap.c | 23 +- src/bin/pg_walsummary/t/002_blocks.pl | 7 +- src/include/access/heapam_xlog.h | 17 +- src/include/access/visibilitymap.h | 2 + 7 files changed, 520 insertions(+), 145 deletions(-) diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c index 602aca66c60..2f89fc1159f 100644 --- a/contrib/pg_surgery/heap_surgery.c +++ b/contrib/pg_surgery/heap_surgery.c @@ -17,6 +17,7 @@ #include "access/visibilitymap.h" #include "access/xloginsert.h" #include "catalog/pg_am_d.h" +#include "catalog/pg_control.h" #include "miscadmin.h" #include "storage/bufmgr.h" #include "utils/acl.h" @@ -146,6 +147,7 @@ heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt) { Buffer buf; Buffer vmbuf = InvalidBuffer; + bool unlock_vmbuf = false; Page page; BlockNumber blkno; OffsetNumber curoff; @@ -233,11 +235,15 @@ heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt) } /* - * Before entering the critical section, pin the visibility map page - * if it appears to be necessary. + * Before entering the critical section, pin and lock the visibility + * map page if it appears to be necessary. */ if (heap_force_opt == HEAP_FORCE_KILL && PageIsAllVisible(page)) + { visibilitymap_pin(rel, blkno, &vmbuf); + LockBuffer(vmbuf, BUFFER_LOCK_EXCLUSIVE); + unlock_vmbuf = true; + } /* No ereport(ERROR) from here until all the changes are logged. */ START_CRIT_SECTION(); @@ -266,10 +272,11 @@ heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt) */ if (PageIsAllVisible(page)) { + if (visibilitymap_clear_locked(rel, blkno, vmbuf, + VISIBILITYMAP_VALID_BITS)) + did_modify_vm = true; + PageClearAllVisible(page); - visibilitymap_clear(rel, blkno, vmbuf, - VISIBILITYMAP_VALID_BITS); - did_modify_vm = true; } } else @@ -320,18 +327,28 @@ heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt) /* XLOG stuff */ if (RelationNeedsWAL(rel)) - log_newpage_buffer(buf, true); + { + XLogRecPtr recptr; + + XLogBeginInsert(); + XLogRegisterBuffer(0, buf, REGBUF_STANDARD | REGBUF_FORCE_IMAGE); + if (did_modify_vm) + XLogRegisterBuffer(1, vmbuf, REGBUF_FORCE_IMAGE); + recptr = XLogInsert(RM_XLOG_ID, XLOG_FPI); + if (did_modify_vm) + PageSetLSN(BufferGetPage(vmbuf), recptr); + PageSetLSN(BufferGetPage(buf), recptr); + } } - /* WAL log the VM page if it was modified. */ - if (did_modify_vm && RelationNeedsWAL(rel)) - log_newpage_buffer(vmbuf, false); - END_CRIT_SECTION(); UnlockReleaseBuffer(buf); - if (vmbuf != InvalidBuffer) + if (unlock_vmbuf) + LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK); + + if (BufferIsValid(vmbuf)) ReleaseBuffer(vmbuf); /* Update the current_start_ptr before moving to the next page. */ diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 3c3b65b3cbf..7dfa3221652 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -58,7 +58,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, CommandId cid, int options); static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf, - Buffer newbuf, HeapTuple oldtup, + Buffer vmbuffer_old, Buffer newbuf, + Buffer vmbuffer_new, HeapTuple oldtup, HeapTuple newtup, HeapTuple old_key_tuple, bool all_visible_cleared, bool new_all_visible_cleared); #ifdef USE_ASSERT_CHECKING @@ -2083,8 +2084,10 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, TransactionId xid = GetCurrentTransactionId(); HeapTuple heaptup; Buffer buffer; + Page page; Buffer vmbuffer = InvalidBuffer; bool all_visible_cleared = false; + bool vmbuffer_modified = false; /* Cheap, simplistic check that the tuple matches the rel's rowtype. */ Assert(HeapTupleHeaderGetNatts(tup->t_data) <= @@ -2108,6 +2111,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, InvalidBuffer, options, bistate, &vmbuffer, NULL, 0); + page = BufferGetPage(buffer); /* * We're about to do the actual insert -- but check for conflict first, to @@ -2126,19 +2130,28 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, */ CheckForSerializableConflictIn(relation, NULL, InvalidBlockNumber); + /* Lock the vmbuffer before the critical section */ + if (PageIsAllVisible(page)) + { + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + all_visible_cleared = true; + } + /* NO EREPORT(ERROR) from here till changes are logged */ START_CRIT_SECTION(); RelationPutHeapTuple(relation, buffer, heaptup, (options & HEAP_INSERT_SPECULATIVE) != 0); - if (PageIsAllVisible(BufferGetPage(buffer))) + if (all_visible_cleared) { - all_visible_cleared = true; - PageClearAllVisible(BufferGetPage(buffer)); - visibilitymap_clear(relation, - ItemPointerGetBlockNumber(&(heaptup->t_self)), - vmbuffer, VISIBILITYMAP_VALID_BITS); + /* It's possible the VM bits were already clear */ + if (visibilitymap_clear_locked(relation, + ItemPointerGetBlockNumber(&(heaptup->t_self)), + vmbuffer, VISIBILITYMAP_VALID_BITS)) + vmbuffer_modified = true; + + PageClearAllVisible(page); } /* @@ -2160,7 +2173,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, xl_heap_insert xlrec; xl_heap_header xlhdr; XLogRecPtr recptr; - Page page = BufferGetPage(buffer); uint8 info = XLOG_HEAP_INSERT; int bufflags = 0; @@ -2230,15 +2242,28 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, /* filtering by origin on a row level is much more efficient */ XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN); + if (vmbuffer_modified) + XLogRegisterBuffer(HEAP_INSERT_BLKREF_VM, vmbuffer, 0); + recptr = XLogInsert(RM_HEAP_ID, info); PageSetLSN(page, recptr); + + if (vmbuffer_modified) + PageSetLSN(BufferGetPage(vmbuffer), recptr); } END_CRIT_SECTION(); UnlockReleaseBuffer(buffer); - if (vmbuffer != InvalidBuffer) + + /* + * We locked vmbuffer if all_visible_cleared was true regardless of + * whether or not we ended up modifying it. + */ + if (all_visible_cleared) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + if (BufferIsValid(vmbuffer)) ReleaseBuffer(vmbuffer); /* @@ -2421,6 +2446,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, Buffer buffer; bool all_visible_cleared = false; bool all_frozen_set = false; + bool vmbuffer_modified = false; int nthispage; CHECK_FOR_INTERRUPTS(); @@ -2462,6 +2488,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN)) all_frozen_set = true; + /* + * If clearing all-visible, take the VM buffer lock before entering + * the critical section where that action will be WAL-logged. Setting + * the VM all-frozen is done and WAL-logged separately. + */ + if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN)) + { + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + all_visible_cleared = true; + } + /* NO EREPORT(ERROR) from here till changes are logged */ START_CRIT_SECTION(); @@ -2502,13 +2539,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, * If we're only adding already frozen rows to a previously empty * page, mark it as all-visible. */ - if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN)) + if (all_visible_cleared) { - all_visible_cleared = true; + if (visibilitymap_clear_locked(relation, + BufferGetBlockNumber(buffer), + vmbuffer, VISIBILITYMAP_VALID_BITS)) + vmbuffer_modified = true; + PageClearAllVisible(page); - visibilitymap_clear(relation, - BufferGetBlockNumber(buffer), - vmbuffer, VISIBILITYMAP_VALID_BITS); } else if (all_frozen_set) PageSetAllVisible(page); @@ -2623,6 +2661,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, XLogRegisterData(xlrec, tupledata - scratch.data); XLogRegisterBuffer(HEAP_MULTI_INSERT_BLKREF_HEAP, buffer, REGBUF_STANDARD | bufflags); + if (vmbuffer_modified) + XLogRegisterBuffer(HEAP_MULTI_INSERT_BLKREF_VM, vmbuffer, 0); XLogRegisterBufData(HEAP_MULTI_INSERT_BLKREF_HEAP, tupledata, totaldatalen); @@ -2633,10 +2673,19 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples, recptr = XLogInsert(RM_HEAP2_ID, info); PageSetLSN(page, recptr); + if (vmbuffer_modified) + PageSetLSN(BufferGetPage(vmbuffer), recptr); } END_CRIT_SECTION(); + /* + * We locked vmbuffer if all_visible_cleared was true regardless of + * whether or not we ended up modifying it. + */ + if (all_visible_cleared) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + /* * If we've frozen everything on the page, update the visibilitymap. * We're already holding pin on the vmbuffer. @@ -2786,6 +2835,7 @@ heap_delete(Relation relation, ItemPointer tid, BlockNumber block; Buffer buffer; Buffer vmbuffer = InvalidBuffer; + bool vmbuffer_modified = false; TransactionId new_xmax; uint16 new_infomask, new_infomask2; @@ -3040,6 +3090,13 @@ l1: xid, LockTupleExclusive, true, &new_xmax, &new_infomask, &new_infomask2); + /* Lock the VM before entering the critical section */ + if (PageIsAllVisible(page)) + { + all_visible_cleared = true; + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + } + START_CRIT_SECTION(); /* @@ -3051,12 +3108,14 @@ l1: */ PageSetPrunable(page, xid); - if (PageIsAllVisible(page)) + if (all_visible_cleared) { - all_visible_cleared = true; + /* It's possible the VM bits were already clear */ + if (visibilitymap_clear_locked(relation, BufferGetBlockNumber(buffer), + vmbuffer, VISIBILITYMAP_VALID_BITS)) + vmbuffer_modified = true; + PageClearAllVisible(page); - visibilitymap_clear(relation, BufferGetBlockNumber(buffer), - vmbuffer, VISIBILITYMAP_VALID_BITS); } /* store transaction information of xact deleting the tuple */ @@ -3137,13 +3196,27 @@ l1: /* filtering by origin on a row level is much more efficient */ XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN); + if (vmbuffer_modified) + XLogRegisterBuffer(HEAP_DELETE_BLKREF_VM, vmbuffer, 0); + recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE); PageSetLSN(page, recptr); + + if (vmbuffer_modified) + PageSetLSN(BufferGetPage(vmbuffer), recptr); } END_CRIT_SECTION(); + /* + * Release VM lock first, since it covers many heap blocks. We locked + * vmbuffer if all_visible_cleared was true regardless of whether or not + * we ended up modifying it. + */ + if (all_visible_cleared) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); if (vmbuffer != InvalidBuffer) @@ -3261,13 +3334,16 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, HeapTuple heaptup; HeapTuple old_key_tuple = NULL; bool old_key_copied = false; - Page page; + Page page, + newpage; BlockNumber block; MultiXactStatus mxact_status; Buffer buffer, newbuf, vmbuffer = InvalidBuffer, vmbuffer_new = InvalidBuffer; + bool unlock_vmbuffer = false; + bool unlock_vmbuffer_new = false; bool need_toast; Size newtupsize, pagefree; @@ -3278,6 +3354,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup, bool key_intact; bool all_visible_cleared = false; bool all_visible_cleared_new = false; + bool vmbuffer_modified = false; + bool vmbuffer_new_modified = false; bool checked_lockers; bool locker_remains; bool id_has_external = false; @@ -3852,6 +3930,12 @@ l2: Assert(HEAP_XMAX_IS_LOCKED_ONLY(infomask_lock_old_tuple)); + if (PageIsAllVisible(page)) + { + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + unlock_vmbuffer = true; + } + START_CRIT_SECTION(); /* Clear obsolete visibility flags ... */ @@ -3874,10 +3958,12 @@ l2: * overhead would be unchanged, that doesn't seem necessarily * worthwhile. */ - if (PageIsAllVisible(page) && - visibilitymap_clear(relation, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN)) - cleared_all_frozen = true; + if (PageIsAllVisible(page)) + { + if (visibilitymap_clear_locked(relation, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN)) + cleared_all_frozen = true; + } MarkBufferDirty(buffer); @@ -3896,12 +3982,24 @@ l2: xlrec.flags = cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0; XLogRegisterData(&xlrec, SizeOfHeapLock); + + if (cleared_all_frozen) + XLogRegisterBuffer(HEAP_LOCK_BLKREF_VM, vmbuffer, 0); + recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); PageSetLSN(page, recptr); + + if (cleared_all_frozen) + PageSetLSN(BufferGetPage(vmbuffer), recptr); } END_CRIT_SECTION(); + /* release VM lock first, since it covers many heap blocks */ + if (unlock_vmbuffer) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + unlock_vmbuffer = false; + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); /* @@ -3987,6 +4085,8 @@ l2: heaptup = newtup; } + newpage = BufferGetPage(newbuf); + /* * We're about to do the actual update -- check for conflict first, to * avoid possibly having to roll back work we've just done. @@ -4050,6 +4150,69 @@ l2: id_has_external, &old_key_copied); + all_visible_cleared = PageIsAllVisible(page); + all_visible_cleared_new = newbuf != buffer && PageIsAllVisible(newpage); + + /* + * Clear PD_ALL_VISIBLE flags and reset visibility map bits for any heap + * pages that were all-visible. If there are two heap pages, we may need + * to clear VM bits for both. + */ + if (all_visible_cleared && all_visible_cleared_new && + vmbuffer_new == vmbuffer) + { + /* + * This is the more complicated case: both the new and old heap pages + * are all-visible and both their VM bits are on the same page of the + * VM, so we register a single VM buffer as HEAP_UPDATE_BLKREF_VM_NEW + * in the WAL record. We must be careful to only lock and register one + * buffer, even though we modify it twice -- once for each heap + * block's VM bits. + */ + LockBuffer(vmbuffer_new, BUFFER_LOCK_EXCLUSIVE); + unlock_vmbuffer_new = true; + + /* We will not lock or attempt to modify old VM buffer */ + } + else + { + /* + * In all the remaining cases, we will clear at most one heap block's + * VM bits per VM page. + */ + Buffer vmbuffers[2] = { + all_visible_cleared ? vmbuffer : InvalidBuffer, + all_visible_cleared_new ? vmbuffer_new : InvalidBuffer + }; + + /* + * When both pages need different VM pages cleared, acquire the VM + * buffer locks in VM block order to avoid deadlocks between backends + * updating tuples in opposite directions across VM pages. + */ + if (all_visible_cleared && all_visible_cleared_new && + BufferGetBlockNumber(vmbuffers[0]) > BufferGetBlockNumber(vmbuffers[1])) + { + Buffer swap = vmbuffers[0]; + + vmbuffers[0] = vmbuffers[1]; + vmbuffers[1] = swap; + } + + Assert((!BufferIsValid(vmbuffers[0]) && !BufferIsValid(vmbuffers[1])) || + vmbuffers[0] != vmbuffers[1]); + + if (BufferIsValid(vmbuffers[0])) + LockBuffer(vmbuffers[0], BUFFER_LOCK_EXCLUSIVE); + if (BufferIsValid(vmbuffers[1])) + LockBuffer(vmbuffers[1], BUFFER_LOCK_EXCLUSIVE); + + if (all_visible_cleared) + unlock_vmbuffer = true; + if (all_visible_cleared_new) + unlock_vmbuffer_new = true; + } + /* NO EREPORT(ERROR) from here till changes are logged */ START_CRIT_SECTION(); @@ -4086,6 +4249,34 @@ l2: RelationPutHeapTuple(relation, newbuf, heaptup, false); /* insert new tuple */ + if (all_visible_cleared) + { + if (visibilitymap_clear_locked(relation, block, + vmbuffer, VISIBILITYMAP_VALID_BITS)) + { + /* + * If both heap block's VM bits are on the same VM buffer, + * vmbuffer_new and vmbuffer will be equal. In that case only + * vmbuffer_new is registered in the WAL record (see + * log_heap_update), so attribute any change to the old heap + * block's VM bits to vmbuffer_new as well. + */ + if (vmbuffer == vmbuffer_new) + vmbuffer_new_modified = true; + else + vmbuffer_modified = true; + } + + PageClearAllVisible(page); + } + if (all_visible_cleared_new) + { + if (visibilitymap_clear_locked(relation, BufferGetBlockNumber(newbuf), + vmbuffer_new, VISIBILITYMAP_VALID_BITS)) + vmbuffer_new_modified = true; + + PageClearAllVisible(newpage); + } /* Clear obsolete visibility flags, possibly set by ourselves above... */ oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED); @@ -4100,21 +4291,6 @@ l2: /* record address of new tuple in t_ctid of old one */ oldtup.t_data->t_ctid = heaptup->t_self; - /* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */ - if (PageIsAllVisible(BufferGetPage(buffer))) - { - all_visible_cleared = true; - PageClearAllVisible(BufferGetPage(buffer)); - visibilitymap_clear(relation, BufferGetBlockNumber(buffer), - vmbuffer, VISIBILITYMAP_VALID_BITS); - } - if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf))) - { - all_visible_cleared_new = true; - PageClearAllVisible(BufferGetPage(newbuf)); - visibilitymap_clear(relation, BufferGetBlockNumber(newbuf), - vmbuffer_new, VISIBILITYMAP_VALID_BITS); - } if (newbuf != buffer) MarkBufferDirty(newbuf); @@ -4136,19 +4312,30 @@ l2: } recptr = log_heap_update(relation, buffer, - newbuf, &oldtup, heaptup, + vmbuffer_modified ? vmbuffer : InvalidBuffer, + newbuf, + vmbuffer_new_modified ? vmbuffer_new : InvalidBuffer, + &oldtup, heaptup, old_key_tuple, all_visible_cleared, all_visible_cleared_new); if (newbuf != buffer) - { - PageSetLSN(BufferGetPage(newbuf), recptr); - } + PageSetLSN(newpage, recptr); PageSetLSN(BufferGetPage(buffer), recptr); + + if (vmbuffer_modified) + PageSetLSN(BufferGetPage(vmbuffer), recptr); + if (vmbuffer_new_modified) + PageSetLSN(BufferGetPage(vmbuffer_new), recptr); } END_CRIT_SECTION(); + if (unlock_vmbuffer) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + if (unlock_vmbuffer_new) + LockBuffer(vmbuffer_new, BUFFER_LOCK_UNLOCK); + if (newbuf != buffer) LockBuffer(newbuf, BUFFER_LOCK_UNLOCK); LockBuffer(buffer, BUFFER_LOCK_UNLOCK); @@ -4586,6 +4773,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple, ItemId lp; Page page; Buffer vmbuffer = InvalidBuffer; + bool unlock_vmbuffer = false; BlockNumber block; TransactionId xid, xmax; @@ -5166,6 +5354,13 @@ failed: GetCurrentTransactionId(), mode, false, &xid, &new_infomask, &new_infomask2); + /* Lock VM buffer before entering critical section */ + if (PageIsAllVisible(page)) + { + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + unlock_vmbuffer = true; + } + START_CRIT_SECTION(); /* @@ -5197,11 +5392,12 @@ failed: tuple->t_data->t_ctid = *tid; /* Clear only the all-frozen bit on visibility map if needed */ - if (PageIsAllVisible(page) && - visibilitymap_clear(relation, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN)) - cleared_all_frozen = true; - + if (PageIsAllVisible(page)) + { + if (visibilitymap_clear_locked(relation, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN)) + cleared_all_frozen = true; + } MarkBufferDirty(*buffer); @@ -5232,15 +5428,25 @@ failed: xlrec.flags = cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0; XLogRegisterData(&xlrec, SizeOfHeapLock); + if (cleared_all_frozen) + XLogRegisterBuffer(HEAP_LOCK_BLKREF_VM, vmbuffer, 0); + /* we don't decode row locks atm, so no need to log the origin */ recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); PageSetLSN(page, recptr); + + if (cleared_all_frozen) + PageSetLSN(BufferGetPage(vmbuffer), recptr); } END_CRIT_SECTION(); + /* release VM lock first, since it covers many heap blocks */ + if (unlock_vmbuffer) + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + result = TM_Ok; out_locked: @@ -5707,6 +5913,7 @@ heap_lock_updated_tuple_rec(Relation rel, TransactionId priorXmax, ItemPointerData tupid; HeapTupleData mytup; Buffer buf; + Page page; uint16 new_infomask, new_infomask2, old_infomask, @@ -5716,6 +5923,7 @@ heap_lock_updated_tuple_rec(Relation rel, TransactionId priorXmax, bool cleared_all_frozen = false; bool pinned_desired_page; Buffer vmbuffer = InvalidBuffer; + bool unlock_vmbuffer = false; BlockNumber block; ItemPointerCopy(tid, &tupid); @@ -5724,6 +5932,7 @@ heap_lock_updated_tuple_rec(Relation rel, TransactionId priorXmax, { new_infomask = 0; new_xmax = InvalidTransactionId; + cleared_all_frozen = false; block = ItemPointerGetBlockNumber(&tupid); ItemPointerCopy(&tupid, &(mytup.t_self)); @@ -5743,13 +5952,15 @@ heap_lock_updated_tuple_rec(Relation rel, TransactionId priorXmax, l4: CHECK_FOR_INTERRUPTS(); + page = BufferGetPage(buf); + /* * Before locking the buffer, pin the visibility map page if it * appears to be necessary. Since we haven't got the lock yet, * someone else might be in the middle of changing this, so we'll need * to recheck after we have the lock. */ - if (PageIsAllVisible(BufferGetPage(buf))) + if (PageIsAllVisible(page)) { visibilitymap_pin(rel, block, &vmbuffer); pinned_desired_page = true; @@ -5770,7 +5981,7 @@ l4: * this page. If this page isn't all-visible, we won't use the vm * page, but we hold onto such a pin till the end of the function. */ - if (!pinned_desired_page && PageIsAllVisible(BufferGetPage(buf))) + if (!pinned_desired_page && PageIsAllVisible(page)) { LockBuffer(buf, BUFFER_LOCK_UNLOCK); visibilitymap_pin(rel, block, &vmbuffer); @@ -5951,10 +6162,11 @@ l4: xid, mode, false, &new_xmax, &new_infomask, &new_infomask2); - if (PageIsAllVisible(BufferGetPage(buf)) && - visibilitymap_clear(rel, block, vmbuffer, - VISIBILITYMAP_ALL_FROZEN)) - cleared_all_frozen = true; + if (PageIsAllVisible(page)) + { + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + unlock_vmbuffer = true; + } START_CRIT_SECTION(); @@ -5967,12 +6179,18 @@ l4: MarkBufferDirty(buf); + if (PageIsAllVisible(page)) + { + if (visibilitymap_clear_locked(rel, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN)) + cleared_all_frozen = true; + } + /* XLOG stuff */ if (RelationNeedsWAL(rel)) { xl_heap_lock_updated xlrec; XLogRecPtr recptr; - Page page = BufferGetPage(buf); XLogBeginInsert(); XLogRegisterBuffer(HEAP_LOCK_BLKREF_HEAP, buf, REGBUF_STANDARD); @@ -5985,13 +6203,26 @@ l4: XLogRegisterData(&xlrec, SizeOfHeapLockUpdated); + if (cleared_all_frozen) + XLogRegisterBuffer(HEAP_LOCK_BLKREF_VM, vmbuffer, 0); + recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_LOCK_UPDATED); PageSetLSN(page, recptr); + + if (cleared_all_frozen) + PageSetLSN(BufferGetPage(vmbuffer), recptr); } END_CRIT_SECTION(); + /* release VM lock first, since it covers many heap blocks */ + if (unlock_vmbuffer) + { + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK); + unlock_vmbuffer = false; + } + next: /* if we find the end of update chain, we're done. */ if (mytup.t_data->t_infomask & HEAP_XMAX_INVALID || @@ -8848,8 +9079,9 @@ log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer, * have modified the buffer(s) and marked them dirty. */ static XLogRecPtr -log_heap_update(Relation reln, Buffer oldbuf, - Buffer newbuf, HeapTuple oldtup, HeapTuple newtup, +log_heap_update(Relation reln, Buffer oldbuf, Buffer vmbuffer_old, + Buffer newbuf, Buffer vmbuffer_new, + HeapTuple oldtup, HeapTuple newtup, HeapTuple old_key_tuple, bool all_visible_cleared, bool new_all_visible_cleared) { @@ -9058,6 +9290,20 @@ log_heap_update(Relation reln, Buffer oldbuf, old_key_tuple->t_len - SizeofHeapTupleHeader); } + /* + * Register VM buffers. If the old and new heap pages' VM bits are on the + * same VM page, the caller passes only vmbuffer_new (mirroring the heap + * page convention where block 0 = new is always registered). + */ + Assert((BufferIsInvalid(vmbuffer_old) && BufferIsInvalid(vmbuffer_new)) || + (vmbuffer_old != vmbuffer_new)); + + if (BufferIsValid(vmbuffer_new)) + XLogRegisterBuffer(HEAP_UPDATE_BLKREF_VM_NEW, vmbuffer_new, 0); + + if (BufferIsValid(vmbuffer_old)) + XLogRegisterBuffer(HEAP_UPDATE_BLKREF_VM_OLD, vmbuffer_old, 0); + /* filtering by origin on a row level is much more efficient */ XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN); diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c index 6cc3bc991c3..1b546268a20 100644 --- a/src/backend/access/heap/heapam_xlog.c +++ b/src/backend/access/heap/heapam_xlog.c @@ -22,6 +22,51 @@ #include "storage/freespace.h" #include "storage/standby.h" +/* + * Helper to clear visibility map bits during heap redo. + * + * This handles records that modify one heap block and its corresponding VM + * block. More complex cases, such as update records that can touch multiple + * heap or VM blocks, must handle VM replay directly. 'flags' specifies which + * visibility map bits to clear. + */ +static void +heap_xlog_vm_clear(XLogReaderState *record, + XLogRecPtr lsn, RelFileLocator target_locator, + BlockNumber heap_blkno, + int vm_block_id, uint8 flags) +{ + Relation reln = CreateFakeRelcacheEntry(target_locator); + Buffer vmbuffer = InvalidBuffer; + + /* + * If this record includes the VM block, use it for redo. Older releases + * did not register the VM buffer when clearing the VM, so keep the + * fallback path to support replay of WAL generated before that fix. + */ + if (XLogRecHasBlockRef(record, vm_block_id)) + { + if (XLogReadBufferForRedo(record, vm_block_id, + &vmbuffer) == BLK_NEEDS_REDO) + { + visibilitymap_clear_locked(reln, + heap_blkno, vmbuffer, + flags); + PageSetLSN(BufferGetPage(vmbuffer), lsn); + } + if (BufferIsValid(vmbuffer)) + UnlockReleaseBuffer(vmbuffer); + } + else + { + visibilitymap_pin(reln, heap_blkno, &vmbuffer); + visibilitymap_clear(reln, heap_blkno, vmbuffer, flags); + ReleaseBuffer(vmbuffer); + } + + FreeFakeRelcacheEntry(reln); +} + /* * Replay XLOG_HEAP2_PRUNE_* records. @@ -360,15 +405,9 @@ heap_xlog_delete(XLogReaderState *record) * already up-to-date. */ if (xlrec->flags & XLH_DELETE_ALL_VISIBLE_CLEARED) - { - Relation reln = CreateFakeRelcacheEntry(target_locator); - Buffer vmbuffer = InvalidBuffer; - - visibilitymap_pin(reln, blkno, &vmbuffer); - visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS); - ReleaseBuffer(vmbuffer); - FreeFakeRelcacheEntry(reln); - } + heap_xlog_vm_clear(record, lsn, target_locator, + blkno, HEAP_DELETE_BLKREF_VM, + VISIBILITYMAP_VALID_BITS); if (XLogReadBufferForRedo(record, HEAP_DELETE_BLKREF_HEAP, &buffer) == BLK_NEEDS_REDO) @@ -449,15 +488,9 @@ heap_xlog_insert(XLogReaderState *record) * already up-to-date. */ if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) - { - Relation reln = CreateFakeRelcacheEntry(target_locator); - Buffer vmbuffer = InvalidBuffer; - - visibilitymap_pin(reln, blkno, &vmbuffer); - visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS); - ReleaseBuffer(vmbuffer); - FreeFakeRelcacheEntry(reln); - } + heap_xlog_vm_clear(record, lsn, target_locator, + blkno, HEAP_INSERT_BLKREF_VM, + VISIBILITYMAP_VALID_BITS); /* * If we inserted the first and only tuple on the page, re-initialize the @@ -571,19 +604,19 @@ heap_xlog_multi_insert(XLogReaderState *record) (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET))); /* - * The visibility map may need to be fixed even if the heap page is - * already up-to-date. + * If required, clear the VM first, to prevent all-visible temporarily + * being set for a heap page that's not all visible anymore. + * + * If it were possible for XLH_INSERT_ALL_VISIBLE_CLEARED and + * XLH_INSERT_ALL_FROZEN_SET to be present in the same record, doing the + * XLogReadBufferForRedo() before the PageSetAllVisible() below would be a + * problem, as it'd violate the rule that a heap page must never be set + * all-visible in the VM while its PD_ALL_VISIBLE is clear. */ if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) - { - Relation reln = CreateFakeRelcacheEntry(rlocator); - Buffer vmbuffer = InvalidBuffer; - - visibilitymap_pin(reln, blkno, &vmbuffer); - visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS); - ReleaseBuffer(vmbuffer); - FreeFakeRelcacheEntry(reln); - } + heap_xlog_vm_clear(record, lsn, rlocator, + blkno, HEAP_MULTI_INSERT_BLKREF_VM, + VISIBILITYMAP_VALID_BITS); if (isinit) { @@ -698,6 +731,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) Buffer obuffer, nbuffer; Page page; + bool new_cleared, + old_cleared; OffsetNumber offnum; ItemId lp = NULL; HeapTupleData oldtup; @@ -737,14 +772,85 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) * The visibility map may need to be fixed even if the heap page is * already up-to-date. */ - if (xlrec->flags & XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED) + new_cleared = (xlrec->flags & XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED) != 0; + old_cleared = (xlrec->flags & XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED) != 0; + if (new_cleared || old_cleared) { Relation reln = CreateFakeRelcacheEntry(rlocator); - Buffer vmbuffer = InvalidBuffer; + bool has_vm_old = XLogRecHasBlockRef(record, HEAP_UPDATE_BLKREF_VM_OLD); + bool has_vm_new = XLogRecHasBlockRef(record, HEAP_UPDATE_BLKREF_VM_NEW); + + if (has_vm_new) + { + Buffer vmbuffer_new = InvalidBuffer; + + Assert(new_cleared); + if (XLogReadBufferForRedo(record, HEAP_UPDATE_BLKREF_VM_NEW, &vmbuffer_new) == + BLK_NEEDS_REDO) + { + /* + * If both the old and new heap pages were all-visible and + * their VM bits are on the same VM page, that single VM page + * is registered as HEAP_UPDATE_BLKREF_VM_NEW. Clear both heap + * blocks' VM bits from the single provided VM buffer. + * + * We must verify that oldblk's VM bits really are on this VM + * page, rather than relying on the absence of a separate + * VM_OLD block reference: VM_OLD is also omitted when oldblk + * is on a different VM page but its bit was already clear. + */ + if (old_cleared && visibilitymap_pin_ok(oldblk, vmbuffer_new)) + visibilitymap_clear_locked(reln, oldblk, vmbuffer_new, + VISIBILITYMAP_VALID_BITS); + visibilitymap_clear_locked(reln, newblk, vmbuffer_new, + VISIBILITYMAP_VALID_BITS); + PageSetLSN(BufferGetPage(vmbuffer_new), lsn); + } + if (BufferIsValid(vmbuffer_new)) + UnlockReleaseBuffer(vmbuffer_new); + } + if (has_vm_old) + { + Buffer vmbuffer_old = InvalidBuffer; + + Assert(old_cleared); + if (XLogReadBufferForRedo(record, HEAP_UPDATE_BLKREF_VM_OLD, &vmbuffer_old) == + BLK_NEEDS_REDO) + { + visibilitymap_clear_locked(reln, oldblk, vmbuffer_old, + VISIBILITYMAP_VALID_BITS); + PageSetLSN(BufferGetPage(vmbuffer_old), lsn); + } + if (BufferIsValid(vmbuffer_old)) + UnlockReleaseBuffer(vmbuffer_old); + } + if (!has_vm_old && !has_vm_new) + { + /* + * Backwards compatibility path. Previously, the VM buffers were + * not registered in the WAL record. We need this path to replay + * WAL generated by a not-yet-patched primary during upgrade. + */ + if (old_cleared) + { + Buffer vmbuffer = InvalidBuffer; + + visibilitymap_pin(reln, oldblk, &vmbuffer); + visibilitymap_clear(reln, oldblk, vmbuffer, + VISIBILITYMAP_VALID_BITS); + ReleaseBuffer(vmbuffer); + } + if (new_cleared) + { + Buffer vmbuffer = InvalidBuffer; + + visibilitymap_pin(reln, newblk, &vmbuffer); + visibilitymap_clear(reln, newblk, vmbuffer, + VISIBILITYMAP_VALID_BITS); + ReleaseBuffer(vmbuffer); + } + } - visibilitymap_pin(reln, oldblk, &vmbuffer); - visibilitymap_clear(reln, oldblk, vmbuffer, VISIBILITYMAP_VALID_BITS); - ReleaseBuffer(vmbuffer); FreeFakeRelcacheEntry(reln); } @@ -793,7 +899,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) /* Mark the page as a candidate for pruning */ PageSetPrunable(page, XLogRecGetXid(record)); - if (xlrec->flags & XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED) + if (old_cleared) PageClearAllVisible(page); PageSetLSN(page, lsn); @@ -819,21 +925,6 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) newaction = XLogReadBufferForRedo(record, HEAP_UPDATE_BLKREF_NEW, &nbuffer); - /* - * The visibility map may need to be fixed even if the heap page is - * already up-to-date. - */ - if (xlrec->flags & XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED) - { - Relation reln = CreateFakeRelcacheEntry(rlocator); - Buffer vmbuffer = InvalidBuffer; - - visibilitymap_pin(reln, newblk, &vmbuffer); - visibilitymap_clear(reln, newblk, vmbuffer, VISIBILITYMAP_VALID_BITS); - ReleaseBuffer(vmbuffer); - FreeFakeRelcacheEntry(reln); - } - /* Deal with new tuple */ if (newaction == BLK_NEEDS_REDO) { @@ -930,7 +1021,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) if (offnum == InvalidOffsetNumber) elog(PANIC, "failed to add tuple"); - if (xlrec->flags & XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED) + if (new_cleared) PageClearAllVisible(page); freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */ @@ -1022,20 +1113,15 @@ heap_xlog_lock(XLogReaderState *record) */ if (xlrec->flags & XLH_LOCK_ALL_FROZEN_CLEARED) { - RelFileLocator rlocator; - Buffer vmbuffer = InvalidBuffer; BlockNumber block; - Relation reln; + RelFileLocator rlocator; XLogRecGetBlockTag(record, HEAP_LOCK_BLKREF_HEAP, &rlocator, NULL, &block); - reln = CreateFakeRelcacheEntry(rlocator); - - visibilitymap_pin(reln, block, &vmbuffer); - visibilitymap_clear(reln, block, vmbuffer, VISIBILITYMAP_ALL_FROZEN); - ReleaseBuffer(vmbuffer); - FreeFakeRelcacheEntry(reln); + heap_xlog_vm_clear(record, lsn, rlocator, + block, HEAP_LOCK_BLKREF_VM, + VISIBILITYMAP_ALL_FROZEN); } if (XLogReadBufferForRedo(record, HEAP_LOCK_BLKREF_HEAP, @@ -1100,20 +1186,15 @@ heap_xlog_lock_updated(XLogReaderState *record) */ if (xlrec->flags & XLH_LOCK_ALL_FROZEN_CLEARED) { - RelFileLocator rlocator; - Buffer vmbuffer = InvalidBuffer; BlockNumber block; - Relation reln; + RelFileLocator rlocator; XLogRecGetBlockTag(record, HEAP_LOCK_BLKREF_HEAP, &rlocator, NULL, &block); - reln = CreateFakeRelcacheEntry(rlocator); - - visibilitymap_pin(reln, block, &vmbuffer); - visibilitymap_clear(reln, block, vmbuffer, VISIBILITYMAP_ALL_FROZEN); - ReleaseBuffer(vmbuffer); - FreeFakeRelcacheEntry(reln); + heap_xlog_vm_clear(record, lsn, rlocator, + block, HEAP_LOCK_BLKREF_VM, + VISIBILITYMAP_ALL_FROZEN); } if (XLogReadBufferForRedo(record, HEAP_LOCK_BLKREF_HEAP, diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c index 1874a3fda37..f02778ba58f 100644 --- a/src/backend/access/heap/visibilitymap.c +++ b/src/backend/access/heap/visibilitymap.c @@ -135,9 +135,27 @@ static Buffer vm_extend(Relation rel, BlockNumber vm_nblocks); * You must pass a buffer containing the correct map page to this function. * Call visibilitymap_pin first to pin the right one. This function doesn't do * any I/O. Returns true if any bits have been cleared and false otherwise. + * + * This is retained for backwards compatability, as it is usually necessary to + * register the VM buffer in the WAL record and this necessitates holding the + * lock for longer than it is held here. */ bool visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags) +{ + bool cleared = false; + + LockBuffer(vmbuf, BUFFER_LOCK_EXCLUSIVE); + + cleared = visibilitymap_clear_locked(rel, heapBlk, vmbuf, flags); + + LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK); + + return cleared; +} + +bool +visibilitymap_clear_locked(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags) { BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); int mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); @@ -157,7 +175,8 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags if (!BufferIsValid(vmbuf) || BufferGetBlockNumber(vmbuf) != mapBlock) elog(ERROR, "wrong buffer passed to visibilitymap_clear"); - LockBuffer(vmbuf, BUFFER_LOCK_EXCLUSIVE); + Assert(BufferIsExclusiveLocked(vmbuf)); + map = PageGetContents(BufferGetPage(vmbuf)); if (map[mapByte] & mask) @@ -168,8 +187,6 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags cleared = true; } - LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK); - return cleared; } diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl index 0f98c7df82e..fcb701e1b98 100644 --- a/src/bin/pg_walsummary/t/002_blocks.pl +++ b/src/bin/pg_walsummary/t/002_blocks.pl @@ -93,13 +93,14 @@ my $filename = sprintf "%s/pg_wal/summaries/%08s%08s%08s%08s%08s.summary", split(m@/@, $end_lsn); ok(-f $filename, "WAL summary file exists"); -# Run pg_walsummary on it. We expect exactly two blocks to be modified, -# block 0 and one other. +# Run pg_walsummary on it. We expect exactly three blocks to be modified, +# block 0 (old tuple), another block (new tuple), and the block for the VM. my ($stdout, $stderr) = run_command([ 'pg_walsummary', '-i', $filename ]); note($stdout); @lines = split(/\n/, $stdout); like($stdout, qr/FORK main: block 0$/m, "stdout shows block 0 modified"); +like($stdout, qr/FORK vm: block 0$/m, "stdout shows VM block 0 modified"); is($stderr, '', 'stderr is empty'); -is(0 + @lines, 2, "UPDATE modified 2 blocks"); +is(0 + @lines, 3, "UPDATE modified 3 blocks"); done_testing(); diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index 55e3c7b0015..63062424dad 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -225,10 +225,21 @@ typedef struct xl_multi_insert_tuple * * HEAP_UPDATE_BLKREF_OLD: old page, if different. (no data, just a reference * to the block) + * + * HEAP_UPDATE_BLKREF_VM_NEW: VM page covering the new heap page. Registered + * when XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED is set and the new heap page's VM + * bit was actually cleared. Also covers the old heap page's VM bits when both + * heap pages map to the same VM page. + * + * HEAP_UPDATE_BLKREF_VM_OLD: VM page covering the old heap page. Only + * registered when XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED is set, the old heap + * page's VM bits are on a different VM page from the new heap page's, and the + * old heap page's VM bit was actually cleared. */ - -#define HEAP_UPDATE_BLKREF_NEW 0 -#define HEAP_UPDATE_BLKREF_OLD 1 +#define HEAP_UPDATE_BLKREF_NEW 0 +#define HEAP_UPDATE_BLKREF_OLD 1 +#define HEAP_UPDATE_BLKREF_VM_NEW 2 +#define HEAP_UPDATE_BLKREF_VM_OLD 3 typedef struct xl_heap_update { diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h index ea889bf9ec7..1bc59c9ac33 100644 --- a/src/include/access/visibilitymap.h +++ b/src/include/access/visibilitymap.h @@ -26,6 +26,8 @@ #define VM_ALL_FROZEN(r, b, v) \ ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0) +extern bool visibilitymap_clear_locked(Relation rel, BlockNumber heapBlk, + Buffer vmbuf, uint8 flags); extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer vmbuf, uint8 flags); extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk, -- 2.43.0
From cb8ffb20e629eb3d5c4234a02863abaefa0f21f7 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <[email protected]> Date: Thu, 30 Apr 2026 16:06:06 -0400 Subject: [PATCH v3 3/4] Put pg_combinebackup debug test output in temp file Having --debug as the default option to pg_combinebackup in init_from_backup() makes the logs extremely verbose. It is used for debugging failures, so put the output in a temp file and only keep it around if tests fail. Author: Melanie Plageman <[email protected]> Disussion: https://postgr.es/m/y73coty2smkxgvelh4e32kqrt5pyn2pb7cz7x4mrz6yaxitqd5%40nzhkeazqf4bn Backpatch-through: 17 --- src/test/perl/PostgreSQL/Test/Cluster.pm | 83 +++++++++++++++++++++++- 1 file changed, 81 insertions(+), 2 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 2e31c23a6ac..3388e45b726 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -125,6 +125,8 @@ our $min_compat = 12; # list of file reservations made by get_free_port my @port_reservation_files; +my @combinebackup_debug_logs; +my %combinebackup_debug_logs_printed; # We want to choose a server port above the range that servers typically use # on Unix systems and below the range those systems typically use for ephemeral @@ -898,7 +900,9 @@ pathnames that should be used for the target cluster. To restore from an incremental backup, pass the parameter combine_with_prior as a reference to an array of prior backup names with which this backup -is to be combined using pg_combinebackup. +is to be combined using pg_combinebackup. pg_combinebackup is run with +--debug, but the output is captured in a temporary file and only copied to the +test log if pg_combinebackup fails or if a later test fails. Streaming replication can be enabled on this node by passing the keyword parameter has_streaming => 1. This is disabled by default. @@ -964,7 +968,7 @@ sub init_from_backup } push @combineargs, @prior_backup_path, $backup_path, '--output' => $data_path; - PostgreSQL::Test::Utils::system_or_bail(@combineargs); + _run_pg_combinebackup_with_debug_log(@combineargs); } elsif (defined $params{tar_program}) { @@ -1079,6 +1083,81 @@ port = $port return; } +sub _run_pg_combinebackup_with_debug_log +{ + my (@cmd) = @_; + my $debug_log = PostgreSQL::Test::Utils::tempdir('pg_combinebackup') . + '/debug.log'; + + push @combinebackup_debug_logs, $debug_log; + + print("# Running: " . join(" ", @cmd) . "\n"); + # If this pg_combinebackup invocation itself fails, print its captured + # output immediately. Successful invocations may still be printed later if + # some subsequent TAP assertion in this test script fails. + if (!IPC::Run::run(\@cmd, '&>' => $debug_log)) + { + _print_pg_combinebackup_debug_log($debug_log); + + if ($? == -1) + { + BAIL_OUT( + sprintf( + "failed to execute command \"%s\": $!", join(" ", @cmd))); + } + elsif ($? & 127) + { + BAIL_OUT( + sprintf( + "command \"%s\" died with signal %d", + join(" ", @cmd), $? & 127)); + } + else + { + BAIL_OUT( + sprintf( + "command \"%s\" exited with value %d", + join(" ", @cmd), $? >> 8)); + } + } +} + +sub _print_pg_combinebackup_debug_log +{ + my ($debug_log) = @_; + + return unless -e $debug_log; + return if $combinebackup_debug_logs_printed{$debug_log}++; + + diag("pg_combinebackup debug output from $debug_log:\n" . + PostgreSQL::Test::Utils::slurp_file($debug_log)); +} + +# Keep pg_combinebackup debug logs until test exit. If pg_combinebackup itself +# fails, its log is printed at the failure site above. If pg_combinebackup +# succeeds but a later TAP assertion fails, print the retained logs here so the +# regress output includes the combine step that produced the restored state. +# Remove the logs after a clean run. +END +{ + # take care not to change the script's exit value + my $exit_code = $?; + + if ($exit_code == 0 && PostgreSQL::Test::Utils::all_tests_passing()) + { + unlink @combinebackup_debug_logs; + } + else + { + foreach my $debug_log (@combinebackup_debug_logs) + { + _print_pg_combinebackup_debug_log($debug_log); + } + } + + $? = $exit_code; +} + =pod =item $node->rotate_logfile() -- 2.43.0
From dd9734f313a4c1410e07a14f0861f53740cec095 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <[email protected]> Date: Thu, 23 Apr 2026 11:13:23 -0400 Subject: [PATCH v3 4/4] Test that VM clear registers VM buffers WAL summarizer only includes registered buffers, so you could end up with data corruption in incremental backups if VM clears were meant to be covered by the incremental backup. Author: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/oqcsevg35xjan2327x5kdfth6q4fgeqboxfo3v3imeyih2uiny%406sez5dzxl6nt Backpatch-through: 17 --- src/bin/pg_combinebackup/meson.build | 1 + .../pg_combinebackup/t/012_vm_consistency.pl | 205 ++++++++++++++++++ 2 files changed, 206 insertions(+) create mode 100644 src/bin/pg_combinebackup/t/012_vm_consistency.pl diff --git a/src/bin/pg_combinebackup/meson.build b/src/bin/pg_combinebackup/meson.build index bbc4c5735ba..757b78e2fe8 100644 --- a/src/bin/pg_combinebackup/meson.build +++ b/src/bin/pg_combinebackup/meson.build @@ -39,6 +39,7 @@ tests += { 't/009_no_full_file.pl', 't/010_hardlink.pl', 't/011_ib_truncation.pl', + 't/012_vm_consistency.pl', ], } } diff --git a/src/bin/pg_combinebackup/t/012_vm_consistency.pl b/src/bin/pg_combinebackup/t/012_vm_consistency.pl new file mode 100644 index 00000000000..7b9c577d19a --- /dev/null +++ b/src/bin/pg_combinebackup/t/012_vm_consistency.pl @@ -0,0 +1,205 @@ +# Copyright (c) 2021-2026, PostgreSQL Global Development Group +# +# Test that heap operations clearing visibility map bits (INSERT, UPDATE, +# DELETE, SELECT FOR UPDATE, COPY) correctly register visibility map buffers, +# since incremental backups rely on the WAL summarizer, which only tracks +# registered buffers. + +use strict; +use warnings FATAL => 'all'; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +my $tempdir = PostgreSQL::Test::Utils::tempdir_short(); +my $mode = $ENV{PG_TEST_PG_COMBINEBACKUP_MODE} || '--copy'; + +# Set up primary with WAL summarization enabled. +my $primary = PostgreSQL::Test::Cluster->new('primary'); +$primary->init(has_archiving => 1, allows_streaming => 1); +$primary->append_conf('postgresql.conf', <<EOF); +summarize_wal = on +autovacuum = off +EOF +$primary->start; + +$primary->safe_psql('postgres', q{CREATE EXTENSION pg_visibility}); + +my @tests = ( + { + label => 'INSERT', + table => 'vm_insert_test', + setup => q{CREATE TABLE vm_insert_test (id int); + INSERT INTO vm_insert_test DEFAULT VALUES;}, + modify => q{INSERT INTO vm_insert_test VALUES (1)}, + visible_op => '<', + frozen_op => '<', + }, + { + label => 'DELETE', + table => 'vm_delete_test', + setup => q{CREATE TABLE vm_delete_test (id int); + INSERT INTO vm_delete_test VALUES (1), (2);}, + modify => q{DELETE FROM vm_delete_test WHERE id = 1}, + visible_op => '<', + frozen_op => '<', + }, + { + label => 'UPDATE', + table => 'vm_update_test', + # Include both same-page and cross-page updates. + setup => q{CREATE TABLE vm_update_test (id INT, val TEXT); + INSERT INTO vm_update_test VALUES (1, 'same page'), (2, 'cross page'); + INSERT INTO vm_update_test SELECT i, repeat('a', 200) + FROM generate_series(3, 70) i;}, + modify => q{UPDATE vm_update_test SET id = 0 WHERE id = 1; + UPDATE vm_update_test SET val = repeat('b', 8000) WHERE id = 2;}, + visible_op => '<', + frozen_op => '<', + }, + { + label => 'LOCK', + table => 'vm_lock_test', + setup => q{CREATE TABLE vm_lock_test (id int); + INSERT INTO vm_lock_test VALUES (1), (2);}, + modify => q{SELECT * FROM vm_lock_test WHERE id = 1 FOR UPDATE}, + visible_op => '==', + frozen_op => '<', + }, + { + label => 'COPY', + table => 'vm_copy_test', + setup => q{CREATE TABLE vm_copy_test (id int); + INSERT INTO vm_copy_test DEFAULT VALUES;}, + modify => q{COPY vm_copy_test FROM PROGRAM 'echo 42'}, + visible_op => '<', + frozen_op => '<', + }, +); + +sub get_vm_summary +{ + my ($node, $table) = @_; + my $result = $node->safe_psql('postgres', + "SELECT all_visible, all_frozen FROM pg_visibility_map_summary('$table')"); + my @vals = split(/\|/, $result); + return @vals; +} + +# Confirm VACUUM (FREEZE) set VM bits before testing whether later heap +# modifications clear those bits and are captured by incremental backup. +sub check_vacuumed_vm +{ + my ($node, $test) = @_; + my ($all_visible, $all_frozen) = get_vm_summary($node, $test->{table}); + + cmp_ok($all_visible, '>', 0, + "$test->{label} test: pages are all-visible after vacuum"); + cmp_ok($all_frozen, '>', 0, + "$test->{label} test: pages are all-frozen after vacuum"); + + return ($all_visible, $all_frozen); +} + +# Check the VM bit counts after a heap modification against the post-vacuum +# baseline. Most operations clear both bits; tuple locking clears all-frozen +# without clearing all-visible. +sub check_modified_vm +{ + my ($node, $test) = @_; + my ($post_visible, $post_frozen) = get_vm_summary($node, $test->{table}); + + cmp_ok($post_visible, $test->{visible_op}, $test->{pre_visible}, + "$test->{label} test: all-visible state after modification"); + cmp_ok($post_frozen, $test->{frozen_op}, $test->{pre_frozen}, + "$test->{label} test: all-frozen state after modification"); + + return ($post_visible, $post_frozen); +} + +# Verify the combined backup restored VM state exactly as it exists on the +# primary, and ask pg_visibility to check that visible tuples are consistent. +sub validate_restored_vm +{ + my ($restored, $test) = @_; + + my ($primary_visible, $primary_frozen) = + get_vm_summary($primary, $test->{table}); + my ($restored_visible, $restored_frozen) = + get_vm_summary($restored, $test->{table}); + + is($restored_visible, $primary_visible, + "$test->{label} test: restored all_visible count matches primary"); + is($restored_frozen, $primary_frozen, + "$test->{label} test: restored all_frozen count matches primary"); + + my $corrupt_tids = $restored->safe_psql('postgres', + "SELECT count(*) FROM pg_check_visible('$test->{table}')"); + is($corrupt_tids, '0', + "$test->{label} test: no VM corruption detected by pg_check_visible"); +} + +# Create and populate the tables, then vacuum freeze them to set the VM bits +foreach my $test (@tests) +{ + $primary->safe_psql('postgres', $test->{setup}); + $primary->safe_psql('postgres', "VACUUM (FREEZE) $test->{table}"); + ($test->{pre_visible}, $test->{pre_frozen}) = + check_vacuumed_vm($primary, $test); +} + +# Take a full backup +my $full_name = 'full'; +my $full_path = $primary->backup_dir . "/$full_name"; +$primary->command_ok( + [ + 'pg_basebackup', '--no-sync', + '--pgdata' => $full_path, + '--checkpoint' => 'fast', + ], + 'full backup'); + +# Modify the tables and check that the VM bits are as expected for that test +# after the specified modification. +foreach my $test (@tests) +{ + $primary->safe_psql('postgres', $test->{modify}); + ($test->{post_visible}, $test->{post_frozen}) = + check_modified_vm($primary, $test); +} + +# Take an incremental backup. This will have the changes made in the +# modification step. +my $incr_name = 'incr'; +my $incr_path = $primary->backup_dir . "/$incr_name"; +$primary->command_ok( + [ + 'pg_basebackup', '--no-sync', + '--pgdata' => $incr_path, + '--checkpoint' => 'fast', + '--incremental' => $full_path . '/backup_manifest', + ], + 'incremental backup'); + +# Start a server from a combined backup composed of the incremental and full +# backup. +my $restored = PostgreSQL::Test::Cluster->new('restored'); +$restored->init_from_backup($primary, $incr_name, + combine_with_prior => [$full_name], + combine_mode => $mode); +$restored->append_conf('postgresql.conf', <<EOF); +autovacuum = off +EOF +$restored->start; +$restored->safe_psql('postgres', q{CREATE EXTENSION IF NOT EXISTS pg_visibility}); + +# Confirm that the restored server's visibility map matches the original server +foreach my $test (@tests) +{ + validate_restored_vm($restored, $test); +} + +$restored->stop; +$primary->stop; + +done_testing(); -- 2.43.0
