On 2016-07-16 10:45:26 -0700, Andres Freund wrote: > > > On July 16, 2016 8:49:06 AM PDT, Tom Lane <t...@sss.pgh.pa.us> wrote: > >Amit Kapila <amit.kapil...@gmail.com> writes: > >> On Sat, Jul 16, 2016 at 7:02 AM, Andres Freund <and...@anarazel.de> > >wrote: > >>> I think we have two choices how to deal with that: First, we can add > >a > >>> new flags variable to xl_heap_lock similar to > >>> xl_heap_insert/update/... and bump page magic, > > > >> +1 for going in this way. This will keep us consistent with how > >clear > >> the visibility info in other places like heap_xlog_update(). > > > >Yeah. We've already forced a catversion bump for beta3, and I'm about > >to go fix PG_CONTROL_VERSION as well, so there's basically no downside > >to doing an xlog version bump as well. At least, not if you can get it > >in before Monday. > > OK, Cool. Will do it later today.
Took till today. Attached is a rather heavily revised version of Sawada-san's patch. Most notably the recovery routines take care to reset the vm in all cases, we don't perform visibilitymap_get_status from inside a critical section anymore, and heap_lock_updated_tuple_rec() also resets the vm (although I'm not entirely sure that can practically be hit). I'm doing some more testing, and Robert said he could take a quick look at the patch. If somebody else... Will push sometime after dinner. Regards, Andres
>From 26f6eff8cef9b436e328a7364d6e4954b702208b Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Sun, 17 Jul 2016 19:30:38 -0700 Subject: [PATCH] Clear all-frozen visibilitymap status when locking tuples. Since a892234 & fd31cd265 the visibilitymap's freeze bit is used to avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so correctly relies on not adding xids to the heap without also unsetting the visibilitymap flag. Tuple locking related code has not done so. To allow selectively resetting all-frozen - to avoid pessimizing heap_lock_tuple - allow to selectively reset the all-frozen with visibilitymap_clear(). To avoid having to use visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical section, have visibilitymap_clear() return whether any bits have been reset. The added flags field fields to xl_heap_lock and xl_heap_lock_updated require bumping the WAL magic. Since there's already been a catversion bump since the last beta, that's not an issue. Author: Masahiko Sawada, heavily revised by Andres Freund Discussion: CAEepm=3fwabwryvw9swhylty4sxvf0xblvxqowuodincx9m...@mail.gmail.com Backpatch: - --- src/backend/access/heap/heapam.c | 126 +++++++++++++++++++++++++++++--- src/backend/access/heap/visibilitymap.c | 18 +++-- src/backend/access/rmgrdesc/heapdesc.c | 6 +- src/backend/commands/vacuumlazy.c | 6 +- src/include/access/heapam_xlog.h | 9 ++- src/include/access/visibilitymap.h | 4 +- src/include/access/xlog_internal.h | 2 +- 7 files changed, 145 insertions(+), 26 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 2815d91..1216f3f 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2423,7 +2423,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, PageClearAllVisible(BufferGetPage(buffer)); visibilitymap_clear(relation, ItemPointerGetBlockNumber(&(heaptup->t_self)), - vmbuffer); + vmbuffer, VISIBILITYMAP_VALID_BITS); } /* @@ -2737,7 +2737,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, PageClearAllVisible(page); visibilitymap_clear(relation, BufferGetBlockNumber(buffer), - vmbuffer); + vmbuffer, VISIBILITYMAP_VALID_BITS); } /* @@ -3239,7 +3239,7 @@ l1: all_visible_cleared = true; PageClearAllVisible(page); visibilitymap_clear(relation, BufferGetBlockNumber(buffer), - vmbuffer); + vmbuffer, VISIBILITYMAP_VALID_BITS); } /* store transaction information of xact deleting the tuple */ @@ -3925,6 +3925,7 @@ l2: TransactionId xmax_lock_old_tuple; uint16 infomask_lock_old_tuple, infomask2_lock_old_tuple; + bool cleared_all_frozen = false; /* * To prevent concurrent sessions from updating the tuple, we have to @@ -3968,6 +3969,17 @@ l2: /* temporarily make it look not-updated, but locked */ oldtup.t_data->t_ctid = oldtup.t_self; + /* + * Clear all-frozen bit on visibility map if needed. We could + * immediately reset ALL_VISIBLE, but given that the WAL logging + * overhead would be unchanged, that doesn't seem necessarily + * worthwhile. + */ + if (PageIsAllVisible(BufferGetPage(buffer)) && + visibilitymap_clear(relation, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN)) + cleared_all_frozen = true; + MarkBufferDirty(buffer); if (RelationNeedsWAL(relation)) @@ -3982,6 +3994,8 @@ l2: xlrec.locking_xid = xmax_lock_old_tuple; xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask, oldtup.t_data->t_infomask2); + xlrec.flags = + cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0; XLogRegisterData((char *) &xlrec, SizeOfHeapLock); recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); PageSetLSN(page, recptr); @@ -4159,20 +4173,20 @@ l2: /* record address of new tuple in t_ctid of old one */ oldtup.t_data->t_ctid = heaptup->t_self; - /* clear PD_ALL_VISIBLE flags */ + /* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */ if (PageIsAllVisible(BufferGetPage(buffer))) { all_visible_cleared = true; PageClearAllVisible(BufferGetPage(buffer)); visibilitymap_clear(relation, BufferGetBlockNumber(buffer), - vmbuffer); + vmbuffer, VISIBILITYMAP_VALID_BITS); } if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf))) { all_visible_cleared_new = true; PageClearAllVisible(BufferGetPage(newbuf)); visibilitymap_clear(relation, BufferGetBlockNumber(newbuf), - vmbuffer_new); + vmbuffer_new, VISIBILITYMAP_VALID_BITS); } if (newbuf != buffer) @@ -4556,6 +4570,8 @@ heap_lock_tuple(Relation relation, HeapTuple tuple, ItemPointer tid = &(tuple->t_self); ItemId lp; Page page; + Buffer vmbuffer = InvalidBuffer; + BlockNumber block; TransactionId xid, xmax; uint16 old_infomask, @@ -4563,8 +4579,18 @@ heap_lock_tuple(Relation relation, HeapTuple tuple, new_infomask2; bool first_time = true; bool have_tuple_lock = false; + bool cleared_all_frozen = false; *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid)); + block = ItemPointerGetBlockNumber(tid); + + /* + * Before locking the buffer, pin the visibility map page if it may be + * necessary. + */ + if (PageIsAllVisible(BufferGetPage(*buffer))) + visibilitymap_pin(relation, block, &vmbuffer); + LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); page = BufferGetPage(*buffer); @@ -5094,6 +5120,13 @@ failed: if (HEAP_XMAX_IS_LOCKED_ONLY(new_infomask)) tuple->t_data->t_ctid = *tid; + /* Clear only the all-frozen bit on visibility map if needed */ + if (PageIsAllVisible(page) && + visibilitymap_clear(relation, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN)) + cleared_all_frozen = true; + + MarkBufferDirty(*buffer); /* @@ -5120,6 +5153,7 @@ failed: xlrec.locking_xid = xid; xlrec.infobits_set = compute_infobits(new_infomask, tuple->t_data->t_infomask2); + xlrec.flags = cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0; XLogRegisterData((char *) &xlrec, SizeOfHeapLock); /* we don't decode row locks atm, so no need to log the origin */ @@ -5132,6 +5166,8 @@ failed: END_CRIT_SECTION(); LockBuffer(*buffer, BUFFER_LOCK_UNLOCK); + if (BufferIsValid(vmbuffer)) + ReleaseBuffer(vmbuffer); /* * Don't update the visibility map here. Locking a tuple doesn't change @@ -5587,6 +5623,9 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid, TransactionId xmax, new_xmax; TransactionId priorXmax = InvalidTransactionId; + bool cleared_all_frozen = false; + Buffer vmbuffer; + BlockNumber block; ItemPointerCopy(tid, &tupid); @@ -5594,6 +5633,7 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid, { new_infomask = 0; new_xmax = InvalidTransactionId; + block = ItemPointerGetBlockNumber(&tupid); ItemPointerCopy(&tupid, &(mytup.t_self)); if (!heap_fetch(rel, SnapshotAny, &mytup, &buf, false, NULL)) @@ -5610,6 +5650,16 @@ heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid, l4: CHECK_FOR_INTERRUPTS(); + + /* + * Before locking the buffer, pin the visibility map page if it may be + * necessary. + */ + if (PageIsAllVisible(BufferGetPage(buf))) + visibilitymap_pin(rel, block, &vmbuffer); + else + vmbuffer = InvalidBuffer; + LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); /* @@ -5749,6 +5799,11 @@ l4: xid, mode, false, &new_xmax, &new_infomask, &new_infomask2); + if (PageIsAllVisible(BufferGetPage(buf)) && + visibilitymap_clear(rel, block, vmbuffer, + VISIBILITYMAP_ALL_FROZEN)) + cleared_all_frozen = true; + START_CRIT_SECTION(); /* ... and set them */ @@ -5773,6 +5828,8 @@ l4: xlrec.offnum = ItemPointerGetOffsetNumber(&mytup.t_self); xlrec.xmax = new_xmax; xlrec.infobits_set = compute_infobits(new_infomask, new_infomask2); + xlrec.flags = + cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0; XLogRegisterData((char *) &xlrec, SizeOfHeapLockUpdated); @@ -5789,6 +5846,9 @@ l4: HeapTupleHeaderIsOnlyLocked(mytup.t_data)) { UnlockReleaseBuffer(buf); + if (vmbuffer != InvalidBuffer) + ReleaseBuffer(vmbuffer); + return HeapTupleMayBeUpdated; } @@ -5796,6 +5856,8 @@ l4: priorXmax = HeapTupleHeaderGetUpdateXid(mytup.t_data); ItemPointerCopy(&(mytup.t_data->t_ctid), &tupid); UnlockReleaseBuffer(buf); + if (vmbuffer != InvalidBuffer) + ReleaseBuffer(vmbuffer); } } @@ -8107,7 +8169,7 @@ heap_xlog_delete(XLogReaderState *record) Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, blkno, &vmbuffer); - visibilitymap_clear(reln, blkno, vmbuffer); + visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS); ReleaseBuffer(vmbuffer); FreeFakeRelcacheEntry(reln); } @@ -8185,7 +8247,7 @@ heap_xlog_insert(XLogReaderState *record) Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, blkno, &vmbuffer); - visibilitymap_clear(reln, blkno, vmbuffer); + visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS); ReleaseBuffer(vmbuffer); FreeFakeRelcacheEntry(reln); } @@ -8305,7 +8367,7 @@ heap_xlog_multi_insert(XLogReaderState *record) Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, blkno, &vmbuffer); - visibilitymap_clear(reln, blkno, vmbuffer); + visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS); ReleaseBuffer(vmbuffer); FreeFakeRelcacheEntry(reln); } @@ -8460,7 +8522,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, oldblk, &vmbuffer); - visibilitymap_clear(reln, oldblk, vmbuffer); + visibilitymap_clear(reln, oldblk, vmbuffer, VISIBILITYMAP_VALID_BITS); ReleaseBuffer(vmbuffer); FreeFakeRelcacheEntry(reln); } @@ -8544,7 +8606,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update) Buffer vmbuffer = InvalidBuffer; visibilitymap_pin(reln, newblk, &vmbuffer); - visibilitymap_clear(reln, newblk, vmbuffer); + visibilitymap_clear(reln, newblk, vmbuffer, VISIBILITYMAP_VALID_BITS); ReleaseBuffer(vmbuffer); FreeFakeRelcacheEntry(reln); } @@ -8724,6 +8786,27 @@ heap_xlog_lock(XLogReaderState *record) ItemId lp = NULL; HeapTupleHeader htup; + /* + * The visibility map may need to be fixed even if the heap page is + * already up-to-date. + */ + if (xlrec->flags & XLH_LOCK_ALL_FROZEN_CLEARED) + { + RelFileNode rnode; + Buffer vmbuffer = InvalidBuffer; + BlockNumber block; + Relation reln; + + XLogRecGetBlockTag(record, 0, &rnode, NULL, &block); + reln = CreateFakeRelcacheEntry(rnode); + + visibilitymap_pin(reln, block, &vmbuffer); + visibilitymap_clear(reln, block, vmbuffer, VISIBILITYMAP_ALL_FROZEN); + + ReleaseBuffer(vmbuffer); + FreeFakeRelcacheEntry(reln); + } + if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO) { page = (Page) BufferGetPage(buffer); @@ -8776,6 +8859,27 @@ heap_xlog_lock_updated(XLogReaderState *record) xlrec = (xl_heap_lock_updated *) XLogRecGetData(record); + /* + * The visibility map may need to be fixed even if the heap page is + * already up-to-date. + */ + if (xlrec->flags & XLH_LOCK_ALL_FROZEN_CLEARED) + { + RelFileNode rnode; + Buffer vmbuffer = InvalidBuffer; + BlockNumber block; + Relation reln; + + XLogRecGetBlockTag(record, 0, &rnode, NULL, &block); + reln = CreateFakeRelcacheEntry(rnode); + + visibilitymap_pin(reln, block, &vmbuffer); + visibilitymap_clear(reln, block, vmbuffer, VISIBILITYMAP_ALL_FROZEN); + + ReleaseBuffer(vmbuffer); + FreeFakeRelcacheEntry(reln); + } + if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO) { page = BufferGetPage(buffer); diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c index b472d31..b60d8e4 100644 --- a/src/backend/access/heap/visibilitymap.c +++ b/src/backend/access/heap/visibilitymap.c @@ -11,7 +11,7 @@ * src/backend/access/heap/visibilitymap.c * * INTERFACE ROUTINES - * visibilitymap_clear - clear a bit in the visibility map + * visibilitymap_clear - clear bits for one page in the visibility map * visibilitymap_pin - pin a map page for setting a bit * visibilitymap_pin_ok - check whether correct map page is already pinned * visibilitymap_set - set a bit in a previously pinned page @@ -159,20 +159,23 @@ static void vm_extend(Relation rel, BlockNumber nvmblocks); /* - * visibilitymap_clear - clear all bits for one page in visibility map + * visibilitymap_clear - clear bit(s) for one page in visibility map * * You must pass a buffer containing the correct map page to this function. * Call visibilitymap_pin first to pin the right one. This function doesn't do - * any I/O. + * any I/O. Returns whether any bits have been cleared. */ -void -visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer buf) +bool +visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer buf, uint8 flags) { BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk); int mapByte = HEAPBLK_TO_MAPBYTE(heapBlk); int mapOffset = HEAPBLK_TO_OFFSET(heapBlk); - uint8 mask = VISIBILITYMAP_VALID_BITS << mapOffset; + uint8 mask = flags << mapOffset; char *map; + bool cleared = false; + + Assert(flags & VISIBILITYMAP_VALID_BITS); #ifdef TRACE_VISIBILITYMAP elog(DEBUG1, "vm_clear %s %d", RelationGetRelationName(rel), heapBlk); @@ -189,9 +192,12 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer buf) map[mapByte] &= ~mask; MarkBufferDirty(buf); + cleared = true; } LockBuffer(buf, BUFFER_LOCK_UNLOCK); + + return cleared; } /* diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c index 2b31ea4..7c763b6 100644 --- a/src/backend/access/rmgrdesc/heapdesc.c +++ b/src/backend/access/rmgrdesc/heapdesc.c @@ -85,7 +85,8 @@ heap_desc(StringInfo buf, XLogReaderState *record) { xl_heap_lock *xlrec = (xl_heap_lock *) rec; - appendStringInfo(buf, "off %u: xid %u ", xlrec->offnum, xlrec->locking_xid); + appendStringInfo(buf, "off %u: xid %u: flags %u ", + xlrec->offnum, xlrec->locking_xid, xlrec->flags); out_infobits(buf, xlrec->infobits_set); } else if (info == XLOG_HEAP_INPLACE) @@ -138,7 +139,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record) { xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec; - appendStringInfo(buf, "off %u: xmax %u ", xlrec->offnum, xlrec->xmax); + appendStringInfo(buf, "off %u: xmax %u: flags %u ", + xlrec->offnum, xlrec->xmax, xlrec->flags); out_infobits(buf, xlrec->infobits_set); } else if (info == XLOG_HEAP2_NEW_CID) diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 32b6fdd..4075f4d 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -1179,7 +1179,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, { elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u", relname, blkno); - visibilitymap_clear(onerel, blkno, vmbuffer); + visibilitymap_clear(onerel, blkno, vmbuffer, + VISIBILITYMAP_VALID_BITS); } /* @@ -1201,7 +1202,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats, relname, blkno); PageClearAllVisible(page); MarkBufferDirty(buf); - visibilitymap_clear(onerel, blkno, vmbuffer); + visibilitymap_clear(onerel, blkno, vmbuffer, + VISIBILITYMAP_VALID_BITS); } /* diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index a822d0b..06a8242 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -243,15 +243,19 @@ typedef struct xl_heap_cleanup_info #define XLHL_XMAX_KEYSHR_LOCK 0x08 #define XLHL_KEYS_UPDATED 0x10 +/* flag bits for xl_heap_lock / xl_heap_lock_updated's flag field */ +#define XLH_LOCK_ALL_FROZEN_CLEARED 0x01 + /* This is what we need to know about lock */ typedef struct xl_heap_lock { TransactionId locking_xid; /* might be a MultiXactId not xid */ OffsetNumber offnum; /* locked tuple's offset on page */ int8 infobits_set; /* infomask and infomask2 bits to set */ + uint8 flags; /* XLH_LOCK_* flag bits */ } xl_heap_lock; -#define SizeOfHeapLock (offsetof(xl_heap_lock, infobits_set) + sizeof(int8)) +#define SizeOfHeapLock (offsetof(xl_heap_lock, flags) + sizeof(int8)) /* This is what we need to know about locking an updated version of a row */ typedef struct xl_heap_lock_updated @@ -259,9 +263,10 @@ typedef struct xl_heap_lock_updated TransactionId xmax; OffsetNumber offnum; uint8 infobits_set; + uint8 flags; } xl_heap_lock_updated; -#define SizeOfHeapLockUpdated (offsetof(xl_heap_lock_updated, infobits_set) + sizeof(uint8)) +#define SizeOfHeapLockUpdated (offsetof(xl_heap_lock_updated, flags) + sizeof(uint8)) /* This is what we need to know about confirmation of speculative insertion */ typedef struct xl_heap_confirm diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h index fca99ca..00bbd4c 100644 --- a/src/include/access/visibilitymap.h +++ b/src/include/access/visibilitymap.h @@ -34,8 +34,8 @@ #define VM_ALL_FROZEN(r, b, v) \ ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0) -extern void visibilitymap_clear(Relation rel, BlockNumber heapBlk, - Buffer vmbuf); +extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk, + Buffer vmbuf, uint8 flags); extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk, Buffer *vmbuf); extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf); diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index 2627519..0a595cc 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -31,7 +31,7 @@ /* * Each page of XLOG file has a header like this: */ -#define XLOG_PAGE_MAGIC 0xD092 /* can be used as WAL version indicator */ +#define XLOG_PAGE_MAGIC 0xD093 /* can be used as WAL version indicator */ typedef struct XLogPageHeaderData { -- 2.8.1
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers