On 2013-12-03 15:46:09 -0500, Tom Lane wrote: > Noah Misch <n...@leadboat.com> writes: > > I'd rather have an readily-verifiable fix that changes WAL format than a > > tricky fix that avoids doing so. So, modulo not having seen the change, +1. > > Yeah, same here.
I am afraid it won't be *that* simple. We still need code to look into multis, check whether all members are ok wrt. cutoff_xid and replace them, either by the contained xid, or by a new multi with the still living members. Ugly. There's currently also the issue that heap_freeze_tuple() modifies the tuple inplace without a critical section. We're executing a HeapTupleSatisfiesVacuum() before we get to WAL logging things, that has plenty of rope to hang itself on. So that doesn't really seem ok to me? Attached is a pre-pre alpha patch for this. To fix the issue with the missing critical section it splits freezing into heap_prepare_freeze_tuple() and heap_execute_freeze_tuple(). The former doesn't touch tuples and is executed on the primary, and the second actually peforms the modifications and is executed both, during normal processing and recovery. Needs a fair bit of work: * Should move parts of the multixact processing into multixact.c, specifically it shouldn't require CreateMultiXactId() to be exported. * it relies on forward-declaring a struct in heapam.h that's actually defined heapam_xlog.h - that's pretty ugly. * any form of testing but make check/isolationcheck across SR. * lots of the comments around need to be added/reworked * has a simpler version of Alvaro's patch to HTSV in there Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
>From 330d128665fcf8633e60a42e8e4a497e2975dac0 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Wed, 4 Dec 2013 00:11:20 +0100 Subject: [PATCH] WIP: new freezing format --- src/backend/access/heap/heapam.c | 476 +++++++++++++++++++++++++-------- src/backend/access/rmgrdesc/heapdesc.c | 9 + src/backend/access/transam/multixact.c | 3 +- src/backend/commands/vacuumlazy.c | 28 +- src/backend/utils/time/tqual.c | 6 +- src/include/access/heapam.h | 7 +- src/include/access/heapam_xlog.h | 32 ++- src/include/access/multixact.h | 1 + 8 files changed, 443 insertions(+), 119 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c13f87c..b80fa5b 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -5242,14 +5242,17 @@ heap_inplace_update(Relation relation, HeapTuple tuple) CacheInvalidateHeapTuple(relation, tuple, NULL); } - /* - * heap_freeze_tuple + * heap_prepare_freeze_tuple * * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac) - * are older than the specified cutoff XID. If so, replace them with - * FrozenTransactionId or InvalidTransactionId as appropriate, and return - * TRUE. Return FALSE if nothing was changed. + * are older than the specified cutoff XID. If so, return enough state to + * later execute and WAL log replacin them with FrozenTransactionId or + * InvalidTransactionId as appropriate, and return TRUE. Return FALSE if + * nothing was changed. + * + * The 'off' field of the freeze state has to be set at the caller, not here, + * if required. * * It is assumed that the caller has checked the tuple with * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD @@ -5258,54 +5261,44 @@ heap_inplace_update(Relation relation, HeapTuple tuple) * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any * XID older than it could neither be running nor seen as running by any * open transaction. This ensures that the replacement will not change - * anyone's idea of the tuple state. Also, since we assume the tuple is - * not HEAPTUPLE_DEAD, the fact that an XID is not still running allows us - * to assume that it is either committed good or aborted, as appropriate; - * so we need no external state checks to decide what to do. (This is good - * because this function is applied during WAL recovery, when we don't have - * access to any such state, and can't depend on the hint bits to be set.) - * There is an exception we make which is to assume GetMultiXactIdMembers can - * be called during recovery. - * + * anyone's idea of the tuple state. * Similarly, cutoff_multi must be less than or equal to the smallest * MultiXactId used by any transaction currently open. * * If the tuple is in a shared buffer, caller must hold an exclusive lock on * that buffer. * - * Note: it might seem we could make the changes without exclusive lock, since - * TransactionId read/write is assumed atomic anyway. However there is a race - * condition: someone who just fetched an old XID that we overwrite here could - * conceivably not finish checking the XID against pg_clog before we finish - * the VACUUM and perhaps truncate off the part of pg_clog he needs. Getting - * exclusive lock ensures no other backend is in process of checking the - * tuple status. Also, getting exclusive lock makes it safe to adjust the - * infomask bits. - * - * NB: Cannot rely on hint bits here, they might not be set after a crash or - * on a standby. + * NB: It is not enough that hint bits indicate something is committed/invalid + * - they might not be set on a standby/after crash recovery. So we really + * need to remove old xids. */ bool -heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, - MultiXactId cutoff_multi) +heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, + TransactionId cutoff_multi, xl_heap_freeze_tuple *frz) + { bool changed = false; bool freeze_xmax = false; TransactionId xid; + frz->freeze_xmin = false; + frz->invalid_xvac = false; + frz->freeze_xvac = false; + frz->t_infomask2 = tuple->t_infomask2; + frz->t_infomask = tuple->t_infomask; + frz->xmax = HeapTupleHeaderGetRawXmax(tuple); + /* Process xmin */ xid = HeapTupleHeaderGetXmin(tuple); if (TransactionIdIsNormal(xid) && TransactionIdPrecedes(xid, cutoff_xid)) { - HeapTupleHeaderSetXmin(tuple, FrozenTransactionId); - + frz->freeze_xmin = true; /* * Might as well fix the hint bits too; usually XMIN_COMMITTED will * already be set here, but there's a small chance not. */ - Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID)); - tuple->t_infomask |= HEAP_XMIN_COMMITTED; + frz->t_infomask |= HEAP_XMIN_COMMITTED; changed = true; } @@ -5332,81 +5325,139 @@ heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, /* * This old multi cannot possibly be running. If it was a locker * only, it can be removed without much further thought; but if it - * contained an update, we need to preserve it. + * contained an update, we might need to preserve it. */ if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)) + { freeze_xmax = true; + } else { - TransactionId update_xid; + /* replace multi by update xid */ + frz->xmax = HeapTupleGetUpdateXid(tuple); + frz->t_infomask &= ~HEAP_XMAX_BITS; + frz->t_infomask &= ~HEAP_XMAX_IS_MULTI; - update_xid = HeapTupleGetUpdateXid(tuple); + /* wasn't only a lock, xid needs to be valid */ + Assert(TransactionIdIsValid(frz->xmax)); /* - * The multixact has an update hidden within. Get rid of it. - * - * If the update_xid is below the cutoff_xid, it necessarily - * must be an aborted transaction. In a primary server, such - * an Xmax would have gotten marked invalid by - * HeapTupleSatisfiesVacuum, but in a replica that is not - * called before we are, so deal with it in the same way. - * - * If not below the cutoff_xid, then the tuple would have been - * pruned by vacuum, if the update committed long enough ago, - * and we wouldn't be freezing it; so it's either recently - * committed, or in-progress. Deal with this by setting the - * Xmax to the update Xid directly and remove the IS_MULTI - * bit. (We know there cannot be running lockers in this - * multi, because it's below the cutoff_multi value.) + * If the xid is older than the cutoff, it has to have + * aborted, otherwise it would have gotten pruned away. */ - - if (TransactionIdPrecedes(update_xid, cutoff_xid)) + if (TransactionIdPrecedes(frz->xmax, cutoff_xid)) { - Assert(InRecovery || TransactionIdDidAbort(update_xid)); + Assert(!TransactionIdDidCommit(frz->xmax)); freeze_xmax = true; } else { - Assert(InRecovery || !TransactionIdIsInProgress(update_xid)); - tuple->t_infomask &= ~HEAP_XMAX_BITS; - HeapTupleHeaderSetXmax(tuple, update_xid); - changed = true; + /* preserve xmax */ } + changed = true; } } - else if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)) + else if (MultiXactIdIsRunning(xid)) { - /* newer than the cutoff, so don't touch it */ - ; + /* cannot be below cutoff */ } else { - TransactionId update_xid; + TransactionId update_xid = InvalidTransactionId; + MultiXactMember *members = NULL; + MultiXactMember *newmembers = NULL; + int nmembers; + int nnewmembers = 0; + bool has_live_members = false; + bool mxact_needs_freeze = false; + int i; /* - * This is a multixact which is not marked LOCK_ONLY, but which - * is newer than the cutoff_multi. If the update_xid is below the - * cutoff_xid point, then we can just freeze the Xmax in the - * tuple, removing it altogether. This seems simple, but there - * are several underlying assumptions: - * - * 1. A tuple marked by an multixact containing a very old - * committed update Xid would have been pruned away by vacuum; we - * wouldn't be freezing this tuple at all. - * - * 2. There cannot possibly be any live locking members remaining - * in the multixact. This is because if they were alive, the - * update's Xid would had been considered, via the lockers' - * snapshot's Xmin, as part the cutoff_xid. - * - * 3. We don't create new MultiXacts via MultiXactIdExpand() that - * include a very old aborted update Xid: in that function we only - * include update Xids corresponding to transactions that are - * committed or in-progress. + * For MultiXacts that are not below the cutoff, we need to check + * whether any of the members are too old. */ - update_xid = HeapTupleGetUpdateXid(tuple); - if (TransactionIdPrecedes(update_xid, cutoff_xid)) + nmembers = GetMultiXactIdMembers(xid, &members, false); + + if (nmembers <= 0) + { + /* pg_upgrade'd multi, just freeze away */ + freeze_xmax = true; + } + else + { + newmembers = (MultiXactMember *) + palloc(sizeof(MultiXactMember) * (nmembers + 1)); + + for (i = 0; i < nmembers; i++) + { + bool keep = false; + bool isupdate = ISUPDATE_from_mxstatus(members[i].status); + + if (TransactionIdPrecedes(members[i].xid, cutoff_xid)) + { + /* + * A potential updater could not have committed, tuple + * would have gotten vacuumed away already. + */ + Assert(!isupdate || TransactionIdDidAbort(cutoff_xid)); + mxact_needs_freeze = true; + } + else if (TransactionIdIsInProgress(members[i].xid)) + { + keep = true; + if (isupdate) + update_xid = members[i].xid; + } + else if (TransactionIdDidCommit(members[i].xid) && isupdate) + { + /* + * Only updates need to be preserved when they have + * committed, locks aren't interesting anymore. + */ + keep = true; + update_xid = members[i].xid; + } + + if (keep) + { + newmembers[nnewmembers++] = members[i]; + has_live_members = true; + } + } + } + + if (!mxact_needs_freeze) + { + /* nothing to do */; + } + else if (has_live_members && + TransactionIdIsValid(update_xid) && + nnewmembers == 1) + { + /* only the updater is still alive, replace multixact by xid */ + frz->xmax = update_xid; + frz->t_infomask &= ~HEAP_XMAX_BITS; + frz->t_infomask |= HEAP_XMAX_COMMITTED; + changed = true; + elog(LOG, "replace multi2"); + /* do not clear HEAP_HOT_UPDATED, HEAP_KEYS_UPDATED just yet */ + } + else if (has_live_members) + { + frz->xmax = CreateMultiXactId(nnewmembers, newmembers); + changed = true; + elog(LOG, "recreating multi"); + } + else + { freeze_xmax = true; + } + + /* cleanup memory we might have allocated */ + if (nmembers > 0) + pfree(members); + if (newmembers != NULL) + pfree(newmembers); } } else if (TransactionIdIsNormal(xid) && @@ -5417,20 +5468,21 @@ heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, if (freeze_xmax) { - HeapTupleHeaderSetXmax(tuple, InvalidTransactionId); + frz->xmax = InvalidTransactionId; /* * The tuple might be marked either XMAX_INVALID or XMAX_COMMITTED + * LOCKED. Normalize to INVALID just to be sure no one gets confused. * Also get rid of the HEAP_KEYS_UPDATED bit. */ - tuple->t_infomask &= ~HEAP_XMAX_BITS; - tuple->t_infomask |= HEAP_XMAX_INVALID; - HeapTupleHeaderClearHotUpdated(tuple); - tuple->t_infomask2 &= ~HEAP_KEYS_UPDATED; + frz->t_infomask &= ~HEAP_XMAX_BITS; + frz->t_infomask |= HEAP_XMAX_INVALID; + frz->t_infomask2 &= ~HEAP_HOT_UPDATED; + frz->t_infomask2 &= ~HEAP_KEYS_UPDATED; changed = true; } + /* * Old-style VACUUM FULL is gone, but we have to keep this code as long as * we support having MOVED_OFF/MOVED_IN tuples in the database. @@ -5447,16 +5499,16 @@ heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, * xvac transaction succeeded. */ if (tuple->t_infomask & HEAP_MOVED_OFF) - HeapTupleHeaderSetXvac(tuple, InvalidTransactionId); + frz->freeze_xvac = true; else - HeapTupleHeaderSetXvac(tuple, FrozenTransactionId); + frz->invalid_xvac = true; /* * Might as well fix the hint bits too; usually XMIN_COMMITTED * will already be set here, but there's a small chance not. */ Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID)); - tuple->t_infomask |= HEAP_XMIN_COMMITTED; + frz->t_infomask |= HEAP_XMIN_COMMITTED; changed = true; } } @@ -5464,6 +5516,59 @@ heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, return changed; } + +/* + * heap_freeze_tuple - freeze tuple inplace without WAL logging. + * + * Useful for callers like CLUSTER that perform their own WAL logging. + */ +bool +heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, + TransactionId cutoff_multi) +{ + xl_heap_freeze_tuple frz; + bool do_freeze; + + do_freeze = heap_prepare_freeze_tuple(tuple, cutoff_xid, cutoff_multi, &frz); + if (do_freeze) + heap_execute_freeze_tuple(tuple, &frz); + return do_freeze; +} + +/* + * heap_execute_freeze_tuple + * + * Execute the prepared freezing of a tuple. + * + * Note: it might seem we could make the changes without exclusive lock, since + * TransactionId read/write is assumed atomic anyway. However there is a race + * condition: someone who just fetched an old XID that we overwrite here could + * conceivably not finish checking the XID against pg_clog before we finish + * the VACUUM and perhaps truncate off the part of pg_clog he needs. Getting + * exclusive lock ensures no other backend is in process of checking the + * tuple status. Also, getting exclusive lock makes it safe to adjust the + * infomask bits. + * + * NB: All code in here must be safe to execute during crash recovery! + */ +void +heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz) +{ + if (frz->freeze_xmin) + HeapTupleHeaderSetXmin(tuple, FrozenTransactionId); + + HeapTupleHeaderSetXmax(tuple, frz->xmax); + + if (frz->freeze_xvac) + HeapTupleHeaderSetXvac(tuple, FrozenTransactionId); + + if (frz->invalid_xvac) + HeapTupleHeaderSetXvac(tuple, InvalidTransactionId); + + tuple->t_infomask = frz->t_infomask; + tuple->t_infomask2 = frz->t_infomask2; +} + /* * For a given MultiXactId, return the hint bits that should be set in the * tuple's infomask. @@ -5767,16 +5872,26 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, } else if (MultiXactIdPrecedes(multi, cutoff_multi)) return true; - else if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)) - { - /* only-locker multis don't need internal examination */ - ; - } else { - if (TransactionIdPrecedes(HeapTupleGetUpdateXid(tuple), - cutoff_xid)) - return true; + MultiXactMember *members; + int nmembers; + int i; + + /* need to check whether any member of the mxact is too old */ + + nmembers = GetMultiXactIdMembers(multi, &members, false); + + for (i = 0; i < nmembers; i++) + { + if (TransactionIdPrecedes(members[i].xid, cutoff_xid)) + { + pfree(members); + return true; + } + } + if (nmembers > 0) + pfree(members); } } else @@ -6031,22 +6146,22 @@ log_heap_clean(Relation reln, Buffer buffer, */ XLogRecPtr log_heap_freeze(Relation reln, Buffer buffer, - TransactionId cutoff_xid, MultiXactId cutoff_multi, - OffsetNumber *offsets, int offcnt) + TransactionId cutoff_xid, + xl_heap_freeze_tuple *tuples, int ntuples) { - xl_heap_freeze xlrec; + xl_heap_freeze_page xlrec; XLogRecPtr recptr; XLogRecData rdata[2]; /* Caller should not call me on a non-WAL-logged relation */ Assert(RelationNeedsWAL(reln)); /* nor when there are no tuples to freeze */ - Assert(offcnt > 0); + Assert(ntuples > 0); xlrec.node = reln->rd_node; xlrec.block = BufferGetBlockNumber(buffer); xlrec.cutoff_xid = cutoff_xid; - xlrec.cutoff_multi = cutoff_multi; + xlrec.ntuples = ntuples; rdata[0].data = (char *) &xlrec; rdata[0].len = SizeOfHeapFreeze; @@ -6058,13 +6173,13 @@ log_heap_freeze(Relation reln, Buffer buffer, * it is. When XLogInsert stores the whole buffer, the offsets array need * not be stored too. */ - rdata[1].data = (char *) offsets; - rdata[1].len = offcnt * sizeof(OffsetNumber); + rdata[1].data = (char *) tuples; + rdata[1].len = ntuples * SizeOfHeapFreezeTuple; rdata[1].buffer = buffer; rdata[1].buffer_std = true; rdata[1].next = NULL; - recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE, rdata); + recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE, rdata); return recptr; } @@ -6406,6 +6521,99 @@ heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record) XLogRecordPageWithFreeSpace(xlrec->node, xlrec->block, freespace); } +/* + * Freeze a single tuple for XLOG_HEAP2_FREEZE + * + * NB: This type of record aren't generated anymore, since bugs around + * multixacts couldn't be fixed without a more robust type of freezing. This + * is kept around to be able to perform PITR. + */ +static bool +heap_xlog_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, + MultiXactId cutoff_multi) +{ + bool changed = false; + TransactionId xid; + + xid = HeapTupleHeaderGetXmin(tuple); + if (TransactionIdIsNormal(xid) && + TransactionIdPrecedes(xid, cutoff_xid)) + { + HeapTupleHeaderSetXmin(tuple, FrozenTransactionId); + + /* + * Might as well fix the hint bits too; usually XMIN_COMMITTED will + * already be set here, but there's a small chance not. + */ + Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID)); + tuple->t_infomask |= HEAP_XMIN_COMMITTED; + changed = true; + } + + /* + * Note that this code handles IS_MULTI Xmax values, too, but only to mark + * the tuple as not updated if the multixact is below the cutoff Multixact + * given; it doesn't remove dead members of a very old multixact. + */ + xid = HeapTupleHeaderGetRawXmax(tuple); + if ((tuple->t_infomask & HEAP_XMAX_IS_MULTI) ? + (MultiXactIdIsValid(xid) && + MultiXactIdPrecedes(xid, cutoff_multi)) : + (TransactionIdIsNormal(xid) && + TransactionIdPrecedes(xid, cutoff_xid))) + { + HeapTupleHeaderSetXmax(tuple, InvalidTransactionId); + + /* + * The tuple might be marked either XMAX_INVALID or XMAX_COMMITTED + + * LOCKED. Normalize to INVALID just to be sure no one gets confused. + * Also get rid of the HEAP_KEYS_UPDATED bit. + */ + tuple->t_infomask &= ~HEAP_XMAX_BITS; + tuple->t_infomask |= HEAP_XMAX_INVALID; + HeapTupleHeaderClearHotUpdated(tuple); + tuple->t_infomask2 &= ~HEAP_KEYS_UPDATED; + changed = true; + } + + /* + * Old-style VACUUM FULL is gone, but we have to keep this code as long as + * we support having MOVED_OFF/MOVED_IN tuples in the database. + */ + if (tuple->t_infomask & HEAP_MOVED) + { + xid = HeapTupleHeaderGetXvac(tuple); + if (TransactionIdIsNormal(xid) && + TransactionIdPrecedes(xid, cutoff_xid)) + { + /* + * If a MOVED_OFF tuple is not dead, the xvac transaction must + * have failed; whereas a non-dead MOVED_IN tuple must mean the + * xvac transaction succeeded. + */ + if (tuple->t_infomask & HEAP_MOVED_OFF) + HeapTupleHeaderSetXvac(tuple, InvalidTransactionId); + else + HeapTupleHeaderSetXvac(tuple, FrozenTransactionId); + + /* + * Might as well fix the hint bits too; usually XMIN_COMMITTED + * will already be set here, but there's a small chance not. + */ + Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID)); + tuple->t_infomask |= HEAP_XMIN_COMMITTED; + changed = true; + } + } + + return changed; +} + +/* + * NB: This type of record aren't generated anymore, since bugs around + * multixacts couldn't be fixed without a more robust type of freezing. This + * is kept around to be able to perform PITR. + */ static void heap_xlog_freeze(XLogRecPtr lsn, XLogRecord *record) { @@ -6454,7 +6662,7 @@ heap_xlog_freeze(XLogRecPtr lsn, XLogRecord *record) ItemId lp = PageGetItemId(page, *offsets); HeapTupleHeader tuple = (HeapTupleHeader) PageGetItem(page, lp); - (void) heap_freeze_tuple(tuple, cutoff_xid, cutoff_multi); + (void) heap_xlog_freeze_tuple(tuple, cutoff_xid, cutoff_multi); offsets++; } } @@ -6578,6 +6786,59 @@ heap_xlog_visible(XLogRecPtr lsn, XLogRecord *record) } } +/* + * Replay XLOG_HEAP2_FREEZE_PAGE records + */ +static void +heap_xlog_freeze_page(XLogRecPtr lsn, XLogRecord *record) +{ + xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) XLogRecGetData(record); + TransactionId cutoff_xid = xlrec->cutoff_xid; + Buffer buffer; + Page page; + int ntup; + + /* + * In Hot Standby mode, ensure that there's no queries running which still + * consider the frozen xids as running. + */ + if (InHotStandby) + ResolveRecoveryConflictWithSnapshot(cutoff_xid, xlrec->node); + + /* If we have a full-page image, restore it and we're done */ + if (record->xl_info & XLR_BKP_BLOCK(0)) + { + (void) RestoreBackupBlock(lsn, record, 0, false, false); + return; + } + + buffer = XLogReadBuffer(xlrec->node, xlrec->block, false); + if (!BufferIsValid(buffer)) + return; + + page = (Page) BufferGetPage(buffer); + + if (lsn <= PageGetLSN(page)) + { + UnlockReleaseBuffer(buffer); + return; + } + + /* now execute freeze plan for each frozen tuple */ + for (ntup = 0; ntup < xlrec->ntuples; ntup++) + { + xl_heap_freeze_tuple *xlrec_tp = &xlrec->tuples[ntup]; + /* offsets are one-based */ + ItemId lp = PageGetItemId(page, xlrec_tp->off); + HeapTupleHeader tuple = (HeapTupleHeader) PageGetItem(page, lp); + heap_execute_freeze_tuple(tuple, xlrec_tp); + } + + PageSetLSN(page, lsn); + MarkBufferDirty(buffer); + UnlockReleaseBuffer(buffer); +} + static void heap_xlog_newpage(XLogRecPtr lsn, XLogRecord *record) { @@ -7433,6 +7694,9 @@ heap2_redo(XLogRecPtr lsn, XLogRecord *record) case XLOG_HEAP2_CLEAN: heap_xlog_clean(lsn, record); break; + case XLOG_HEAP2_FREEZE_PAGE: + heap_xlog_freeze_page(lsn, record); + break; case XLOG_HEAP2_CLEANUP_INFO: heap_xlog_cleanup_info(lsn, record); break; diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c index e14c053..1b244b1 100644 --- a/src/backend/access/rmgrdesc/heapdesc.c +++ b/src/backend/access/rmgrdesc/heapdesc.c @@ -149,6 +149,15 @@ heap2_desc(StringInfo buf, uint8 xl_info, char *rec) xlrec->node.relNode, xlrec->block, xlrec->latestRemovedXid); } + if (info == XLOG_HEAP2_FREEZE_PAGE) + { + xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec; + + appendStringInfo(buf, "freeze_page: rel %u/%u/%u; blk %u; cutoff xid %u ntuples %u", + xlrec->node.spcNode, xlrec->node.dbNode, + xlrec->node.relNode, xlrec->block, + xlrec->cutoff_xid, xlrec->ntuples); + } else if (info == XLOG_HEAP2_CLEANUP_INFO) { xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) rec; diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 2081470..2a1bf6f 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -286,7 +286,6 @@ static MemoryContext MXactContext = NULL; /* internal MultiXactId management */ static void MultiXactIdSetOldestVisible(void); -static MultiXactId CreateMultiXactId(int nmembers, MultiXactMember *members); static void RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, int nmembers, MultiXactMember *members); static MultiXactId GetNewMultiXactId(int nmembers, MultiXactOffset *offset); @@ -672,7 +671,7 @@ ReadNextMultiXactId(void) * * NB: the passed members[] array will be sorted in-place. */ -static MultiXactId +MultiXactId CreateMultiXactId(int nmembers, MultiXactMember *members) { MultiXactId multi; diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index fe2d9e7..538f3b8 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -500,13 +500,14 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, bool tupgone, hastup; int prev_dead_count; - OffsetNumber frozen[MaxOffsetNumber]; + xl_heap_freeze_tuple frozen[MaxOffsetNumber]; /* FIXME: stack ok? */ int nfrozen; Size freespace; bool all_visible_according_to_vm; bool all_visible; bool has_dead_tuples; TransactionId visibility_cutoff_xid = InvalidTransactionId; + int i; if (blkno == next_not_all_visible_block) { @@ -894,9 +895,9 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, * Each non-removable tuple must be checked to see if it needs * freezing. Note we already have exclusive buffer lock. */ - if (heap_freeze_tuple(tuple.t_data, FreezeLimit, - MultiXactCutoff)) - frozen[nfrozen++] = offnum; + if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit, + MultiXactCutoff, &frozen[nfrozen])) + frozen[nfrozen++].off = offnum; } } /* scan along page */ @@ -907,15 +908,32 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, */ if (nfrozen > 0) { + START_CRIT_SECTION(); + MarkBufferDirty(buf); + + /* execute collected freezes */ + for (i = 0; i < nfrozen; i++) + { + ItemId itemid; + HeapTupleHeader htup; + + itemid = PageGetItemId(page, frozen[i].off); + htup = (HeapTupleHeader) PageGetItem(page, itemid); + + heap_execute_freeze_tuple(htup, &frozen[i]); + } + + /* Now WAL-log freezing if neccessary */ if (RelationNeedsWAL(onerel)) { XLogRecPtr recptr; recptr = log_heap_freeze(onerel, buf, FreezeLimit, - MultiXactCutoff, frozen, nfrozen); + frozen, nfrozen); PageSetLSN(page, recptr); } + END_CRIT_SECTION(); } /* diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index 1ebc5ff..67d5fec 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -598,11 +598,13 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid, /* no member, even just a locker, alive anymore */ if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple))) + { SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); + return HeapTupleMayBeUpdated; + } - /* it must have aborted or crashed */ - return HeapTupleMayBeUpdated; + return HeapTupleBeingUpdated; } if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple))) diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 0d40398..e5864bb 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -148,8 +148,13 @@ extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple, bool follow_update, Buffer *buffer, HeapUpdateFailureData *hufd); extern void heap_inplace_update(Relation relation, HeapTuple tuple); + +struct xl_heap_freeze_tuple; extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, - TransactionId cutoff_multi); + TransactionId cutoff_multi); +extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, + TransactionId cutoff_multi, struct xl_heap_freeze_tuple *frz); +extern void heap_execute_freeze_tuple(HeapTupleHeader tuple, struct xl_heap_freeze_tuple *xlrec_tp); extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, MultiXactId cutoff_multi, Buffer buf); diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index 4381778..d2baa5b 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -50,7 +50,7 @@ */ #define XLOG_HEAP2_FREEZE 0x00 #define XLOG_HEAP2_CLEAN 0x10 -/* 0x20 is free, was XLOG_HEAP2_CLEAN_MOVE */ +#define XLOG_HEAP2_FREEZE_PAGE 0x20 #define XLOG_HEAP2_CLEANUP_INFO 0x30 #define XLOG_HEAP2_VISIBLE 0x40 #define XLOG_HEAP2_MULTI_INSERT 0x50 @@ -251,6 +251,33 @@ typedef struct xl_heap_freeze #define SizeOfHeapFreeze (offsetof(xl_heap_freeze, cutoff_multi) + sizeof(MultiXactId)) +/* This is what we need to know about tuple freezing during vacuum */ +typedef struct xl_heap_freeze_tuple +{ + OffsetNumber off; + bool freeze_xmin; + bool invalid_xvac; + bool freeze_xvac; + TransactionId xmax; + uint16 t_infomask2; + uint16 t_infomask; +} xl_heap_freeze_tuple; + +#define SizeOfHeapFreezeTuple sizeof(xl_heap_freeze_tuple) + +/* This is what we need to know about tuple freezing during vacuum */ +typedef struct xl_heap_freeze_block +{ + RelFileNode node; + BlockNumber block; + TransactionId cutoff_xid; + uint16 ntuples; + xl_heap_freeze_tuple tuples[1]; +} xl_heap_freeze_page; + +#define MinSizeOfHeapFreezeBlock (offsetof(xl_heap_freeze_block, tuples)) + + /* This is what we need to know about setting a visibility map bit */ typedef struct xl_heap_visible { @@ -277,8 +304,7 @@ extern XLogRecPtr log_heap_clean(Relation reln, Buffer buffer, OffsetNumber *nowunused, int nunused, TransactionId latestRemovedXid); extern XLogRecPtr log_heap_freeze(Relation reln, Buffer buffer, - TransactionId cutoff_xid, MultiXactId cutoff_multi, - OffsetNumber *offsets, int offcnt); + TransactionId cutoff_xid, xl_heap_freeze_tuple *tuples, int ntuples); extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer, TransactionId cutoff_xid); extern XLogRecPtr log_newpage(RelFileNode *rnode, ForkNumber forkNum, diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h index 6085ea3..cad91e2 100644 --- a/src/include/access/multixact.h +++ b/src/include/access/multixact.h @@ -79,6 +79,7 @@ typedef struct xl_multixact_create extern MultiXactId MultiXactIdCreate(TransactionId xid1, MultiXactStatus status1, TransactionId xid2, MultiXactStatus status2); +extern MultiXactId CreateMultiXactId(int nmembers, MultiXactMember *members); extern MultiXactId MultiXactIdExpand(MultiXactId multi, TransactionId xid, MultiXactStatus status); extern MultiXactId ReadNextMultiXactId(void); -- 1.8.5.rc2.dirty
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers