On Sat, Jan 29, 2022 at 8:42 PM Peter Geoghegan <p...@bowt.ie> wrote: > Attached is v7, a revision that overhauls the algorithm that decides > what to freeze. I'm now calling it block-driven freezing in the commit > message. Also included is a new patch, that makes VACUUM record zero > free space in the FSM for an all-visible page, unless the total amount > of free space happens to be greater than one half of BLCKSZ.
I pushed the earlier refactoring and instrumentation patches today. Attached is v8. No real changes -- just a rebased version. It will be easier to benchmark and test the page-driven freezing stuff now, since the master/baseline case will now output instrumentation showing how relfrozenxid has been advanced (if at all) -- whether (and to what extent) each VACUUM operation advances relfrozenxid can now be directly compared, just by monitoring the log_autovacuum_min_duration output for a given table over time. -- Peter Geoghegan
From 41136d2a8af434a095ce3e6dfdfbe4b48b9ec338 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <pg@bowt.ie> Date: Sun, 23 Jan 2022 21:10:38 -0800 Subject: [PATCH v8 3/3] Add all-visible FSM heuristic. When recording free space in all-frozen page, record that the page has zero free space when it has less than half BLCKSZ worth of space, according to the traditional definition. Otherwise record free space as usual. Making all-visible pages resistant to change like this can be thought of as a form of hysteresis. The page is given an opportunity to "settle" and permanently stay in the same state when the tuples on the page will never be updated or deleted. But when they are updated or deleted, the page can once again be used to store any tuple. Over time, most pages tend to settle permanently in many workloads, perhaps only on the second or third attempt. --- src/backend/access/heap/vacuumlazy.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index ea4b75189..95049ed25 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1231,6 +1231,13 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) */ freespace = PageGetHeapFreeSpace(page); + /* + * An all-visible page should not have its free space + * available from FSM unless it's more than half empty + */ + if (PageIsAllVisible(page) && freespace < BLCKSZ / 2) + freespace = 0; + UnlockReleaseBuffer(buf); RecordPageWithFreeSpace(vacrel->rel, blkno, freespace); continue; @@ -1368,6 +1375,13 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) { Size freespace = PageGetHeapFreeSpace(page); + /* + * An all-visible page should not have its free space available + * from FSM unless it's more than half empty + */ + if (PageIsAllVisible(page) && freespace < BLCKSZ / 2) + freespace = 0; + UnlockReleaseBuffer(buf); RecordPageWithFreeSpace(vacrel->rel, blkno, freespace); } @@ -2537,6 +2551,13 @@ lazy_vacuum_heap_rel(LVRelState *vacrel) page = BufferGetPage(buf); freespace = PageGetHeapFreeSpace(page); + /* + * An all-visible page should not have its free space available from + * FSM unless it's more than half empty + */ + if (PageIsAllVisible(page) && freespace < BLCKSZ / 2) + freespace = 0; + UnlockReleaseBuffer(buf); RecordPageWithFreeSpace(vacrel->rel, tblk, freespace); vacuumed_pages++; -- 2.30.2
From 4838bd1f11b748d2082caedfe4da506b8fe3f67a Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <pg@bowt.ie> Date: Mon, 13 Dec 2021 15:00:49 -0800 Subject: [PATCH v8 2/3] Make block-level characteristics drive freezing. Teach VACUUM to freeze all of the tuples on a page whenever it notices that it would otherwise mark the page all-visible, without also marking it all-frozen. VACUUM won't freeze _any_ tuples on the page unless _all_ tuples (that remain after pruning) are all-visible. It may occasionally be necessary to freeze the page due to the presence of a particularly old XID, from before VACUUM's FreezeLimit cutoff. But the FreezeLimit mechanism will seldom have any impact on which pages are frozen anymore -- it is just a backstop now. Freezing can now informally be thought of as something that takes place at the level of an entire page, or not at all -- differences in XIDs among tuples on the same page are not interesting, barring extreme cases. Freezing a page is now practically synonymous with setting the page to all-visible in the visibility map, at least to users. The main upside of the new approach to freezing is that it makes the overhead of vacuuming much more predictable over time. We avoid the need for large balloon payments, since the system no longer accumulates "freezing debt" that can only be paid off by anti-wraparound vacuuming. This seems to have been particularly troublesome with append-only tables, especially in the common case where XIDs from pages that are marked all-visible for the first time are still fairly young (in particular, not as old as indicated by VACUUM's vacuum_freeze_min_age freezing cutoff). Before now, nothing stopped these pages from being set to all-visible (without also being set to all-frozen) the first time they were reached by VACUUM, which meant that they just couldn't be frozen until the next anti-wraparound VACUUM -- at which point the XIDs from the unfrozen tuples might be much older than vacuum_freeze_min_age. In summary, the old vacuum_freeze_min_age-based FreezeLimit cutoff could not _reliably_ limit freezing debt unless the GUC was set to 0. There is a virtuous cycle enabled by the new approach to freezing: freezing more tuples earlier during non-aggressive VACUUMs allows us to advance relfrozenxid eagerly, which buys time. This creates every opportunity for the workload to naturally generate enough dead tuples (or newly inserted tuples) to make the autovacuum launcher launch a non-aggressive autovacuum. The overall effect is that most individual tables no longer require _any_ anti-wraparound vacuum operations. This effect also owes much to the enhancement added by commit ?????, which loosened the coupling between freezing and advancing relfrozenxid, allowing VACUUM to precisely determine a new relfrozenxid. It's still possible (and sometimes even likely) that VACUUM won't be able to freeze a tuple with a somewhat older XID due only to a cleanup lock not being immediately available. It's even possible that some VACUUM operations will fail to advance relfrozenxid by very many XIDs as a consequence. But the impact over time should be negligible. The next VACUUM operation for the table will effectively get a new opportunity to freeze (or perhaps remove) the same tuple that was originally missed. Once that happens, relfrozenxid will completely catch up. (Actually, one could reasonably argue that we never really "fell behind" in the first place -- the amount of freezing needed to significantly advance relfrozenxid won't have measurably increased at any point. A once-off drop in the extent to which VACUUM can advance relfrozenxid is almost certainly harmless noise.) Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com --- src/backend/access/heap/vacuumlazy.c | 84 ++++++++++++++++++++++++---- 1 file changed, 72 insertions(+), 12 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index d481a300b..ea4b75189 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -169,6 +169,7 @@ typedef struct LVRelState /* VACUUM operation's cutoff for pruning */ TransactionId OldestXmin; + MultiXactId OldestMxact; /* VACUUM operation's cutoff for freezing XIDs and MultiXactIds */ TransactionId FreezeLimit; MultiXactId MultiXactCutoff; @@ -200,6 +201,7 @@ typedef struct LVRelState BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */ BlockNumber frozenskipped_pages; /* # frozen pages skipped via VM */ BlockNumber removed_pages; /* # pages removed by relation truncation */ + BlockNumber newly_frozen_pages; /* # pages with tuples frozen by us */ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */ @@ -474,6 +476,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, /* Set cutoffs for entire VACUUM */ vacrel->OldestXmin = OldestXmin; + vacrel->OldestMxact = OldestMxact; vacrel->FreezeLimit = FreezeLimit; vacrel->MultiXactCutoff = MultiXactCutoff; @@ -654,12 +657,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->relnamespace, vacrel->relname, vacrel->num_index_scans); - appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"), + appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total), %u newly frozen (%.2f%% of total)\n"), vacrel->removed_pages, vacrel->rel_pages, vacrel->scanned_pages, orig_rel_pages == 0 ? 0 : - 100.0 * vacrel->scanned_pages / orig_rel_pages); + 100.0 * vacrel->scanned_pages / orig_rel_pages, + vacrel->newly_frozen_pages, + orig_rel_pages == 0 ? 0 : + 100.0 * vacrel->newly_frozen_pages / orig_rel_pages); appendStringInfo(&buf, _("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"), (long long) vacrel->tuples_deleted, @@ -827,6 +833,7 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) vacrel->scanned_pages = 0; vacrel->frozenskipped_pages = 0; vacrel->removed_pages = 0; + vacrel->newly_frozen_pages = 0; vacrel->lpdead_item_pages = 0; vacrel->missed_dead_pages = 0; vacrel->nonempty_pages = 0; @@ -1027,7 +1034,7 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) /* * SKIP_PAGES_THRESHOLD (threshold for skipping) was not * crossed, or this is the last page. Scan the page, even - * though it's all-visible (and possibly even all-frozen). + * though it's all-visible (and likely all-frozen, too). */ all_visible_according_to_vm = true; } @@ -1589,7 +1596,7 @@ lazy_scan_prune(LVRelState *vacrel, ItemId itemid; HeapTupleData tuple; HTSV_Result res; - int tuples_deleted, + int tuples_deleted = 0, lpdead_items, recently_dead_tuples, num_tuples, @@ -1600,6 +1607,9 @@ lazy_scan_prune(LVRelState *vacrel, xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage]; TransactionId NewRelfrozenxid; MultiXactId NewRelminmxid; + TransactionId FreezeLimit = vacrel->FreezeLimit; + MultiXactId MultiXactCutoff = vacrel->MultiXactCutoff; + bool freezeblk = false; Assert(BufferGetBlockNumber(buf) == blkno); @@ -1610,7 +1620,6 @@ retry: /* Initialize (or reset) page-level counters */ NewRelfrozenxid = vacrel->NewRelfrozenxid; NewRelminmxid = vacrel->NewRelminmxid; - tuples_deleted = 0; lpdead_items = 0; recently_dead_tuples = 0; num_tuples = 0; @@ -1625,9 +1634,9 @@ retry: * lpdead_items's final value can be thought of as the number of tuples * that were deleted from indexes. */ - tuples_deleted = heap_page_prune(rel, buf, vistest, - InvalidTransactionId, 0, &nnewlpdead, - &vacrel->offnum); + tuples_deleted += heap_page_prune(rel, buf, vistest, + InvalidTransactionId, 0, &nnewlpdead, + &vacrel->offnum); /* * Now scan the page to collect LP_DEAD items and check for tuples @@ -1678,11 +1687,16 @@ retry: * vacrel->nonempty_pages value) is inherently race-prone. It must be * treated as advisory/unreliable, so we might as well be slightly * optimistic. + * + * We delay setting all_visible to false due to seeing an LP_DEAD + * item. We need to test "is the page all_visible if we just consider + * remaining tuples with tuple storage?" below, when considering if we + * should freeze the tuples on the page. (all_visible will be set to + * false for caller once we've decided on what to freeze.) */ if (ItemIdIsDead(itemid)) { deadoffsets[lpdead_items++] = offnum; - prunestate->all_visible = false; prunestate->has_lpdead_items = true; continue; } @@ -1816,8 +1830,8 @@ retry: if (heap_prepare_freeze_tuple(tuple.t_data, vacrel->relfrozenxid, vacrel->relminmxid, - vacrel->FreezeLimit, - vacrel->MultiXactCutoff, + FreezeLimit, + MultiXactCutoff, &frozen[nfrozen], &tuple_totally_frozen, &NewRelfrozenxid, @@ -1837,6 +1851,50 @@ retry: vacrel->offnum = InvalidOffsetNumber; + /* + * Freeze the whole page using OldestXmin (not FreezeLimit) as our cutoff + * if the page is now eligible to be marked all_visible (barring any + * LP_DEAD items) when the page is not already eligible to be marked + * all_frozen. We generally expect to freeze all of a block's tuples + * together and at once, or none at all. FreezeLimit is just a backstop + * mechanism that makes sure that we don't overlook one or two older + * tuples. + * + * For example, it's just about possible that successive VACUUM operations + * will never quite manage to use the main block-level logic to freeze one + * old tuple from a page where all other tuples are continually updated. + * We should not be in any hurry to freeze such a tuple. Even still, it's + * better if we take care of it before an anti-wraparound VACUUM becomes + * necessary -- that would mean that we'd have to wait for a cleanup lock + * during the aggressive VACUUM, which has risks of its own. + * + * FIXME This code structure has been used for prototyping and testing the + * algorithm, details of which have settled. Code itself to be rewritten, + * though. It is backwards right now -- should be _starting_ with + * OldestXmin (not FreezeLimit), since that's what happens at the + * conceptual level. + * + * TODO Make vacuum_freeze_min_age GUC/reloption default -1, which will be + * interpreted as "whatever autovacuum_freeze_max_age/2 is". Idea is to + * make FreezeLimit into a true backstop, and to do our best to avoid + * waiting for a cleanup lock (always prefer to punt to the next VACUUM, + * since we can advance relfrozenxid to the oldest XID on the page inside + * lazy_scan_noprune). + */ + if (!freezeblk && + ((nfrozen > 0 && nfrozen < num_tuples) || + (prunestate->all_visible && !prunestate->all_frozen))) + { + freezeblk = true; + FreezeLimit = vacrel->OldestXmin; + MultiXactCutoff = vacrel->OldestMxact; + goto retry; + } + + /* Time to define all_visible in a way that accounts for LP_DEAD items */ + if (lpdead_items > 0) + prunestate->all_visible = false; + /* * We have now divided every item on the page into either an LP_DEAD item * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple @@ -1854,6 +1912,8 @@ retry: { Assert(prunestate->hastup); + vacrel->newly_frozen_pages++; + /* * At least one tuple with storage needs to be frozen -- execute that * now. @@ -1882,7 +1942,7 @@ retry: { XLogRecPtr recptr; - recptr = log_heap_freeze(vacrel->rel, buf, vacrel->FreezeLimit, + recptr = log_heap_freeze(vacrel->rel, buf, FreezeLimit, frozen, nfrozen); PageSetLSN(page, recptr); } -- 2.30.2
From 6c8cb32e074e7de2414b067fcf4011acb4cca121 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <pg@bowt.ie> Date: Mon, 22 Nov 2021 10:02:30 -0800 Subject: [PATCH v8 1/3] Loosen coupling between relfrozenxid and tuple freezing. The pg_class.relfrozenxid invariant for heap relations is as follows: relfrozenxid must be less than or equal to the oldest extant XID in the table, and must never wraparound (it must be advanced by VACUUM before wraparound, or in extreme cases the system must be forced to stop allocating new XIDs). Before now, VACUUM always set relfrozenxid to whatever value it happened to use when determining which tuples to freeze (the VACUUM operation's FreezeLimit cutoff). But there was no inherent reason why the oldest extant XID in the table should be anywhere near as old as that. Furthermore, even if it really was almost as old as FreezeLimit, that tells us much more about the mechanism that VACUUM used to determine which tuples to freeze than anything else. Depending on the details of the table and workload, it may have been possible to safely advance relfrozenxid by many more XIDs, at a relatively small cost in freezing (possibly no extra cost at all) -- but VACUUM rigidly coupled freezing with advancing relfrozenxid, missing all this. Teach VACUUM to track the newest possible safe final relfrozenxid dynamically (and to track a new value for relminmxid). In the extreme though common case where all tuples are already frozen, or became frozen (or were removed by pruning), the final relfrozenxid value will be exactly equal to the OldestXmin value used by the same VACUUM operation. A later patch will overhaul the strategy that VACUUM uses for freezing so that relfrozenxid will tend to get set to a value that's relatively close to OldestXmin in almost all cases. Final relfrozenxid values still follow the same rules as before. They must still be >= FreezeLimit in an aggressive VACUUM. Non-aggressive VACUUMs can set relfrozenxid to any value that's greater than the preexisting relfrozenxid, which could be either much earlier or much later than FreezeLimit. Much depends on workload characteristics. In practice there is significant natural variation that we can take advantage of. Credit for the general idea of using the oldest extant XID to set pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com --- src/include/access/heapam.h | 4 +- src/include/access/heapam_xlog.h | 4 +- src/include/commands/vacuum.h | 1 + src/backend/access/heap/heapam.c | 186 ++++++++++++++++++++------- src/backend/access/heap/vacuumlazy.c | 85 ++++++++---- src/backend/commands/cluster.c | 5 +- src/backend/commands/vacuum.c | 34 +++-- 7 files changed, 238 insertions(+), 81 deletions(-) diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 0ad87730e..d35402f9f 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -168,7 +168,9 @@ extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId cutoff_xid, TransactionId cutoff_multi); extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, - MultiXactId cutoff_multi, Buffer buf); + MultiXactId cutoff_multi, + TransactionId *NewRelfrozenxid, + MultiXactId *NewRelminmxid, Buffer buf); extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple); extern void simple_heap_insert(Relation relation, HeapTuple tup); diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index 5c47fdcec..ae55c90f7 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -410,7 +410,9 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid, TransactionId cutoff_multi, xl_heap_freeze_tuple *frz, - bool *totally_frozen); + bool *totally_frozen, + TransactionId *NewRelfrozenxid, + MultiXactId *NewRelminmxid); extern void heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *xlrec_tp); extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer, diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index d64f6268f..ead88edda 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -291,6 +291,7 @@ extern bool vacuum_set_xid_limits(Relation rel, int multixact_freeze_min_age, int multixact_freeze_table_age, TransactionId *oldestXmin, + MultiXactId *oldestMxact, TransactionId *freezeLimit, MultiXactId *multiXactCutoff); extern bool vacuum_xid_failsafe_check(TransactionId relfrozenxid, diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 98230aac4..d85a817ff 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6087,12 +6087,24 @@ heap_inplace_update(Relation relation, HeapTuple tuple) * FRM_RETURN_IS_MULTI * The return value is a new MultiXactId to set as new Xmax. * (caller must obtain proper infomask bits using GetMultiXactIdHintBits) + * + * "NewRelfrozenxid" is an output value; it's used to maintain target new + * relfrozenxid for the relation. It can be ignored unless "flags" contains + * either FRM_NOOP or FRM_RETURN_IS_MULTI, because we only handle multiXacts + * here. This follows the general convention: only track XIDs that will still + * be in the table after the ongoing VACUUM finishes. Note that it's up to + * caller to maintain this when the Xid return value is itself an Xid. + * + * Note that we cannot depend on xmin to maintain NewRelfrozenxid. We need to + * push maintenance of NewRelfrozenxid down this far, since in general xmin + * might have been frozen by an earlier VACUUM operation, in which case our + * caller will not have factored-in xmin when maintaining NewRelfrozenxid. */ static TransactionId FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId cutoff_xid, MultiXactId cutoff_multi, - uint16 *flags) + uint16 *flags, TransactionId *NewRelfrozenxid) { TransactionId xid = InvalidTransactionId; int i; @@ -6104,6 +6116,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, bool has_lockers; TransactionId update_xid; bool update_committed; + TransactionId tempNewRelfrozenxid; *flags = 0; @@ -6198,13 +6211,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, /* is there anything older than the cutoff? */ need_replace = false; + tempNewRelfrozenxid = *NewRelfrozenxid; for (i = 0; i < nmembers; i++) { if (TransactionIdPrecedes(members[i].xid, cutoff_xid)) - { need_replace = true; - break; - } + if (TransactionIdPrecedes(members[i].xid, tempNewRelfrozenxid)) + tempNewRelfrozenxid = members[i].xid; } /* @@ -6213,6 +6226,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, */ if (!need_replace) { + *NewRelfrozenxid = tempNewRelfrozenxid; *flags |= FRM_NOOP; pfree(members); return InvalidTransactionId; @@ -6222,6 +6236,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * If the multi needs to be updated, figure out which members do we need * to keep. */ + tempNewRelfrozenxid = *NewRelfrozenxid; nnewmembers = 0; newmembers = palloc(sizeof(MultiXactMember) * nmembers); has_lockers = false; @@ -6303,7 +6318,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * list.) */ if (TransactionIdIsValid(update_xid)) + { newmembers[nnewmembers++] = members[i]; + if (TransactionIdPrecedes(members[i].xid, tempNewRelfrozenxid)) + tempNewRelfrozenxid = members[i].xid; + } } else { @@ -6313,6 +6332,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, { /* running locker cannot possibly be older than the cutoff */ Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid)); + Assert(!TransactionIdPrecedes(members[i].xid, *NewRelfrozenxid)); newmembers[nnewmembers++] = members[i]; has_lockers = true; } @@ -6341,6 +6361,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, if (update_committed) *flags |= FRM_MARK_COMMITTED; xid = update_xid; + /* Caller manages NewRelfrozenxid directly when we return an XID */ } else { @@ -6350,6 +6371,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, */ xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers); *flags |= FRM_RETURN_IS_MULTI; + *NewRelfrozenxid = tempNewRelfrozenxid; } pfree(newmembers); @@ -6368,6 +6390,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask, * will be totally frozen after these operations are performed and false if * more freezing will eventually be required. * + * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current + * target relfrozenxid and relminmxid for the relation. Assumption is that + * caller will actually go on to freeze as indicated by our *frz output, so + * any (xmin, xmax, xvac) XIDs that we indicate need to be frozen won't need + * to be counted here. Values are valid lower bounds at the point that the + * ongoing VACUUM finishes. + * * Caller is responsible for setting the offset field, if appropriate. * * It is assumed that the caller has checked the tuple with @@ -6392,7 +6421,9 @@ bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId relfrozenxid, TransactionId relminmxid, TransactionId cutoff_xid, TransactionId cutoff_multi, - xl_heap_freeze_tuple *frz, bool *totally_frozen_p) + xl_heap_freeze_tuple *frz, bool *totally_frozen_p, + TransactionId *NewRelfrozenxid, + MultiXactId *NewRelminmxid) { bool changed = false; bool xmax_already_frozen = false; @@ -6436,6 +6467,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, frz->t_infomask |= HEAP_XMIN_FROZEN; changed = true; } + else if (TransactionIdPrecedes(xid, *NewRelfrozenxid)) + { + /* won't be frozen, but older than current NewRelfrozenxid */ + *NewRelfrozenxid = xid; + } } /* @@ -6453,10 +6489,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, { TransactionId newxmax; uint16 flags; + TransactionId temp = *NewRelfrozenxid; newxmax = FreezeMultiXactId(xid, tuple->t_infomask, relfrozenxid, relminmxid, - cutoff_xid, cutoff_multi, &flags); + cutoff_xid, cutoff_multi, &flags, &temp); freeze_xmax = (flags & FRM_INVALIDATE_XMAX); @@ -6474,6 +6511,24 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, if (flags & FRM_MARK_COMMITTED) frz->t_infomask |= HEAP_XMAX_COMMITTED; changed = true; + + if (TransactionIdPrecedes(newxmax, *NewRelfrozenxid)) + { + /* New xmax is an XID older than new NewRelfrozenxid */ + *NewRelfrozenxid = newxmax; + } + } + else if (flags & FRM_NOOP) + { + /* + * Changing nothing, so might have to ratchet back NewRelminmxid, + * NewRelfrozenxid, or both together + */ + if (MultiXactIdIsValid(xid) && + MultiXactIdPrecedes(xid, *NewRelminmxid)) + *NewRelminmxid = xid; + if (TransactionIdPrecedes(temp, *NewRelfrozenxid)) + *NewRelfrozenxid = temp; } else if (flags & FRM_RETURN_IS_MULTI) { @@ -6495,6 +6550,13 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, frz->xmax = newxmax; changed = true; + + /* + * New multixact might have remaining XID older than + * NewRelfrozenxid + */ + if (TransactionIdPrecedes(temp, *NewRelfrozenxid)) + *NewRelfrozenxid = temp; } } else if (TransactionIdIsNormal(xid)) @@ -6522,7 +6584,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, freeze_xmax = true; } else + { freeze_xmax = false; + if (TransactionIdPrecedes(xid, *NewRelfrozenxid)) + { + /* won't be frozen, but older than current NewRelfrozenxid */ + *NewRelfrozenxid = xid; + } + } } else if ((tuple->t_infomask & HEAP_XMAX_INVALID) || !TransactionIdIsValid(HeapTupleHeaderGetRawXmax(tuple))) @@ -6569,6 +6638,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, * was removed in PostgreSQL 9.0. Note that if we were to respect * cutoff_xid here, we'd need to make surely to clear totally_frozen * when we skipped freezing on that basis. + * + * Since we always freeze here, NewRelfrozenxid doesn't need to be + * maintained. */ if (TransactionIdIsNormal(xid)) { @@ -6646,11 +6718,14 @@ heap_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple frz; bool do_freeze; bool tuple_totally_frozen; + TransactionId NewRelfrozenxid = FirstNormalTransactionId; + MultiXactId NewRelminmxid = FirstMultiXactId; do_freeze = heap_prepare_freeze_tuple(tuple, relfrozenxid, relminmxid, cutoff_xid, cutoff_multi, - &frz, &tuple_totally_frozen); + &frz, &tuple_totally_frozen, + &NewRelfrozenxid, &NewRelminmxid); /* * Note that because this is not a WAL-logged operation, we don't need to @@ -7080,6 +7155,15 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple) * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac) * are older than the specified cutoff XID or MultiXactId. If so, return true. * + * Also maintains *NewRelfrozenxid and *NewRelminmxid, which are the current + * target relfrozenxid and relminmxid for the relation. Assumption is that + * caller will never freeze any of the XIDs from the tuple, even when we say + * that they should. If caller opts to go with our recommendation to freeze, + * then it must account for the fact that it shouldn't trust how we've set + * NewRelfrozenxid/NewRelminmxid. (In practice aggressive VACUUMs always take + * our recommendation because they must, and non-aggressive VACUUMs always opt + * to not freeze, preferring to ratchet back NewRelfrozenxid instead). + * * It doesn't matter whether the tuple is alive or dead, we are checking * to see if a tuple needs to be removed or frozen to avoid wraparound. * @@ -7088,74 +7172,86 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple) */ bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid, - MultiXactId cutoff_multi, Buffer buf) + MultiXactId cutoff_multi, + TransactionId *NewRelfrozenxid, + MultiXactId *NewRelminmxid, Buffer buf) { TransactionId xid; + bool needs_freeze = false; xid = HeapTupleHeaderGetXmin(tuple); - if (TransactionIdIsNormal(xid) && - TransactionIdPrecedes(xid, cutoff_xid)) - return true; + if (TransactionIdIsNormal(xid)) + { + if (TransactionIdPrecedes(xid, *NewRelfrozenxid)) + *NewRelfrozenxid = xid; + if (TransactionIdPrecedes(xid, cutoff_xid)) + needs_freeze = true; + } /* * The considerations for multixacts are complicated; look at * heap_prepare_freeze_tuple for justifications. This routine had better * be in sync with that one! + * + * (Actually, we maintain NewRelminmxid differently here, because we + * assume that XIDs that should be frozen according to cutoff_xid won't + * be, whereas heap_prepare_freeze_tuple makes the opposite assumption.) */ if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) { MultiXactId multi; + MultiXactMember *members; + int nmembers; multi = HeapTupleHeaderGetRawXmax(tuple); - if (!MultiXactIdIsValid(multi)) - { - /* no xmax set, ignore */ - ; - } - else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask)) + if (MultiXactIdIsValid(multi) && + MultiXactIdPrecedes(multi, *NewRelminmxid)) + *NewRelminmxid = multi; + + if (HEAP_LOCKED_UPGRADED(tuple->t_infomask)) return true; else if (MultiXactIdPrecedes(multi, cutoff_multi)) - return true; - else + needs_freeze = true; + + /* need to check whether any member of the mxact is too old */ + nmembers = GetMultiXactIdMembers(multi, &members, false, + HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)); + + for (int i = 0; i < nmembers; i++) { - MultiXactMember *members; - int nmembers; - int i; - - /* need to check whether any member of the mxact is too old */ - - nmembers = GetMultiXactIdMembers(multi, &members, false, - HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)); - - for (i = 0; i < nmembers; i++) - { - if (TransactionIdPrecedes(members[i].xid, cutoff_xid)) - { - pfree(members); - return true; - } - } - if (nmembers > 0) - pfree(members); + if (TransactionIdPrecedes(members[i].xid, cutoff_xid)) + needs_freeze = true; + if (TransactionIdPrecedes(members[i].xid, *NewRelfrozenxid)) + *NewRelfrozenxid = xid; } + if (nmembers > 0) + pfree(members); } else { xid = HeapTupleHeaderGetRawXmax(tuple); - if (TransactionIdIsNormal(xid) && - TransactionIdPrecedes(xid, cutoff_xid)) - return true; + if (TransactionIdIsNormal(xid)) + { + if (TransactionIdPrecedes(xid, *NewRelfrozenxid)) + *NewRelfrozenxid = xid; + if (TransactionIdPrecedes(xid, cutoff_xid)) + needs_freeze = true; + } } if (tuple->t_infomask & HEAP_MOVED) { xid = HeapTupleHeaderGetXvac(tuple); - if (TransactionIdIsNormal(xid) && - TransactionIdPrecedes(xid, cutoff_xid)) - return true; + if (TransactionIdIsNormal(xid)) + { + if (TransactionIdPrecedes(xid, *NewRelfrozenxid)) + *NewRelfrozenxid = xid; + if (TransactionIdPrecedes(xid, cutoff_xid)) + needs_freeze = true; + } } - return false; + return needs_freeze; } /* diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index d57055674..d481a300b 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -172,8 +172,10 @@ typedef struct LVRelState /* VACUUM operation's cutoff for freezing XIDs and MultiXactIds */ TransactionId FreezeLimit; MultiXactId MultiXactCutoff; - /* Are FreezeLimit/MultiXactCutoff still valid? */ - bool freeze_cutoffs_valid; + + /* Track new pg_class.relfrozenxid/pg_class.relminmxid values */ + TransactionId NewRelfrozenxid; + MultiXactId NewRelminmxid; /* Error reporting state */ char *relnamespace; @@ -330,6 +332,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, PgStat_Counter startreadtime = 0; PgStat_Counter startwritetime = 0; TransactionId OldestXmin; + MultiXactId OldestMxact; TransactionId FreezeLimit; MultiXactId MultiXactCutoff; @@ -365,8 +368,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, params->freeze_table_age, params->multixact_freeze_min_age, params->multixact_freeze_table_age, - &OldestXmin, &FreezeLimit, - &MultiXactCutoff); + &OldestXmin, &OldestMxact, + &FreezeLimit, &MultiXactCutoff); skipwithvm = true; if (params->options & VACOPT_DISABLE_PAGE_SKIPPING) @@ -473,8 +476,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, vacrel->OldestXmin = OldestXmin; vacrel->FreezeLimit = FreezeLimit; vacrel->MultiXactCutoff = MultiXactCutoff; - /* Track if cutoffs became invalid (possible in !aggressive case only) */ - vacrel->freeze_cutoffs_valid = true; + + /* Initialize values used to advance relfrozenxid/relminmxid at the end */ + vacrel->NewRelfrozenxid = OldestXmin; + vacrel->NewRelminmxid = OldestMxact; /* * Call lazy_scan_heap to perform all required heap pruning, index @@ -527,16 +532,18 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * Aggressive VACUUM must reliably advance relfrozenxid (and relminmxid). * We are able to advance relfrozenxid in a non-aggressive VACUUM too, * provided we didn't skip any all-visible (not all-frozen) pages using - * the visibility map, and assuming that we didn't fail to get a cleanup - * lock that made it unsafe with respect to FreezeLimit (or perhaps our - * MultiXactCutoff) established for VACUUM operation. + * the visibility map. A non-aggressive VACUUM might only be able to + * advance relfrozenxid to an XID from before FreezeLimit (or a relminmxid + * from before MultiXactCutoff) when it wasn't possible to freeze some + * tuples due to our inability to acquire a cleanup lock, but the effect + * is usually insignificant -- NewRelfrozenxid value still has a decent + * chance of being much more recent that the existing relfrozenxid. * * NB: We must use orig_rel_pages, not vacrel->rel_pages, since we want * the rel_pages used by lazy_scan_heap, which won't match when we * happened to truncate the relation afterwards. */ - if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages || - !vacrel->freeze_cutoffs_valid) + if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages) { /* Cannot advance relfrozenxid/relminmxid */ Assert(!aggressive); @@ -548,11 +555,23 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, } else { + /* + * Aggressive case is strictly required to advance relfrozenxid, at + * least up to FreezeLimit (same applies with relminmxid and its + * cutoff, MultiXactCutoff). Assert that we got this right now. + */ Assert(vacrel->scanned_pages + vacrel->frozenskipped_pages == orig_rel_pages); + Assert(!aggressive || + TransactionIdPrecedesOrEquals(FreezeLimit, + vacrel->NewRelfrozenxid)); + Assert(!aggressive || + MultiXactIdPrecedesOrEquals(MultiXactCutoff, + vacrel->NewRelminmxid)); + vac_update_relstats(rel, new_rel_pages, new_live_tuples, new_rel_allvisible, vacrel->nindexes > 0, - FreezeLimit, MultiXactCutoff, + vacrel->NewRelfrozenxid, vacrel->NewRelminmxid, &frozenxid_updated, &minmulti_updated, false); } @@ -657,17 +676,17 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, OldestXmin, diff); if (frozenxid_updated) { - diff = (int32) (FreezeLimit - vacrel->relfrozenxid); + diff = (int32) (vacrel->NewRelfrozenxid - vacrel->relfrozenxid); appendStringInfo(&buf, _("new relfrozenxid: %u, which is %d xids ahead of previous value\n"), - FreezeLimit, diff); + vacrel->NewRelfrozenxid, diff); } if (minmulti_updated) { - diff = (int32) (MultiXactCutoff - vacrel->relminmxid); + diff = (int32) (vacrel->NewRelminmxid - vacrel->relminmxid); appendStringInfo(&buf, _("new relminmxid: %u, which is %d mxids ahead of previous value\n"), - MultiXactCutoff, diff); + vacrel->NewRelminmxid, diff); } if (orig_rel_pages > 0) { @@ -1579,6 +1598,8 @@ lazy_scan_prune(LVRelState *vacrel, int nfrozen; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage]; + TransactionId NewRelfrozenxid; + MultiXactId NewRelminmxid; Assert(BufferGetBlockNumber(buf) == blkno); @@ -1587,6 +1608,8 @@ lazy_scan_prune(LVRelState *vacrel, retry: /* Initialize (or reset) page-level counters */ + NewRelfrozenxid = vacrel->NewRelfrozenxid; + NewRelminmxid = vacrel->NewRelminmxid; tuples_deleted = 0; lpdead_items = 0; recently_dead_tuples = 0; @@ -1796,7 +1819,9 @@ retry: vacrel->FreezeLimit, vacrel->MultiXactCutoff, &frozen[nfrozen], - &tuple_totally_frozen)) + &tuple_totally_frozen, + &NewRelfrozenxid, + &NewRelminmxid)) { /* Will execute freeze below */ frozen[nfrozen++].offset = offnum; @@ -1810,13 +1835,16 @@ retry: prunestate->all_frozen = false; } + vacrel->offnum = InvalidOffsetNumber; + /* * We have now divided every item on the page into either an LP_DEAD item * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple * that remains and needs to be considered for freezing now (LP_UNUSED and * LP_REDIRECT items also remain, but are of no further interest to us). */ - vacrel->offnum = InvalidOffsetNumber; + vacrel->NewRelfrozenxid = NewRelfrozenxid; + vacrel->NewRelminmxid = NewRelminmxid; /* * Consider the need to freeze any items with tuple storage from the page @@ -1969,6 +1997,8 @@ lazy_scan_noprune(LVRelState *vacrel, missed_dead_tuples; HeapTupleHeader tupleheader; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; + TransactionId NewRelfrozenxid = vacrel->NewRelfrozenxid; + MultiXactId NewRelminmxid = vacrel->NewRelminmxid; Assert(BufferGetBlockNumber(buf) == blkno); @@ -2015,7 +2045,8 @@ lazy_scan_noprune(LVRelState *vacrel, tupleheader = (HeapTupleHeader) PageGetItem(page, itemid); if (heap_tuple_needs_freeze(tupleheader, vacrel->FreezeLimit, - vacrel->MultiXactCutoff, buf)) + vacrel->MultiXactCutoff, + &NewRelfrozenxid, &NewRelminmxid, buf)) { if (vacrel->aggressive) { @@ -2025,10 +2056,12 @@ lazy_scan_noprune(LVRelState *vacrel, } /* - * Current non-aggressive VACUUM operation definitely won't be - * able to advance relfrozenxid or relminmxid + * A non-aggressive VACUUM doesn't have to wait on a cleanup lock + * to ensure that it advances relfrozenxid to a sufficiently + * recent XID that happens to be present on this page. It can + * just accept an older New/final relfrozenxid instead. There is + * a decent chance that the problem will go away naturally. */ - vacrel->freeze_cutoffs_valid = false; } num_tuples++; @@ -2078,6 +2111,14 @@ lazy_scan_noprune(LVRelState *vacrel, vacrel->offnum = InvalidOffsetNumber; + /* + * We have committed to not freezing the tuples on this page (always + * happens with a non-aggressive VACUUM), so make sure that the target + * relfrozenxid/relminmxid values reflect the XIDs/MXIDs we encountered + */ + vacrel->NewRelfrozenxid = NewRelfrozenxid; + vacrel->NewRelminmxid = NewRelminmxid; + /* * Now save details of the LP_DEAD items from the page in vacrel (though * only when VACUUM uses two-pass strategy) diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 02a7e94bf..a7e988298 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -767,6 +767,7 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, TupleDesc oldTupDesc PG_USED_FOR_ASSERTS_ONLY; TupleDesc newTupDesc PG_USED_FOR_ASSERTS_ONLY; TransactionId OldestXmin; + MultiXactId oldestMxact; TransactionId FreezeXid; MultiXactId MultiXactCutoff; bool use_sort; @@ -856,8 +857,8 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, * Since we're going to rewrite the whole table anyway, there's no reason * not to be aggressive about this. */ - vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0, - &OldestXmin, &FreezeXid, &MultiXactCutoff); + vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0, &OldestXmin, &oldestMxact, + &FreezeXid, &MultiXactCutoff); /* * FreezeXid will become the table's new relfrozenxid, and that mustn't go diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index b6767a5ff..d71ff21b1 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -945,14 +945,26 @@ get_all_vacuum_rels(int options) * The output parameters are: * - oldestXmin is the Xid below which tuples deleted by any xact (that * committed) should be considered DEAD, not just RECENTLY_DEAD. - * - freezeLimit is the Xid below which all Xids are replaced by - * FrozenTransactionId during vacuum. - * - multiXactCutoff is the value below which all MultiXactIds are removed - * from Xmax. + * - oldestMxact is the Mxid below which MultiXacts are definitely not + * seen as visible by any running transaction. + * - freezeLimit is the Xid below which all Xids are definitely replaced by + * FrozenTransactionId during aggressive vacuums. + * - multiXactCutoff is the value below which all MultiXactIds are definitely + * removed from Xmax during aggressive vacuums. * * Return value indicates if vacuumlazy.c caller should make its VACUUM * operation aggressive. An aggressive VACUUM must advance relfrozenxid up to - * FreezeLimit, and relminmxid up to multiXactCutoff. + * FreezeLimit (at a minimum), and relminmxid up to multiXactCutoff (at a + * minimum). + * + * oldestXmin and oldestMxact can be thought of as the most recent values that + * can ever be passed to vac_update_relstats() as frozenxid and minmulti + * arguments. These exact values can be used when no newer XIDs or MultiXacts + * remain in the heap relation (e.g., with an empty table). It's typical for + * vacuumlazy.c caller to notice that older XIDs/Multixacts remain in the + * table, which will force it to use the oldest extant values when it calls + * vac_update_relstats(). Ideally these values won't be very far behind the + * "optimal" oldestXmin and oldestMxact values we provide. */ bool vacuum_set_xid_limits(Relation rel, @@ -961,6 +973,7 @@ vacuum_set_xid_limits(Relation rel, int multixact_freeze_min_age, int multixact_freeze_table_age, TransactionId *oldestXmin, + MultiXactId *oldestMxact, TransactionId *freezeLimit, MultiXactId *multiXactCutoff) { @@ -969,7 +982,6 @@ vacuum_set_xid_limits(Relation rel, int effective_multixact_freeze_max_age; TransactionId limit; TransactionId safeLimit; - MultiXactId oldestMxact; MultiXactId mxactLimit; MultiXactId safeMxactLimit; int freezetable; @@ -1065,9 +1077,11 @@ vacuum_set_xid_limits(Relation rel, effective_multixact_freeze_max_age / 2); Assert(mxid_freezemin >= 0); + /* Remember for caller */ + *oldestMxact = GetOldestMultiXactId(); + /* compute the cutoff multi, being careful to generate a valid value */ - oldestMxact = GetOldestMultiXactId(); - mxactLimit = oldestMxact - mxid_freezemin; + mxactLimit = *oldestMxact - mxid_freezemin; if (mxactLimit < FirstMultiXactId) mxactLimit = FirstMultiXactId; @@ -1082,8 +1096,8 @@ vacuum_set_xid_limits(Relation rel, (errmsg("oldest multixact is far in the past"), errhint("Close open transactions with multixacts soon to avoid wraparound problems."))); /* Use the safe limit, unless an older mxact is still running */ - if (MultiXactIdPrecedes(oldestMxact, safeMxactLimit)) - mxactLimit = oldestMxact; + if (MultiXactIdPrecedes(*oldestMxact, safeMxactLimit)) + mxactLimit = *oldestMxact; else mxactLimit = safeMxactLimit; } -- 2.30.2