On Fri, Feb 25, 2022 at 5:52 PM Peter Geoghegan <p...@bowt.ie> wrote:
> There is an important practical way in which it makes sense to treat
> 0001 as separate to 0002. It is true that 0001 is independently quite
> useful. In practical terms, I'd be quite happy to just get 0001 into
> Postgres 15, without 0002. I think that that's what you meant here, in
> concrete terms, and we can agree on that now.

Attached is v10. While this does still include the freezing patch,
it's not in scope for Postgres 15. As I've said, I still think that it
makes sense to maintain the patch series with the freezing stuff,
since it's structurally related. So, to be clear, the first two
patches from the patch series are in scope for Postgres 15. But not
the third.

Highlights:

* Changes to terminology and commit messages along the lines suggested
by Andres.

* Bug fixes to heap_tuple_needs_freeze()'s MultiXact handling. My
testing strategy here still needs work.

* Expanded refactoring by v10-0002 patch.

The v10-0002 patch (which appeared for the first time in v9) was
originally all about fixing a case where non-aggressive VACUUMs were
at a gratuitous disadvantage (relative to aggressive VACUUMs) around
advancing relfrozenxid -- very much like the lazy_scan_noprune work
from commit 44fa8488. And that is still its main purpose. But the
refactoring now seems related to Andres' idea of making non-aggressive
VACUUMs decides to scan a few extra all-visible pages in order to be
able to advance relfrozenxid.

The code that sets up skipping the visibility map is made a lot
clearer by v10-0002. That patch moves a significant amount of code
from lazy_scan_heap() into a new helper routine (so it continues the
trend started by the Postgres 14 work that added lazy_scan_prune()).
Now skipping a range of visibility map pages is fundamentally based on
setting up the range up front, and then using the same saved details
about the range thereafter -- we don't have anymore ad-hoc
VM_ALL_VISIBLE()/VM_ALL_FROZEN() calls for pages from a range that we
already decided to skip (so no calls to those routines from
lazy_scan_heap(), at least not until after we finish processing in
lazy_scan_prune()).

This is more or less what we were doing all along for one special
case: aggressive VACUUMs. We had to make sure to either increment
frozenskipped_pages or increment scanned_pages for every page from
rel_pages -- this issue is described by lazy_scan_heap() comments on
HEAD that begin with "Tricky, tricky." (these date back to the freeze
map work from 2016). Anyway, there is no reason to not go further with
that: we should make whole ranges the basic unit that we deal with
when skipping. It's a lot simpler to think in terms of entire ranges
(not individual pages) that are determined to be all-visible or
all-frozen up-front, without needing to recheck anything (regardless
of whether it's an aggressive VACUUM).

We don't need to track frozenskipped_pages this way. And it's much
more obvious that it's safe for more complicated cases, in particular
for aggressive VACUUMs.

This kind of approach seems necessary to make non-aggressive VACUUMs
do a little more work opportunistically, when they realize that they
can advance relfrozenxid relatively easily that way (which I believe
Andres favors as part of overhauling freezing). That becomes a lot
more natural when you have a clear and unambiguous separation between
deciding what range of blocks to skip, and then actually skipping. I
can imagine the new helper function added by v10-0002 (which I've
called lazy_scan_skip_range()) eventually being taught to do these
kinds of tricks.

In general I think that all of the details of what to skip need to be
decided up front. The loop in lazy_scan_heap() should execute skipping
based on the instructions it receives from the new helper function, in
the simplest way possible. The helper function can become more
intelligent about the costs and benefits of skipping in the future,
without that impacting lazy_scan_heap().

--
Peter Geoghegan
From 43ab00609392ed7ad31be491834bdac348e13653 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 11 Mar 2022 19:16:02 -0800
Subject: [PATCH v10 3/3] Make page-level characteristics drive freezing.

Teach VACUUM to freeze all of the tuples on a page whenever it notices
that it would otherwise mark the page all-visible, without also marking
it all-frozen.  VACUUM typically won't freeze _any_ tuples on the page
unless _all_ tuples (that remain after pruning) are all-visible.  This
makes the overhead of vacuuming much more predictable over time.  We
avoid the need for large balloon payments during aggressive VACUUMs
(typically anti-wraparound autovacuums).  Freezing is proactive, so
we're much less likely to get into "freezing debt".

The new approach to freezing also enables relfrozenxid advancement in
non-aggressive VACUUMs, which might be enough to avoid aggressive
VACUUMs altogether (with many individual tables/workloads).  While the
non-aggressive case continues to skip all-visible (but not all-frozen)
pages, that will no longer hinder relfrozenxid advancement (outside of
pg_upgrade scenarios).  We now try to avoid leaving behind all-visible
(not all-frozen) pages.  This (as well as work from commit 44fa84881f)
makes relfrozenxid advancement in non-aggressive VACUUMs commonplace.

There is also a clear disadvantage to the new approach to freezing: more
eager freezing will impose overhead on cases that don't receive any
benefit.  This is considered an acceptable trade-off.  The new algorithm
tends to avoid freezing early on pages where it makes the least sense,
since frequently modified pages are unlikely to be all-visible.

The system accumulates freezing debt in proportion to the number of
physical heap pages with unfrozen tuples, more or less.  Anything based
on XID age is likely to be a poor proxy for the eventual cost of
freezing (during the inevitable anti-wraparound autovacuum).  At a high
level, freezing is now treated as one of the costs of storing tuples in
physical heap pages -- not a cost of transactions that allocate XIDs.
Although vacuum_freeze_min_age and vacuum_multixact_freeze_min_age still
influence what we freeze, and when, they seldom have much influence in
many important cases.

It may still be necessary to "freeze a page" due to the presence of a
particularly old XID, from before VACUUM's FreezeLimit cutoff.
FreezeLimit can only trigger page-level freezing, though -- it cannot
change how freezing is actually executed.  All XIDs < OldestXmin and all
MXIDs < OldestMxact will now be frozen on any page that VACUUM decides
to freeze, regardless of the details behind its decision.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
---
 src/include/access/heapam_xlog.h     |   7 +-
 src/backend/access/heap/heapam.c     |  92 +++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 116 ++++++++++++++++++---------
 src/backend/commands/vacuum.c        |   8 ++
 doc/src/sgml/maintenance.sgml        |   9 +--
 5 files changed, 172 insertions(+), 60 deletions(-)

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 2d8a7f627..2c25e72b2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -409,10 +409,15 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  TransactionId relminmxid,
 									  TransactionId cutoff_xid,
 									  TransactionId cutoff_multi,
+									  TransactionId limit_xid,
+									  MultiXactId limit_multi,
 									  xl_heap_freeze_tuple *frz,
 									  bool *totally_frozen,
+									  bool *force_freeze,
 									  TransactionId *relfrozenxid_out,
-									  MultiXactId *relminmxid_out);
+									  MultiXactId *relminmxid_out,
+									  TransactionId *relfrozenxid_nofreeze_out,
+									  MultiXactId *relminmxid_nofreeze_out);
 extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
 									  xl_heap_freeze_tuple *xlrec_tp);
 extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2e859e427..3454201f3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6446,14 +6446,38 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
  * are older than the specified cutoff XID and cutoff MultiXactId.  If so,
  * setup enough state (in the *frz output argument) to later execute and
  * WAL-log what we would need to do, and return true.  Return false if nothing
- * is to be changed.  In addition, set *totally_frozen to true if the tuple
+ * can be changed.  In addition, set *totally_frozen to true if the tuple
  * will be totally frozen after these operations are performed and false if
  * more freezing will eventually be required.
  *
+ * Although this interface is primarily tuple-based, vacuumlazy.c caller
+ * cooperates with us to decide on whether or not to freeze whole pages,
+ * together as a single group.  We prepare for freezing at the level of each
+ * tuple, but the final decision is made for the page as a whole.  All pages
+ * that are frozen within a given VACUUM operation are frozen according to
+ * cutoff_xid and cutoff_multi.  Caller _must_ freeze the whole page when
+ * we've set *force_freeze to true!
+ *
+ * cutoff_xid must be caller's oldest xmin to ensure that any XID older than
+ * it could neither be running nor seen as running by any open transaction.
+ * This ensures that the replacement will not change anyone's idea of the
+ * tuple state.  Similarly, cutoff_multi must be the smallest MultiXactId used
+ * by any open transaction (at the time that the oldest xmin was acquired).
+ *
+ * limit_xid must be <= cutoff_xid, and limit_multi must be <= cutoff_multi.
+ * When any XID/XMID from before these secondary cutoffs are encountered, we
+ * set *force_freeze to true, making caller freeze the page (freezing-eligible
+ * XIDs/XMIDs will be frozen, at least).  Forcing freezing like this ensures
+ * that VACUUM won't allow XIDs/XMIDs to ever get too old.  This shouldn't be
+ * necessary very often.  VACUUM should prefer to freeze when it's cheap (not
+ * when it's urgent).
+ *
  * Maintains *relfrozenxid_out and *relminmxid_out, which are the current
- * target relfrozenxid and relminmxid for the relation.  Caller should make
- * temp copies of global tracking variables before starting to process a page,
- * so that we can only scribble on copies.
+ * target relfrozenxid and relminmxid for the relation.  There are also "no
+ * freeze" variants (*relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out)
+ * that are used by caller when it decides to not freeze the page.  Caller
+ * should make temp copies of global tracking variables before starting to
+ * process a page, so that we can only scribble on copies.
  *
  * Caller is responsible for setting the offset field, if appropriate.
  *
@@ -6461,13 +6485,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
  * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
  * (else we should be removing the tuple, not freezing it).
  *
- * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any
- * XID older than it could neither be running nor seen as running by any
- * open transaction.  This ensures that the replacement will not change
- * anyone's idea of the tuple state.
- * Similarly, cutoff_multi must be less than or equal to the smallest
- * MultiXactId used by any transaction currently open.
- *
  * If the tuple is in a shared buffer, caller must hold an exclusive lock on
  * that buffer.
  *
@@ -6479,11 +6496,16 @@ bool
 heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 						  TransactionId relfrozenxid, TransactionId relminmxid,
 						  TransactionId cutoff_xid, TransactionId cutoff_multi,
-						  xl_heap_freeze_tuple *frz, bool *totally_frozen,
+						  TransactionId limit_xid, MultiXactId limit_multi,
+						  xl_heap_freeze_tuple *frz,
+						  bool *totally_frozen, bool *force_freeze,
 						  TransactionId *relfrozenxid_out,
-						  MultiXactId *relminmxid_out)
+						  MultiXactId *relminmxid_out,
+						  TransactionId *relfrozenxid_nofreeze_out,
+						  MultiXactId *relminmxid_nofreeze_out)
 {
 	bool		changed = false;
+	bool		xmin_already_frozen = false;
 	bool		xmax_already_frozen = false;
 	bool		xmin_frozen;
 	bool		freeze_xmax;
@@ -6504,7 +6526,10 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 	 */
 	xid = HeapTupleHeaderGetXmin(tuple);
 	if (!TransactionIdIsNormal(xid))
+	{
+		xmin_already_frozen = true;
 		xmin_frozen = true;
+	}
 	else
 	{
 		if (TransactionIdPrecedes(xid, relfrozenxid))
@@ -6534,7 +6559,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 	 * resolve a MultiXactId to its member Xids, in case some of them are
 	 * below the given cutoff for Xids.  In that case, those values might need
 	 * freezing, too.  Also, if a multi needs freezing, we cannot simply take
-	 * it out --- if there's a live updater Xid, it needs to be kept.
+	 * it out --- if there's a live updater Xid, it needs to be kept.  If we
+	 * need to allocate a new MultiXact for that purposes, we will force
+	 * caller to freeze the page.
 	 *
 	 * Make sure to keep heap_tuple_needs_freeze in sync with this.
 	 */
@@ -6580,6 +6607,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			Assert(TransactionIdIsValid(newxmax));
 			if (TransactionIdPrecedes(newxmax, *relfrozenxid_out))
 				*relfrozenxid_out = newxmax;
+
+			/*
+			 * We have an opportunity to get rid of this MultiXact now, so
+			 * force freezing to avoid wasting it
+			 */
+			*force_freeze = true;
 		}
 		else if (flags & FRM_RETURN_IS_MULTI)
 		{
@@ -6616,6 +6649,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			Assert(TransactionIdPrecedesOrEquals(xmax_oldest_xid_out,
 												 *relfrozenxid_out));
 			*relfrozenxid_out = xmax_oldest_xid_out;
+
+			/*
+			 * We allocated a MultiXact for this, so force freezing to avoid
+			 * wasting it
+			 */
+			*force_freeze = true;
 		}
 		else if (flags & FRM_NOOP)
 		{
@@ -6734,11 +6773,27 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID));
 			frz->t_infomask |= HEAP_XMIN_COMMITTED;
 			changed = true;
+
+			/* Seems like a good idea to freeze early when this case is hit */
+			*force_freeze = true;
 		}
 	}
 
 	*totally_frozen = (xmin_frozen &&
 					   (freeze_xmax || xmax_already_frozen));
+
+	/*
+	 * Maintain alternative versions of relfrozenxid_out/relminmxid_out that
+	 * leave caller with the option of *not* freezing the page.  If caller has
+	 * already lost that option (e.g. when the page has an old XID that we
+	 * must force caller to freeze), then we don't waste time on this.
+	 */
+	if (!*force_freeze && (!xmin_already_frozen || !xmax_already_frozen))
+		*force_freeze = heap_tuple_needs_freeze(tuple,
+												limit_xid, limit_multi,
+												relfrozenxid_nofreeze_out,
+												relminmxid_nofreeze_out);
+
 	return changed;
 }
 
@@ -6790,15 +6845,22 @@ heap_freeze_tuple(HeapTupleHeader tuple,
 {
 	xl_heap_freeze_tuple frz;
 	bool		do_freeze;
+	bool		force_freeze = true;
 	bool		tuple_totally_frozen;
 	TransactionId relfrozenxid_out = cutoff_xid;
 	MultiXactId relminmxid_out = cutoff_multi;
+	TransactionId relfrozenxid_nofreeze_out = cutoff_xid;
+	MultiXactId relminmxid_nofreeze_out = cutoff_multi;
 
 	do_freeze = heap_prepare_freeze_tuple(tuple,
 										  relfrozenxid, relminmxid,
 										  cutoff_xid, cutoff_multi,
+										  cutoff_xid, cutoff_multi,
 										  &frz, &tuple_totally_frozen,
-										  &relfrozenxid_out, &relminmxid_out);
+										  &force_freeze,
+										  &relfrozenxid_out, &relminmxid_out,
+										  &relfrozenxid_nofreeze_out,
+										  &relminmxid_nofreeze_out);
 
 	/*
 	 * Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3bc75d401..7e2d03ba6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -169,8 +169,9 @@ typedef struct LVRelState
 
 	/* VACUUM operation's cutoffs for freezing and pruning */
 	TransactionId OldestXmin;
+	MultiXactId OldestMxact;
 	GlobalVisState *vistest;
-	/* VACUUM operation's target cutoffs for freezing XIDs and MultiXactIds */
+	/* Limits on the age of the oldest unfrozen XID and MXID */
 	TransactionId FreezeLimit;
 	MultiXactId MultiXactCutoff;
 	/* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
@@ -199,6 +200,7 @@ typedef struct LVRelState
 	BlockNumber rel_pages;		/* total number of pages */
 	BlockNumber scanned_pages;	/* # pages examined (not skipped via VM) */
 	BlockNumber removed_pages;	/* # pages removed by relation truncation */
+	BlockNumber newly_frozen_pages; /* # pages frozen by lazy_scan_prune */
 	BlockNumber lpdead_item_pages;	/* # pages with LP_DEAD items */
 	BlockNumber missed_dead_pages;	/* # pages with missed dead tuples */
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
@@ -477,6 +479,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	/* Initialize page counters explicitly (be tidy) */
 	vacrel->scanned_pages = 0;
 	vacrel->removed_pages = 0;
+	vacrel->newly_frozen_pages = 0;
 	vacrel->lpdead_item_pages = 0;
 	vacrel->missed_dead_pages = 0;
 	vacrel->nonempty_pages = 0;
@@ -514,10 +517,11 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 */
 	vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
 	vacrel->OldestXmin = OldestXmin;
+	vacrel->OldestMxact = OldestMxact;
 	vacrel->vistest = GlobalVisTestFor(rel);
-	/* FreezeLimit controls XID freezing (always <= OldestXmin) */
+	/* FreezeLimit limits unfrozen XID age (always <= OldestXmin) */
 	vacrel->FreezeLimit = FreezeLimit;
-	/* MultiXactCutoff controls MXID freezing (always <= OldestMxact) */
+	/* MultiXactCutoff limits unfrozen MXID age (always <= OldestMxact) */
 	vacrel->MultiXactCutoff = MultiXactCutoff;
 	/* Initialize state used to track oldest extant XID/XMID */
 	vacrel->NewRelfrozenXid = OldestXmin;
@@ -583,7 +587,14 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 */
 	if (vacrel->skippedallvis)
 	{
-		/* Cannot advance relfrozenxid/relminmxid */
+		/*
+		 * Skipped some all-visible pages, so definitely cannot advance
+		 * relfrozenxid.  This is generally only expected in pg_upgrade
+		 * scenarios, since VACUUM now avoids setting a page to all-visible
+		 * but not all-frozen.  However, it's also possible (though quite
+		 * unlikely) that we ended up here because somebody else cleared some
+		 * page's all-frozen flag (without clearing its all-visible flag).
+		 */
 		Assert(!aggressive);
 		frozenxid_updated = minmulti_updated = false;
 		vac_update_relstats(rel, new_rel_pages, new_live_tuples,
@@ -685,9 +696,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 							 vacrel->relnamespace,
 							 vacrel->relname,
 							 vacrel->num_index_scans);
-			appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
+			appendStringInfo(&buf, _("pages: %u removed, %u remain, %u frozen, %u scanned (%.2f%% of total)\n"),
 							 vacrel->removed_pages,
 							 vacrel->rel_pages,
+							 vacrel->newly_frozen_pages,
 							 vacrel->scanned_pages,
 							 orig_rel_pages == 0 ? 100.0 :
 							 100.0 * vacrel->scanned_pages / orig_rel_pages);
@@ -1613,8 +1625,11 @@ lazy_scan_prune(LVRelState *vacrel,
 				recently_dead_tuples;
 	int			nnewlpdead;
 	int			nfrozen;
-	TransactionId NewRelfrozenXid;
-	MultiXactId NewRelminMxid;
+	bool		force_freeze = false;
+	TransactionId NewRelfrozenXid,
+				NoFreezeNewRelfrozenXid;
+	MultiXactId NewRelminMxid,
+				NoFreezeNewRelminMxid;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage];
 
@@ -1625,8 +1640,8 @@ lazy_scan_prune(LVRelState *vacrel,
 retry:
 
 	/* Initialize (or reset) page-level state */
-	NewRelfrozenXid = vacrel->NewRelfrozenXid;
-	NewRelminMxid = vacrel->NewRelminMxid;
+	NewRelfrozenXid = NoFreezeNewRelfrozenXid = vacrel->NewRelfrozenXid;
+	NewRelminMxid = NoFreezeNewRelminMxid = vacrel->NewRelminMxid;
 	tuples_deleted = 0;
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -1679,27 +1694,23 @@ retry:
 			continue;
 		}
 
-		/*
-		 * LP_DEAD items are processed outside of the loop.
-		 *
-		 * Note that we deliberately don't set hastup=true in the case of an
-		 * LP_DEAD item here, which is not how count_nondeletable_pages() does
-		 * it -- it only considers pages empty/truncatable when they have no
-		 * items at all (except LP_UNUSED items).
-		 *
-		 * Our assumption is that any LP_DEAD items we encounter here will
-		 * become LP_UNUSED inside lazy_vacuum_heap_page() before we actually
-		 * call count_nondeletable_pages().  In any case our opinion of
-		 * whether or not a page 'hastup' (which is how our caller sets its
-		 * vacrel->nonempty_pages value) is inherently race-prone.  It must be
-		 * treated as advisory/unreliable, so we might as well be slightly
-		 * optimistic.
-		 */
 		if (ItemIdIsDead(itemid))
 		{
+			/*
+			 * Delay unsetting all_visible until after we have decided on
+			 * whether this page should be frozen.  We need to test "is this
+			 * page all_visible, assuming any LP_DEAD items are set LP_UNUSED
+			 * in final heap pass?" to reach a decision.  all_visible will be
+			 * unset before we return, as required by lazy_scan_heap caller.
+			 *
+			 * Deliberately don't set hastup for LP_DEAD items.  We make the
+			 * soft assumption that any LP_DEAD items encountered here will
+			 * become LP_UNUSED later on, before count_nondeletable_pages is
+			 * reached.  Whether the page 'hastup' is inherently race-prone.
+			 * It must be treated as unreliable by caller anyway, so we might
+			 * as well be slightly optimistic about it.
+			 */
 			deadoffsets[lpdead_items++] = offnum;
-			prunestate->all_visible = false;
-			prunestate->has_lpdead_items = true;
 			continue;
 		}
 
@@ -1831,11 +1842,15 @@ retry:
 		if (heap_prepare_freeze_tuple(tuple.t_data,
 									  vacrel->relfrozenxid,
 									  vacrel->relminmxid,
+									  vacrel->OldestXmin,
+									  vacrel->OldestMxact,
 									  vacrel->FreezeLimit,
 									  vacrel->MultiXactCutoff,
 									  &frozen[nfrozen],
-									  &tuple_totally_frozen,
-									  &NewRelfrozenXid, &NewRelminMxid))
+									  &tuple_totally_frozen, &force_freeze,
+									  &NewRelfrozenXid, &NewRelminMxid,
+									  &NoFreezeNewRelfrozenXid,
+									  &NoFreezeNewRelminMxid))
 		{
 			/* Will execute freeze below */
 			frozen[nfrozen++].offset = offnum;
@@ -1856,9 +1871,32 @@ retry:
 	 * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
 	 * that remains and needs to be considered for freezing now (LP_UNUSED and
 	 * LP_REDIRECT items also remain, but are of no further interest to us).
+	 *
+	 * Freeze the page when it is about to become all-visible (either just
+	 * after we return control to lazy_scan_heap, or later on, during the
+	 * final heap pass).  Also freeze when heap_prepare_freeze_tuple forces us
+	 * to freeze (this is mandatory).  Freezing is typically forced because
+	 * there is at least one XID/XMID from before FreezeLimit/MultiXactCutoff.
 	 */
-	vacrel->NewRelfrozenXid = NewRelfrozenXid;
-	vacrel->NewRelminMxid = NewRelminMxid;
+	if (prunestate->all_visible || force_freeze)
+	{
+		/*
+		 * We're freezing the page.  Our final NewRelfrozenXid doesn't need to
+		 * be affected by the XIDs/XMIDs that are just about to be frozen
+		 * anyway.
+		 */
+		vacrel->NewRelfrozenXid = NewRelfrozenXid;
+		vacrel->NewRelminMxid = NewRelminMxid;
+	}
+	else
+	{
+		/* This is comparable to lazy_scan_noprune's handling */
+		vacrel->NewRelfrozenXid = NoFreezeNewRelfrozenXid;
+		vacrel->NewRelminMxid = NoFreezeNewRelminMxid;
+
+		/* Forget heap_prepare_freeze_tuple's guidance on freezing */
+		nfrozen = 0;
+	}
 
 	/*
 	 * Consider the need to freeze any items with tuple storage from the page
@@ -1866,7 +1904,7 @@ retry:
 	 */
 	if (nfrozen > 0)
 	{
-		Assert(prunestate->hastup);
+		vacrel->newly_frozen_pages++;
 
 		/*
 		 * At least one tuple with storage needs to be frozen -- execute that
@@ -1892,11 +1930,11 @@ retry:
 		}
 
 		/* Now WAL-log freezing if necessary */
-		if (RelationNeedsWAL(vacrel->rel))
+		if (RelationNeedsWAL(rel))
 		{
 			XLogRecPtr	recptr;
 
-			recptr = log_heap_freeze(vacrel->rel, buf, vacrel->FreezeLimit,
+			recptr = log_heap_freeze(rel, buf, NewRelfrozenXid,
 									 frozen, nfrozen);
 			PageSetLSN(page, recptr);
 		}
@@ -1919,7 +1957,7 @@ retry:
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible)
+	if (prunestate->all_visible && lpdead_items == 0)
 	{
 		TransactionId cutoff;
 		bool		all_frozen;
@@ -1927,7 +1965,6 @@ retry:
 		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
 			Assert(false);
 
-		Assert(lpdead_items == 0);
 		Assert(prunestate->all_frozen == all_frozen);
 
 		/*
@@ -1949,9 +1986,6 @@ retry:
 		VacDeadItems *dead_items = vacrel->dead_items;
 		ItemPointerData tmp;
 
-		Assert(!prunestate->all_visible);
-		Assert(prunestate->has_lpdead_items);
-
 		vacrel->lpdead_item_pages++;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
@@ -1965,6 +1999,10 @@ retry:
 		Assert(dead_items->num_items <= dead_items->max_items);
 		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
 									 dead_items->num_items);
+
+		/* lazy_scan_heap caller expects LP_DEAD item to unset all_visible */
+		prunestate->has_lpdead_items = true;
+		prunestate->all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0ae3b4506..f1ea50454 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -957,6 +957,14 @@ get_all_vacuum_rels(int options)
  * FreezeLimit (at a minimum), and relminmxid up to multiXactCutoff (at a
  * minimum).
  *
+ * While non-aggressive VACUUMs are never required to advance relfrozenxid and
+ * relminmxid, they often do so in practice.  They freeze wherever possible,
+ * based on the same criteria that aggressive VACUUMs use.  FreezeLimit and
+ * multiXactCutoff still force freezing of older XIDs/XMIDs that did not get
+ * frozen based on the standard criteria, though.  (Actually, these cutoffs
+ * won't force non-aggressive VACUUMs to freeze pages that cannot be cleanup
+ * locked without waiting.)
+ *
  * oldestXmin and oldestMxact are the most recent values that can ever be
  * passed to vac_update_relstats() as frozenxid and minmulti arguments by our
  * vacuumlazy.c caller later on.  These values should be passed when it turns
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 6a02d0fa8..4d585a265 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -565,11 +565,10 @@
     the <structfield>relfrozenxid</structfield> column of a table's
     <structname>pg_class</structname> row contains the oldest
     remaining XID at the end of the most recent <command>VACUUM</command>
-    that successfully advanced <structfield>relfrozenxid</structfield>
-    (typically the most recent aggressive VACUUM).  All rows inserted
-    by transactions with XIDs older than this cutoff XID are
-    guaranteed to have been frozen.  Similarly,
-    the <structfield>datfrozenxid</structfield> column of a database's
+    that successfully advanced <structfield>relfrozenxid</structfield>.
+    All rows inserted by transactions with XIDs older than this cutoff
+    XID are guaranteed to have been frozen.  Similarly, the
+    <structfield>datfrozenxid</structfield> column of a database's
     <structname>pg_database</structname> row is a lower bound on the unfrozen XIDs
     appearing in that database &mdash; it is just the minimum of the
     per-table <structfield>relfrozenxid</structfield> values within the database.
-- 
2.30.2

From 19edc49f9a0f7efa5b8518285dafac620b7b8e72 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 11 Mar 2022 19:16:02 -0800
Subject: [PATCH v10 1/3] Loosen coupling between relfrozenxid and freezing.

When VACUUM set relfrozenxid before now, it set it to whatever value was
used to determine which tuples to freeze -- the FreezeLimit cutoff.
This approach was very naive: the relfrozenxid invariant only requires
that new relfrozenxid values be <= the oldest extant XID remaining in
the table (at the point that the VACUUM operation ends), which in
general might be much more recent than FreezeLimit.  There is no fixed
relationship between the amount of physical work performed by VACUUM to
make it safe to advance relfrozenxid (freezing and pruning), and the
actual number of XIDs that relfrozenxid can be advanced by (at least in
principle) as a result.  VACUUM might have to freeze all of the tuples
from a hundred million heap pages just to enable relfrozenxid to be
advanced by no more than one or two XIDs.  On the other hand, VACUUM
might end up doing little or no work, and yet still be capable of
advancing relfrozenxid by hundreds of millions of XIDs as a result.

VACUUM now sets relfrozenxid (and relminmxid) using the exact oldest
extant XID (and oldest extant MultiXactId) from the table, including
XIDs from the table's remaining/unfrozen MultiXacts.  This requires that
VACUUM carefully track the oldest unfrozen XID/MultiXactId as it goes.
This optimization doesn't require any changes to the definition of
relfrozenxid, nor does it require changes to the core design of
freezing.

Later work targeting PostgreSQL 16 will teach VACUUM to determine what
to freeze based on page-level characteristics (not XID/XMID based
cutoffs).  But setting relfrozenxid/relminmxid to the exact oldest
extant XID/MXID is independently useful work.  For example, it is
helpful with larger databases that consume many MultiXacts.  If we
assume that the largest tables don't ever need to allocate any
MultiXacts, then aggressive VACUUMs targeting those tables will now
advance relminmxid right up to OldestMxact.  pg_class.relminmxid becomes
a much more precise indicator of what's really going on in each table,
making autovacuums to prevent wraparound (MultiXactId wraparound) occur
less frequently.

Final relfrozenxid values must still be >= FreezeLimit in an aggressive
VACUUM -- FreezeLimit still acts as a lower bound on the final value
that aggressive VACUUM can set relfrozenxid to.  Since standard VACUUMs
still make no guarantees about advancing relfrozenxid, they might as
well set relfrozenxid to a value from well before FreezeLimit when the
opportunity presents itself.  In general standard VACUUMs may now set
relfrozenxid to any value > the original relfrozenxid and <= OldestXmin.

Credit for the general idea of using the oldest extant XID to set
pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
---
 src/include/access/heapam.h          |   7 +-
 src/include/access/heapam_xlog.h     |   4 +-
 src/include/commands/vacuum.h        |   1 +
 src/backend/access/heap/heapam.c     | 247 +++++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c | 119 +++++++++----
 src/backend/commands/cluster.c       |   5 +-
 src/backend/commands/vacuum.c        |  42 +++--
 doc/src/sgml/maintenance.sgml        |  30 +++-
 8 files changed, 344 insertions(+), 111 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b46ab7d73..6ef3c02bb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -167,8 +167,11 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
 extern bool heap_freeze_tuple(HeapTupleHeader tuple,
 							  TransactionId relfrozenxid, TransactionId relminmxid,
 							  TransactionId cutoff_xid, TransactionId cutoff_multi);
-extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
-									MultiXactId cutoff_multi);
+extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple,
+									TransactionId limit_xid,
+									MultiXactId limit_multi,
+									TransactionId *relfrozenxid_nofreeze_out,
+									MultiXactId *relminmxid_nofreeze_out);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
 
 extern void simple_heap_insert(Relation relation, HeapTuple tup);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 5c47fdcec..2d8a7f627 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -410,7 +410,9 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  TransactionId cutoff_xid,
 									  TransactionId cutoff_multi,
 									  xl_heap_freeze_tuple *frz,
-									  bool *totally_frozen);
+									  bool *totally_frozen,
+									  TransactionId *relfrozenxid_out,
+									  MultiXactId *relminmxid_out);
 extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
 									  xl_heap_freeze_tuple *xlrec_tp);
 extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d64f6268f..ead88edda 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -291,6 +291,7 @@ extern bool vacuum_set_xid_limits(Relation rel,
 								  int multixact_freeze_min_age,
 								  int multixact_freeze_table_age,
 								  TransactionId *oldestXmin,
+								  MultiXactId *oldestMxact,
 								  TransactionId *freezeLimit,
 								  MultiXactId *multiXactCutoff);
 extern bool vacuum_xid_failsafe_check(TransactionId relfrozenxid,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3746336a0..2e859e427 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6128,7 +6128,12 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
  * NB -- this might have the side-effect of creating a new MultiXactId!
  *
  * "flags" is an output value; it's used to tell caller what to do on return.
- * Possible flags are:
+ *
+ * "xmax_oldest_xid_out" is an output value; we must handle the details of
+ * tracking the oldest extant XID within Multixacts.  This is part of how
+ * caller tracks relfrozenxid_out (the oldest extant XID) on behalf of VACUUM.
+ *
+ * Possible values that we can set in "flags":
  * FRM_NOOP
  *		don't do anything -- keep existing Xmax
  * FRM_INVALIDATE_XMAX
@@ -6140,12 +6145,21 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
  * FRM_RETURN_IS_MULTI
  *		The return value is a new MultiXactId to set as new Xmax.
  *		(caller must obtain proper infomask bits using GetMultiXactIdHintBits)
+ *
+ * Final *xmax_oldest_xid_out value should be ignored completely unless
+ * "flags" contains either FRM_NOOP or FRM_RETURN_IS_MULTI.  Final value is
+ * drawn from oldest extant XID that will remain in some MultiXact (old or
+ * new) after xmax is frozen (XIDs that won't remain after freezing are
+ * ignored, per convention).
+ *
+ * Note in particular that caller must deal with FRM_RETURN_IS_XID case
+ * itself, by considering returned Xid (not using *xmax_oldest_xid_out).
  */
 static TransactionId
 FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 				  TransactionId relfrozenxid, TransactionId relminmxid,
 				  TransactionId cutoff_xid, MultiXactId cutoff_multi,
-				  uint16 *flags)
+				  uint16 *flags, TransactionId *xmax_oldest_xid_out)
 {
 	TransactionId xid = InvalidTransactionId;
 	int			i;
@@ -6157,6 +6171,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	bool		has_lockers;
 	TransactionId update_xid;
 	bool		update_committed;
+	TransactionId temp_xid_out;
 
 	*flags = 0;
 
@@ -6251,13 +6266,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 
 	/* is there anything older than the cutoff? */
 	need_replace = false;
+	temp_xid_out = *xmax_oldest_xid_out;	/* initialize temp_xid_out */
 	for (i = 0; i < nmembers; i++)
 	{
 		if (TransactionIdPrecedes(members[i].xid, cutoff_xid))
-		{
 			need_replace = true;
-			break;
-		}
+		if (TransactionIdPrecedes(members[i].xid, temp_xid_out))
+			temp_xid_out = members[i].xid;
 	}
 
 	/*
@@ -6266,6 +6281,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 */
 	if (!need_replace)
 	{
+		*xmax_oldest_xid_out = temp_xid_out;
 		*flags |= FRM_NOOP;
 		pfree(members);
 		return InvalidTransactionId;
@@ -6275,6 +6291,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * If the multi needs to be updated, figure out which members do we need
 	 * to keep.
 	 */
+	temp_xid_out = *xmax_oldest_xid_out;	/* reset temp_xid_out */
 	nnewmembers = 0;
 	newmembers = palloc(sizeof(MultiXactMember) * nmembers);
 	has_lockers = false;
@@ -6356,7 +6373,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 * list.)
 			 */
 			if (TransactionIdIsValid(update_xid))
+			{
 				newmembers[nnewmembers++] = members[i];
+				if (TransactionIdPrecedes(members[i].xid, temp_xid_out))
+					temp_xid_out = members[i].xid;
+			}
 		}
 		else
 		{
@@ -6366,6 +6387,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			{
 				/* running locker cannot possibly be older than the cutoff */
 				Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));
+				Assert(!TransactionIdPrecedes(members[i].xid, *xmax_oldest_xid_out));
 				newmembers[nnewmembers++] = members[i];
 				has_lockers = true;
 			}
@@ -6403,6 +6425,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 */
 		xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
 		*flags |= FRM_RETURN_IS_MULTI;
+
+		/*
+		 * Return oldest remaining XID in new multixact if it's older than
+		 * caller's original xmax_oldest_xid_out (otherwise it's just the
+		 * original xmax_oldest_xid_out value from caller)
+		 */
+		*xmax_oldest_xid_out = temp_xid_out;
 	}
 
 	pfree(newmembers);
@@ -6421,6 +6450,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
  * will be totally frozen after these operations are performed and false if
  * more freezing will eventually be required.
  *
+ * Maintains *relfrozenxid_out and *relminmxid_out, which are the current
+ * target relfrozenxid and relminmxid for the relation.  Caller should make
+ * temp copies of global tracking variables before starting to process a page,
+ * so that we can only scribble on copies.
+ *
  * Caller is responsible for setting the offset field, if appropriate.
  *
  * It is assumed that the caller has checked the tuple with
@@ -6445,7 +6479,9 @@ bool
 heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 						  TransactionId relfrozenxid, TransactionId relminmxid,
 						  TransactionId cutoff_xid, TransactionId cutoff_multi,
-						  xl_heap_freeze_tuple *frz, bool *totally_frozen)
+						  xl_heap_freeze_tuple *frz, bool *totally_frozen,
+						  TransactionId *relfrozenxid_out,
+						  MultiXactId *relminmxid_out)
 {
 	bool		changed = false;
 	bool		xmax_already_frozen = false;
@@ -6489,6 +6525,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			frz->t_infomask |= HEAP_XMIN_FROZEN;
 			changed = true;
 		}
+		else if (TransactionIdPrecedes(xid, *relfrozenxid_out))
+			*relfrozenxid_out = xid;
 	}
 
 	/*
@@ -6506,16 +6544,21 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 	{
 		TransactionId newxmax;
 		uint16		flags;
+		TransactionId xmax_oldest_xid_out = *relfrozenxid_out;
 
 		newxmax = FreezeMultiXactId(xid, tuple->t_infomask,
 									relfrozenxid, relminmxid,
-									cutoff_xid, cutoff_multi, &flags);
+									cutoff_xid, cutoff_multi,
+									&flags, &xmax_oldest_xid_out);
 
 		freeze_xmax = (flags & FRM_INVALIDATE_XMAX);
 
 		if (flags & FRM_RETURN_IS_XID)
 		{
 			/*
+			 * xmax will become an updater XID (an XID from the original
+			 * MultiXact's XIDs that needs to be carried forward).
+			 *
 			 * NB -- some of these transformations are only valid because we
 			 * know the return Xid is a tuple updater (i.e. not merely a
 			 * locker.) Also note that the only reason we don't explicitly
@@ -6527,6 +6570,16 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			if (flags & FRM_MARK_COMMITTED)
 				frz->t_infomask |= HEAP_XMAX_COMMITTED;
 			changed = true;
+			Assert(freeze_xmax);
+
+			/*
+			 * Only consider newxmax Xid to track relfrozenxid_out here, since
+			 * any other XIDs from the old MultiXact won't be left behind once
+			 * xmax is actually frozen.
+			 */
+			Assert(TransactionIdIsValid(newxmax));
+			if (TransactionIdPrecedes(newxmax, *relfrozenxid_out))
+				*relfrozenxid_out = newxmax;
 		}
 		else if (flags & FRM_RETURN_IS_MULTI)
 		{
@@ -6534,6 +6587,10 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			uint16		newbits2;
 
 			/*
+			 * xmax was an old MultiXactId which we have to replace with a new
+			 * Multixact, that carries forward a subset of the XIDs from the
+			 * original (those that we'll still need).
+			 *
 			 * We can't use GetMultiXactIdHintBits directly on the new multi
 			 * here; that routine initializes the masks to all zeroes, which
 			 * would lose other bits we need.  Doing it this way ensures all
@@ -6548,6 +6605,37 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			frz->xmax = newxmax;
 
 			changed = true;
+			Assert(!freeze_xmax);
+
+			/*
+			 * FreezeMultiXactId sets xmax_oldest_xid_out to any XID that it
+			 * notices is older than initial relfrozenxid_out, unless the XID
+			 * won't remain after freezing
+			 */
+			Assert(!MultiXactIdPrecedes(newxmax, *relminmxid_out));
+			Assert(TransactionIdPrecedesOrEquals(xmax_oldest_xid_out,
+												 *relfrozenxid_out));
+			*relfrozenxid_out = xmax_oldest_xid_out;
+		}
+		else if (flags & FRM_NOOP)
+		{
+			/*
+			 * xmax is a MultiXactId, and nothing about it changes for now.
+			 *
+			 * Might have to ratchet back relminmxid_out, relfrozenxid_out, or
+			 * both together.  FreezeMultiXactId sets xmax_oldest_xid_out to
+			 * any XID that it notices is older than initial relfrozenxid_out,
+			 * unless the XID won't remain after freezing (or in this case
+			 * after _not_ freezing).
+			 */
+			Assert(MultiXactIdIsValid(xid));
+			Assert(!changed && !freeze_xmax);
+
+			if (MultiXactIdPrecedes(xid, *relminmxid_out))
+				*relminmxid_out = xid;
+			Assert(TransactionIdPrecedesOrEquals(xmax_oldest_xid_out,
+												 *relfrozenxid_out));
+			*relfrozenxid_out = xmax_oldest_xid_out;
 		}
 	}
 	else if (TransactionIdIsNormal(xid))
@@ -6575,7 +6663,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			freeze_xmax = true;
 		}
 		else
+		{
 			freeze_xmax = false;
+			if (TransactionIdPrecedes(xid, *relfrozenxid_out))
+				*relfrozenxid_out = xid;
+		}
 	}
 	else if ((tuple->t_infomask & HEAP_XMAX_INVALID) ||
 			 !TransactionIdIsValid(HeapTupleHeaderGetRawXmax(tuple)))
@@ -6699,11 +6791,14 @@ heap_freeze_tuple(HeapTupleHeader tuple,
 	xl_heap_freeze_tuple frz;
 	bool		do_freeze;
 	bool		tuple_totally_frozen;
+	TransactionId relfrozenxid_out = cutoff_xid;
+	MultiXactId relminmxid_out = cutoff_multi;
 
 	do_freeze = heap_prepare_freeze_tuple(tuple,
 										  relfrozenxid, relminmxid,
 										  cutoff_xid, cutoff_multi,
-										  &frz, &tuple_totally_frozen);
+										  &frz, &tuple_totally_frozen,
+										  &relfrozenxid_out, &relminmxid_out);
 
 	/*
 	 * Note that because this is not a WAL-logged operation, we don't need to
@@ -7133,24 +7228,57 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
  * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac)
  * are older than the specified cutoff XID or MultiXactId.  If so, return true.
  *
+ * See heap_prepare_freeze_tuple for information about the basic rules for the
+ * cutoffs used here.
+ *
  * It doesn't matter whether the tuple is alive or dead, we are checking
  * to see if a tuple needs to be removed or frozen to avoid wraparound.
  *
+ * The *relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out arguments are
+ * input/output arguments that work just like heap_prepare_freeze_tuple's
+ * *relfrozenxid_out and *relminmxid_out input/output arguments.  However,
+ * there is one important difference: we track the oldest extant XID and XMID
+ * while making a working assumption that no freezing will actually take
+ * place.  On the other hand, heap_prepare_freeze_tuple assumes that freezing
+ * will take place (based on the specific instructions it also sets up for
+ * caller's tuple).
+ *
+ * Note, in particular, that we even assume that freezing won't go ahead for a
+ * tuple that we indicate "needs freezing" (by returning true).  Not all
+ * callers will be okay with that.  Caller should make temp copies of global
+ * tracking variables before starting to process a page, so that we only ever
+ * scribble on copies.  That way caller can just discard the temp copies if it
+ * really needs to freeze (using heap_prepare_freeze_tuple interface).  In
+ * practice aggressive VACUUM callers always do this and non-aggressive VACUUM
+ * callers always just accept an older final relfrozenxid value.
+ *
  * NB: Cannot rely on hint bits here, they might not be set after a crash or
  * on a standby.
  */
 bool
-heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
-						MultiXactId cutoff_multi)
+heap_tuple_needs_freeze(HeapTupleHeader tuple,
+						TransactionId limit_xid, MultiXactId limit_multi,
+						TransactionId *relfrozenxid_nofreeze_out,
+						MultiXactId *relminmxid_nofreeze_out)
 {
 	TransactionId xid;
-
-	xid = HeapTupleHeaderGetXmin(tuple);
-	if (TransactionIdIsNormal(xid) &&
-		TransactionIdPrecedes(xid, cutoff_xid))
-		return true;
+	bool		needs_freeze = false;
 
 	/*
+	 * First deal with xmin.
+	 */
+	xid = HeapTupleHeaderGetXmin(tuple);
+	if (TransactionIdIsNormal(xid))
+	{
+		if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out))
+			*relfrozenxid_nofreeze_out = xid;
+		if (TransactionIdPrecedes(xid, limit_xid))
+			needs_freeze = true;
+	}
+
+	/*
+	 * Now deal with xmax.
+	 *
 	 * The considerations for multixacts are complicated; look at
 	 * heap_prepare_freeze_tuple for justifications.  This routine had better
 	 * be in sync with that one!
@@ -7158,57 +7286,80 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 	{
 		MultiXactId multi;
+		MultiXactMember *members;
+		int			nmembers;
 
 		multi = HeapTupleHeaderGetRawXmax(tuple);
 		if (!MultiXactIdIsValid(multi))
 		{
-			/* no xmax set, ignore */
-			;
+			/* no xmax set -- but xmin might still need freezing */
+			return needs_freeze;
 		}
-		else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
-			return true;
-		else if (MultiXactIdPrecedes(multi, cutoff_multi))
-			return true;
-		else
+
+		/*
+		 * Might have to ratchet back relminmxid_nofreeze_out, which we assume
+		 * won't be frozen by caller (even when we return true)
+		 */
+		if (MultiXactIdPrecedes(multi, *relminmxid_nofreeze_out))
+			*relminmxid_nofreeze_out = multi;
+
+		if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
 		{
-			MultiXactMember *members;
-			int			nmembers;
-			int			i;
-
-			/* need to check whether any member of the mxact is too old */
-
-			nmembers = GetMultiXactIdMembers(multi, &members, false,
-											 HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
-
-			for (i = 0; i < nmembers; i++)
-			{
-				if (TransactionIdPrecedes(members[i].xid, cutoff_xid))
-				{
-					pfree(members);
-					return true;
-				}
-			}
-			if (nmembers > 0)
-				pfree(members);
+			/*
+			 * pg_upgrade'd MultiXact doesn't need to have its XID members
+			 * affect caller's relfrozenxid_nofreeze_out (just freeze it)
+			 */
+			return true;
 		}
+		else if (MultiXactIdPrecedes(multi, limit_multi))
+			needs_freeze = true;
+
+		/*
+		 * Need to check whether any member of the mxact is too old to
+		 * determine if MultiXact needs to be frozen now.  We even access the
+		 * members when we know that the MultiXactId isn't eligible for
+		 * freezing now -- we must still maintain relfrozenxid_nofreeze_out.
+		 */
+		nmembers = GetMultiXactIdMembers(multi, &members, false,
+										 HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
+
+		for (int i = 0; i < nmembers; i++)
+		{
+			xid = members[i].xid;
+
+			if (TransactionIdPrecedes(xid, limit_xid))
+				needs_freeze = true;
+			if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out))
+				*relfrozenxid_nofreeze_out = xid;
+		}
+		if (nmembers > 0)
+			pfree(members);
 	}
 	else
 	{
 		xid = HeapTupleHeaderGetRawXmax(tuple);
-		if (TransactionIdIsNormal(xid) &&
-			TransactionIdPrecedes(xid, cutoff_xid))
-			return true;
+		if (TransactionIdIsNormal(xid))
+		{
+			if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out))
+				*relfrozenxid_nofreeze_out = xid;
+			if (TransactionIdPrecedes(xid, limit_xid))
+				needs_freeze = true;
+		}
 	}
 
 	if (tuple->t_infomask & HEAP_MOVED)
 	{
 		xid = HeapTupleHeaderGetXvac(tuple);
-		if (TransactionIdIsNormal(xid) &&
-			TransactionIdPrecedes(xid, cutoff_xid))
-			return true;
+		if (TransactionIdIsNormal(xid))
+		{
+			if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out))
+				*relfrozenxid_nofreeze_out = xid;
+			if (TransactionIdPrecedes(xid, limit_xid))
+				needs_freeze = true;
+		}
 	}
 
-	return false;
+	return needs_freeze;
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 87ab7775a..9f5178e0a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -144,7 +144,7 @@ typedef struct LVRelState
 	Relation   *indrels;
 	int			nindexes;
 
-	/* Aggressive VACUUM (scan all unfrozen pages)? */
+	/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
 	bool		aggressive;
 	/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
 	bool		skipwithvm;
@@ -173,8 +173,9 @@ typedef struct LVRelState
 	/* VACUUM operation's target cutoffs for freezing XIDs and MultiXactIds */
 	TransactionId FreezeLimit;
 	MultiXactId MultiXactCutoff;
-	/* Are FreezeLimit/MultiXactCutoff still valid? */
-	bool		freeze_cutoffs_valid;
+	/* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
+	TransactionId NewRelfrozenXid;
+	MultiXactId NewRelminMxid;
 
 	/* Error reporting state */
 	char	   *relnamespace;
@@ -328,6 +329,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 	TransactionId OldestXmin;
+	MultiXactId OldestMxact;
 	TransactionId FreezeLimit;
 	MultiXactId MultiXactCutoff;
 
@@ -354,17 +356,17 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * used to determine which XIDs/MultiXactIds will be frozen.
 	 *
 	 * If this is an aggressive VACUUM, then we're strictly required to freeze
-	 * any and all XIDs from before FreezeLimit, so that we will be able to
-	 * safely advance relfrozenxid up to FreezeLimit below (we must be able to
-	 * advance relminmxid up to MultiXactCutoff, too).
+	 * any and all XIDs from before FreezeLimit in order to be able to advance
+	 * relfrozenxid to a value >= FreezeLimit below.  There is an analogous
+	 * requirement around MultiXact freezing, relminmxid, and MultiXactCutoff.
 	 */
 	aggressive = vacuum_set_xid_limits(rel,
 									   params->freeze_min_age,
 									   params->freeze_table_age,
 									   params->multixact_freeze_min_age,
 									   params->multixact_freeze_table_age,
-									   &OldestXmin, &FreezeLimit,
-									   &MultiXactCutoff);
+									   &OldestXmin, &OldestMxact,
+									   &FreezeLimit, &MultiXactCutoff);
 
 	skipwithvm = true;
 	if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
@@ -511,10 +513,11 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	vacrel->vistest = GlobalVisTestFor(rel);
 	/* FreezeLimit controls XID freezing (always <= OldestXmin) */
 	vacrel->FreezeLimit = FreezeLimit;
-	/* MultiXactCutoff controls MXID freezing */
+	/* MultiXactCutoff controls MXID freezing (always <= OldestMxact) */
 	vacrel->MultiXactCutoff = MultiXactCutoff;
-	/* Track if cutoffs became invalid (possible in !aggressive case only) */
-	vacrel->freeze_cutoffs_valid = true;
+	/* Initialize state used to track oldest extant XID/XMID */
+	vacrel->NewRelfrozenXid = OldestXmin;
+	vacrel->NewRelminMxid = OldestMxact;
 
 	/*
 	 * Call lazy_scan_heap to perform all required heap pruning, index
@@ -568,12 +571,11 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * Aggressive VACUUM must reliably advance relfrozenxid (and relminmxid).
 	 * We are able to advance relfrozenxid in a non-aggressive VACUUM too,
 	 * provided we didn't skip any all-visible (not all-frozen) pages using
-	 * the visibility map, and assuming that we didn't fail to get a cleanup
-	 * lock that made it unsafe with respect to FreezeLimit (or perhaps our
-	 * MultiXactCutoff) established for VACUUM operation.
+	 * the visibility map.  A non-aggressive VACUUM might advance relfrozenxid
+	 * to an XID that is either older or newer than FreezeLimit (same applies
+	 * to relminmxid and MultiXactCutoff).
 	 */
-	if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages ||
-		!vacrel->freeze_cutoffs_valid)
+	if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages)
 	{
 		/* Cannot advance relfrozenxid/relminmxid */
 		Assert(!aggressive);
@@ -587,9 +589,16 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	{
 		Assert(vacrel->scanned_pages + vacrel->frozenskipped_pages ==
 			   orig_rel_pages);
+		Assert(!aggressive ||
+			   TransactionIdPrecedesOrEquals(FreezeLimit,
+											 vacrel->NewRelfrozenXid));
+		Assert(!aggressive ||
+			   MultiXactIdPrecedesOrEquals(MultiXactCutoff,
+										   vacrel->NewRelminMxid));
+
 		vac_update_relstats(rel, new_rel_pages, new_live_tuples,
 							new_rel_allvisible, vacrel->nindexes > 0,
-							FreezeLimit, MultiXactCutoff,
+							vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
 							&frozenxid_updated, &minmulti_updated, false);
 	}
 
@@ -694,17 +703,19 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 							 OldestXmin, diff);
 			if (frozenxid_updated)
 			{
-				diff = (int32) (FreezeLimit - vacrel->relfrozenxid);
+				diff = (int32) (vacrel->NewRelfrozenXid - vacrel->relfrozenxid);
+				Assert(diff > 0);
 				appendStringInfo(&buf,
 								 _("new relfrozenxid: %u, which is %d xids ahead of previous value\n"),
-								 FreezeLimit, diff);
+								 vacrel->NewRelfrozenXid, diff);
 			}
 			if (minmulti_updated)
 			{
-				diff = (int32) (MultiXactCutoff - vacrel->relminmxid);
+				diff = (int32) (vacrel->NewRelminMxid - vacrel->relminmxid);
+				Assert(diff > 0);
 				appendStringInfo(&buf,
 								 _("new relminmxid: %u, which is %d mxids ahead of previous value\n"),
-								 MultiXactCutoff, diff);
+								 vacrel->NewRelminMxid, diff);
 			}
 			if (orig_rel_pages > 0)
 			{
@@ -896,8 +907,8 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 	 * find them.  But even when aggressive *is* set, it's still OK if we miss
 	 * a page whose all-frozen marking has just been cleared.  Any new XIDs
 	 * just added to that page are necessarily >= vacrel->OldestXmin, and so
-	 * they'll have no effect on the value to which we can safely set
-	 * relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
+	 * they cannot invalidate NewRelfrozenXid tracking.  A similar argument
+	 * applies for NewRelminMxid tracking and OldestMxact.
 	 */
 	next_unskippable_block = 0;
 	if (vacrel->skipwithvm)
@@ -1584,6 +1595,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				recently_dead_tuples;
 	int			nnewlpdead;
 	int			nfrozen;
+	TransactionId NewRelfrozenXid;
+	MultiXactId NewRelminMxid;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage];
 
@@ -1593,7 +1606,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 retry:
 
-	/* Initialize (or reset) page-level counters */
+	/* Initialize (or reset) page-level state */
+	NewRelfrozenXid = vacrel->NewRelfrozenXid;
+	NewRelminMxid = vacrel->NewRelminMxid;
 	tuples_deleted = 0;
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -1801,7 +1816,8 @@ retry:
 									  vacrel->FreezeLimit,
 									  vacrel->MultiXactCutoff,
 									  &frozen[nfrozen],
-									  &tuple_totally_frozen))
+									  &tuple_totally_frozen,
+									  &NewRelfrozenXid, &NewRelminMxid))
 		{
 			/* Will execute freeze below */
 			frozen[nfrozen++].offset = offnum;
@@ -1815,13 +1831,16 @@ retry:
 			prunestate->all_frozen = false;
 	}
 
+	vacrel->offnum = InvalidOffsetNumber;
+
 	/*
 	 * We have now divided every item on the page into either an LP_DEAD item
 	 * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
 	 * that remains and needs to be considered for freezing now (LP_UNUSED and
 	 * LP_REDIRECT items also remain, but are of no further interest to us).
 	 */
-	vacrel->offnum = InvalidOffsetNumber;
+	vacrel->NewRelfrozenXid = NewRelfrozenXid;
+	vacrel->NewRelminMxid = NewRelminMxid;
 
 	/*
 	 * Consider the need to freeze any items with tuple storage from the page
@@ -1972,6 +1991,8 @@ lazy_scan_noprune(LVRelState *vacrel,
 				missed_dead_tuples;
 	HeapTupleHeader tupleheader;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
+	TransactionId NoFreezeNewRelfrozenXid = vacrel->NewRelfrozenXid;
+	MultiXactId NoFreezeNewRelminMxid = vacrel->NewRelminMxid;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2017,20 +2038,40 @@ lazy_scan_noprune(LVRelState *vacrel,
 		tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
 		if (heap_tuple_needs_freeze(tupleheader,
 									vacrel->FreezeLimit,
-									vacrel->MultiXactCutoff))
+									vacrel->MultiXactCutoff,
+									&NoFreezeNewRelfrozenXid,
+									&NoFreezeNewRelminMxid))
 		{
 			if (vacrel->aggressive)
 			{
-				/* Going to have to get cleanup lock for lazy_scan_prune */
+				/*
+				 * heap_tuple_needs_freeze determined that it isn't going to
+				 * be possible for the ongoing aggressive VACUUM operation to
+				 * advance relfrozenxid to a value >= FreezeLimit without
+				 * freezing one or more tuples with older XIDs from this page.
+				 * (Or perhaps the issue was that MultiXactCutoff could not be
+				 * respected.  Might have even been both cutoffs, together.)
+				 *
+				 * Tell caller that it must acquire a full cleanup lock.  It's
+				 * possible that caller will have to wait a while for one, but
+				 * that can't be helped -- full processing by lazy_scan_prune
+				 * is required to freeze the older XIDs (and/or freeze older
+				 * MultiXactIds).
+				 */
 				vacrel->offnum = InvalidOffsetNumber;
 				return false;
 			}
-
-			/*
-			 * Current non-aggressive VACUUM operation definitely won't be
-			 * able to advance relfrozenxid or relminmxid
-			 */
-			vacrel->freeze_cutoffs_valid = false;
+			else
+			{
+				/*
+				 * This is a non-aggressive VACUUM, which is under no strict
+				 * obligation to advance relfrozenxid at all (much less to
+				 * advance it to a value >= FreezeLimit).  Non-aggressive
+				 * VACUUM advances relfrozenxid/relminmxid on a best-effort
+				 * basis.  Accept an older final relfrozenxid/relminmxid value
+				 * rather than waiting for a cleanup lock.
+				 */
+			}
 		}
 
 		ItemPointerSet(&(tuple.t_self), blkno, offnum);
@@ -2079,6 +2120,16 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 	vacrel->offnum = InvalidOffsetNumber;
 
+	/*
+	 * By here we know for sure that caller can tolerate having reduced
+	 * processing for this particular page.  Before we return to report
+	 * success, update vacrel with details of how we processed the page.
+	 * (lazy_scan_prune expects a clean slate, so we have to delay these steps
+	 * until here.)
+	 */
+	vacrel->NewRelfrozenXid = NoFreezeNewRelfrozenXid;
+	vacrel->NewRelminMxid = NoFreezeNewRelminMxid;
+
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel (though
 	 * only when VACUUM uses two-pass strategy)
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 02a7e94bf..a7e988298 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -767,6 +767,7 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	TupleDesc	oldTupDesc PG_USED_FOR_ASSERTS_ONLY;
 	TupleDesc	newTupDesc PG_USED_FOR_ASSERTS_ONLY;
 	TransactionId OldestXmin;
+	MultiXactId oldestMxact;
 	TransactionId FreezeXid;
 	MultiXactId MultiXactCutoff;
 	bool		use_sort;
@@ -856,8 +857,8 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	 * Since we're going to rewrite the whole table anyway, there's no reason
 	 * not to be aggressive about this.
 	 */
-	vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0,
-						  &OldestXmin, &FreezeXid, &MultiXactCutoff);
+	vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0, &OldestXmin, &oldestMxact,
+						  &FreezeXid, &MultiXactCutoff);
 
 	/*
 	 * FreezeXid will become the table's new relfrozenxid, and that mustn't go
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 50a4a612e..0ae3b4506 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -945,14 +945,22 @@ get_all_vacuum_rels(int options)
  * The output parameters are:
  * - oldestXmin is the Xid below which tuples deleted by any xact (that
  *   committed) should be considered DEAD, not just RECENTLY_DEAD.
- * - freezeLimit is the Xid below which all Xids are replaced by
- *	 FrozenTransactionId during vacuum.
- * - multiXactCutoff is the value below which all MultiXactIds are removed
- *   from Xmax.
+ * - oldestMxact is the Mxid below which MultiXacts are definitely not
+ *   seen as visible by any running transaction.
+ * - freezeLimit is the Xid below which all Xids are definitely replaced by
+ *   FrozenTransactionId during aggressive vacuums.
+ * - multiXactCutoff is the value below which all MultiXactIds are definitely
+ *   removed from Xmax during aggressive vacuums.
  *
  * Return value indicates if vacuumlazy.c caller should make its VACUUM
  * operation aggressive.  An aggressive VACUUM must advance relfrozenxid up to
- * FreezeLimit, and relminmxid up to multiXactCutoff.
+ * FreezeLimit (at a minimum), and relminmxid up to multiXactCutoff (at a
+ * minimum).
+ *
+ * oldestXmin and oldestMxact are the most recent values that can ever be
+ * passed to vac_update_relstats() as frozenxid and minmulti arguments by our
+ * vacuumlazy.c caller later on.  These values should be passed when it turns
+ * out that VACUUM will leave no unfrozen XIDs/XMIDs behind in the table.
  */
 bool
 vacuum_set_xid_limits(Relation rel,
@@ -961,6 +969,7 @@ vacuum_set_xid_limits(Relation rel,
 					  int multixact_freeze_min_age,
 					  int multixact_freeze_table_age,
 					  TransactionId *oldestXmin,
+					  MultiXactId *oldestMxact,
 					  TransactionId *freezeLimit,
 					  MultiXactId *multiXactCutoff)
 {
@@ -969,7 +978,6 @@ vacuum_set_xid_limits(Relation rel,
 	int			effective_multixact_freeze_max_age;
 	TransactionId limit;
 	TransactionId safeLimit;
-	MultiXactId oldestMxact;
 	MultiXactId mxactLimit;
 	MultiXactId safeMxactLimit;
 	int			freezetable;
@@ -1065,9 +1073,11 @@ vacuum_set_xid_limits(Relation rel,
 						 effective_multixact_freeze_max_age / 2);
 	Assert(mxid_freezemin >= 0);
 
+	/* Remember for caller */
+	*oldestMxact = GetOldestMultiXactId();
+
 	/* compute the cutoff multi, being careful to generate a valid value */
-	oldestMxact = GetOldestMultiXactId();
-	mxactLimit = oldestMxact - mxid_freezemin;
+	mxactLimit = *oldestMxact - mxid_freezemin;
 	if (mxactLimit < FirstMultiXactId)
 		mxactLimit = FirstMultiXactId;
 
@@ -1082,8 +1092,8 @@ vacuum_set_xid_limits(Relation rel,
 				(errmsg("oldest multixact is far in the past"),
 				 errhint("Close open transactions with multixacts soon to avoid wraparound problems.")));
 		/* Use the safe limit, unless an older mxact is still running */
-		if (MultiXactIdPrecedes(oldestMxact, safeMxactLimit))
-			mxactLimit = oldestMxact;
+		if (MultiXactIdPrecedes(*oldestMxact, safeMxactLimit))
+			mxactLimit = *oldestMxact;
 		else
 			mxactLimit = safeMxactLimit;
 	}
@@ -1390,14 +1400,10 @@ vac_update_relstats(Relation relation,
 	 * Update relfrozenxid, unless caller passed InvalidTransactionId
 	 * indicating it has no new data.
 	 *
-	 * Ordinarily, we don't let relfrozenxid go backwards: if things are
-	 * working correctly, the only way the new frozenxid could be older would
-	 * be if a previous VACUUM was done with a tighter freeze_min_age, in
-	 * which case we don't want to forget the work it already did.  However,
-	 * if the stored relfrozenxid is "in the future", then it must be corrupt
-	 * and it seems best to overwrite it with the cutoff we used this time.
-	 * This should match vac_update_datfrozenxid() concerning what we consider
-	 * to be "in the future".
+	 * Ordinarily, we don't let relfrozenxid go backwards.  However, if the
+	 * stored relfrozenxid is "in the future", then it must be corrupt, so
+	 * just overwrite it.  This should match vac_update_datfrozenxid()
+	 * concerning what we consider to be "in the future".
 	 */
 	if (frozenxid_updated)
 		*frozenxid_updated = false;
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e..6a02d0fa8 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -563,9 +563,11 @@
     statistics in the system tables <structname>pg_class</structname> and
     <structname>pg_database</structname>.  In particular,
     the <structfield>relfrozenxid</structfield> column of a table's
-    <structname>pg_class</structname> row contains the freeze cutoff XID that was used
-    by the last aggressive <command>VACUUM</command> for that table.  All rows
-    inserted by transactions with XIDs older than this cutoff XID are
+    <structname>pg_class</structname> row contains the oldest
+    remaining XID at the end of the most recent <command>VACUUM</command>
+    that successfully advanced <structfield>relfrozenxid</structfield>
+    (typically the most recent aggressive VACUUM).  All rows inserted
+    by transactions with XIDs older than this cutoff XID are
     guaranteed to have been frozen.  Similarly,
     the <structfield>datfrozenxid</structfield> column of a database's
     <structname>pg_database</structname> row is a lower bound on the unfrozen XIDs
@@ -588,6 +590,17 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
     cutoff XID to the current transaction's XID.
    </para>
 
+   <tip>
+    <para>
+     <literal>VACUUM VERBOSE</literal> outputs information about
+     <structfield>relfrozenxid</structfield> and/or
+     <structfield>relminmxid</structfield> when either field was
+     advanced.  The same details appear in the server log when <xref
+      linkend="guc-log-autovacuum-min-duration"/> reports on vacuuming
+     by autovacuum.
+    </para>
+   </tip>
+
    <para>
     <command>VACUUM</command> normally only scans pages that have been modified
     since the last vacuum, but <structfield>relfrozenxid</structfield> can only be
@@ -602,7 +615,11 @@ SELECT datname, age(datfrozenxid) FROM pg_database;
     set <literal>age(relfrozenxid)</literal> to a value just a little more than the
     <varname>vacuum_freeze_min_age</varname> setting
     that was used (more by the number of transactions started since the
-    <command>VACUUM</command> started).  If no <structfield>relfrozenxid</structfield>-advancing
+    <command>VACUUM</command> started).  <command>VACUUM</command>
+    will set <structfield>relfrozenxid</structfield> to the oldest XID
+    that remains in the table, so it's possible that the final value
+    will be much more recent than strictly required.
+    If no <structfield>relfrozenxid</structfield>-advancing
     <command>VACUUM</command> is issued on the table until
     <varname>autovacuum_freeze_max_age</varname> is reached, an autovacuum will soon
     be forced for the table.
@@ -689,8 +706,9 @@ HINT:  Stop the postmaster and vacuum that database in single-user mode.
     </para>
 
     <para>
-     Aggressive <command>VACUUM</command> scans, regardless of
-     what causes them, enable advancing the value for that table.
+     Aggressive <command>VACUUM</command> scans, regardless of what
+     causes them, are <emphasis>guaranteed</emphasis> to be able to
+     advance the table's <structfield>relminmxid</structfield>.
      Eventually, as all tables in all databases are scanned and their
      oldest multixact values are advanced, on-disk storage for older
      multixacts can be removed.
-- 
2.30.2

From 134bd550bd7cb8c182fe3a28789705be5bf8785a Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 11 Mar 2022 19:16:02 -0800
Subject: [PATCH v10 2/3] Generalize how VACUUM skips all-frozen pages.

Non-aggressive VACUUMs were at a gratuitous disadvantage (relative to
aggressive VACUUMs) around advancing relfrozenxid before now.  The
underlying issue was that lazy_scan_heap conditioned its skipping
behavior on whether or not the current VACUUM was aggressive.  VACUUM
could fail to increment its frozenskipped_pages counter as a result, and
so could miss out on advancing relfrozenxid for no good reason.  The
approach taken during aggressive VACUUMs avoided the problem, but that
only worked in the aggressive case.

Fix the issue by generalizing how we skip all-frozen pages: remember
whether a range of skippable pages consists only of all-frozen pages as
we're initially establishing the range of skippable pages.  If we decide
to skip the range of pages, and if the range as a whole is not an
all-frozen range, remember that fact for later (this makes it unsafe to
advance relfrozenxid).  We no longer need to recheck any pages using the
visibility map.  We no longer directly track frozenskipped_pages at all.
And we no longer need ad-hoc VM_ALL_VISIBLE()/VM_ALL_FROZEN() calls for
pages from a range of blocks that we already decided were safe to skip.

The issue is subtle.  Before now, the non-aggressive case always had to
recheck the visibility map at the point of actually skipping each page.
This created a window for some other session to concurrently unset the
same heap page's bit in the visibility map.  If the bit was unset at
exactly the wrong time, then the non-aggressive case would
conservatively conclude that the page was _never_ all-frozen on recheck.
And so frozenskipped_pages would not be incremented for the page.
lazy_scan_heap had already "committed" to skipping the page at that
point, though, which was enough to make it unsafe to advance
relfrozenxid/relminmxid later on.

It's possible that this issue hardly ever came up in practice.  It's
hard to be sure either way.  We only had to be unlucky once to lose out
on advancing relfrozenxid -- a single affected heap page was enough to
throw VACUUM off.  That seems like something to avoid on general
principle.  This is similar to an issue addressed by commit 44fa8488,
which taught vacuumlazy.c to not give up on non-aggressive relfrozenxid
advancement just because a cleanup lock wasn't immediately available on
some heap page.

Also refactor the mechanism that disables skipping using the visibility
map during VACUUM(DISABLE_PAGE_SKIPPING).  Our old approach made VACUUM
behave as if there were no pages with VM bits set.  Our new approach has
VACUUM set up a range of pages in the usual way, without actually going
through with skipping the range in the end.  This has the advantage of
making VACUUM(DISABLE_PAGE_SKIPPING) apply standard cross checks that
report on visibility map corruption via WARNINGs.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: CAH2-Wzn6bGJGfOy3zSTJicKLw99PHJeSOQBOViKjSCinaxUKDQ@mail.gmail.com">https://postgr.es/m/CAH2-Wzn6bGJGfOy3zSTJicKLw99PHJeSOQBOViKjSCinaxUKDQ@mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 298 ++++++++++++++-------------
 1 file changed, 158 insertions(+), 140 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f5178e0a..3bc75d401 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -176,6 +176,8 @@ typedef struct LVRelState
 	/* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
 	TransactionId NewRelfrozenXid;
 	MultiXactId NewRelminMxid;
+	/* Have we skipped any all-visible (not all-frozen) pages? */
+	bool		skippedallvis;
 
 	/* Error reporting state */
 	char	   *relnamespace;
@@ -196,7 +198,6 @@ typedef struct LVRelState
 	VacDeadItems *dead_items;	/* TIDs whose index tuples we'll delete */
 	BlockNumber rel_pages;		/* total number of pages */
 	BlockNumber scanned_pages;	/* # pages examined (not skipped via VM) */
-	BlockNumber frozenskipped_pages;	/* # frozen pages skipped via VM */
 	BlockNumber removed_pages;	/* # pages removed by relation truncation */
 	BlockNumber lpdead_item_pages;	/* # pages with LP_DEAD items */
 	BlockNumber missed_dead_pages;	/* # pages with missed dead tuples */
@@ -247,6 +248,10 @@ typedef struct LVSavedErrInfo
 
 /* non-export function prototypes */
 static void lazy_scan_heap(LVRelState *vacrel, int nworkers);
+static BlockNumber lazy_scan_skip_range(LVRelState *vacrel, Buffer *vmbuffer,
+										BlockNumber next_unskippable_block,
+										bool *all_visible_next_unskippable,
+										bool *all_frozen_skippable_range);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
@@ -471,7 +476,6 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 
 	/* Initialize page counters explicitly (be tidy) */
 	vacrel->scanned_pages = 0;
-	vacrel->frozenskipped_pages = 0;
 	vacrel->removed_pages = 0;
 	vacrel->lpdead_item_pages = 0;
 	vacrel->missed_dead_pages = 0;
@@ -518,6 +522,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	/* Initialize state used to track oldest extant XID/XMID */
 	vacrel->NewRelfrozenXid = OldestXmin;
 	vacrel->NewRelminMxid = OldestMxact;
+	/* Cannot advance relfrozenxid when we skipped all-visible pages */
+	vacrel->skippedallvis = false;
 
 	/*
 	 * Call lazy_scan_heap to perform all required heap pruning, index
@@ -575,7 +581,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * to an XID that is either older or newer than FreezeLimit (same applies
 	 * to relminmxid and MultiXactCutoff).
 	 */
-	if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages)
+	if (vacrel->skippedallvis)
 	{
 		/* Cannot advance relfrozenxid/relminmxid */
 		Assert(!aggressive);
@@ -587,8 +593,6 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	}
 	else
 	{
-		Assert(vacrel->scanned_pages + vacrel->frozenskipped_pages ==
-			   orig_rel_pages);
 		Assert(!aggressive ||
 			   TransactionIdPrecedesOrEquals(FreezeLimit,
 											 vacrel->NewRelfrozenXid));
@@ -842,7 +846,9 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 				next_failsafe_block,
 				next_fsm_block_to_vacuum;
 	Buffer		vmbuffer = InvalidBuffer;
-	bool		skipping_blocks;
+	bool		skipping_range,
+				all_visible_next_unskippable,
+				all_frozen_skippable_range;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -874,167 +880,85 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
 	/*
-	 * Set things up for skipping blocks using visibility map.
-	 *
-	 * Except when vacrel->aggressive is set, we want to skip pages that are
-	 * all-visible according to the visibility map, but only when we can skip
-	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
-	 * sequentially, the OS should be doing readahead for us, so there's no
-	 * gain in skipping a page now and then; that's likely to disable
-	 * readahead and so be counterproductive. Also, skipping even a single
-	 * page means that we can't update relfrozenxid, so we only want to do it
-	 * if we can skip a goodly number of pages.
-	 *
-	 * When vacrel->aggressive is set, we can't skip pages just because they
-	 * are all-visible, but we can still skip pages that are all-frozen, since
-	 * such pages do not need freezing and do not affect the value that we can
-	 * safely set for relfrozenxid or relminmxid.
+	 * Set up an initial range of blocks to skip via the visibility map.
 	 *
 	 * Before entering the main loop, establish the invariant that
 	 * next_unskippable_block is the next block number >= blkno that we can't
-	 * skip based on the visibility map, either all-visible for a regular scan
-	 * or all-frozen for an aggressive scan.  We set it to rel_pages when
-	 * there's no such block.  We also set up the skipping_blocks flag
-	 * correctly at this stage.
-	 *
-	 * Note: The value returned by visibilitymap_get_status could be slightly
-	 * out-of-date, since we make this test before reading the corresponding
-	 * heap page or locking the buffer.  This is OK.  If we mistakenly think
-	 * that the page is all-visible or all-frozen when in fact the flag's just
-	 * been cleared, we might fail to vacuum the page.  It's easy to see that
-	 * skipping a page when aggressive is not set is not a very big deal; we
-	 * might leave some dead tuples lying around, but the next vacuum will
-	 * find them.  But even when aggressive *is* set, it's still OK if we miss
-	 * a page whose all-frozen marking has just been cleared.  Any new XIDs
-	 * just added to that page are necessarily >= vacrel->OldestXmin, and so
-	 * they cannot invalidate NewRelfrozenXid tracking.  A similar argument
-	 * applies for NewRelminMxid tracking and OldestMxact.
+	 * skip based on the visibility map.
 	 */
-	next_unskippable_block = 0;
-	if (vacrel->skipwithvm)
-	{
-		while (next_unskippable_block < rel_pages)
-		{
-			uint8		vmstatus;
+	next_unskippable_block = lazy_scan_skip_range(vacrel, &vmbuffer, 0,
+												  &all_visible_next_unskippable,
+												  &all_frozen_skippable_range);
 
-			vmstatus = visibilitymap_get_status(vacrel->rel,
-												next_unskippable_block,
-												&vmbuffer);
-			if (vacrel->aggressive)
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-					break;
-			}
-			else
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-					break;
-			}
-			vacuum_delay_point();
-			next_unskippable_block++;
-		}
-	}
-
-	if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
-		skipping_blocks = true;
-	else
-		skipping_blocks = false;
+	/*
+	 * Decide whether or not we'll actually skip the first skippable range.
+	 *
+	 * We want to skip pages that are all-visible according to the visibility
+	 * map (or all-frozen in the aggressive case), but only when we can skip
+	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
+	 * sequentially, the OS should be doing readahead for us, so there's no
+	 * gain in skipping a page now and then; that's likely to disable
+	 * readahead and so be counterproductive.
+	 */
+	skipping_range = (vacrel->skipwithvm &&
+					  next_unskippable_block >= SKIP_PAGES_THRESHOLD);
 
 	for (blkno = 0; blkno < rel_pages; blkno++)
 	{
 		Buffer		buf;
 		Page		page;
-		bool		all_visible_according_to_vm = false;
+		bool		all_visible_according_to_vm;
 		LVPagePruneState prunestate;
 
-		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
-
-		update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP,
-								 blkno, InvalidOffsetNumber);
-
 		if (blkno == next_unskippable_block)
 		{
-			/* Time to advance next_unskippable_block */
-			next_unskippable_block++;
-			if (vacrel->skipwithvm)
-			{
-				while (next_unskippable_block < rel_pages)
-				{
-					uint8		vmskipflags;
-
-					vmskipflags = visibilitymap_get_status(vacrel->rel,
-														   next_unskippable_block,
-														   &vmbuffer);
-					if (vacrel->aggressive)
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-							break;
-					}
-					else
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-							break;
-					}
-					vacuum_delay_point();
-					next_unskippable_block++;
-				}
-			}
+			/*
+			 * We can't skip this block.  It might still be all-visible,
+			 * though.  This can happen when an aggressive VACUUM cannot skip
+			 * an all-visible block.
+			 */
+			all_visible_according_to_vm = all_visible_next_unskippable;
 
 			/*
-			 * We know we can't skip the current block.  But set up
-			 * skipping_blocks to do the right thing at the following blocks.
+			 * Determine a range of blocks to skip after we scan and process
+			 * this block.  We pass blkno + 1 as next_unskippable_block.  The
+			 * final next_unskippable_block won't change when there are no
+			 * blocks to skip (skippable blocks are those after blkno, but
+			 * before final next_unskippable_block).
 			 */
-			if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
-				skipping_blocks = true;
-			else
-				skipping_blocks = false;
+			next_unskippable_block =
+				lazy_scan_skip_range(vacrel, &vmbuffer, blkno + 1,
+									 &all_visible_next_unskippable,
+									 &all_frozen_skippable_range);
 
-			/*
-			 * Normally, the fact that we can't skip this block must mean that
-			 * it's not all-visible.  But in an aggressive vacuum we know only
-			 * that it's not all-frozen, so it might still be all-visible.
-			 */
-			if (vacrel->aggressive &&
-				VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
-				all_visible_according_to_vm = true;
+			/* Decide whether or not we'll actually skip the new range */
+			skipping_range =
+				(vacrel->skipwithvm &&
+				 next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD);
 		}
 		else
 		{
-			/*
-			 * The current page can be skipped if we've seen a long enough run
-			 * of skippable blocks to justify skipping it -- provided it's not
-			 * the last page in the relation (according to rel_pages).
-			 *
-			 * We always scan the table's last page to determine whether it
-			 * has tuples or not, even if it would otherwise be skipped. This
-			 * avoids having lazy_truncate_heap() take access-exclusive lock
-			 * on the table to attempt a truncation that just fails
-			 * immediately because there are tuples on the last page.
-			 */
-			if (skipping_blocks && blkno < rel_pages - 1)
+			/* Every block in the range must be safe to skip */
+			all_visible_according_to_vm = true;
+
+			Assert(blkno < next_unskippable_block);
+			Assert(blkno < rel_pages - 1);	/* see lazy_scan_skip_range */
+			Assert(!vacrel->aggressive || all_frozen_skippable_range);
+
+			if (skipping_range)
 			{
 				/*
-				 * Tricky, tricky.  If this is in aggressive vacuum, the page
-				 * must have been all-frozen at the time we checked whether it
-				 * was skippable, but it might not be any more.  We must be
-				 * careful to count it as a skipped all-frozen page in that
-				 * case, or else we'll think we can't update relfrozenxid and
-				 * relminmxid.  If it's not an aggressive vacuum, we don't
-				 * know whether it was initially all-frozen, so we have to
-				 * recheck.
+				 * If this range of blocks is not all-frozen, then we cannot
+				 * advance relfrozenxid later.  This is another reason for
+				 * SKIP_PAGES_THRESHOLD; it helps us to avoid losing out on
+				 * advancing relfrozenxid where it makes the least sense.
 				 */
-				if (vacrel->aggressive ||
-					VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-					vacrel->frozenskipped_pages++;
+				if (!all_frozen_skippable_range)
+					vacrel->skippedallvis = true;
 				continue;
 			}
 
-			/*
-			 * SKIP_PAGES_THRESHOLD (threshold for skipping) was not
-			 * crossed, or this is the last page.  Scan the page, even
-			 * though it's all-visible (and possibly even all-frozen).
-			 */
-			all_visible_according_to_vm = true;
+			/* We decided to not skip this range, so scan its page */
 		}
 
 		vacuum_delay_point();
@@ -1046,6 +970,11 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 		 */
 		vacrel->scanned_pages++;
 
+		/* Report as block scanned, update error traceback information */
+		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+		update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP,
+								 blkno, InvalidOffsetNumber);
+
 		/*
 		 * Regularly check if wraparound failsafe should trigger.
 		 *
@@ -1425,6 +1354,95 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 	Assert(!IsInParallelMode());
 }
 
+/*
+ * Set up a range of skippable blocks using visibility map.
+ *
+ * lazy_scan_heap() caller calls here every time it needs to set up a new
+ * range of blocks to skip via the visibility map.  Caller passes the block
+ * immediately after its last next_unskippable_block to set up a new range.
+ * We return a new next_unskippable_block for this range.  This is often a
+ * degenerate 0-page range (we return caller's next_unskippable_block when
+ * that happens).
+ *
+ * Sets *all_visible_next_unskippable describes whether the returned block can
+ * be assumed all-visible.  Also sets *all_frozen_skippable_range to indicate
+ * whether the range is known to contain any all-visible pages.
+ *
+ * When vacrel->aggressive is set, caller can't skip pages just because they
+ * are all-visible, but can still skip pages that are all-frozen, since such
+ * pages do not need freezing and do not affect the value that we can safely
+ * set for relfrozenxid or relminmxid.  *all_frozen_skippable_range is never
+ * set 'true' for aggressive callers for this reason.
+ *
+ * Note: If caller thinks that one of the pages from the range is all-visible
+ * or all-frozen when in fact the flag's just been cleared, caller might fail
+ * to vacuum the page.  It's easy to see that skipping a page in a VACUUM that
+ * ultimately cannot advance relfrozenxid or relminmxid is not a very big
+ * deal; we might leave some dead tuples lying around, but the next vacuum
+ * will find them.  But even in VACUUMs that *are* capable of advancing
+ * relfrozenxid, it's still OK if we miss a page whose all-frozen marking gets
+ * concurrently cleared.  Any new XIDs from such a page must be >= OldestXmin,
+ * and so cannot invalidate NewRelfrozenXid tracking.  A similar argument
+ * applies for NewRelminMxid tracking and OldestMxact.
+ */
+static BlockNumber
+lazy_scan_skip_range(LVRelState *vacrel, Buffer *vmbuffer,
+					 BlockNumber next_unskippable_block,
+					 bool *all_visible_next_unskippable,
+					 bool *all_frozen_skippable_range)
+{
+	BlockNumber rel_pages = vacrel->rel_pages;
+
+	*all_visible_next_unskippable = true;
+	*all_frozen_skippable_range = true;
+
+	while (next_unskippable_block < rel_pages)
+	{
+		uint8		vmstatus;
+
+		vmstatus = visibilitymap_get_status(vacrel->rel,
+											next_unskippable_block,
+											vmbuffer);
+		if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			Assert((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0);
+			*all_visible_next_unskippable = false;
+			break;
+		}
+
+		/*
+		 * We always scan the table's last page later to determine whether it
+		 * has tuples or not, even if it would otherwise be skipped.  This
+		 * avoids having lazy_truncate_heap() take access-exclusive lock on
+		 * the table to attempt a truncation that just fails immediately
+		 * because there are tuples on the last page.
+		 */
+		if (next_unskippable_block == rel_pages - 1)
+		{
+			/* Last block case need only set all_visible_next_unskippable */
+			Assert(*all_visible_next_unskippable);
+			break;
+		}
+
+		if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+		{
+			if (vacrel->aggressive)
+				break;
+
+			/*
+			 * This block may be skipped too.  It's not all-frozen, though, so
+			 * entire skippable range will be deemed not-all-frozen.
+			 */
+			*all_frozen_skippable_range = false;
+		}
+
+		vacuum_delay_point();
+		next_unskippable_block++;
+	}
+
+	return next_unskippable_block;
+}
+
 /*
  *	lazy_scan_new_or_empty() -- lazy_scan_heap() new/empty page handling.
  *
-- 
2.30.2

Reply via email to