On Tue, 15 Jun 2021 at 03:22, Andres Freund <and...@anarazel.de> wrote:
>
> Hi,
>
> > @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state)
> >  static void
> >  GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons)
> >  {
> > +     /* assert non-decreasing nature of horizons */
>
> Thinking more about it, I don't think these are correct. See the
> following comment in procarray.c:
>
>  * Note: despite the above, it's possible for the calculated values to move
>  * backwards on repeated calls.

So the implicit assumption in heap_page_prune that
HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
we'll need to redo the condition in heap_page_prune as well.

PFA my adapted patch that fixes this new-ish issue, and does not
include the (incorrect) assertions in GlobalVisUpdateApply. I've
tested this against the reproducing case, both with and without the
fix in GetOldestNonRemovableTransactionId, and it fails fall into an
infinite loop.

I would appreciate it if someone could validate the new logic in the
HEAPTUPLE_DEAD case. Although I believe it correctly handles the case
where the vistest non-removable horizon moved backwards, a second pair
of eyes would be appreciated.


With regards,

Matthias van de Meent
From f4b8de3c53b9f2ff5c8ac2907dafd328d2ec787a Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm@gmail.com>
Date: Wed, 9 Jun 2021 17:26:34 +0200
Subject: [PATCH v2] Fix a bug in GetOldestNonRemovableTransactionId

GetOldestNonRemovableTransactionId(rel) did not return values consistent
with GlobalVisTestFor(rel).

Additionally, lazy_scan_prune had an incorrect assumption that GlobalVis*Rels
would never have an OldestXmin < vacrel->OldestXmin, which is incorrect.
This assumption is now fixed, and the changes have been documented.
---
 src/backend/access/heap/vacuumlazy.c | 29 +++++++++++++++++++++++++++-
 src/backend/storage/ipc/procarray.c  | 21 +++++++++++++-------
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4b600e951a..f4320d5a34 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1675,6 +1675,12 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
  * that any items that make it into the dead_tuples array are simple LP_DEAD
  * line pointers, and that every remaining item with tuple storage is
  * considered as a candidate for freezing.
+ * 
+ * Note: It is possible that vistest's window moves back from the
+ * vacrel->OldestXmin (see ComputeXidHorizons). To prevent an infinite
+ * loop where we bounce between HeapTupleSatisfiesVacuum and 
+ * heap_prune_satisfies_vacuum who disagree on the [almost]deadness of
+ * a tuple, we only retry when we know HTSV agrees with HPSV.
  */
 static void
 lazy_scan_prune(LVRelState *vacrel,
@@ -1798,7 +1804,28 @@ retry:
 		res = HeapTupleSatisfiesVacuum(&tuple, vacrel->OldestXmin, buf);
 
 		if (unlikely(res == HEAPTUPLE_DEAD))
-			goto retry;
+		{
+			/* If the horizon of the visibility test has not moved backwards, retry */
+			if (likely(TransactionIdPrecedesOrEquals(vacrel->OldestXmin,
+													 GlobalVisTestNonRemovableHorizon(vistest))))
+				goto retry;
+
+			/*
+			 * If the vistest horizon moved backward in relation to 
+			 * OldestXmin, we cannot retry (lest we create an infinite loop).
+			 * 
+			 * We also cannot feed the tuple into heap_prepare_freeze_tuple,
+			 * as that function is not built to handle DEAD tuples. To
+			 * correctly handle the rest of the data on the page we update
+			 * statistics and continue the loop.
+			 */
+			num_tuples++;
+			new_dead_tuples++;
+			prunestate->all_visible = false;
+			prunestate->hastup = true;
+			prunestate->all_frozen = false;
+			continue;
+		}
 
 		/*
 		 * The criteria for counting a tuple as live in this block need to
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 42a89fc5dc..deb6edd168 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1944,18 +1944,25 @@ TransactionId
 GetOldestNonRemovableTransactionId(Relation rel)
 {
 	ComputeXidHorizonsResult horizons;
+	TransactionId	result;
 
 	ComputeXidHorizons(&horizons);
 
-	/* select horizon appropriate for relation */
-	if (rel == NULL || rel->rd_rel->relisshared)
-		return horizons.shared_oldest_nonremovable;
-	else if (RelationIsAccessibleInLogicalDecoding(rel))
-		return horizons.catalog_oldest_nonremovable;
+	/* select horizon appropriate for relation. Mirrored from GlobalVisTestFor */
+	if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
+		result = horizons.shared_oldest_nonremovable;
+	else if (IsCatalogRelation(rel) || RelationIsAccessibleInLogicalDecoding(rel))
+		result = horizons.catalog_oldest_nonremovable;
 	else if (RELATION_IS_LOCAL(rel))
-		return horizons.temp_oldest_nonremovable;
+		result = horizons.temp_oldest_nonremovable;
 	else
-		return horizons.data_oldest_nonremovable;
+		result = horizons.data_oldest_nonremovable;
+
+	/* Sanity check */
+	Assert(TransactionIdEquals(
+		XidFromFullTransactionId(GlobalVisTestFor(rel)->maybe_needed),
+							result));
+	return result;
 }
 
 /*
-- 
2.20.1

Reply via email to