From 379ec8259cc3b2f79e397d83370cad44aee5a70c Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 5 Aug 2022 16:40:08 -0700
Subject: [PATCH v1] Avoid reltuples distortion in very small tables.

Consistently avoid trusting a sample of only one page at the point that
VACUUM determines a new reltuples for the target table (though not when
the table is <= 1 page in size, since then it's not merely a sample).

This is follow-up work to commit 74388a1a, which added heuristics that
prevented reltuples from becoming distorted by successive VACUUM
operations that only scan one page.  The earlier commit failed to
account for remaining cases where VACUUM scans only one page of a table
that is small enough that a single page is greater than 2% of its total
size, yet big enough that VACUUM will skip most of its pages using the
visibility map (vacuumlazy.c won't skip any page when rel_pages is below
the SKIP_PAGES_THRESHOLD threshold for skipping).
---
 src/backend/commands/vacuum.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8df25f59d..dbdfe8bd2 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1234,31 +1234,25 @@ vac_estimate_reltuples(Relation relation,
 	if (scanned_pages >= total_pages)
 		return scanned_tuples;
 
-	/*
-	 * If scanned_pages is zero but total_pages isn't, keep the existing value
-	 * of reltuples.  (Note: we might be returning -1 in this case.)
-	 */
-	if (scanned_pages == 0)
-		return old_rel_tuples;
-
 	/*
 	 * When successive VACUUM commands scan the same few pages again and
 	 * again, without anything from the table really changing, there is a risk
 	 * that our beliefs about tuple density will gradually become distorted.
-	 * It's particularly important to avoid becoming confused in this way due
-	 * to vacuumlazy.c implementation details.  For example, the tendency for
-	 * our caller to always scan the last heap page should not ever cause us
-	 * to believe that every page in the table must be just like the last
-	 * page.
+	 * This might be caused by vacuumlazy.c implementation details, such as
+	 * its tendency to always scan the last heap page.  Handle that here.
 	 *
-	 * We apply a heuristic to avoid these problems: if the relation is
-	 * exactly the same size as it was at the end of the last VACUUM, and only
-	 * a few of its pages (less than a quasi-arbitrary threshold of 2%) were
-	 * scanned by this VACUUM, assume that reltuples has not changed at all.
+	 * If the relation is _exactly_ the same size according to the existing
+	 * pg_class entry, and only a few of its pages (less than 2%) were
+	 * scanned, keep the existing value of reltuples.  Also keep the existing
+	 * value when only a subset of rel's pages <= a single page were scanned.
+	 *
+	 * (Note: we might be returning -1 here.)
 	 */
 	if (old_rel_pages == total_pages &&
 		scanned_pages < (double) total_pages * 0.02)
 		return old_rel_tuples;
+	if (scanned_pages <= 1)
+		return old_rel_tuples;
 
 	/*
 	 * If old density is unknown, we can't do much except scale up
-- 
2.32.0

