Bug ERROR: missing chunk number 0 for toast value.
The bug #18351 was previously reported in  
https://www.postgresql.org/message-id/flat/18351-f6e06364b3a2e669%40postgresql.org
  but not resolved.
I have made reproducing easier, figured out the cause of the bug, and developed
a prototype patch, though it has known issues I'd like feedback on.
 
Reproduction
============
Tested on PostgreSQL 17.10
  1) Terminal 1:
     psql -d postgres
  2) Terminal 1:
     DROP TABLE IF EXISTS tbl;
     CREATE TABLE tbl (i int, t text);
     CREATE INDEX ON tbl (i);
     ALTER TABLE tbl ALTER COLUMN t SET STORAGE EXTERNAL;
     INSERT INTO tbl(i, t) VALUES (1, repeat('1234567890', 250));
  3) Terminal 2:
     psql -d postgres
  4) Terminal 2:
     BEGIN;
     SELECT txid_current();
  5) Terminal 3:
     createdb d1
  6) Terminal 3:
     psql -d d1
  7) Terminal 3:
     BEGIN;
     SELECT txid_current();
  8) Terminal 1:
     DELETE FROM tbl WHERE i = 1;
  9) Attach gdb to the backend from terminal 1
  10) Set breakpoint at vacuum_rel(toast_relid, NULL, &toast_vacuum_params,
      bstrategy); (line 2300 in src/backend/commands/vacuum.c)
  11) Terminal 1:
      VACUUM(VERBOSE) tbl;
  12) Wait for the breakpoint to be hit
  13) Terminal 2:
      COMMIT; (or just \q)
  14) Detach the process in gdb
  15) Terminal 1:
      CREATE INDEX ON tbl(t);
 
This should produce: ERROR: missing chunk number 0 for toast value …
 
The bug stems from different horizons in VACUUM table and its TOAST.
Two key mechanisms are involved:
1) Horizon computation (ComputeXidHorizons, called by
   GetOldestNonRemovableTransactionId): iterates over processes in the
   procarray, but skips those with PROC_IN_VACUUM set, and only considers
   processes in the same database selecting the minimum for the
   data_oldest_nonremovable. This value also feeds into
   GlobalVisState->definitely_needed (which can only grow).
2) Snapshot computation (GetSnapshotData): also skips PROC_IN_VACUUM
   processes, but does NOT filter by database — transactions in all databases
   contribute to the snapshot's xmin.
 
During the main table's VACUUM, a transaction in the same database holds the
data horizon up, so the tuple survives (it is RECENTLY_DEAD) — but by the
time we vacuum the TOAST table, that transaction has committed. The TOAST
tuples get removed, because with no other active processes in this database
OldestXmin become max computed (that is >xmax).
Later, CREATE INDEX on the main table computes its own OldestXmin. Our process
is no longer in VACUUM, so its xmin (carried over from the snapshot - minimum
txid obtained from backend in another database) is now considered. This xmin
is less than the dead tuple's xmax, so HeapTupleSatisfiesVacuum classifies
it as RECENTLY_DEAD rather than DEAD. CREATE INDEX tries to fetch the TOAST
value — but it's already gone.

Prototype patch
===============
The core idea: when vacuuming a TOAST table, reuse the OldestXmin that was
computed for the parent table, rather than computing a fresh one that
may have advanced past it.
 
The prototype patch adds two fields to VacuumParams:
  - cached_parent_oldest_xmin: stores the OldestXmin from the parent table
  - cached_parent_cutoffs_valid: indicates the cached value is usable
 
In heap_vacuum_rel(), if we're vacuuming a TOAST table and the parent's
OldestXmin is available, we use it instead of calling
GetOldestNonRemovableTransactionId() again. This prevents the TOAST vacuum
from removing tuples (based on OldestXmin and definitely_needed) that the
main table's vacuum considered still visible.
 
Known issues
============
Make check fails. One of the problems is cutoff for removing and freezing
tuples is far in the past. This causes assertion failures and incorrect
freezing behavior.
 
Alternative approach
====================
An alternative would be to add a definitely_needed check alongside
OldestXmin in create index, so that a tuple classified as RECENTLY_DEAD
by OldestXmin would be reclassified as DEAD if definitely_needed says it's safe
to remove. However, this adds an extra visibility check during CREATE INDEX,
which could cause a performance regression.
I'd appreciate comments on whether the "cache parent OldestXmin for TOAST
vacuum" approach worth pursuing, despite the freezing complications?
Or is there a cleaner way?
 
Best regards,
Ekaterina Testova, Postgres Professional
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index cb28830..26115ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -429,6 +429,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	vacrel->live_tuples = 0;
 	vacrel->recently_dead_tuples = 0;
 	vacrel->missed_dead_tuples = 0;
+	vacrel->cutoffs.OldestXmin = InvalidTransactionId;
 
 	/*
 	 * Get cutoffs that determine which deleted tuples are considered DEAD,
@@ -446,7 +447,22 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * want to teach lazy_scan_prune to recompute vistest from time to time,
 	 * to increase the number of dead tuples it can prune away.)
 	 */
-	vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
+	/* If this is a TOAST and we have a horizon from the parent, use it */
+	if (params->toast_parent != InvalidOid && params->cached_parent_cutoffs_valid)
+	{
+		vacrel->cutoffs.OldestXmin = params->cached_parent_oldest_xmin;
+		vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
+	}
+	else
+	{
+		vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
+		if (params->toast_parent == InvalidOid)
+		{
+			params->cached_parent_oldest_xmin = vacrel->cutoffs.OldestXmin;
+			params->cached_parent_cutoffs_valid = true;
+		}
+	}
+
 	vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
 	vacrel->vistest = GlobalVisTestFor(rel);
 	/* Initialize state used to track oldest extant XID/MXID */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index c42c8a6..0b8442f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -628,6 +628,7 @@ vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy,
 				 * to avoid affecting other relations.
 				 */
 				memcpy(&params_copy, params, sizeof(VacuumParams));
+				params_copy.cached_parent_cutoffs_valid = false;
 
 				if (!vacuum_rel(vrel->oid, vrel->relation, &params_copy, bstrategy))
 					continue;
@@ -1117,7 +1118,8 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
 	 * that only one vacuum process can be working on a particular table at
 	 * any time, and that each vacuum is always an independent transaction.
 	 */
-	cutoffs->OldestXmin = GetOldestNonRemovableTransactionId(rel);
+	if (!TransactionIdIsValid(cutoffs->OldestXmin))
+		cutoffs->OldestXmin = GetOldestNonRemovableTransactionId(rel);
 
 	Assert(TransactionIdIsNormal(cutoffs->OldestXmin));
 
@@ -2297,6 +2299,12 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params,
 		toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
 		toast_vacuum_params.toast_parent = relid;
 
+		if (params->cached_parent_cutoffs_valid)
+		{
+			toast_vacuum_params.cached_parent_oldest_xmin = params->cached_parent_oldest_xmin;
+			toast_vacuum_params.cached_parent_cutoffs_valid = true;
+		}
+
 		vacuum_rel(toast_relid, NULL, &toast_vacuum_params, bstrategy);
 	}
 
@@ -2309,7 +2317,6 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params,
 	return true;
 }
 
-
 /*
  * Open all the vacuumable indexes of the given relation, obtaining the
  * specified kind of lock on each.  Return an array of Relation pointers for
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 759f9a8..d5ae2bf 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -237,6 +237,9 @@ typedef struct VacuumParams
 	 * disabled.
 	 */
 	int			nworkers;
+
+	bool		cached_parent_cutoffs_valid; /* Flag for cache cutoffs to TOAST */
+	TransactionId	cached_parent_oldest_xmin;
 } VacuumParams;
 
 /*

Reply via email to