Re: Pluggable Storage - Andres's take

Andres Freund Wed, 15 May 2019 11:55:53 -0700

Hi,

On 2019-04-25 15:43:15 -0700, Andres Freund wrote:
> Hm. I think some of those changes would be a bit bigger than I initially
> though. Attached is a more minimal fix that'd route
> RelationGetNumberOfBlocksForFork() through tableam if necessary.  I
> think it's definitely the right answer for 1), probably the pragmatic
> answer to 2), but certainly not for 3).


> I've for now made the AM return the size in bytes, and then convert that
> into blocks in RelationGetNumberOfBlocksForFork(). Most postgres callers
> are going to continue to want it internally as pages (otherwise there's
> going to be way too much churn, without a benefit I can see). So I
> thinkt that's OK.
> 
> There's also a somewhat weird bit of returning the total relation size
> for InvalidForkNumber - it's pretty likely that other AMs wouldn't use
> postgres' current forks, but have equivalent concepts. And without that
> there'd be no way to get that size.  I'm not sure I like this, input
> welcome. But it seems good to offer the ability to get the entire size
> somehow.

I'm still reasonably happy with this.  I'll polish it a bit and push.


> > 3) nodeTidscan, skipping over too large tids
> >    I think this should just be moved into the AMs, there's no need to
> >    have this in nodeTidscan.c
> 
> I think here it's *not* actually correct at all to use the relation
> size. It's currently doing:
> 
>       /*
>        * We silently discard any TIDs that are out of range at the time of 
> scan
>        * start.  (Since we hold at least AccessShareLock on the table, it 
> won't
>        * be possible for someone to truncate away the blocks we intend to
>        * visit.)
>        */
>       nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);
> 
> which is fine (except for a certain abstraction leakage) for an AM like
> heap or zheap, but I suspect strongly that that's not ok for Ashwin &
> Heikki's approach where tid isn't tied to physical representation.
> 
> 
> The obvious answer would be to just move that check into the
> table_fetch_row_version implementation (currently just calling
> heap_fetch()) - but that doesn't seem OK from a performance POV, because
> we'd then determine the relation size once for each tid, rather than
> once per tidscan.  And it'd also check in cases where we know the tid is
> supposed to be valid (e.g. fetching trigger tuples and such).
> 
> The proper fix seems to be to introduce a new scan variant
> (e.g. table_beginscan_tid()), and then have table_fetch_row_version take
> a scan as a parameter.  But it seems we'd have to introduce that as a
> separate tableam callback, because we'd not want to incur the overhead
> of creating an additional scan / RelationGetNumberOfBlocks() checks for
> triggers et al.

Attached is a prototype of a variation of this. I added a
table_tuple_tid_valid(TableScanDesc sscan, ItemPointer tid)
callback / wrapper. Currently it just takes a "plain" scan, but we could
add a separate table_beginscan variant too.

For heap that just means we can just use HeapScanDesc's rs_nblock to
filter out invalid tids, and we only need to call
RelationGetNumberOfBlocks() once, rather than every
table_tuple_tid_valid(0 / table_get_latest_tid() call. Which is a good
improvement for nodeTidscan's table_get_latest_tid() call (for WHERE
CURRENT OF) - which previously computed the relation size once per
tuple.

Needs a bit of polishing, but I think this is the right direction?
Unless somebody protests, I'm going to push something along those lines
quite soon.

Greetings,

Andres Freund

>From 5c8edd1803619120c9ac856e7353943281f2d407 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Tue, 14 May 2019 23:50:41 -0700
Subject: [PATCH v2 1/2] tableam: Don't assume that an AM uses md.c style
 storage.

---
 src/backend/access/heap/heapam_handler.c | 27 +++++++++++++++
 src/backend/access/table/tableamapi.c    |  3 ++
 src/backend/storage/buffer/bufmgr.c      | 43 ++++++++++++++++++++++--
 src/include/access/tableam.h             | 40 ++++++++++++++++++++++
 4 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 00505ec3f4d..96211c673e1 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1975,6 +1975,31 @@ heapam_scan_get_blocks_done(HeapScanDesc hscan)
 }
 
 
+/* ------------------------------------------------------------------------
+ * Miscellaneous callbacks for the heap AM
+ * ------------------------------------------------------------------------
+ */
+
+static uint64
+heapam_relation_size(Relation rel, ForkNumber forkNumber)
+{
+	uint64		nblocks = 0;
+
+	/* Open it at the smgr level if not already done */
+	RelationOpenSmgr(rel);
+
+	/* InvalidForkNumber indicates the size of all forks */
+	if (forkNumber == InvalidForkNumber)
+	{
+		for (int i = 0; i < MAX_FORKNUM; i++)
+			nblocks += smgrnblocks(rel->rd_smgr, i);
+	}
+	else
+		nblocks = smgrnblocks(rel->rd_smgr, forkNumber);
+
+	return nblocks * BLCKSZ;
+}
+
 
 /* ------------------------------------------------------------------------
  * Planner related callbacks for the heap AM
@@ -2556,6 +2581,8 @@ static const TableAmRoutine heapam_methods = {
 	.index_build_range_scan = heapam_index_build_range_scan,
 	.index_validate_scan = heapam_index_validate_scan,
 
+	.relation_size = heapam_relation_size,
+
 	.relation_estimate_size = heapam_estimate_rel_size,
 
 	.scan_bitmap_next_block = heapam_scan_bitmap_next_block,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 0053dc95cab..32877e7674f 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -86,6 +86,9 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_analyze_next_tuple != NULL);
 	Assert(routine->index_build_range_scan != NULL);
 	Assert(routine->index_validate_scan != NULL);
+
+	Assert(routine->relation_size != NULL);
+
 	Assert(routine->relation_estimate_size != NULL);
 
 	/* optional, but one callback implies presence of hte other */
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 887023fc8a5..fae290b4dbd 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -33,6 +33,7 @@
 #include <sys/file.h>
 #include <unistd.h>
 
+#include "access/tableam.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -2789,14 +2790,50 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 /*
  * RelationGetNumberOfBlocksInFork
  *		Determines the current number of pages in the specified relation fork.
+ *
+ * Note that the accuracy of the result will depend on the details of the
+ * relation's storage. For builtin AMs it'll be accurate, but for external AMs
+ * it might not be.
  */
 BlockNumber
 RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum)
 {
-	/* Open it at the smgr level if not already done */
-	RelationOpenSmgr(relation);
+	switch (relation->rd_rel->relkind)
+	{
+		case RELKIND_SEQUENCE:
+		case RELKIND_INDEX:
+		case RELKIND_PARTITIONED_INDEX:
+			/* Open it at the smgr level if not already done */
+			RelationOpenSmgr(relation);
 
-	return smgrnblocks(relation->rd_smgr, forkNum);
+			return smgrnblocks(relation->rd_smgr, forkNum);
+
+		case RELKIND_RELATION:
+		case RELKIND_TOASTVALUE:
+		case RELKIND_MATVIEW:
+			{
+				/*
+				 * Not every table AM uses BLCKSZ wide fixed size
+				 * blocks. Therefore tableam returns the size in bytes - but
+				 * for the purpose of this routine, we want the number of
+				 * blocks. Therefore divide, rounding up.
+				 */
+				uint64 szbytes;
+
+				szbytes = table_relation_size(relation, forkNum);
+
+				return (szbytes + (BLCKSZ - 1)) / BLCKSZ;
+			}
+		case RELKIND_VIEW:
+		case RELKIND_COMPOSITE_TYPE:
+		case RELKIND_FOREIGN_TABLE:
+		case RELKIND_PARTITIONED_TABLE:
+		default:
+			Assert(false);
+			break;
+	}
+
+	return 0; /* satisfy compiler */
 }
 
 /*
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index ebfa0d51855..e2062d808ef 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -540,6 +540,22 @@ typedef struct TableAmRoutine
 										struct ValidateIndexState *state);
 
 
+	/* ------------------------------------------------------------------------
+	 * Miscellaneous functions.
+	 * ------------------------------------------------------------------------
+	 */
+
+	/*
+	 * See table_relation_size().
+	 *
+	 * Note that currently a few callers use the MAIN_FORKNUM size to vet the
+	 * validity of tids (e.g. nodeTidscans.c), and others use it to figure out
+	 * the range of potentially interesting blocks (brin, analyze). The
+	 * abstraction around this will need to be improved in the near future.
+	 */
+	uint64		(*relation_size) (Relation rel, ForkNumber forkNumber);
+
+
 	/* ------------------------------------------------------------------------
 	 * Planner related functions.
 	 * ------------------------------------------------------------------------
@@ -550,6 +566,10 @@ typedef struct TableAmRoutine
 	 *
 	 * While block oriented, it shouldn't be too hard for an AM that doesn't
 	 * doesn't internally use blocks to convert into a usable representation.
+	 *
+	 * This differs from the relation_size callback by returning size
+	 * estimates (both relation size and tuple count) for planning purposes,
+	 * rather than returning a currently correct estimate.
 	 */
 	void		(*relation_estimate_size) (Relation rel, int32 *attr_widths,
 										   BlockNumber *pages, double *tuples,
@@ -1503,6 +1523,26 @@ table_index_validate_scan(Relation heap_rel,
 }
 
 
+/* ----------------------------------------------------------------------------
+ * Miscellaneous functionality
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Return the current size of `rel` in bytes. If `forkNumber` is
+ * InvalidForkNumber return the relation's overall size, otherwise the size
+ * for the indicated fork.
+ *
+ * Note that the overall size might not be the equivalent of the sum of sizes
+ * for the individual forks for some AMs, e.g. because the AMs storage does
+ * not neatly map onto the builtin types of forks.
+ */
+static inline uint64
+table_relation_size(Relation rel, ForkNumber forkNumber)
+{
+	return rel->rd_tableam->relation_size(rel, forkNumber);
+}
+
 /* ----------------------------------------------------------------------------
  * Planner related functionality
  * ----------------------------------------------------------------------------
-- 
2.21.0.dirty

>From 1518a2adff7d6fdd7f2aefd6fbe6b66d179a7f7d Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Wed, 15 May 2019 11:52:01 -0700
Subject: [PATCH v2 2/2] tableam: Avoid relying on relation size to determine
 validity of tids.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam.c         | 26 +++-----
 src/backend/access/heap/heapam_handler.c | 11 +++-
 src/backend/access/table/tableam.c       | 27 +++++++++
 src/backend/executor/nodeTidscan.c       | 77 +++++++++++++++---------
 src/backend/utils/adt/tid.c              | 10 ++-
 src/include/access/heapam.h              |  3 +-
 src/include/access/tableam.h             | 37 ++++++++----
 7 files changed, 129 insertions(+), 62 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ec9853603fd..d8d4f3b1f5a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1654,8 +1654,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 /*
  *	heap_get_latest_tid -  get the latest tid of a specified tuple
  *
- * Actually, this gets the latest version that is visible according to
- * the passed snapshot.  You can pass SnapshotDirty to get the very latest,
+ * Actually, this gets the latest version that is visible according to the
+ * scan's snapshot.  Create a scan using SnapshotDirty to get the very latest,
  * possibly uncommitted version.
  *
  * *tid is both an input and an output parameter: it is updated to
@@ -1663,28 +1663,20 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
  * if no version of the row passes the snapshot test.
  */
 void
-heap_get_latest_tid(Relation relation,
-					Snapshot snapshot,
+heap_get_latest_tid(TableScanDesc sscan,
 					ItemPointer tid)
 {
-	BlockNumber blk;
+	Relation relation = sscan->rs_rd;
+	Snapshot snapshot = sscan->rs_snapshot;
 	ItemPointerData ctid;
 	TransactionId priorXmax;
 
-	/* this is to avoid Assert failures on bad input */
-	if (!ItemPointerIsValid(tid))
-		return;
-
 	/*
-	 * Since this can be called with user-supplied TID, don't trust the input
-	 * too much.  (RelationGetNumberOfBlocks is an expensive check, so we
-	 * don't check t_ctid links again this way.  Note that it would not do to
-	 * call it just once and save the result, either.)
+	 * table_get_latest_tid verified that the passed in tid is valid.  Assume
+	 * that t_ctid links are valid however - there shouldn't be invalid ones
+	 * in the table.
 	 */
-	blk = ItemPointerGetBlockNumber(tid);
-	if (blk >= RelationGetNumberOfBlocks(relation))
-		elog(ERROR, "block number %u is out of range for relation \"%s\"",
-			 blk, RelationGetRelationName(relation));
+	Assert(ItemPointerIsValid(tid));
 
 	/*
 	 * Loop to chase down t_ctid links.  At top of loop, ctid is the tuple we
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 96211c673e1..d7e1f5eb724 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -73,7 +73,6 @@ heapam_slot_callbacks(Relation relation)
 	return &TTSOpsBufferHeapTuple;
 }
 
-
 /* ------------------------------------------------------------------------
  * Index Scan Callbacks for heap AM
  * ------------------------------------------------------------------------
@@ -204,6 +203,15 @@ heapam_fetch_row_version(Relation relation,
 	return false;
 }
 
+static bool
+heapam_tuple_tid_valid(TableScanDesc scan, ItemPointer tid)
+{
+	HeapScanDesc hscan = (HeapScanDesc) scan;
+
+	return ItemPointerIsValid(tid) &&
+		ItemPointerGetBlockNumber(tid) < hscan->rs_nblocks;
+}
+
 static bool
 heapam_tuple_satisfies_snapshot(Relation rel, TupleTableSlot *slot,
 								Snapshot snapshot)
@@ -2568,6 +2576,7 @@ static const TableAmRoutine heapam_methods = {
 
 	.tuple_fetch_row_version = heapam_fetch_row_version,
 	.tuple_get_latest_tid = heap_get_latest_tid,
+	.tuple_tid_valid = heapam_tuple_tid_valid,
 	.tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot,
 	.compute_xid_horizon_for_tuples = heap_compute_xid_horizon_for_tuples,
 
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index baba1ea699b..6e46befdfd9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -213,6 +213,33 @@ table_index_fetch_tuple_check(Relation rel,
 }
 
 
+/* ------------------------------------------------------------------------
+ * Functions for non-modifying operations on individual tuples
+ * ------------------------------------------------------------------------
+ */
+
+void
+table_get_latest_tid(TableScanDesc scan, ItemPointer tid)
+{
+	Relation rel = scan->rs_rd;
+	const TableAmRoutine *tableam = rel->rd_tableam;
+
+	/*
+	 * Since this can be called with user-supplied TID, don't trust the input
+	 * too much.
+	 */
+	if (!tableam->tuple_tid_valid(scan, tid))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("tid (%u, %u) is not valid for relation for relation \"%s\"",
+						ItemPointerGetBlockNumberNoCheck(tid),
+						ItemPointerGetOffsetNumberNoCheck(tid),
+						RelationGetRelationName(rel))));
+
+	return tableam->tuple_get_latest_tid(scan, tid);
+}
+
+
 /* ----------------------------------------------------------------------------
  * Functions to make modifications a bit simpler.
  * ----------------------------------------------------------------------------
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 156be56a57d..63802a53419 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -129,20 +129,12 @@ static void
 TidListEval(TidScanState *tidstate)
 {
 	ExprContext *econtext = tidstate->ss.ps.ps_ExprContext;
-	BlockNumber nblocks;
+	TableScanDesc scan = tidstate->ss.ss_currentScanDesc;
 	ItemPointerData *tidList;
 	int			numAllocTids;
 	int			numTids;
 	ListCell   *l;
 
-	/*
-	 * We silently discard any TIDs that are out of range at the time of scan
-	 * start.  (Since we hold at least AccessShareLock on the table, it won't
-	 * be possible for someone to truncate away the blocks we intend to
-	 * visit.)
-	 */
-	nblocks = RelationGetNumberOfBlocks(tidstate->ss.ss_currentRelation);
-
 	/*
 	 * We initialize the array with enough slots for the case that all quals
 	 * are simple OpExprs or CurrentOfExprs.  If there are any
@@ -165,19 +157,26 @@ TidListEval(TidScanState *tidstate)
 				DatumGetPointer(ExecEvalExprSwitchContext(tidexpr->exprstate,
 														  econtext,
 														  &isNull));
-			if (!isNull &&
-				ItemPointerIsValid(itemptr) &&
-				ItemPointerGetBlockNumber(itemptr) < nblocks)
+			if (isNull)
+				continue;
+
+			/*
+			 * We silently discard any TIDs that are out of range at the time
+			 * of scan start.  (Since we hold at least AccessShareLock on the
+			 * table, it won't be possible for someone to truncate away the
+			 * blocks we intend to visit.)
+			 */
+			if (!table_tuple_tid_valid(scan, itemptr))
+				continue;
+
+			if (numTids >= numAllocTids)
 			{
-				if (numTids >= numAllocTids)
-				{
-					numAllocTids *= 2;
-					tidList = (ItemPointerData *)
-						repalloc(tidList,
-								 numAllocTids * sizeof(ItemPointerData));
-				}
-				tidList[numTids++] = *itemptr;
+				numAllocTids *= 2;
+				tidList = (ItemPointerData *)
+					repalloc(tidList,
+							 numAllocTids * sizeof(ItemPointerData));
 			}
+			tidList[numTids++] = *itemptr;
 		}
 		else if (tidexpr->exprstate && tidexpr->isarray)
 		{
@@ -206,13 +205,15 @@ TidListEval(TidScanState *tidstate)
 			}
 			for (i = 0; i < ndatums; i++)
 			{
-				if (!ipnulls[i])
-				{
-					itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
-					if (ItemPointerIsValid(itemptr) &&
-						ItemPointerGetBlockNumber(itemptr) < nblocks)
-						tidList[numTids++] = *itemptr;
-				}
+				if (ipnulls[i])
+					continue;
+
+				itemptr = (ItemPointer) DatumGetPointer(ipdatums[i]);
+
+				if (!table_tuple_tid_valid(scan, itemptr))
+					continue;
+
+				tidList[numTids++] = *itemptr;
 			}
 			pfree(ipdatums);
 			pfree(ipnulls);
@@ -306,6 +307,7 @@ TidNext(TidScanState *node)
 	EState	   *estate;
 	ScanDirection direction;
 	Snapshot	snapshot;
+	TableScanDesc scan;
 	Relation	heapRelation;
 	TupleTableSlot *slot;
 	ItemPointerData *tidList;
@@ -325,8 +327,17 @@ TidNext(TidScanState *node)
 	 * First time through, compute the list of TIDs to be visited
 	 */
 	if (node->tss_TidList == NULL)
-		TidListEval(node);
+	{
+		Assert(node->ss.ss_currentScanDesc == NULL);
 
+		node->ss.ss_currentScanDesc =
+			table_beginscan(node->ss.ss_currentRelation,
+							estate->es_snapshot,
+							0, NULL);
+		TidListEval(node);
+	}
+
+	scan = node->ss.ss_currentScanDesc;
 	tidList = node->tss_TidList;
 	numTids = node->tss_NumTids;
 
@@ -365,7 +376,7 @@ TidNext(TidScanState *node)
 		 * current according to our snapshot.
 		 */
 		if (node->tss_isCurrentOf)
-			table_get_latest_tid(heapRelation, snapshot, &tid);
+			table_get_latest_tid(scan, &tid);
 
 		if (table_fetch_row_version(heapRelation, &tid, snapshot, slot))
 			return slot;
@@ -436,8 +447,13 @@ ExecTidScan(PlanState *pstate)
 void
 ExecReScanTidScan(TidScanState *node)
 {
+	if (node->ss.ss_currentScanDesc)
+		table_endscan(node->ss.ss_currentScanDesc);
+
 	if (node->tss_TidList)
 		pfree(node->tss_TidList);
+
+	node->ss.ss_currentScanDesc = NULL;
 	node->tss_TidList = NULL;
 	node->tss_NumTids = 0;
 	node->tss_TidPtr = -1;
@@ -455,6 +471,9 @@ ExecReScanTidScan(TidScanState *node)
 void
 ExecEndTidScan(TidScanState *node)
 {
+	if (node->ss.ss_currentScanDesc)
+		table_endscan(node->ss.ss_currentScanDesc);
+
 	/*
 	 * Free the exprcontext
 	 */
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 6ab26d8ea8b..1aab30b6aab 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -358,6 +358,7 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	Relation	rel;
 	AclResult	aclresult;
 	Snapshot	snapshot;
+	TableScanDesc scan;
 
 	result = (ItemPointer) palloc(sizeof(ItemPointerData));
 	if (!reloid)
@@ -380,7 +381,9 @@ currtid_byreloid(PG_FUNCTION_ARGS)
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	table_get_latest_tid(rel, snapshot, result);
+	scan = table_beginscan(rel, snapshot, 0, NULL);
+	table_get_latest_tid(scan, result);
+	table_endscan(scan);
 	UnregisterSnapshot(snapshot);
 
 	table_close(rel, AccessShareLock);
@@ -398,6 +401,7 @@ currtid_byrelname(PG_FUNCTION_ARGS)
 	Relation	rel;
 	AclResult	aclresult;
 	Snapshot	snapshot;
+	TableScanDesc scan;
 
 	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
 	rel = table_openrv(relrv, AccessShareLock);
@@ -415,7 +419,9 @@ currtid_byrelname(PG_FUNCTION_ARGS)
 	ItemPointerCopy(tid, result);
 
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	table_get_latest_tid(rel, snapshot, result);
+	scan = table_beginscan(rel, snapshot, 0, NULL);
+	table_get_latest_tid(scan, result);
+	table_endscan(scan);
 	UnregisterSnapshot(snapshot);
 
 	table_close(rel, AccessShareLock);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 77e5e603b03..6b8c7020c8c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -134,8 +134,7 @@ extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 					   Buffer buffer, Snapshot snapshot, HeapTuple heapTuple,
 					   bool *all_dead, bool first_call);
 
-extern void heap_get_latest_tid(Relation relation, Snapshot snapshot,
-					ItemPointer tid);
+extern void heap_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 extern void setLastTid(const ItemPointer tid);
 
 extern BulkInsertState GetBulkInsertState(void);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e2062d808ef..e3e99a7dcdb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -308,12 +308,17 @@ typedef struct TableAmRoutine
 											Snapshot snapshot,
 											TupleTableSlot *slot);
 
+	/*
+	 * Is tid valid for a scan of this relation.
+	 */
+	bool		(*tuple_tid_valid) (TableScanDesc scan,
+									ItemPointer tid);
+
 	/*
 	 * Return the latest version of the tuple at `tid`, by updating `tid` to
 	 * point at the newest version.
 	 */
-	void		(*tuple_get_latest_tid) (Relation rel,
-										 Snapshot snapshot,
+	void		(*tuple_get_latest_tid) (TableScanDesc scan,
 										 ItemPointer tid);
 
 	/*
@@ -548,10 +553,10 @@ typedef struct TableAmRoutine
 	/*
 	 * See table_relation_size().
 	 *
-	 * Note that currently a few callers use the MAIN_FORKNUM size to vet the
-	 * validity of tids (e.g. nodeTidscans.c), and others use it to figure out
-	 * the range of potentially interesting blocks (brin, analyze). The
-	 * abstraction around this will need to be improved in the near future.
+	 * Note that currently a few callers use the MAIN_FORKNUM size to figure
+	 * out the range of potentially interesting blocks (brin, analyze). It's
+	 * probably that we'll need to revise the interface for those at some
+	 * point.
 	 */
 	uint64		(*relation_size) (Relation rel, ForkNumber forkNumber);
 
@@ -985,15 +990,25 @@ table_fetch_row_version(Relation rel,
 	return rel->rd_tableam->tuple_fetch_row_version(rel, tid, snapshot, slot);
 }
 
+/*
+ * Verify that `tid` is a potentially valid tuple identifier. That doesn't
+ * mean that the pointed to row needs to exist or be visible, but that
+ * attempting to fetch the row (e.g. with table_get_latest_tid() or
+ * table_fetch_row_version()) should not error out if called with that tid.
+ *
+ * `scan` needs to have been started via table_beginscan().
+ */
+static inline bool
+table_tuple_tid_valid(TableScanDesc scan, ItemPointer tid)
+{
+	return scan->rs_rd->rd_tableam->tuple_tid_valid(scan, tid);
+}
+
 /*
  * Return the latest version of the tuple at `tid`, by updating `tid` to
  * point at the newest version.
  */
-static inline void
-table_get_latest_tid(Relation rel, Snapshot snapshot, ItemPointer tid)
-{
-	rel->rd_tableam->tuple_get_latest_tid(rel, snapshot, tid);
-}
+extern void table_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 
 /*
  * Return true iff tuple in slot satisfies the snapshot.
-- 
2.21.0.dirty

Re: Pluggable Storage - Andres's take

Reply via email to