Hi,

On 2025-07-16 17:47:53 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 5:41 PM Andres Freund <and...@anarazel.de> wrote:
> > I don't mean the index tids, but how the read stream is fed block numbers. 
> > In
> > the "complex" patch that's done by index_scan_stream_read_next(). And the
> > block number it returns is simply
> >
> >       return ItemPointerGetBlockNumber(tid);
> >
> > without the table AM having any way of influencing that. Which means that if
> > your table AM does not use the block number of the tid 1:1 as the real block
> > number, the fetched block will be completely bogus.
> 
> How is that handled when such a table AM uses the existing amgettuple
> interface?

There's no problem today - the indexams never use the tids to look up blocks
themselves. They're always passed to the tableam to do so (via
table_index_fetch_tuple() etc). I.e. the translation from TIDs to specific
blocks & buffers happens entirely inside the tableam, therefore the tableam
can choose to not use a 1:1 mapping or even to not use any buffers at all.


> I think that it shouldn't be hard to implement an opt-out
> of prefetching for such table AMs, so at least you won't fetch random
> garbage.

I don't think that's the right answer here. ISTM the layering in both patches
just isn't quite correct right now. The read stream shouldn't be "filled" with
table buffers by index code, it needs to be filled by tableam specific code.


> Right now, the amgetbatch interface is oriented around returning TIDs.
> Obviously it works that way because that's what heapam expects, and
> what amgettuple (which I'd like to replace with amgetbatch) does.

ISTM the right answer would be to allow the tableam to get the batches,
without indexam feeding the read stream.  That, perhaps not so coincidentally,
is also what's needed for batching heap page locking and and HOT search.

I think this means that it has to be the tableam that creates the read stream
and that does the work that's currently done in index_scan_stream_read_next(),
i.e. the translation from TID to whatever resources are required by the
tableam. Which presumably would include the tableam calling
index_batch_getnext().

Greetings,

Andres Freund


Reply via email to