Re: index prefetching

Andres Freund Fri, 18 Jul 2025 19:47:26 -0700

Hi,

On 2025-07-18 17:44:26 -0400, Peter Geoghegan wrote:
> On Fri, Jul 18, 2025 at 4:52 PM Andres Freund <and...@anarazel.de> wrote:
> > I don't agree with that. For efficiency reasons alone table AMs should get a
> > whole batch of TIDs at once. If you have an ordered indexscan that returns
> > TIDs that are correlated with the table, we waste *tremendous* amount of
> > cycles right now.
> 
> I agree, I think. But the terminology in this area can be confusing,
> so let's make sure that we all understand each other:
> 
> I think that the table AM probably needs to have its own definition of
> a batch (or some other distinct phrase/concept) -- it's not
> necessarily the same group of TIDs that are associated with a batch on
> the index AM side.


I assume, for heap, it'll always be a narrower definition than for the
indexam, basically dealing with all the TIDs that fit within one page at once?


> (Within an index AM, there is a 1:1 correspondence between batches and leaf
> pages, and batches need to hold on to a leaf page buffer pin for a
> time. None of this should really matter to the table AM.)

To some degree the table AM will need to care about the index level batching -
we have to be careful about how many pages we keep pinned overall. Which is
something that both the table and the index AM have some influence over.


> At a high level, the table AM (and/or its read stream) asks for so
> many heap blocks/TIDs. Occasionally, index AM implementation details
> (i.e. the fact that many index leaf pages have to be read to get very
> few TIDs) will result in that request not being honored. The interface
> that the table AM uses must therefore occasionally answer "I'm sorry,
> I can only reasonably give you so many TIDs at this time". When that
> happens, the table AM has to make do. That can be very temporary, or
> it can happen again and again, depending on implementation details
> known only to the index AM side (though typically it'll never happen
> even once).

I think that requirement will make things more complicated. Why do we need to
have it?


> Does that sound roughly right to you? Obviously these details are
> still somewhat hand-wavy -- I'm not fully sure of what the interface
> should look like, by any means. But the important points are:
> 
> * The table AM drives the whole process.

Check.


> * The table AM knows essentially nothing about leaf pages/index AM
> batches -- it just has some general idea that sometimes it cannot have
> its request honored, in which case it must make do.

Not entirely convinced by this one.


> * Some other layer represents the index AM -- though that layer
> actually lives outside of index AMs (this is the code that the
> "complex" patch currently puts in indexam.c). This other layer manages
> resources (primarily leaf page buffer pins) on behalf of each index
> AM. It also determines whether or not index AM implementation details
> make it impractical to give the table AM exactly what it asked for
> (this might actually require a small amount of cooperation from index
> AM code, based on simple generic measures like leaf pages read).

I don't really have an opinion about this one.


> * This other index AM layer does still know that it isn't cool to drop
> leaf page buffer pins before we're done reading the corresponding heap
> TIDs, due to heapam implementation details around making concurrent
> heap TID recycling safe.

I'm not sure why this needs to live in the generic code, rather than the
specific index AM?


> I'm not really sure how the table AM lets the new index AM layer know "okay,
> done with all those TIDs now" in a way that is both correct (in terms of
> avoiding unsafe concurrent TID recycling) and also gives the table AM the
> freedom to do its own kind of batch access at the level of heap pages.

I'd assume that the table AM has to call some indexam function to release
index-batches, whenever it doesn't need the reference anymore? And the
index-batch release can then unpin?

Greetings,

Andres Freund

Re: index prefetching

Reply via email to