Hi, On 2025-07-18 17:44:26 -0400, Peter Geoghegan wrote: > On Fri, Jul 18, 2025 at 4:52 PM Andres Freund <and...@anarazel.de> wrote: > > I don't agree with that. For efficiency reasons alone table AMs should get a > > whole batch of TIDs at once. If you have an ordered indexscan that returns > > TIDs that are correlated with the table, we waste *tremendous* amount of > > cycles right now. > > I agree, I think. But the terminology in this area can be confusing, > so let's make sure that we all understand each other: > > I think that the table AM probably needs to have its own definition of > a batch (or some other distinct phrase/concept) -- it's not > necessarily the same group of TIDs that are associated with a batch on > the index AM side.
I assume, for heap, it'll always be a narrower definition than for the indexam, basically dealing with all the TIDs that fit within one page at once? > (Within an index AM, there is a 1:1 correspondence between batches and leaf > pages, and batches need to hold on to a leaf page buffer pin for a > time. None of this should really matter to the table AM.) To some degree the table AM will need to care about the index level batching - we have to be careful about how many pages we keep pinned overall. Which is something that both the table and the index AM have some influence over. > At a high level, the table AM (and/or its read stream) asks for so > many heap blocks/TIDs. Occasionally, index AM implementation details > (i.e. the fact that many index leaf pages have to be read to get very > few TIDs) will result in that request not being honored. The interface > that the table AM uses must therefore occasionally answer "I'm sorry, > I can only reasonably give you so many TIDs at this time". When that > happens, the table AM has to make do. That can be very temporary, or > it can happen again and again, depending on implementation details > known only to the index AM side (though typically it'll never happen > even once). I think that requirement will make things more complicated. Why do we need to have it? > Does that sound roughly right to you? Obviously these details are > still somewhat hand-wavy -- I'm not fully sure of what the interface > should look like, by any means. But the important points are: > > * The table AM drives the whole process. Check. > * The table AM knows essentially nothing about leaf pages/index AM > batches -- it just has some general idea that sometimes it cannot have > its request honored, in which case it must make do. Not entirely convinced by this one. > * Some other layer represents the index AM -- though that layer > actually lives outside of index AMs (this is the code that the > "complex" patch currently puts in indexam.c). This other layer manages > resources (primarily leaf page buffer pins) on behalf of each index > AM. It also determines whether or not index AM implementation details > make it impractical to give the table AM exactly what it asked for > (this might actually require a small amount of cooperation from index > AM code, based on simple generic measures like leaf pages read). I don't really have an opinion about this one. > * This other index AM layer does still know that it isn't cool to drop > leaf page buffer pins before we're done reading the corresponding heap > TIDs, due to heapam implementation details around making concurrent > heap TID recycling safe. I'm not sure why this needs to live in the generic code, rather than the specific index AM? > I'm not really sure how the table AM lets the new index AM layer know "okay, > done with all those TIDs now" in a way that is both correct (in terms of > avoiding unsafe concurrent TID recycling) and also gives the table AM the > freedom to do its own kind of batch access at the level of heap pages. I'd assume that the table AM has to call some indexam function to release index-batches, whenever it doesn't need the reference anymore? And the index-batch release can then unpin? Greetings, Andres Freund