On Fri, Jul 18, 2025 at 4:52 PM Andres Freund <and...@anarazel.de> wrote: > I don't agree with that. For efficiency reasons alone table AMs should get a > whole batch of TIDs at once. If you have an ordered indexscan that returns > TIDs that are correlated with the table, we waste *tremendous* amount of > cycles right now.
I agree, I think. But the terminology in this area can be confusing, so let's make sure that we all understand each other: I think that the table AM probably needs to have its own definition of a batch (or some other distinct phrase/concept) -- it's not necessarily the same group of TIDs that are associated with a batch on the index AM side. (Within an index AM, there is a 1:1 correspondence between batches and leaf pages, and batches need to hold on to a leaf page buffer pin for a time. None of this should really matter to the table AM.) At a high level, the table AM (and/or its read stream) asks for so many heap blocks/TIDs. Occasionally, index AM implementation details (i.e. the fact that many index leaf pages have to be read to get very few TIDs) will result in that request not being honored. The interface that the table AM uses must therefore occasionally answer "I'm sorry, I can only reasonably give you so many TIDs at this time". When that happens, the table AM has to make do. That can be very temporary, or it can happen again and again, depending on implementation details known only to the index AM side (though typically it'll never happen even once). Does that sound roughly right to you? Obviously these details are still somewhat hand-wavy -- I'm not fully sure of what the interface should look like, by any means. But the important points are: * The table AM drives the whole process. * The table AM knows essentially nothing about leaf pages/index AM batches -- it just has some general idea that sometimes it cannot have its request honored, in which case it must make do. * Some other layer represents the index AM -- though that layer actually lives outside of index AMs (this is the code that the "complex" patch currently puts in indexam.c). This other layer manages resources (primarily leaf page buffer pins) on behalf of each index AM. It also determines whether or not index AM implementation details make it impractical to give the table AM exactly what it asked for (this might actually require a small amount of cooperation from index AM code, based on simple generic measures like leaf pages read). * This other index AM layer does still know that it isn't cool to drop leaf page buffer pins before we're done reading the corresponding heap TIDs, due to heapam implementation details around making concurrent heap TID recycling safe. I'm not really sure how the table AM lets the new index AM layer know "okay, done with all those TIDs now" in a way that is both correct (in terms of avoiding unsafe concurrent TID recycling) and also gives the table AM the freedom to do its own kind of batch access at the level of heap pages. We don't necessarily have to figure all that out in the first committed version, though. -- Peter Geoghegan