On Wed, Feb 14, 2024 at 11:40 AM Melanie Plageman <melanieplage...@gmail.com> wrote: > I wasn't quite sure how we could use > index_compute_xid_horizon_for_tuples() for inspiration -- per Peter's > suggestion. But, I'd like to understand.
The point I was trying to make with that example was: a highly generic mechanism can sometimes work across disparate index AMs (that all at least support plain index scans) when it just so happens that these AMs don't actually differ in a way that could possibly matter to that mechanism. While it's true that (say) nbtree and hash are very different at a high level, it's nevertheless also true that the way things work at the level of individual index pages is much more similar than different. With index deletion, we know that we're differences between each supported index AM either don't matter at all (which is what obviates the need for index_compute_xid_horizon_for_tuples() to be directly aware of which index AM the page it is passed comes from), or matter only in small, incidental ways (e.g., nbtree stores posting lists in its tuples, despite using IndexTuple structs). With prefetching, it seems reasonable to suppose that an index-AM specific approach would end up needing very little truly custom code. This is pretty strongly suggested by the fact that the rules around buffer pins (as an interlock against concurrent TID recycling by VACUUM) are standardized by the index AM API itself. Those rules might be slightly more natural with nbtree, but that's kinda beside the point. While the basic organizing principle for where each index tuple goes can vary enormously, it doesn't necessarily matter at all -- in the end, you're really just reading each index page (that has TIDs to read) exactly once per scan, in some fixed order, with interlaced inline heap accesses (that go fetch heap tuples for each individual TID read from each index page). In general I don't accept that we need to do things outside the index AM, because software architecture encapsulation something something. I suspect that we'll need to share some limited information across different layers of abstraction, because that's just fundamentally what's required by the constraints we're operating under. Can't really prove it, though. -- Peter Geoghegan