Hi, On 2025-07-16 15:39:58 -0400, Andres Freund wrote: > Looking at the actual patches now.
I just did an initial, not particularly in depth look. A few comments and questions below. For either patch, I think it's high time we split the index/table buffer stats in index scans. It's really annoying to not be able to see if IO time was inside the index itself or in the table. What we're discussing here obviously can never avoid stalls due to fetching index pages, but so far neither patch is able to fully utilize hardware when bound on heap fetches, but that's harder to know without those stats. The BufferMatches() both patches add seems to check more than needed? It's not like the old buffer could have changed what relation it is for while pinned. Seems like it'd be better to just keep track what the prior block was and not go into bufmgr.c at all. WRT the complex patch: Maybe I'm missing something, but the current interface doesn't seem to work for AMs that don't have a 1:1 mapping between the block number portion of the tid and the actual block number? Currently the API wouldn't easily allow the table AM to do batched TID lookups - if you have a query that looks at a lot of table tuples in the same buffer consecutively, we spend a lot of time locking/unlocking said buffer. We also spend a lot of time dispatching from nodeIndexscan.c to tableam in such queries. I'm not suggesting to increase the scope to handle that, but it might be worth keeping in mind. I think the potential gains here are really substantial. Even just not having to lock/unlock the heap block for every tuple in the page would be a huge win, a quick and incorrect hack suggests it's like 25% faster A batched heap_hot_search_buffer() could be a larger improvement, it's often bound by memory latency and per-call overhead. I see some slowdown for well-cached queries with the patch, I've not dug into why. WRT the simple patch: Seems to have the same issue that it assumes TID block numbers correspond to actual disk location? Greetings, Andres Freund