On Tue, Jul 22, 2025 at 5:11 PM Andres Freund <and...@anarazel.de> wrote: > On 2025-07-18 23:25:38 -0400, Peter Geoghegan wrote: > > > To some degree the table AM will need to care about the index level > > > batching - > > > we have to be careful about how many pages we keep pinned overall. Which > > > is > > > something that both the table and the index AM have some influence over. > > > > Can't they operate independently? > > I'm somewhat doubtful. Read stream is careful to limit how many things it > pins, lest we get errors about having too many buffers pinned. Somehow the > number of pins held within the index needs to be limited too, and how much > that needs to be limited depends on how many buffers are pinned in the read > stream :/
That makes sense. Currently, the complex patch holds on to leaf page buffer pins until btfreebatch is called for the relevant batch -- no matter what. This is actually a short term workaround. I removed _bt_drop_lock_and_maybe_pin from nbtree (the thing added by commit 2ed5b87f), without adding back an equivalent function that can work across all index AMs. That shouldn't be hard. Once I do that, then plain index scans with MVCC snapshots should never actually have to hold on to buffer pins. I'm not sure if that makes the underlying resource management problem any easier to address -- but at least we won't *actually* hold on to any extra leaf page buffer pins most of the time (once I make this fix). > > What if there's no matches across many leaf pages? > > We don't need to keep leaf nodes without matches pinned in that case, so I > don't think there's really an issue? That might be true, but if we're reading leaf pages then we're not returning tuples to the scan -- even when, in principle, we could return at least a few more right away. That's the kind of trade-off I'm concerned about here. -- Peter Geoghegan