On 7/16/25 16:29, Peter Geoghegan wrote: > On Wed, Jul 16, 2025 at 10:20 AM Tomas Vondra <to...@vondra.me> wrote: >> The read stream can only return blocks generated by the "next" callback. >> When we return the block for the last item on a leaf page, we can only >> return "InvalidBlockNumber" which means "no more blocks in the stream". >> And once we advance to the next leaf, we say "hey, there's more blocks". >> Which is what read_stream_reset() does. >> >> It's a bit like what rescan does. > > That sounds weird. >
What sounds weird? That the read_stream works like a stream of blocks, or that it can't do "pause" and we use "reset" as a workaround? >> In an ideal world we'd have a function that'd "pause" the stream, >> without resetting the distance etc. But we don't have that, and the >> reset thing was suggested to me as a workaround. > > Does the "complex" patch require a similar workaround? Why or why not? > I think it'll need to do something like that in some cases, when we need to limit the number of leaf pages kept in memory to something sane. (a) index-only scans, with most of the tuples all-visible (we don't prefetch all-visible pages, so finding the next "prefetchable" block may force reading a lot of leaf pages) (b) scans on correlated indexes - we skip duplicate block numbers, so again, we may need to read a lot of leafs to find enough prefetchable blocks to reach the "distance" (measured in queued blocks) (c) indexes with "fat" index tuples (but it's less of an issue, because with one tuple per leaf we still have a clear idea how many leafs we'll need to read) regards -- Tomas Vondra