Re: index prefetching

Tomas Vondra Wed, 16 Jul 2025 10:42:58 -0700

On 7/16/25 16:45, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 10:37 AM Tomas Vondra <to...@vondra.me> wrote:
>> What sounds weird? That the read_stream works like a stream of blocks,
>> or that it can't do "pause" and we use "reset" as a workaround?
> 
> The fact that prefetch distance is in any way affected by a temporary
> inability to return more blocks. Just starting from scratch seems
> particularly bad.
> 
> Doesn't that mean that it's simply impossible for us to remember
> ramping up the distance on an earlier leaf page? There is nothing
> about leaf page boundaries that should be meaningful to the read
> stream/our heap accesses.
> 
> I get that index characteristics could be the limiting factor,
> especially in a world where we're not yet eagerly reading leaf pages.
> But that in no way justifies just forgetting about prefetch distance
> like this.
>


True. I think it's simply a matter of "no one really needed that yet",
so the read stream does not have a way to do that. I suspect Thomas
might have a WIP patch for that somewhere ...

>>>> In an ideal world we'd have a function that'd "pause" the stream,
>>>> without resetting the distance etc. But we don't have that, and the
>>>> reset thing was suggested to me as a workaround.
>>>
>>> Does the "complex" patch require a similar workaround? Why or why not?
>>>
>>
>> I think it'll need to do something like that in some cases, when we need
>> to limit the number of leaf pages kept in memory to something sane.
> 
> That's the only reason? The memory usage for batches?
> 
> That doesn't seem like a big deal. It's something to keep an eye on,
> but I see no reason why it'd be particularly difficult.
> 
> Doesn't this argue for the "complex" patch's approach?
> 

Memory pressure is the "implementation" reason, because the indexam.c
layer has a fixed-length array of batches, so it can't load more than
INDEX_SCAN_MAX_BATCHES of them. That could be reworked to allow loading
arbitrary number of batches, of course.

But I think we don't really want to do that, because what would be the
benefit? If you need to load many leaf pages to find the next thing to
prefetch, is the prefetching really improving anything?

How would we even know there actually is a prefetchable item? We could
load the whole index only to find everything is all-visible. And then
what if the query has LIMIT 10?

So that's the other thing this probably needs to consider - some concept
of how much effort to invest into finding the next prefetchable block.

regards

-- 
Tomas Vondra

Re: index prefetching

Reply via email to