Re: index prefetching

Konstantin Knizhnik Mon, 29 Dec 2025 05:37:49 -0800


On 29/12/2025 1:53 AM, Tomas Vondra wrote:

It seems this is due to sending an extra SET (for the new GUC) in the
pgbench script, which is recognized only on the v5+threshold build.

That's a thinko on my side, I should have realized the extra command
might affect this. It doesn't really affect the behavior, because 10 is
the default value for read_stream_threshold. I've fixed the script, will
check fresh results tomorrow.

Still, I think most of what I said about heuristics when to initialize
the read stream, and the risk/benefit tradeoff, still applies.

I did a lot of experiments this morning but could not find anynoticeable difference at any configuration when all working set fits inshared buffers.And frankly speaking after more thinking I do not see good reasons whichcan explain such difference.Just initialization of read stream should not add much overhead - itseems to be not expensive operation.What is actually matter is async IO. Without read stream, Postgres readsheap pages using sync operation: backend just calls pread.With read stream, AIO is used. By default "worker" AIO mode is used, itmeans that backend sends request to one of the workers and wait for it'scompletion. Worker receives request, performs IO and notifies backend.Such interprocess communication adds significant overhead and this iswhy if we initialize read stream from the very beginning, then we getabout ~4x worse performance with LIMIT 1.

Please correct me if I wrong (or it is Mac specific), but it is notcaused by any overhead related with read_stream, but by AIO.I have not made such experiment, but it seems to me that if we make readstream to perform sync calls, then there will be almost no difference inperformance.


When all data is cached in shared buffers, then we do not perform IO at all.
It means there it doesn't matter whether and when we initialize read_stream.

We can do it after processing 10 items (current default), or immediately- it should not affect performance.And this is what I have tested: performance actually not depends on`read_stream_threshold` (if data fits in shared buffers).At least it is within few percents and may be it is just randomfluctuations.

Obviously there is no 25% degradation.

It definitely doesn't mean that it is not possible to find scenariowhere this approach with enabling prefetch after processing N items willshow worse performance than master or v5. We just need to properlychoose cache hit rate. But the same is true IMHO for v5 itself: it ispossible to find workload where it will show the same degradationcomparing with master.

More precise heuristic should IMHO take in account actual number ofperformed disk read.Please notice that I do not want to predict number of disk reads - i.e.check if candidates for prefetch are present in shared buffers.It will really adds significant overhead. I think that it is better touse as threshold number of performed reads.

Unfortunately looks like it is not possible to accumulate suchinformation without changing other Postgres code.For example, if `ReadBuffer` can somehow inform caller that it actually performs read, then it can be easily calculate number of reads in`heapam_index_fetch_tuple`:


```

static pg_attribute_always_inline Buffer
ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
                  ForkNumber forkNum,
                  BlockNumber blockNum, ReadBufferMode mode,
                  BufferAccessStrategy strategy,
                  bool* fast_path)
{
     ...
    if (StartReadBuffer(&operation,
                        &buffer,
                        blockNum,
                        flags))

    {
        WaitReadBuffers(&operation);
        *fast_path = false;
    }
    else
         *fast_path = true;
    return buffer;
}

It can be certainly achieved without changed ReadBuffer* family, just bydirectly calling StartReadBuffer

from `heapam_index_fetch_tuple` instead of `ReadBuffer`.

Not so nice because we have to duplicate some bufmgr code. Not so much -check for local relation and

filling `ReadBuffersOperation`structure. But it is better to avoid it.

Re: index prefetching

Reply via email to