On 29/12/2025 1:53 AM, Tomas Vondra wrote:
It seems this is due to sending an extra SET (for the new GUC) in the
pgbench script, which is recognized only on the v5+threshold build.

That's a thinko on my side, I should have realized the extra command
might affect this. It doesn't really affect the behavior, because 10 is
the default value for read_stream_threshold. I've fixed the script, will
check fresh results tomorrow.

Still, I think most of what I said about heuristics when to initialize
the read stream, and the risk/benefit tradeoff, still applies.

I did a lot of experiments this morning but could not find any noticeable difference at any configuration when all working set fits in shared buffers. And frankly speaking after more thinking I do not see good reasons which can explain such difference. Just initialization of read stream should not add much overhead - it seems to be not expensive operation. What is actually matter is async IO. Without read stream, Postgres reads heap pages using sync operation: backend just calls pread. With read stream, AIO is used. By default "worker" AIO mode is used, it means that backend sends request to one of the workers and wait for it's completion. Worker receives request, performs IO and notifies backend. Such interprocess communication adds significant overhead and this is why if we initialize read stream from the very beginning, then we get about ~4x worse performance with LIMIT 1.

Please correct me if I wrong (or it is Mac specific), but it is not caused by any overhead related with read_stream, but by AIO. I have not made such experiment, but it seems to me that if we make read stream to perform sync calls, then there will be almost no difference in performance.

When all data is cached in shared buffers, then we do not perform IO at all.
It means there it doesn't matter whether and when we initialize read_stream.
We can do it after processing 10 items (current default), or immediately - it should not affect performance. And this is what I have tested: performance actually not depends on `read_stream_threshold` (if data fits in shared buffers). At least it is within few percents and may be it is just random fluctuations.
Obviously there is no 25% degradation.


It definitely doesn't mean that it is not possible to find scenario where this approach with enabling prefetch after processing N items will show worse performance than master or v5. We just need to properly choose cache hit rate. But the same is true IMHO for v5 itself: it is possible to find workload where it will show the same degradation comparing with master.


More precise heuristic should IMHO take in account actual number of performed disk read. Please notice that I do not want to predict number of disk reads - i.e. check if candidates for prefetch are present in shared buffers. It will really adds significant overhead. I think that it is better to use as threshold number of performed reads.

Unfortunately looks like it is not possible to accumulate such information without changing other Postgres code. For example, if `ReadBuffer` can somehow inform caller that it actually  performs read, then it can be easily calculate number of reads in `heapam_index_fetch_tuple`:

```

static pg_attribute_always_inline Buffer
ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
                  ForkNumber forkNum,
                  BlockNumber blockNum, ReadBufferMode mode,
                  BufferAccessStrategy strategy,
                  bool* fast_path)
{
     ...
    if (StartReadBuffer(&operation,
                        &buffer,
                        blockNum,
                        flags))

    {
        WaitReadBuffers(&operation);
        *fast_path = false;
    }
    else
         *fast_path = true;
    return buffer;
}

It can be certainly achieved without changed ReadBuffer* family, just by directly calling StartReadBuffer
from `heapam_index_fetch_tuple` instead of `ReadBuffer`.
Not so nice because we have to duplicate some bufmgr code. Not so much - check for local relation and
filling `ReadBuffersOperation`structure. But it is better to avoid it.





Reply via email to