Re: index prefetching

Peter Geoghegan Fri, 27 Feb 2026 12:12:09 -0800

On Thu, Feb 26, 2026 at 11:18 PM Andres Freund <[email protected]> wrote:
> Note how the increase in scanned heap pages actually *decreases* the overall
> time rather substantially.
>
> It's quite visible, both in iostat, and a query like
>   SELECT pid, target_desc, off, length FROM pg_aios \watch 0.5
>
> that for the first query has basically no IO concurrency, the second has very
> intermittent IO concurrency and the third one has nice IO concurrency.
>
>
> If I disable the yield logic, the fillfactor=90 case is good:


I can recreate your results. Including the part where you found that
the problem would go away once yields were completely disabled.

I can certainly understand why you're suspicious of the yielding
mechanism. I wonder if I gave undue weight to the merge join query I
showed you [1] (and one or two others like it). Declaring that the
underlying merge join/yielding issue is not worth the complexity
required to yield would certainly be convenient. Yielding *isn't*
helpful for the vast majority of individual queries, so I'm certainly
tempted. But I can't help but feel nervous about the large disparity
in the number of *index* pages read by that particular query, once the
yielding mechanism is disabled.

Just in case there's any doubt: I'm flexible about whether a yielding
mechanism is needed and how it should work. Ideally we can come up
with a design that gives us the best of all possible worlds -- but
everything is on the table. It's not that I'm attached to the idea of
yielding; I'm just nervous about one or two funny looking cases [1].

With that being said, it seems as if yielding isn't the only factor in
play here. I also notice that even master exhibits roughly the same
performance disparity (also while using direct I/O, though with
shared_buffers set to 16GB rather than your 2GB):

=================================
EXPLAIN OUTPUT (best run, master)
=================================

--- Fillfactor 90 ---
  Index Scan using pgbench_accounts_ff90_pkey on pgbench_accounts_ff90
    Index Searches: 1
    Buffers: shared hit=27325 read=181819
    I/O Timings: shared read=16822.256
  Planning Time: 0.035 ms
  Execution Time: 18048.198 ms

--- Fillfactor 50 ---
  Index Scan using pgbench_accounts_ff50_pkey on pgbench_accounts_ff50
    Index Searches: 1
    Buffers: shared hit=27325 read=333334
    I/O Timings: shared read=30685.965
  Planning Time: 0.028 ms
  Execution Time: 32005.962 ms

--- Fillfactor 25 ---
  Index Scan using pgbench_accounts_ff25_pkey on pgbench_accounts_ff25
    Index Searches: 1
    Buffers: shared hit=27325 read=666667
    I/O Timings: shared read=10278.124
  Planning Time: 0.034 ms
  Execution Time: 11796.573 ms

While fillfactor 90 is fastest, fillfactor 25 is almost 3x faster than
fillfactor 50, despite performing about twice as many reads. I have to
imagine this relates to my Samsung 980 Pro SSD performing its own
read-ahead, in a way that works inconsistently across workloads.

Note again that this effect with master only appears when
shared_buffers is set to 16GB. With your 2GB shared_buffers setting,
master takes 17930.381 ms for FF 90, 31822.473 ms for FF 50, and
61094.676 ms for FF 25 (which is at least consistent-ish in the way
that one would expect).

For context, here is how the patch compares to master with
shared_buffers=16GB (here master uses the same query execution/query
plans as those shown above) once the patch/Pfetch's yielding is
disabled:

  FF  Heap Pages      Master   Pfetch ON   ON/Master
----------------------------------------------------
  90      181819     18048.2      1465.0       0.081x
  50      333334     32006.0      1682.2       0.053x
  25      666667     11796.6      1928.4       0.163x

I also noticed that the patch isn't at all sensitive to whether
shared_buffers is set to 2GB or 16GB -- not once yielding is disabled
like this. Obviously that insensitivity is desirable, which argues for
removing yielding.

For context, with standard v11/with yielding enabled, the "2GB vs
16GB" matters to quite a surprising degree:

====================================
Patch + yielding, 2GB shared_buffers
====================================

  FF  Heap Pages   Pfetch ON
------------------------------
  90      181819      4276.5
  50      333334      1523.3
  25      666667      6805.7

=====================================
Patch + yielding, 16GB shared_buffers
=====================================

  FF  Heap Pages   Pfetch ON
------------------------------
  90      181819      4384.6
  50      333334      1683.2
  25      666667      2002.0

Notice that with 16GB shared_buffers, the perverse effect from
yielding is even more pronounced!

I'm not sure how relevant this later point about "shared_buffers
sensitivity with yielding" really is. Nor am I sure if the effect with
master (and the possible role of device-level readahead) is all that
significant. I'm pointing all of this out in the hope that you can
offer an explanation that'll help me to improve my own intutions about
this stuff.

[1] 
https://postgr.es/m/CAH2-Wzk-89uCvdJ1Q6NsM6LvDvUEt6Qy66T6A60J=d_vowx...@mail.gmail.com
--
Peter Geoghegan

Re: index prefetching

Reply via email to