On Mon, Feb 16, 2026 at 11:48 AM Andres Freund <[email protected]> wrote:
> Yes, It's hurting quite substantially. For well correlated index scans it
> prevents readahead from becoming aggressive enough even on a local low latency
> SSD. Which means it'll not even be even remotely aggressive enough on a
> networked block store.

I agree that the current heuristics (which were invented recently) are
too conservative. I overfit the heuristics to my current set of
adversarial queries, as a stopgap measure.

> Note that there is pretty much *no* readhead, because the yields happen more
> frequently than a io_combine_limit sized IO can be formed.

ISTM that we need the yields to better cooperate with whatever's
happening on the read stream side.

> With the yielding logic disabled:

> The comment seems to say it's about avoiding to look very into the future when
> using index only scans that just need a few heap lookups. Certainly an
> important goal.

The main motivation for yielding is to deal with things like merge
joins fed by at least one plain index scan, and plain scans for an
"ORDER BY .... LIMIT N" query. The LIMIT N is also addressed by the
"Use ExecSetTupleBound hint during index scans" patch, albeit
imperfectly. This is also important with index-only scans, though only
when heap fetches are actually required in the first place (most
index-only scans will never create a read stream in the first place).

I attach an example of where disabling the yield mechanism hurts
instead of helping, to give you a sense of the problems in this area.
Notice that the inner side of the merge join ("Index Scan using
prefetch_customers_pkey on prefetch_customers c") requires the same
number of buffer hits + buffer reads as master with yielding enabled
-- that allows the patch to shave off 7% of master's execution time.
Whereas without yielding, there's quite a lot more buffer hits (for
index page reads) + buffer misses (for heap page reads) -- which hurts
us. Without yielding, the patch takes about 13% longer to execute the
query.

Just to be clear, I'm not arguing that this is the right trade-off.
And I understand that it's just not feasible to completely prevent
such plan shapes from reading more data than strictly necessary -- a
certain amount of that seems like a "cost of doing business". But it
seems important that the added costs of speculatively reading later
batches/index leaf pages never exceeds some fixed threshold.

The non-yielding EXPLAIN ANALYZE has an inner-index scan that does
~60% more work than master/than the yielding variant of the patch.
It's not so much that the amount of extra work we'll perform is
excessive (though it is); what really concerns me is that the amount
of extra work is *indeterminate*. There's likely to be other queries
that are much slower still, at least in the absence of some kind of
yielding mechanism. And so ISTM that we need a better yield mechanism
-- one that is better attuned to the need to maintain an adequate
prefetch distance on the read stream side.

> One thing that does confuse me about the yielding logic is that it seems to
> actually put a cap on ever looking more than two batches ahead (with a bit of
> fuzziness)? Why support more batches then (INDEX_SCAN_MAX_BATCHES == 128)?
> Isn't two batches too low for anything with some correlation on even remotely
> higher latency storage?

FWIW that's not actually true; we *can* still use more than 2 batches.
My instrumentation confirms this. Though it seems to top out at about
4 batches total with the current yield logic/with my test suite
queries.

-- 
Peter Geoghegan
A15: regressed anti-join, index-only
  master                         (min):       9.041 ms (avg=9.134, max=9.262)
  patch + yield    (prefetch=on) (min):       8.429 ms (avg=8.520, max=8.636) 
[0.932x vs master]
  patch + no yield (prefetch=on) (min):      10.225 ms (avg=10.323, max=10.394) 
[1.131x vs master]

  Query:
    SELECT o.customer_id, o.order_date
    FROM prefetch_orders o
    WHERE o.order_date BETWEEN '2023-02-09' AND '2023-02-23'
    AND o.customer_id BETWEEN 36307 AND 37126
    AND NOT EXISTS (
    SELECT 1 FROM prefetch_customers c
    WHERE c.customer_id = o.customer_id
    AND c.region_id = 4
    )
    ORDER BY o.order_date

  master EXPLAIN ANALYZE:
    Sort (actual rows=7882.00 loops=1)
      Sort Key: o.order_date
      Sort Method: quicksort  Memory: 377kB
      Buffers: shared hit=11732 read=237
      I/O Timings: shared read=1.145
      ->  Merge Anti Join (actual rows=7882.00 loops=1)
            Merge Cond: (o.customer_id = c.customer_id)
            Buffers: shared hit=11732 read=237
            I/O Timings: shared read=1.145
            ->  Index Only Scan using prefetch_orders_cust_date_idx on 
prefetch_orders o (actual rows=8445.00 loops=1)
                  Index Cond: ((customer_id >= 36307) AND (customer_id <= 
37126) AND (order_date >= '2023-02-09'::date) AND (order_date <= 
'2023-02-23'::date))
                  Heap Fetches: 0
                  Index Searches: 772
                  Buffers: shared hit=11629
            ->  Index Scan using prefetch_customers_pkey on prefetch_customers 
c (actual rows=1858.00 loops=1)
                  Filter: (region_id = 4)
                  Rows Removed by Filter: 35285
                  Index Searches: 1
                  Buffers: shared hit=103 read=237
                  I/O Timings: shared read=1.145
    Planning:
      Buffers: shared hit=16
    Planning Time: 0.155 ms
    Serialization: output=193kB  format=text
    Execution Time: 9.041 ms

  patch + yield (prefetch=on) EXPLAIN ANALYZE:
    Sort (actual rows=7882.00 loops=1)
      Sort Key: o.order_date
      Sort Method: quicksort  Memory: 377kB
      Buffers: shared hit=11732 read=237
      I/O Timings: shared read=0.825
      ->  Merge Anti Join (actual rows=7882.00 loops=1)
            Merge Cond: (o.customer_id = c.customer_id)
            Buffers: shared hit=11732 read=237
            I/O Timings: shared read=0.825
            ->  Index Only Scan using prefetch_orders_cust_date_idx on 
prefetch_orders o (actual rows=8445.00 loops=1)
                  Index Cond: ((customer_id >= 36307) AND (customer_id <= 
37126) AND (order_date >= '2023-02-09'::date) AND (order_date <= 
'2023-02-23'::date))
                  Heap Fetches: 0
                  Index Searches: 772
                  Buffers: shared hit=11629
            ->  Index Scan using prefetch_customers_pkey on prefetch_customers 
c (actual rows=1858.00 loops=1)
                  Filter: (region_id = 4)
                  Rows Removed by Filter: 35285
                  Index Searches: 1
                  Buffers: shared hit=103 read=237
                  I/O Timings: shared read=0.825
    Planning:
      Buffers: shared hit=16
    Planning Time: 0.166 ms
    Serialization: output=193kB  format=text
    Execution Time: 8.429 ms

  patch + no yield (prefetch=on) EXPLAIN ANALYZE:
    Sort (actual rows=7882.00 loops=1)
      Sort Key: o.order_date
      Sort Method: quicksort  Memory: 377kB
      Buffers: shared hit=11795 read=385
      I/O Timings: shared read=1.517
      ->  Merge Anti Join (actual rows=7882.00 loops=1)
            Merge Cond: (o.customer_id = c.customer_id)
            Buffers: shared hit=11795 read=385
            I/O Timings: shared read=1.517
            ->  Index Only Scan using prefetch_orders_cust_date_idx on 
prefetch_orders o (actual rows=8445.00 loops=1)
                  Index Cond: ((customer_id >= 36307) AND (customer_id <= 
37126) AND (order_date >= '2023-02-09'::date) AND (order_date <= 
'2023-02-23'::date))
                  Heap Fetches: 0
                  Index Searches: 772
                  Buffers: shared hit=11629
            ->  Index Scan using prefetch_customers_pkey on prefetch_customers 
c (actual rows=1858.00 loops=1)
                  Filter: (region_id = 4)
                  Rows Removed by Filter: 35285
                  Index Searches: 1
                  Buffers: shared hit=166 read=385
                  I/O Timings: shared read=1.517
    Planning:
      Buffers: shared hit=16
    Planning Time: 0.149 ms
    Serialization: output=193kB  format=text
    Execution Time: 10.225 ms

Reply via email to