On 18/12/2025 4:40 PM, Tomas Vondra wrote:

On 12/18/25 14:57, Konstantin Knizhnik wrote:
On 17/12/2025 9:54 PM, Tomas Vondra wrote:
On 12/17/25 20:30, Andres Freund wrote:
Hi,

On 2025-12-17 13:49:43 -0500, Peter Geoghegan wrote:
On Wed, Dec 17, 2025 at 12:19 PM Konstantin Knizhnik
<[email protected]> wrote:
Moreover with `enable_indexscan_prefetch=off` results are the same.
It's quite unlikely that the current heuristics that trigger
prefetching would have ever allowed any prefetching, for queries such
as these.

The exact rule right now is that we don't even begin prefetching until
we've already read at least one index leaf page, and have to read
another one. So it's impossible to use prefetching with a LIMIT of 1,
with queries such as these. It's highly unlikely that you'd see any
benefits from prefetching even with LIMIT 100 (usually we wouldn't
even begin prefetching).
Note that due to the tuple size and fillfactor in Konstantin's
workload, there
will be one tuple per page... That should allow for some prefetching.

Yes, but that's in the heap. The mechanism Peter described is about leaf
pages in the index, and the index has the usual fillfactor. So there'll
be many index entries per leaf.

I slightly change my benchmark setup:

create table t (pk integer, sk integer, payload text default repeat('x',
1000)) with (fillfactor=10);
insert into t values (generate_series(1,10000000),random()*10000000);
create index on t(sk);

select.sql:

\set sk random(1, 10000000)
select * from t where sk >= :sk order by sk limit N;

You are right. There is almost no effect of prefetch for limit=100, but
~2x times improvement for limit=1000:

eio\limit       1      100   1000
  10          11102    142    28
   0           11419    137    14

master:
limit              1     100   1000
                11480   130      13

One of the motivation of my experiments was to check that there is no
degrade of performance because of batching.
And it is nice that there is no performance penalty here.
Still it is not quite clear to me why there is no any positive effect
for LIMIT 100.
The technical reason is that batch_getnext() does this:

   /* Delay initializing stream until reading from scan's second batch */
   if (priorbatch && !scan->xs_heapfetch->rs && !batchqueue->disabled &&
       enable_indexscan_prefetch)
       scan->xs_heapfetch->rs =
           read_stream_begin_relation(READ_STREAM_DEFAULT, NULL,
                                      ....);

which means we only create the read_stream (which is what enables the
prefetching) only when creating the second batch. And with LIMIT 100 we
likely read just a single leaf page (=batch) most of the time, which
means no read_stream and thus no prefetching.

You can try disabling this "priorbatch" condition, so that the
read_stream gets created right away.


It makes the expected effect - performance of LIMIT 100 is increased from 142TPS to 315TPS (so also 2x times). At the same time performance of LIMIT 1 is reduced from 11419 to 3499 - ~4x times For LIMIT 10 result are 1388 with disabled prefetch and 1116 with enabled prefetch. So looks like threshold for enabling prefetch should be based not on number of batches, but on expected heap reads and it is more closer to 100 or even 10 than to 1000. And for slower disks (or remote storage), effect of prefetch should be much bigger.

True, but only if the data is not already in memory / shared buffers.
IIRC this "priorbatch" logic mitigates regressions for cached workloads,
because the read_stream initialization is expensive enough to hurt small
queries when no I/O is needed.

I see.
But may be we should compare table size with shared buffers or effective_cache_size?

Reply via email to