Hi,
On 2026-03-02 18:45:03 +0000, Alexandre Felipe wrote:
> > Unfortunately that has too big a performance penalty for fully cached
> > workloads :(. Doing buffer mapping lookups ahead of the current point is
> not
> > free.
>
> Could you elaborate?
Looking ahead requires pinning those buffers (or you need to do more buffer
lookups, which also isn't free). The way things work right now is that the
infrastructure to maintain buffer pins gets more expensive the more buffers
you pin. That increased overhead doesnt' matter much if you actually avoid IO
or you have fair bit of work for each buffer (like looking at each tuple), but
if you just are fully cached and just look at a single tuple, it's painful.
CREATE TABLE uncorrelated(id int8 primary key, balance float not null);
INSERT INTO uncorrelated(id, balance) SELECT id, 0 FROM (SELECT
generate_series(1, 1000000) AS id) ORDER BY random();
VACUUM (FREEZE, ANALYZE) uncorrelated;
CREATE TABLE correlated(id int8 primary key, balance float not null);
INSERT INTO correlated(id, balance) SELECT id, 0 FROM (SELECT
generate_series(1, 1000000) AS id);
VACUUM (FREEZE, ANALYZE) correlated;
SELECT pg_prewarm('uncorrelated');
SET max_parallel_workers_per_gather=0;SET enable_bitmapscan = 0;SET
enable_seqscan = 0;
SELECT sum(balance) FROM uncorrelated WHERE id > 1000 and id < 500000;
SELECT pg_prewarm('correlated');
SET max_parallel_workers_per_gather=0;SET enable_bitmapscan = 0;SET
enable_seqscan = 0;
SELECT sum(balance) FROM correlated WHERE id > 1000 and id < 500000;
Index prefetching patch:
uncorrelated: 228.936 ms
correlated: 71.684 ms
+ minimum distance 2:
uncorrelated: 244.361 ms
correlated: 71.848 ms
+ minimum distance 4:
uncorrelated: 246.164 ms
correlated: 71.912 ms
+ minimum distance 8:
uncorrelated: 268.686 ms
correlated: 71.880 ms
+ minimum distance 16:
uncorrelated: 310.320 ms
correlated: 71.890 ms
+ minimum distance 32:
uncorrelated: 330.714 ms
correlated: 74.178 ms
There starts to be some overhead of the increased distance even when
correlated, but it's small and starts much later.
We probably can reduce the overhead of buffer tracking, but probably not
enough for it to not be a factor at all.
Possible improvements to refcount tracking:
- increase REFCOUNT_ARRAY_ENTRIES - there's a very significant cliff at 8
right now, and with vectorized lookup it might not hurt too much to go to 16
or so
- To make the cliff at REFCOUNT_ARRAY_ENTRIES smaller, replace dynahash with
simplehash. That should reduce the perf penalty a good bit.
Unfortunately it's not just the refcount tracking, it's also resowner
management that gets more expensive.
Greetings,
Andres Freund