On Sat, Dec 6, 2025 at 3:07 AM Victor Yegorov <[email protected]> wrote:
> I like this change and I agree that it's both handy and gives an easy 
> performance boost.

There's a number of things that I find counterintuitive about the
performance impact of this patch:

* It doesn't make very much difference (under a 1% improvement) on
similar index-only scans, with queries such as "SELECT count(*) FROM
pgbench_accounts WHERE aid between :aid AND :endrange". One might
expect this case to be improved at least as much as the plain index
scan case, since the relative cost of _bt_readpage is higher (since
it's much cheaper to access the visibility map than to access the
heap).

* The patch makes an even larger difference when I make pgbench use a
larger range of values in the "between". For example, if I increase
the number of values/tuples returned from 2,000 (which is what I used
initially/reported on in the first email) to 15,000, I find that the
patch increases TPS by as much as 5.5%.

* These queries make maximum use of the _bt_set_startikey optimization
-- most individual leaf pages don't need to evaluate any scan key
(after an initial page-level check within _bt_set_startikey). So the
patch really helps in exactly those cases where we don't truly need to
access the scan direction at all -- the loop inside _bt_check_compare
always has 0 iterations with these queries, which means that scan
direction doesn't actually ever need to be considered at that point.

My best guess is that the benefits I see come from eliminating a
dependent load. Without the second patch applied, I see this
disassembly for _bt_checkkeys:

mov    rax,QWORD PTR [rdi+0x38]   ; Load scan->opaque
mov    r15d,DWORD PTR [rax+0x70]  ; Load so->dir

A version with the second patch applied still loads a pointer passed
by the _bt_checkkeys caller (_bt_readpage), but doesn't have to chase
another pointer to get to it. Maybe this significantly ameliorates
execution port pressure in the cases where I see a speedup?

> Patch applies and compiles cleanly. I can barely see a performance boost on 
> my end (VM on a busy host), round 1%, but I still consider this change 
> beneficial.

It seems to have no downsides, and some upside. I wouldn't be
surprised if the results I'm seeing are dependent on
microarchitectural details. I myself use a Zen 3 chip (a Ryzen 9
5950X).

-- 
Peter Geoghegan


Reply via email to