On Sat, Dec 6, 2025 at 3:07 AM Victor Yegorov <[email protected]> wrote: > I like this change and I agree that it's both handy and gives an easy > performance boost.
There's a number of things that I find counterintuitive about the performance impact of this patch: * It doesn't make very much difference (under a 1% improvement) on similar index-only scans, with queries such as "SELECT count(*) FROM pgbench_accounts WHERE aid between :aid AND :endrange". One might expect this case to be improved at least as much as the plain index scan case, since the relative cost of _bt_readpage is higher (since it's much cheaper to access the visibility map than to access the heap). * The patch makes an even larger difference when I make pgbench use a larger range of values in the "between". For example, if I increase the number of values/tuples returned from 2,000 (which is what I used initially/reported on in the first email) to 15,000, I find that the patch increases TPS by as much as 5.5%. * These queries make maximum use of the _bt_set_startikey optimization -- most individual leaf pages don't need to evaluate any scan key (after an initial page-level check within _bt_set_startikey). So the patch really helps in exactly those cases where we don't truly need to access the scan direction at all -- the loop inside _bt_check_compare always has 0 iterations with these queries, which means that scan direction doesn't actually ever need to be considered at that point. My best guess is that the benefits I see come from eliminating a dependent load. Without the second patch applied, I see this disassembly for _bt_checkkeys: mov rax,QWORD PTR [rdi+0x38] ; Load scan->opaque mov r15d,DWORD PTR [rax+0x70] ; Load so->dir A version with the second patch applied still loads a pointer passed by the _bt_checkkeys caller (_bt_readpage), but doesn't have to chase another pointer to get to it. Maybe this significantly ameliorates execution port pressure in the cases where I see a speedup? > Patch applies and compiles cleanly. I can barely see a performance boost on > my end (VM on a busy host), round 1%, but I still consider this change > beneficial. It seems to have no downsides, and some upside. I wouldn't be surprised if the results I'm seeing are dependent on microarchitectural details. I myself use a Zen 3 chip (a Ryzen 9 5950X). -- Peter Geoghegan
