On Sat, Dec 6, 2025 at 3:04 PM Peter Geoghegan <[email protected]> wrote: > My best guess is that the benefits I see come from eliminating a > dependent load. Without the second patch applied, I see this > disassembly for _bt_checkkeys: > > mov rax,QWORD PTR [rdi+0x38] ; Load scan->opaque > mov r15d,DWORD PTR [rax+0x70] ; Load so->dir > > A version with the second patch applied still loads a pointer passed > by the _bt_checkkeys caller (_bt_readpage), but doesn't have to chase > another pointer to get to it. Maybe this significantly ameliorates > execution port pressure in the cases where I see a speedup?
I found a way to further speed up the queries that the second patch already helped with, following profiling with perf: if _bt_readpage takes a local copy of scan->ignore_killed_tuples when first called, and then uses that local copy within its per-tuple loop (instead of using scan->ignore_killed_tuples directly), it gives me an additional 1% speedup over what I reported earlier today. In other words, the range/BETWEEN pgbench variant I summarized earlier today goes from being about 4.5% faster than master, to being about ~5.5% faster than master. Testing has also shown that the ignore_killed_tuples enhancement doesn't significantly change the picture with other types of queries (such as the default pgbench SELECT). In short, this ignore_killed_tuples change makes the second patch from v1 more effective, seemingly by further ameliorating the same bottleneck. Apparently accessing scan->ignore_killed_tuples created another load-use hazard in the same tight inner loop (the per-tuple _bt_readpage loop). Which matters with these queries, where we don't need to do very much work per-tuple (_bt_readpage's pstate.startikey optimization is as effective as possible here) and have quite a few tuples (2,000 tuples) that need to be returned by each test query run. Since this ignore_killed_tuples change is also very simple, and also seems like an easy win, I think that it can be committed as part of the second patch. Without it needing to wait for too much more performance validation. Attached are 2 text files showing pgbench output/summary info, generated by my test script (both are from runs that took place within the last 2 hours). One of these result sets just confirms what I reported earlier on, with an unmodified v1 patchset. The other set of results/file shows detailed results for the v1 patchset with the ignore_killed_tuples change also applied, for the same pgbench config/workload. This second file gives full details to back up my "~5.5% faster than master" claim. The pgbench script used for this is as follows: \set aid random_exponential(1, 100000 * :scale, 3.0) \set endrange :aid + 2000 SELECT abalance FROM pgbench_accounts WHERE aid between :aid AND :endrange; I'm deliberately not attaching a new v2 for this ignore_killed_tuples change right now. The first patch is a few hundred KBs, and I don't want this email to get held up in moderation. Though I will attach the ignore_killed_tuples change in its own patch, which I've also attached (with a .txt extension, just to avoid confusing CFTester). -- Peter Geoghegan
From 19ec9bd82d6cd6369c3413e10234c17fd9973df4 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <[email protected]> Date: Sat, 6 Dec 2025 16:28:19 -0500 Subject: [PATCH 3/3] Use ignore_killed_tuples local variable --- src/backend/access/nbtree/nbtreadpage.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/backend/access/nbtree/nbtreadpage.c b/src/backend/access/nbtree/nbtreadpage.c index 540d172cc..7f9c66b8b 100644 --- a/src/backend/access/nbtree/nbtreadpage.c +++ b/src/backend/access/nbtree/nbtreadpage.c @@ -141,7 +141,8 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum, OffsetNumber minoff; OffsetNumber maxoff; BTReadPageState pstate; - bool arrayKeys; + bool arrayKeys, + ignore_killed_tuples = scan->ignore_killed_tuples; int itemIndex, indnatts; @@ -246,7 +247,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum, * If the scan specifies not to return killed tuples, then we * treat a killed tuple as not passing the qual */ - if (scan->ignore_killed_tuples && ItemIdIsDead(iid)) + if (ignore_killed_tuples && ItemIdIsDead(iid)) { offnum = OffsetNumberNext(offnum); continue; @@ -404,7 +405,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum, * uselessly advancing to the page to the left. This is similar * to the high key optimization used by forward scans. */ - if (scan->ignore_killed_tuples && ItemIdIsDead(iid)) + if (ignore_killed_tuples && ItemIdIsDead(iid)) { if (offnum > minoff) { -- 2.51.0
v1-plus-ignore_killed_tuples-change.out
Description: Binary data
v1-only.out
Description: Binary data
