On Mon, Jan 26, 2026 at 12:03 PM David Rowley <[email protected]> wrote: > On Sat, 24 Jan 2026 at 19:21, Amit Langote <[email protected]> wrote: > > On Sat, Jan 24, 2026 at 5:16 AM Andres Freund <[email protected]> wrote: > > > We can use a pg_assume() in table_scan_getnextslot() to make the > > > compiler > > > understand. > > > > Something like this? > > > > result = sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, > > slot); > > pg_assume(result == !TupIsNull(slot)); > > return result; > > > > I assume this relies on table_scan_getnextslot() being inlined into > > ExecScanExtended()? > > I looked at the objdump of this, and it does seem to get rid of the extra > check. > > I did: > > cd src/backend/executor > gcc -Wall -Wmissing-prototypes -Wpointer-arith > -Wdeclaration-after-statement -Werror=vla -Wendif-labels > -Wmissing-format-attribute -Wimplicit-fallthrough=3 > -Wcast-function-type -Wshadow=compatible-local -Wformat-security > -Wmissing-variable-declarations -fno-strict-aliasing -fwrapv > -fexcess-precision=standard -Wno-format-truncation > -Wno-stringop-truncation -O2 -I../../../src/include -D_GNU_SOURCE -g > -c nodeSeqscan.c > objdump -d -M intel -S nodeSeqscan.o > nodeSeqscan_pg_assume.s > > Looking at ExecSeqScanWithQual, I see: > > master: > return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot); > 22c: 48 8b 07 mov rax,QWORD PTR [rdi] > 22f: 4c 89 f2 mov rdx,r14 > 232: 44 89 fe mov esi,r15d > 235: 48 8b 80 40 01 00 00 mov rax,QWORD PTR [rax+0x140] > 23c: ff 50 28 call QWORD PTR [rax+0x28] > if (table_scan_getnextslot(scandesc, direction, slot)) > 23f: 84 c0 test al,al <-- *** this test and the > subsequent jump equal are removed > 241: 74 6d je 2b0 <ExecSeqScanWithQual+0x100> > if (TupIsNull(slot)) > 243: 41 f6 46 04 02 test BYTE PTR [r14+0x4],0x2 > 248: 75 69 jne 2b3 <ExecSeqScanWithQual+0x103> > 24a: 48 8b 45 28 mov rax,QWORD PTR [rbp+0x28] > > And with your pg_assume added: > result = sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot); > 22c: 48 8b 07 mov rax,QWORD PTR [rdi] > 22f: 4c 89 f2 mov rdx,r14 > 232: 44 89 fe mov esi,r15d > 235: 48 8b 80 40 01 00 00 mov rax,QWORD PTR [rax+0x140] > 23c: ff 50 28 call QWORD PTR [rax+0x28] > pg_assume(result == !TupIsNull(slot)); > 23f: 41 f6 46 04 02 test BYTE PTR [r14+0x4],0x2 > 244: 75 62 jne 2a8 <ExecSeqScanWithQual+0xf8> > 246: 48 8b 45 28 mov rax,QWORD PTR [rbp+0x28] > > I didn't test the performance.
Thanks for sharing that. I tried my patch over your committed SeqNext inlining patch and ran the following benchmark but didn't notice in material difference: CREATE TABLE t (a int); INSERT INTO t SELECT generate_series(1, 1000000); ANALYZE t; SET max_parallel_workers_per_gather = 0; SELECT * FROM t WHERE a = -1; Perhaps not too surprising given it's just eliminating a couple of instructions per row that the branch predictor probably handles well anyway? Still seems worth having for code hygiene if nothing else. Same result (no diff in perf) when I apply it over your patch to move the scandesc == NULL check. -- Thanks, Amit Langote
