Re: unnecessary executor overheads around seqscans

Amit Langote Sun, 25 Jan 2026 23:48:06 -0800

On Mon, Jan 26, 2026 at 12:03 PM David Rowley <[email protected]> wrote:
> On Sat, 24 Jan 2026 at 19:21, Amit Langote <[email protected]> wrote:
> > On Sat, Jan 24, 2026 at 5:16 AM Andres Freund <[email protected]> wrote:
> > >   We can use a pg_assume() in table_scan_getnextslot() to make the 
> > > compiler
> > >   understand.
> >
> > Something like this?
> >
> >     result = sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, 
> > slot);
> >     pg_assume(result == !TupIsNull(slot));
> >     return result;
> >
> > I assume this relies on table_scan_getnextslot() being inlined into
> > ExecScanExtended()?
>
> I looked at the objdump of this, and it does seem to get rid of the extra 
> check.
>
> I did:
>
> cd src/backend/executor
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3
> -Wcast-function-type -Wshadow=compatible-local -Wformat-security
> -Wmissing-variable-declarations -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -Wno-format-truncation
> -Wno-stringop-truncation -O2 -I../../../src/include -D_GNU_SOURCE -g
> -c nodeSeqscan.c
> objdump -d -M intel -S nodeSeqscan.o > nodeSeqscan_pg_assume.s
>
> Looking at ExecSeqScanWithQual, I see:
>
> master:
> return sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
>  22c: 48 8b 07              mov    rax,QWORD PTR [rdi]
>  22f: 4c 89 f2              mov    rdx,r14
>  232: 44 89 fe              mov    esi,r15d
>  235: 48 8b 80 40 01 00 00 mov    rax,QWORD PTR [rax+0x140]
>  23c: ff 50 28              call   QWORD PTR [rax+0x28]
> if (table_scan_getnextslot(scandesc, direction, slot))
>  23f: 84 c0                test   al,al <-- *** this test and the
> subsequent jump equal are removed
>  241: 74 6d                je     2b0 <ExecSeqScanWithQual+0x100>
> if (TupIsNull(slot))
>  243: 41 f6 46 04 02        test   BYTE PTR [r14+0x4],0x2
>  248: 75 69                jne    2b3 <ExecSeqScanWithQual+0x103>
>  24a: 48 8b 45 28          mov    rax,QWORD PTR [rbp+0x28]
>
> And with your pg_assume added:
> result = sscan->rs_rd->rd_tableam->scan_getnextslot(sscan, direction, slot);
>  22c: 48 8b 07              mov    rax,QWORD PTR [rdi]
>  22f: 4c 89 f2              mov    rdx,r14
>  232: 44 89 fe              mov    esi,r15d
>  235: 48 8b 80 40 01 00 00 mov    rax,QWORD PTR [rax+0x140]
>  23c: ff 50 28              call   QWORD PTR [rax+0x28]
> pg_assume(result == !TupIsNull(slot));
>  23f: 41 f6 46 04 02        test   BYTE PTR [r14+0x4],0x2
>  244: 75 62                jne    2a8 <ExecSeqScanWithQual+0xf8>
>  246: 48 8b 45 28          mov    rax,QWORD PTR [rbp+0x28]
>
> I didn't test the performance.


Thanks for sharing that.

I tried my patch over your committed SeqNext inlining patch and ran
the following benchmark but didn't notice in material difference:

CREATE TABLE t (a int);
INSERT INTO t SELECT generate_series(1, 1000000);
ANALYZE t;
SET max_parallel_workers_per_gather = 0;
SELECT * FROM t WHERE a = -1;

Perhaps not too surprising given it's just eliminating a couple of
instructions per row that the branch predictor probably handles well
anyway? Still seems worth having for code hygiene if nothing else.

Same result (no diff in perf) when I apply it over your patch to move
the scandesc == NULL check.

-- 
Thanks, Amit Langote

Re: unnecessary executor overheads around seqscans

Reply via email to