Re: More speedups for tuple deformation

David Rowley Tue, 24 Feb 2026 17:00:21 -0800

Thanks for looking.

On Wed, 25 Feb 2026 at 03:39, Andres Freund <[email protected]> wrote:
> ISTM we should just merge 0004. In my testing it's a very clear win, without,
> afaict, any downsides.


I'd like to get them in in sequence as I believe 0004 buys back some
extra overheads such as the Min()s in slot_deform_heap_tuple(). If I
were to do 0004 first, then wait a while, it might look more like I'm
introducing a small regression.

> > I'm not getting great results from benchmarking the 0005 patch. I
> > verified that gcc does access the array without calculating the
> > element address from scratch each time and calculates it once, then
> > increments the pointer by sizeof(CompactAttribute). See the attached
> > .csv for the results on the 3 machines I tested on.
>
> FWIW, where I had seen that be rather beneficial is the TupleDescCompactAttr()
> at the start of the various loops, where the compiler has little choice to
> compute the address of the tupdesc->compact_attrs[firstNeededCol].  That
> matters only when only deforming a small number of columns, of course.

oh ok. I wasn't aware that LEA's scaling factor can only be 1,2 4 or
8. With the 8-byte struct, the compiler should be able to do the shift
and add as one operation, whereas with the 16-byte struct would
require a separate shift and add.

Looking at the generated code, with 0004, I see:

    1c79: 48 c1 e2 04          shl    rdx,0x4
    1c7d: 48 8d 4c 15 20        lea    rcx,[rbp+rdx*1+0x20]

whereas with 0005 I see:

    1c6b: 4a 8d 1c dd 00 00 00 lea    rbx,[r11*8+0x0]

Is that what you meant?

> > I've also resequenced the patches to make the deform_bench test module
> > part of the 0001 patch. This makes it easier to test the performance
> > of master.
>
> What are your thoughts about merging the deform_bench tooling?  I wonder if we
> should have src/test/modules/benchmark_tools or such, so we can add a few more
> micro-benchmarky tools over time?

I'd like to see us give these tools a proper home. It helps lower the
bar for anyone else who'd like to experiment at some future date, and
also allows people to more easily test for performance regressions if
they're forced to change related code. I've also got a tool that
benchmarks the MemoryContext code which I keep in some local repo that
I dig out from time to time. Given that, it's probably unlikely
deform_bench would be the only extension in there if we did make a
directory for these.

On the otherhand, it does add some maintenance overhead, but IMO,
helping to ensure various key routines are optimal is a worthy enough
cause to make the maintenance overhead worthwhile.

David

Re: More speedups for tuple deformation

Reply via email to