Re: More speedups for tuple deformation

Andres Freund Tue, 20 Jan 2026 10:59:31 -0800

Hi,

On 2026-01-20 13:11:55 +1300, David Rowley wrote:
> I've attached the v4 patch, which also fixes the LLVM compiler warning
> that I introduced.


I wonder if it's possible to split the patch - it's big enough to be
nontrivial to review...  Perhaps the finalization could be introduced
separately from the patch actually making use of it?

I wonder if we should somehow change the API of tupledesc creation, to make
old code that doesn't have TupleDescFinalize() fail to compile, instead of
just warn...



> diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c 
> b/contrib/pg_buffercache/pg_buffercache_pages.c
> index dcba3fb5473..2fdf5a341f6 100644
> --- a/contrib/pg_buffercache/pg_buffercache_pages.c
> +++ b/contrib/pg_buffercache/pg_buffercache_pages.c
> @@ -174,6 +174,7 @@ pg_buffercache_pages(PG_FUNCTION_ARGS)
>                       TupleDescInitEntry(tupledesc, (AttrNumber) 9, 
> "pinning_backends",
>                                                          INT4OID, -1, 0);
>  
> +             TupleDescFinalize(tupledesc);
>               fctx->tupdesc = BlessTupleDesc(tupledesc);
>  

Think it'd be worth adding an assertion to BlessTupleDesc that
TupleDescFinalize has been called, I think that'll lead to easier to
understand backtraces in a lot of cases. Particularly if you consider cases
where BlessTupleDesc() will create a tupdesc in shared memory, that could then
trigger an assertion failure in a parallel worker or such.
>  /*
>   * slot_deform_heap_tuple
>   *           Given a TupleTableSlot, extract data from the slot's physical 
> tuple
> @@ -1122,78 +1010,167 @@ static pg_attribute_always_inline void
>  slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
>                                          int natts)
>  {
> +     CompactAttribute *cattr;
> +     TupleDesc       tupleDesc = slot->tts_tupleDescriptor;
>       bool            hasnulls = HeapTupleHasNulls(tuple);
> +     HeapTupleHeader tup = tuple->t_data;
> +     bits8      *bp;                         /* ptr to null bitmap in tuple 
> */
>       int                     attnum;
> +     int                     firstNonCacheOffsetAttr;
> +
> +#ifdef OPTIMIZE_BYVAL
> +     int                     firstByRefAttr;
> +#endif
> +     int                     firstNullAttr;
> +     Datum      *values;
> +     bool       *isnull;
> +     char       *tp;                         /* ptr to tuple data */
>       uint32          off;                    /* offset in tuple data */
> -     bool            slow;                   /* can we use/set attcacheoff? 
> */
> +
> +     /* Did someone forget to call TupleDescFinalize()? */
> +     Assert(tupleDesc->firstNonCachedOffAttr >= 0);
>  
>       /* We can only fetch as many attributes as the tuple has. */
> -     natts = Min(HeapTupleHeaderGetNatts(tuple->t_data), natts);
> +     natts = Min(HeapTupleHeaderGetNatts(tup), natts);
> +     attnum = slot->tts_nvalid;
> +     firstNonCacheOffsetAttr = Min(tupleDesc->firstNonCachedOffAttr, natts);
> +
> +     if (hasnulls)
> +     {
> +             bp = tup->t_bits;
> +             firstNullAttr = first_null_attr(bp, natts);
> +             firstNonCacheOffsetAttr = Min(firstNonCacheOffsetAttr, 
> firstNullAttr);
> +     }
> +     else
> +     {
> +             bp = NULL;
> +             firstNullAttr = natts;
> +     }
> +
> +#ifdef OPTIMIZE_BYVAL
> +     firstByRefAttr = Min(firstNonCacheOffsetAttr, 
> tupleDesc->firstByRefAttr);
> +#endif
> +     values = slot->tts_values;
> +     isnull = slot->tts_isnull;
> +     tp = (char *) tup + tup->t_hoff;
> +
> +#ifdef OPTIMIZE_BYVAL
>  
>       /*
> -      * Check whether the first call for this tuple, and initialize or 
> restore
> -      * loop state.
> +      * Many tuples have leading byval attributes, try and process as many of
> +      * those as possible with a special loop that can't handle byref types.
>        */
> -     attnum = slot->tts_nvalid;
> -     if (attnum == 0)
> +     if (attnum < firstByRefAttr)
> +     {
> +             /* Use do/while as we already know we need to loop at least 
> once. */
> +             do
> +             {
> +                     cattr = TupleDescCompactAttr(tupleDesc, attnum);
> +
> +                     Assert(cattr->attcacheoff >= 0);
> +
> +                     /*
> +                      * Hard code byval == true to allow the compiler to 
> remove the
> +                      * byval check when inlining fetch_att().
> +                      */

Maybe add an assert for cattr->attbyval? Just to avoid a bad debugging
experience if somebody tries to extend this logic to
e.g. non-null-fixed-width-byref columns?

I also wonder if we could have assert-only crosschecking of the "real" offsets
against the cached ones?

Greetings,

Andres Freund

Re: More speedups for tuple deformation

Reply via email to