> On Jan 19, 2026, at 06:13, David Rowley <[email protected]> wrote:
> 
> On Fri, 2 Jan 2026 at 18:58, David Rowley <[email protected]> wrote:
>> Please find attached an updated set of patches. A rebase was needed,
>> plus 0003 had a problem with an Assert not handling the bitmap being a
>> NULL pointer.
> 
> Another rebase and updates to some newly created missing calls to
> TupleDescFinalize().
> 
> I've also attached another round of benchmarks after dipping into some
> Azure machines to cover my lack of any Intel benchmark results. I
> think these are somewhat noisy as I opted for low core-count instances
> which will have L3 shared with workloads running for other people.
> This is most evident in Xeon_E5-2673 with gcc where the patched run
> was nearly twice as fast as unpatched for test 2 on 20 extra columns.
> If you look at the raw results from that, you can see the times are
> quite unstable between the 3 runs of each test, which makes me believe
> that the machine was busy with other work when that test ran on
> master. The AMD3990x and M2 machines are all sitting next to me and
> were otherwise idle, so they should be much more stable.
> 
> Quite a few machines have a small regression for the 0 extra column
> tests. There is a small amount of extra work being done in the
> deforming function to check if the attnum < the first attribute
> without an attcacheoff. This mostly only affects the tests that don't
> do any deforming with a cached attcacheoff, e.g due to NULLs or
> varlena types. The only way I've thought about to possibly reduce that
> is to invent a new TupleTableSlotOps and pick the one that applies
> when creating the TupleTableSlot. This doesn't appeal to me very much
> as it requires modifying many callsites. But I do wonder if we should
> try to come up with something here as technically we could use this to
> eliminate alignment padding out of some MinimalTuples in some cases
> where these were not directly derived from pre-formed HeapTuples. That
> could allow a more compact tuple representation for sorting and
> hashing, allowing us to do more with less memory in some cases.
> 
> The benchmark results also indicated that there wasn't much advantage
> to the 0002+0003 patches, so I've removed those from the set. That
> reduces some complexity around the benchmarks. I did still keep the
> OPTIMIZE_BYVAL loop as separate results. It's not quite clear what's
> best there as machines seem to vary on which they prefer.
> 
> Benchmark results attached in the bz2 file both in spreadsheet form
> and the raw results pg_dumped.
> 
> David
> <v3-0001-Precalculate-CompactAttribute-s-attcacheoff.patch><deform_results2.tar.bz2>

Hi David,

I reviewed the patch and traced some basic workflows. But I haven’t done a load 
test to compare performance differences with and without this patch, I will do 
that if I get some bandwidth later. Here comes some review comments:

1 - tupmacs.h
```
+       /* Create a mask with all bits beyond natts's bit set to off */
+       mask = 0xFF & ((((uint8) 1) << (natts & 7)) - 1);
+       byte = (~bits[lastByte]) & mask;
```

When I read the code, I got an impression bits[lastByte] might overflow when 
natts % 8 == 0, so I traced the code, then I realized that, this function is 
only called when a row has null values, so that, when reaching here, natts % 8 
!= 0, otherwise it should return earlier within the for loop.

So, to avoid future reader’s same confusion, can we add a brief comment to 
explain that no overflow should happen here.

2 - After this patch, nocachegetattr() and nocache_index_getattr() strictly 
rely on tupleDesc->firstNonCachedOffAttr to work:
```
        if (tupleDesc->firstNonCachedOffAttr >= 0)
        {
                startAttr = Min(tupleDesc->firstNonCachedOffAttr - 1, 
firstnullattr);
                off = TupleDescCompactAttr(tupleDesc, startAttr)->attcacheoff;
        }
        else
        {
                startAttr = 0;
                off = 0;
        }
```

And tupleDesc->firstNonCachedOffAttr is only set by TupleDescFinalize(). So, 
assuming some code misses to call TupleDescFinalize(), looking at how TupleDesc 
is created, for example CreateTemplateTupleDesc():
```
        desc = (TupleDesc) palloc(offsetof(struct TupleDescData, compact_attrs) 
+
                                                          natts * 
sizeof(CompactAttribute) +
                                                          natts * 
sizeof(FormData_pg_attribute));

        /*
         * Initialize other fields of the tupdesc.
         */
        desc->natts = natts;
        desc->constr = NULL;
        desc->tdtypeid = RECORDOID;
        desc->tdtypmod = -1;
        desc->tdrefcount = -1;          /* assume not reference-counted */

        return desc;
```

It’s palloc and not palloc0, so desc->firstNonCachedOffAttr will initially hold 
a random value. As long as TupleDescFinalize() is missed, then that’s a bug.

From this perspective, I think we can set firstNonCachedOffAttr to -2 when in 
CreateTemplateTupleDesc() as well as other functions that create a TupleDesc. 
Then in nocachegetattr() and nocache_index_getattr(), we can 
Assert(desc->firstNonCachedOffAttr > -2).

3
```
+               firstNonCachedOffAttr = i + 1;
```

In TupleDescFinalize(), given firstNonCachedOffAttr = i + 1, 
firstNonCachedOffAttr will never be 0.

But in nocachegetattr(), it checks firstNonCachedOffAttr >= 0:
```
        if (tupleDesc->firstNonCachedOffAttr >= 0)
        {
                startAttr = Min(tupleDesc->firstNonCachedOffAttr - 1, 
firstnullattr);
                off = TupleDescCompactAttr(tupleDesc, startAttr)->attcacheoff;
        }
```

This is kinda inconsistent, and may potentially lead to some confusion to code 
readers.

From the meaning of the variable name “firstNonCachedOffAttr”, when there is no 
cached attribute, firstNonCachedOffAttr feels better to be 0 rather than-1. 
From this perspective, TupleDescFinalize() can initialize 
desc->firstNonCachedOffAttr to 0. And for my comment 2, we can use -1 instead 
of -2, so that -1 indicates TupleDescFinalize() is not called, 0 means no 
cached attribute, >0 means some cached attributes. 

4
```
+                * possibily could cache the first attlen == -2 attr.  
Worthwhile?
```

Typo: possibily -> possibly

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Reply via email to