Re: tuple radix sort

John Naylor Thu, 12 Feb 2026 18:32:57 -0800

On Wed, Feb 11, 2026 at 5:25 PM zengman <[email protected]> wrote:
> I'm wondering if we should replace `state->memtuples` with `data` in the 
> `sort_byvalue_datum()` function here.
> ```
>         if (nulls_first)
>         {
>                 null_start = state->memtuples;

Sounds good.

On Wed, Feb 11, 2026 at 6:47 PM cca5507 <[email protected]> wrote:
> +1. Now data == state->memtuples, how about remove the parameter "data" and 
> "n" and just
> get them from "Tuplesortstate"?

The parameter list currently looks like a sort function. I'm not sure
why you think that's objectionable?

> Some comments for v7:
>
> 1)
>
> ```
> /*
>  * Retrieve byte from datum, indexed by 'level': 0 for LSB, 7 for MSB
>  */
> static inline uint8
> current_byte(Datum key, int level)
> {
>         int                     shift = (SIZEOF_DATUM - 1 - level) * 
> BITS_PER_BYTE;
>
>         return (key >> shift) & 0xFF;
> }
> ```
>
> Maybe "0 for MSB, 7 for LSB"? If level == 0, this function will return the 
> Most Significant Byte.

Will fix, thanks.

> 2) radix_sort_tuple()
>
> ```
>                 size_t          end_offset = partitions[*rp].next_offset;
>                 SortTuple  *partition_end = begin + end_offset;
>                 ptrdiff_t       num_elements = end_offset - start_offset;
> ```
>
> Why the type of "num_elements" is "ptrdiff_t"? Maybe just "size_t"?

That's from upstream. It's pretty rare in our codebase for in-core
code, so I'll go ahead and change it.

> 3) tuplesort_sort_memtuples()
>
> ```
>                 /*
>                  * Do we have the leading column's value or abbreviation in 
> datum1?
>                  */
>                 if (state->base.haveDatum1 && state->base.sortKeys)
>                 {
>                         SortSupportData ssup = state->base.sortKeys[0];
> ```
>
> I think we should avoid the copy of SortSupportData. We can just use a 
> pointer.

Will do, for consistency.

> 4)
>
> Many places just consider "Datum" as integer, do we need to add a 
> DatumGetUInt**() for them?

It already does that for the case where it specifically needs uint32.
For the general case, I don't think so. We can treat a Datum as an
unsigned integer and don't care about the actual size or what it
contains. There is one exception that I can see: 0002 has a call to
pg_leftmost_one_pos64(), so a macro for that would be more appropriate
than an implicit cast.

Now I'm wondering if 0002 should normalize the datum as it determines
the common prefix. That should only matter for int32 where the input
has a mix of positive and negative, but it could be worth the
additional calculation. I'll hold off on committing 0002 for a bit
longer, but 0001 (plus the changes that I've agreed to) is still on
schedule.

Also, looking at commit 6aebedc38, it's preferred to use
"sizeof(Datum)" rather than the vestigial SIZEOF_DATUM, so I'll change
that in 0001/2.

--
John Naylor
Amazon Web Services

Re: tuple radix sort

Reply via email to