On Thu, Jan 22, 2026 at 3:23 AM Nathan Bossart <[email protected]> wrote:
>
> Committed, thanks for reviewing.

Sure. Now that that's in place, I wanted to brainstorm more
refactoring/rationalization ideas that seem on-topic for the thread
but have less clear payoff:

1) Nowadays, the only global call sites of the word-sized functions
are select_best_grantor() and in bitmapsets. The latter calls the
word-sized functions in a loop (could be just one word). It may be
more efficient to calculate the size in bytes and call pg_popcount().
Then we could get rid of all the pointer indirection for the
word-sized functions.

2) The x86 byte buffer variants expend a lot of effort to detect
whether the buffer is aligned on both 64- and 32-bit platforms, with
an optimized path for each. At least 64-bit doesn't care about
alignment, and 32-bit doesn't warrant anything fancier than pure C.
Simultaneously, the aarch64 equivalent doesn't seem to take care about
alignment. (I think Nathan mentioned he didn't see a difference during
testing, but I wonder how universal that is).

3) There is repeated code for the <8 bytes case, and the tail of the
"optimized" functions. I'm also not sure why the small case is inlined
everywhere.

--
John Naylor
Amazon Web Services


Reply via email to