On Thu, Jan 22, 2026 at 3:23 AM Nathan Bossart <[email protected]> wrote: > > Committed, thanks for reviewing.
Sure. Now that that's in place, I wanted to brainstorm more refactoring/rationalization ideas that seem on-topic for the thread but have less clear payoff: 1) Nowadays, the only global call sites of the word-sized functions are select_best_grantor() and in bitmapsets. The latter calls the word-sized functions in a loop (could be just one word). It may be more efficient to calculate the size in bytes and call pg_popcount(). Then we could get rid of all the pointer indirection for the word-sized functions. 2) The x86 byte buffer variants expend a lot of effort to detect whether the buffer is aligned on both 64- and 32-bit platforms, with an optimized path for each. At least 64-bit doesn't care about alignment, and 32-bit doesn't warrant anything fancier than pure C. Simultaneously, the aarch64 equivalent doesn't seem to take care about alignment. (I think Nathan mentioned he didn't see a difference during testing, but I wonder how universal that is). 3) There is repeated code for the <8 bytes case, and the tail of the "optimized" functions. I'm also not sure why the small case is inlined everywhere. -- John Naylor Amazon Web Services
