On Fri, Dec 5, 2025 at 5:40 PM Nathan Bossart <[email protected]>
wrote:
> I don't think the proposed improvements are relevant for either of the
> machines you used for your benchmarks.  For x86, we've optimized our
> popcount code to use SSE4.2 or AVX-512, and for AArch64, we've optimized
it
> to use Neon or SVE.  And for other systems, we still try to use
> __builtin_popcount() and friends in the fallback paths, which IIUC are
> available on both gcc and clang (and maybe elsewhere).  IMHO we need to
run
> the benchmarks on a compiler/architecture combination where it would
> actually be used in practice.

Yes, I saw that the code is on a rather obscure path, but those machines
were my only options for quick benchmarks.
I reasoned that the code path still exists, and eliminating branching there
would be beneficial anyway
(most probably). But you are right, we need to test it on target
architectures/compilers. I'll try to do with that.

---
Cheers,
Andy

Reply via email to