Hi David, Thanks for looking at it!
> I would like to test if I can reproduce your results. Could you share > your test program? Here you go: https://gist.github.com/dAdAbird/1480ff15764f5a6301174806d8512a3a > You also don't specify an optimization level. That means the default > level -O0 is used. Does it make it a difference if you run e.g. with -O3? Yes, good point. I got carried away with experiments and totally forgot that.. I've re-run tests with -O3 and it seems like the Summing of Adjacent Bits might make sense for uint64: X86 ``` % g++ pop.cc -std=c++11 -isystem benchmark/include -L../benchmark/build/src -lbenchmark -lpthread -O3 -o mybenchmark && ./mybenchmark 2025-12-05T16:22:43+02:00 Running ./mybenchmark Run on (12 X 3200 MHz CPU s) CPU Caches: L1 Data 32 KiB L1 Instruction 32 KiB L2 Unified 256 KiB (x6) L3 Unified 12288 KiB Load Average: 2.67, 4.72, 2.90 -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- Popcnt64_BranchlessLookup 5.92 ns 5.90 ns 113871130 Popcnt64_AdjacentBits 5.30 ns 5.27 ns 115084258 Popcnt64_PG 8.34 ns 8.32 ns 80622869 Popcnt32_BranchlessLookup 3.62 ns 3.61 ns 197816110 Popcnt32_AdjacentBits 4.56 ns 4.55 ns 154932383 Popcnt32_PG 5.32 ns 5.31 ns 121845083 ``` M1 ``` $ g++ popcnt.cc -std=c++11 -isystem benchmark/include -L../benchmark/build/src -lbenchmark -lpthread -O3 -o mybenchmark && ./mybenchmark 2025-12-05T14:29:18+00:00 Running ./mybenchmark Run on (10 X 48 MHz CPU s) CPU Caches: L1 Data 128 KiB (x10) L1 Instruction 192 KiB (x10) L2 Unified 12288 KiB (x10) Load Average: 0.01, 0.01, 0.00 -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- Popcnt64_BranchlessLookup 2.13 ns 2.13 ns 342101937 Popcnt64_AdjacentBits 1.89 ns 1.89 ns 370316571 Popcnt64_PG 3.83 ns 3.83 ns 190990088 Popcnt32_BranchlessLookup 1.19 ns 1.19 ns 591797267 Popcnt32_AdjacentBits 1.73 ns 1.73 ns 409381266 Popcnt32_PG 2.01 ns 2.01 ns 344969625 ``` --- Cheers, Andy
