On Tue, 19 Mar 2024 at 06:30, Nathan Bossart <nathandboss...@gmail.com> wrote: > Here is a more fleshed-out version of what I believe David is proposing. > On my machine, the gains aren't quite as impressive (~8.8s to ~5.2s for the > test_popcount benchmark). I assume this is because this patch turns > pg_popcount() into a function pointer, which is what the AVX512 patches do, > too. I left out the 32-bit section from pg_popcount_fast(), but I'll admit > that I'm not yet 100% sure that we can assume we're on a 64-bit system > there.
I looked at your latest patch and tried out the performance on a Zen4 running windows and a Zen2 running on Linux. As follows: AMD 3990x: master: postgres=# select drive_popcount(10000000, 1024); Time: 11904.078 ms (00:11.904) Time: 11907.176 ms (00:11.907) Time: 11927.983 ms (00:11.928) patched: postgres=# select drive_popcount(10000000, 1024); Time: 3641.271 ms (00:03.641) Time: 3610.934 ms (00:03.611) Time: 3663.423 ms (00:03.663) AMD 7945HX Windows master: postgres=# select drive_popcount(10000000, 1024); Time: 9832.845 ms (00:09.833) Time: 9844.460 ms (00:09.844) Time: 9858.608 ms (00:09.859) patched: postgres=# select drive_popcount(10000000, 1024); Time: 3427.942 ms (00:03.428) Time: 3364.262 ms (00:03.364) Time: 3413.407 ms (00:03.413) The only thing I'd question in the patch is in pg_popcount_fast(). It looks like you've opted to not do the 32-bit processing on 32-bit machines. I think that's likely still worth coding in a similar way to how pg_popcount_slow() works. i.e. use "#if SIZEOF_VOID_P >= 8". Probably one day we'll remove that code, but it seems strange to have pg_popcount_slow() do it and not pg_popcount_fast(). > IMHO this work is arguably a prerequisite for the AVX512 work, as turning > pg_popcount() into a function pointer will likely regress performance for > folks on systems without AVX512 otherwise. I think so too. David