On Tue, Aug 3, 2021 at 10:43 PM John Naylor
<john.nay...@enterprisedb.com> wrote:
> (Side note, but sort of related to #1 above: non-x86 platforms have to 
> indirect through a function pointer even though they have no fast 
> implementation to make it worth their while. It would be better for them if 
> the "slow" implementation was called static inline or at least a direct 
> function call, but that's a separate thread.)

+1

I haven't looked into whether we could benefit from it in real use
cases, but it seems like it'd also be nice if pg_popcount() were a
candidate for auto-vectorisation and inlining.  For example, NEON has
vector popcount, and for Intel/AMD there is a shuffle-based AVX2 trick
that at least Clang produces automatically[1].  We're obstructing that
by doing function dispatch at individual word level, and using inline
assembler instead of builtins.

[1] https://arxiv.org/abs/1611.07612


Reply via email to