bufferiszero: Add simd acceleration for aarch64

Richard Henderson Thu, 15 Feb 2024 13:12:35 -0800

On 2/15/24 08:46, Alexander Monakov wrote:

Right, so we can pick the cheapest reduction method, and if I'm reading
Neoverse-N1 SOG right, SHRN is marginally cheaper than ADDV (latency 2
instead of 3), and it should be generally preferable on other cores, no?


Fair.

For that matter, cannot UQXTN (unsigned saturating extract narrow) be
used in place of CMEQ+ADDV here?


Interesting.  I hadn't thought about using saturation to preserve non-zeroness 
like that.

Using 1 4-cycle insn instead of 2 2-cycle insns is interesting as well. I suppose, sinceit's at the end of the dependency chain, the fact that it is restricted to the V1 pipematters not at all.

r~

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

Reply via email to