Looks good, the code is more readable now.

> For both Neon and SVE, I do see improvements with looping over 4
 > registers at a time, so IMHO it's worth doing so even if it performs the
 > same as 2-register blocks on some hardware.

There was no regression on Graviton 3 when using the 4-register version so can 
keep it.

-Chiranmoy

Reply via email to