On Sun, Aug 28, 2022 at 10:39:09AM +1200, Thomas Munro wrote: > On Sun, Aug 28, 2022 at 10:12 AM Nathan Bossart > <nathandboss...@gmail.com> wrote: >> Yup. The problem is that AFAICT there's no equivalent to >> _mm_movemask_epi8() on aarch64, so you end up with something like >> >> vmaxvq_u8(vandq_u8(v, vector8_broadcast(0x80))) != 0 >> >> But for pg_lfind32(), we really just want to know if any lane is set, which >> only requires a call to vmaxvq_u32(). I haven't had a chance to look too >> closely, but my guess is that this ultimately results in an extra AND >> operation in the aarch64 path, so maybe it doesn't impact performance too >> much. The other option would be to open-code the intrinsic function calls >> into pg_lfind.h. I'm trying to avoid the latter, but maybe it's the right >> thing to do for now... What do you think? > > Ahh, this gives me a flashback to John's UTF-8 validation thread[1] > (the beginner NEON hackery in there was just a learning exercise, > sadly not followed up with real patches...). He had > _mm_movemask_epi8(v) != 0 which I first translated to > to_bool(bitwise_and(v, vmovq_n_u8(0x80))) and he pointed out that > vmaxvq_u8(v) > 0x7F has the right effect without the and.
I knew there had to be an easier way! I'll give this a try. Thanks. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com