Re: [POC] verifying UTF-8 using SIMD instructions

John Naylor Fri, 12 Mar 2021 07:37:28 -0800

On Fri, Mar 12, 2021 at 9:14 AM Amit Khandekar <[email protected]>
wrote:
>
> On my Arm64 VM :
>
> HEAD :
>  mixed | ascii
> -------+-------
>   1091 |   628
> (1 row)
>
> PATCHED :
>  mixed | ascii
> -------+-------
>    681 |   119


Thanks for testing! Good, the speedup is about as much as I can hope for
using plain C. In the next patch I'll go ahead and squash in the ascii fast
path, using 16-byte stride, unless there are objections. I claim we can
live with the regression Heikki found on an old 32-bit Arm platform since
it doesn't seem to be true of Arm in general.

> I guess, if at all we use the equivalent Arm NEON intrinsics, the
> "mixed" figures will be close to the "ascii" figures, going by your
> figures on x86.

I would assume so.

> I was not thinking about auto-vectorizing the code in
> pg_validate_utf8_sse42(). Rather, I was considering auto-vectorization
> inside the individual helper functions that you wrote, such as
> _mm_setr_epi8(), shift_right(), bitwise_and(), prev1(), splat(),

If the PhD holders who came up with this algorithm thought it possible to
do it that way, I'm sure they would have. In reality, simdjson has
different files for SSE4, AVX, AVX512, NEON, and Altivec. We can
incorporate any of those as needed. That's a PG15 project, though, and I'm
not volunteering.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [POC] verifying UTF-8 using SIMD instructions

Reply via email to