On Fri, Mar 12, 2021 at 9:14 AM Amit Khandekar <amitdkhan...@gmail.com> wrote: > > On my Arm64 VM : > > HEAD : > mixed | ascii > -------+------- > 1091 | 628 > (1 row) > > PATCHED : > mixed | ascii > -------+------- > 681 | 119
Thanks for testing! Good, the speedup is about as much as I can hope for using plain C. In the next patch I'll go ahead and squash in the ascii fast path, using 16-byte stride, unless there are objections. I claim we can live with the regression Heikki found on an old 32-bit Arm platform since it doesn't seem to be true of Arm in general. > I guess, if at all we use the equivalent Arm NEON intrinsics, the > "mixed" figures will be close to the "ascii" figures, going by your > figures on x86. I would assume so. > I was not thinking about auto-vectorizing the code in > pg_validate_utf8_sse42(). Rather, I was considering auto-vectorization > inside the individual helper functions that you wrote, such as > _mm_setr_epi8(), shift_right(), bitwise_and(), prev1(), splat(), If the PhD holders who came up with this algorithm thought it possible to do it that way, I'm sure they would have. In reality, simdjson has different files for SSE4, AVX, AVX512, NEON, and Altivec. We can incorporate any of those as needed. That's a PG15 project, though, and I'm not volunteering. -- John Naylor EDB: http://www.enterprisedb.com