On Wed, Jul 21, 2021 at 11:29 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
> Just for fun/experimentation, here's a quick (and probably too naive) > translation of those helper functions to NEON, on top of the v15 > patch. Neat! It's good to make it more architecture-agnostic, and I'm sure we can use quite a bit of this. I don't know enough about NEON to comment intelligently, but a quick glance through the simdjson source show a couple differences that might be worth a look: to_bool(const pg_u8x16_t v) { +#if defined(USE_NEON) + return vmaxvq_u32((uint32x4_t) v) != 0; --> return vmaxvq_u8(*this) != 0; vzero() { +#if defined(USE_NEON) + return vmovq_n_u8(0); --> return vdupq_n_u8(0); // or equivalently, splat(0) is_highbit_set(const pg_u8x16_t v) { +#if defined(USE_NEON) + return to_bool(bitwise_and(v, vmovq_n_u8(0x80))); --> return vmaxq_u8(v) > 0x7F (Technically, their convention is: is_ascii(v) { return vmaxq_u8(v) < 0x80; } , but same effect) +#if defined(USE_NEON) +static pg_attribute_always_inline pg_u8x16_t +vset(uint8 v0, uint8 v1, uint8 v2, uint8 v3, + uint8 v4, uint8 v5, uint8 v6, uint8 v7, + uint8 v8, uint8 v9, uint8 v10, uint8 v11, + uint8 v12, uint8 v13, uint8 v14, uint8 v15) +{ + uint8 pg_attribute_aligned(16) values[16] = { + v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15 + }; + return vld1q_u8(values); +} --> They have this strange beast instead: // Doing a load like so end ups generating worse code. // uint8_t array[16] = {x1, x2, x3, x4, x5, x6, x7, x8, // x9, x10,x11,x12,x13,x14,x15,x16}; // return vld1q_u8(array); uint8x16_t x{}; // incredibly, Visual Studio does not allow x[0] = x1 x = vsetq_lane_u8(x1, x, 0); x = vsetq_lane_u8(x2, x, 1); x = vsetq_lane_u8(x3, x, 2); ... x = vsetq_lane_u8(x15, x, 14); x = vsetq_lane_u8(x16, x, 15); return x; Since you aligned the array, that might not have the problem alluded to above, and it looks nicer. -- John Naylor EDB: http://www.enterprisedb.com