Re: [PATCH] Hex-coding optimizations using SVE on ARM.

John Naylor Sat, 18 Oct 2025 09:48:12 -0700

On Fri, Sep 26, 2025 at 1:50 AM Nathan Bossart <[email protected]> wrote:
>
> On Thu, Sep 25, 2025 at 09:16:35PM +0700, John Naylor wrote:
> > (Separately, now I'm wondering if we can do the same for
> > vector8_has_le since _mm_min_epu8 and vminvq_u8 both exist, and that
> > would allow getting rid of )
>
> I think so.  I doubt there's any performance advantage, but it could be
> nice for code cleanup.  (I'm assuming you meant to say vector8_ssub
> (renamed to vector8_ussub() in the patch) after "getting rid of.")


Yes right, sorry. And it seems good to do such cleanup first, since it
doesn't make sense to rename something that is about to be deleted.

> I think I disagree on this one.  We're not broadcasting arbitrary bytes for
> every vector element, we're broadcasting a patten of bytes that happens to
> be wider than the element size.  I would expect this to be a relatively
> common use-case.

That's probably true. I'm still worried that the hack for working
around compiler pickiness (while nice enough in it's current form)
might break at some point and require awareness of compiler versions.

Hmm, for this case, we can sidestep the maintainability questions
entirely by instead using the new interleave functions to build the
masks:

vector8_interleave_low(vector8_zero(), vector8_broadcast(0x0f))
vector8_interleave_low(vector8_broadcast(0x0f), vector8_zero())

This generates identical code as v12 on Arm and is not bad on x86.
What do you think of the attached?

While looking around again, it looks like the "msk" variable isn't a
mask like the implies to me. Not sure of a better name because I'm not
sure what it represents aside from a temp variable.

+#elif defined(USE_NEON)
+ switch (i)
+ {
+ case 4:
+ return (Vector8) vshrq_n_u32((Vector32) v1, 4);
+ case 8:
+ return (Vector8) vshrq_n_u32((Vector32) v1, 8);
+ default:
+ pg_unreachable();
+ return vector8_broadcast(0);
+ }

This is just a compiler hint, so if the input is not handled I think
it will return the wrong answer rather than alerting the developer, so
we probabaly want "Assert(false)" here.

Other than that, the pack/unpack functions could use some
documentation about which parameter is low/high.

-- 
John Naylor
Amazon Web Services

diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 87126a003b3..db734481a60 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -315,6 +315,8 @@ static inline bool
 hex_decode_simd_helper(const Vector8 src, Vector8 *dst)
 {
        Vector8         sub;
+       Vector8         mask_hi = vector8_interleave_low(vector8_zero(), 
vector8_broadcast(0x0f));
+       Vector8         mask_lo = 
vector8_interleave_low(vector8_broadcast(0x0f), vector8_zero());
        Vector8         msk;
        bool            ret;
 
@@ -332,9 +334,9 @@ hex_decode_simd_helper(const Vector8 src, Vector8 *dst)
        *dst = vector8_issub(src, sub);
        ret = !vector8_has_ge(*dst, 0x10);
 
-       msk = vector8_and(*dst, vector8_broadcast_u32(0x0f000f00));
+       msk = vector8_and(*dst, mask_hi);
        msk = vector8_shift_right(msk, 8);
-       *dst = vector8_and(*dst, vector8_broadcast_u32(0x000f000f));
+       *dst = vector8_and(*dst, mask_lo);
        *dst = vector8_shift_left(*dst, 4);
        *dst = vector8_or(*dst, msk);
        return ret;
diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index 0261179e9e7..e316233b7aa 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -101,6 +101,17 @@ static inline Vector8 vector8_min(const Vector8 v1, const 
Vector8 v2);
 static inline Vector32 vector32_eq(const Vector32 v1, const Vector32 v2);
 #endif
 
+/* return a zeroed register */
+static inline Vector8
+vector8_zero()
+{
+#if defined(USE_SSE2)
+       return _mm_setzero_si128();
+#elif defined(USE_NEON)
+       return vmovq_n_u8(0);
+#endif
+}
+
 /*
  * Load a chunk of memory into the given vector.
  */
@@ -170,24 +181,6 @@ vector32_broadcast(const uint32 c)
 }
 #endif                                                 /* ! USE_NO_SIMD */
 
-/*
- * Some compilers are picky about casts to the same underlying type, and others
- * are picky about implicit conversions with vector types.  This function does
- * the same thing as vector32_broadcast(), but it returns a Vector8 and is
- * carefully crafted to avoid compiler indigestion.
- */
-#ifndef USE_NO_SIMD
-static inline Vector8
-vector8_broadcast_u32(const uint32 c)
-{
-#ifdef USE_SSE2
-       return vector32_broadcast(c);
-#elif defined(USE_NEON)
-       return (Vector8) vector32_broadcast(c);
-#endif
-}
-#endif                                                 /* ! USE_NO_SIMD */
-
 /*
  * Return true if any elements in the vector are equal to the given scalar.
  */
@@ -577,8 +570,9 @@ vector8_interleave_high(const Vector8 v1, const Vector8 v2)
 static inline Vector8
 vector8_pack_16(const Vector8 v1, const Vector8 v2)
 {
-       Vector32        mask PG_USED_FOR_ASSERTS_ONLY = 
vector32_broadcast(0xff00ff00);
+       Vector8         mask PG_USED_FOR_ASSERTS_ONLY;
 
+       mask = vector8_interleave_low(vector8_zero(), vector8_broadcast(0xff));
        Assert(!vector8_has_ge(vector8_and(v1, mask), 1));
        Assert(!vector8_has_ge(vector8_and(v2, mask), 1));
 #ifdef USE_SSE2

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

Reply via email to