arm: Implement SME2 SQCVT, UQCVT, SQCVTU

Richard Henderson Thu, 03 Jul 2025 08:55:25 -0700

On 7/3/25 04:20, Peter Maydell wrote:

On Wed, 2 Jul 2025 at 13:38, Richard Henderson
<richard.hender...@linaro.org> wrote:


Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
---
  target/arm/tcg/helper-sme.h    |  20 ++++++
  target/arm/tcg/sme_helper.c    | 116 +++++++++++++++++++++++++++++++++
  target/arm/tcg/translate-sme.c |  35 ++++++++++
  target/arm/tcg/sme.decode      |  22 +++++++
  4 files changed, 193 insertions(+)

index d69d57c4cb..906d369d37 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1561,6 +1561,64 @@ void HELPER(sme2_fcvt_n)(void *vd, void *vs, 
float_status *fpst, uint32_t desc)
      }
  }

+#define SQCVT2(NAME, TW, TN, HW, HN, SAT)                       \
+void HELPER(NAME)(void *vd, void *vs, uint32_t desc)            \
+{                                                               \
+    ARMVectorReg scratch;                                       \
+    size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW);    \
+    TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg);               \
+    TN *d = vd;                                                 \
+    if ((vd - vs) < 2 * sizeof(ARMVectorReg)) {                 \


Does this do the right thing if Vd is less than Vs?
Pointer differences are signed, I think, so for eg vd == 0
vs == 16 we unnecessarily use the scratch reg.
Maybe clearer to write
    (vd >= vs && vd < (vs + 2 * sizeof(..))

(Similarly for other use of this condition later in the patch.)


I should probably split out a helper for this, there are so many instances.

+        d = (TN *)&scratch;                                     \
+    }                                                           \
+    for (size_t i = 0; i < n; ++i) {                            \
+        d[HN(i)] = SAT(s0[HW(i)]);                              \
+        d[HN(i) + n] = SAT(s1[HW(i)]);                          \


Should this be HN(i + n) ?

They're equivalent, because n is the whole vector size, and so does not overlap the xor on8-byte endianness.

+    for (size_t i = 0; i < n; ++i) {                            \
+        d[HN(2 * i + 0)] = SAT(s0[HW(i)]);                      \
+        d[HN(2 * i + 1)] = SAT(s1[HW(i)]);                      \


Hmm, here we do do HN(whole expr)...


None of these inputs is known to be a multiple of 8.


r~

Re: [PATCH v3 49/97] target/arm: Implement SME2 SQCVT, UQCVT, SQCVTU

Reply via email to