On 7/3/25 04:20, Peter Maydell wrote:
On Wed, 2 Jul 2025 at 13:38, Richard Henderson
<richard.hender...@linaro.org> wrote:
Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
---
target/arm/tcg/helper-sme.h | 20 ++++++
target/arm/tcg/sme_helper.c | 116 +++++++++++++++++++++++++++++++++
target/arm/tcg/translate-sme.c | 35 ++++++++++
target/arm/tcg/sme.decode | 22 +++++++
4 files changed, 193 insertions(+)
index d69d57c4cb..906d369d37 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1561,6 +1561,64 @@ void HELPER(sme2_fcvt_n)(void *vd, void *vs,
float_status *fpst, uint32_t desc)
}
}
+#define SQCVT2(NAME, TW, TN, HW, HN, SAT) \
+void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \
+{ \
+ ARMVectorReg scratch; \
+ size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \
+ TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \
+ TN *d = vd; \
+ if ((vd - vs) < 2 * sizeof(ARMVectorReg)) { \
Does this do the right thing if Vd is less than Vs?
Pointer differences are signed, I think, so for eg vd == 0
vs == 16 we unnecessarily use the scratch reg.
Maybe clearer to write
(vd >= vs && vd < (vs + 2 * sizeof(..))
(Similarly for other use of this condition later in the patch.)
I should probably split out a helper for this, there are so many instances.
+ d = (TN *)&scratch; \
+ } \
+ for (size_t i = 0; i < n; ++i) { \
+ d[HN(i)] = SAT(s0[HW(i)]); \
+ d[HN(i) + n] = SAT(s1[HW(i)]); \
Should this be HN(i + n) ?
They're equivalent, because n is the whole vector size, and so does not overlap the xor on
8-byte endianness.
+ for (size_t i = 0; i < n; ++i) { \
+ d[HN(2 * i + 0)] = SAT(s0[HW(i)]); \
+ d[HN(2 * i + 1)] = SAT(s1[HW(i)]); \
Hmm, here we do do HN(whole expr)...
None of these inputs is known to be a multiple of 8.
r~