arm: Implemement SME2 SDOT, UDOT, USDOT, SUDOT

Richard Henderson Thu, 03 Jul 2025 09:26:55 -0700

On 7/3/25 03:45, Peter Maydell wrote:

On Wed, 2 Jul 2025 at 13:34, Richard Henderson
<richard.hender...@linaro.org> wrote:


Signed-off-by: Richard Henderson <richard.hender...@linaro.org>

+/* Similar for 2-way dot product */
+#define DO_DOT(NAME, TYPED, TYPEN, TYPEM) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
+{                                                                         \
+    intptr_t i, opr_sz = simd_oprsz(desc);                                \
+    TYPED *d = vd, *a = va;                                               \
+    TYPEN *n = vn;                                                        \
+    TYPEM *m = vm;                                                        \
+    for (i = 0; i < opr_sz / sizeof(TYPED); ++i) {                        \
+        d[i] = (a[i] +                                                    \
+                (TYPED)n[i * 2 + 0] * m[i * 2 + 0] +                      \
+                (TYPED)n[i * 2 + 1] * m[i * 2 + 1]);                      \


Don't we need some H macros here for the big-endian host case?
(For that matter, the existing 4-way dot product helpers also
look like they won't work on big-endian...)


The logic here is that all columns are treated identically.

...a0... ...a1...
.n0..n1. .n2..n3.
.m0..m1. .m2..m3.

vs

...a1... ...a0...
.n3..n2. .n1..n0.
.m3..m2. .m1..m0.

d0 = a0 + n0 * m0 + n1 * m1 -- it doesn't matter if n0 or n1 is at the lowest or highestaddress, because it still gets multiplied by the corresponding element in m, and then thetwo products are added to the sum that is addressed the same way.


The existing 4-way dot product uses the same endian independent logic, fwiw.


r~

Re: [PATCH v3 31/97] target/arm: Implemement SME2 SDOT, UDOT, USDOT, SUDOT

Reply via email to