On 7/3/25 03:45, Peter Maydell wrote:
On Wed, 2 Jul 2025 at 13:34, Richard Henderson
<richard.hender...@linaro.org> wrote:

Signed-off-by: Richard Henderson <richard.hender...@linaro.org>



+/* Similar for 2-way dot product */
+#define DO_DOT(NAME, TYPED, TYPEN, TYPEM) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
+{                                                                         \
+    intptr_t i, opr_sz = simd_oprsz(desc);                                \
+    TYPED *d = vd, *a = va;                                               \
+    TYPEN *n = vn;                                                        \
+    TYPEM *m = vm;                                                        \
+    for (i = 0; i < opr_sz / sizeof(TYPED); ++i) {                        \
+        d[i] = (a[i] +                                                    \
+                (TYPED)n[i * 2 + 0] * m[i * 2 + 0] +                      \
+                (TYPED)n[i * 2 + 1] * m[i * 2 + 1]);                      \

Don't we need some H macros here for the big-endian host case?
(For that matter, the existing 4-way dot product helpers also
look like they won't work on big-endian...)

The logic here is that all columns are treated identically.

...a0... ...a1...
.n0..n1. .n2..n3.
.m0..m1. .m2..m3.

vs

...a1... ...a0...
.n3..n2. .n1..n0.
.m3..m2. .m1..m0.

d0 = a0 + n0 * m0 + n1 * m1 -- it doesn't matter if n0 or n1 is at the lowest or highest address, because it still gets multiplied by the corresponding element in m, and then the two products are added to the sum that is addressed the same way.

The existing 4-way dot product uses the same endian independent logic, fwiw.


r~

Reply via email to