On Wed, 2 Jul 2025 at 13:34, Richard Henderson <richard.hender...@linaro.org> wrote: > > Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
> +/* Similar for 2-way dot product */ > +#define DO_DOT(NAME, TYPED, TYPEN, TYPEM) \ > +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ > +{ \ > + intptr_t i, opr_sz = simd_oprsz(desc); \ > + TYPED *d = vd, *a = va; \ > + TYPEN *n = vn; \ > + TYPEM *m = vm; \ > + for (i = 0; i < opr_sz / sizeof(TYPED); ++i) { \ > + d[i] = (a[i] + \ > + (TYPED)n[i * 2 + 0] * m[i * 2 + 0] + \ > + (TYPED)n[i * 2 + 1] * m[i * 2 + 1]); \ Don't we need some H macros here for the big-endian host case? (For that matter, the existing 4-way dot product helpers also look like they won't work on big-endian...) > + } \ > + clear_tail(d, opr_sz, simd_maxsz(desc)); \ > +} > + > +#define DO_DOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD) \ > +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ > +{ \ > + intptr_t i = 0, opr_sz = simd_oprsz(desc); \ > + intptr_t opr_sz_n = opr_sz / sizeof(TYPED); \ > + intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n); \ > + intptr_t index = simd_data(desc); \ > + TYPED *d = vd, *a = va; \ > + TYPEN *n = vn; \ > + TYPEM *m_indexed = (TYPEM *)vm + HD(index) * 2; \ > + do { \ > + TYPED m0 = m_indexed[i * 2 + 0]; \ > + TYPED m1 = m_indexed[i * 2 + 1]; \ > + do { \ > + d[i] = (a[i] + \ > + n[i * 2 + 0] * m0 + \ > + n[i * 2 + 1] * m1); \ Similarly here. > + } while (++i < segend); \ > + segend = i + (16 / sizeof(TYPED)); \ > + } while (i < opr_sz_n); \ > + clear_tail(d, opr_sz, simd_maxsz(desc)); \ > +} Otherwise Reviewed-by: Peter Maydell <peter.mayd...@linaro.org> thanks -- PMM