On Wed, 2 Jul 2025 at 13:34, Richard Henderson
<richard.hender...@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.hender...@linaro.org>



> +/* Similar for 2-way dot product */
> +#define DO_DOT(NAME, TYPED, TYPEN, TYPEM) \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
> +{                                                                         \
> +    intptr_t i, opr_sz = simd_oprsz(desc);                                \
> +    TYPED *d = vd, *a = va;                                               \
> +    TYPEN *n = vn;                                                        \
> +    TYPEM *m = vm;                                                        \
> +    for (i = 0; i < opr_sz / sizeof(TYPED); ++i) {                        \
> +        d[i] = (a[i] +                                                    \
> +                (TYPED)n[i * 2 + 0] * m[i * 2 + 0] +                      \
> +                (TYPED)n[i * 2 + 1] * m[i * 2 + 1]);                      \

Don't we need some H macros here for the big-endian host case?
(For that matter, the existing 4-way dot product helpers also
look like they won't work on big-endian...)

> +    }                                                                     \
> +    clear_tail(d, opr_sz, simd_maxsz(desc));                              \
> +}
> +
> +#define DO_DOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD) \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
> +{                                                                         \
> +    intptr_t i = 0, opr_sz = simd_oprsz(desc);                            \
> +    intptr_t opr_sz_n = opr_sz / sizeof(TYPED);                           \
> +    intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);                  \
> +    intptr_t index = simd_data(desc);                                     \
> +    TYPED *d = vd, *a = va;                                               \
> +    TYPEN *n = vn;                                                        \
> +    TYPEM *m_indexed = (TYPEM *)vm + HD(index) * 2;                       \
> +    do {                                                                  \
> +        TYPED m0 = m_indexed[i * 2 + 0];                                  \
> +        TYPED m1 = m_indexed[i * 2 + 1];                                  \
> +        do {                                                              \
> +            d[i] = (a[i] +                                                \
> +                    n[i * 2 + 0] * m0 +                                   \
> +                    n[i * 2 + 1] * m1);                                   \

Similarly here.

> +        } while (++i < segend);                                           \
> +        segend = i + (16 / sizeof(TYPED));                                \
> +    } while (i < opr_sz_n);                                               \
> +    clear_tail(d, opr_sz, simd_maxsz(desc));                              \
> +}

Otherwise
Reviewed-by: Peter Maydell <peter.mayd...@linaro.org>

thanks
-- PMM

Reply via email to