On 26 June 2018 at 17:17, Richard Henderson <richard.hender...@linaro.org> wrote: > On 06/26/2018 08:30 AM, Peter Maydell wrote: >> On 21 June 2018 at 02:53, Richard Henderson >> <richard.hender...@linaro.org> wrote: >>> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> >>> --- >>> target/arm/helper.h | 5 ++ >>> target/arm/translate-sve.c | 18 +++++++ >>> target/arm/vec_helper.c | 96 ++++++++++++++++++++++++++++++++++++++ >>> target/arm/sve.decode | 8 +++- >>> 4 files changed, 126 insertions(+), 1 deletion(-) >>> >> >>> +void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc) >>> +{ >>> + intptr_t i, j, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4; >>> + intptr_t index = simd_data(desc); >>> + uint32_t *d = vd; >>> + int8_t *n = vn, *m = vm; >>> + >>> + for (i = 0; i < opr_sz_4; i = j) { >>> + int8_t m0 = m[(i + index) * 4 + 0]; >>> + int8_t m1 = m[(i + index) * 4 + 1]; >>> + int8_t m2 = m[(i + index) * 4 + 2]; >>> + int8_t m3 = m[(i + index) * 4 + 3]; >>> + >>> + j = i; >>> + do { >>> + d[j] += n[j * 4 + 0] * m0 >>> + + n[j * 4 + 1] * m1 >>> + + n[j * 4 + 2] * m2 >>> + + n[j * 4 + 3] * m3; >>> + } while (++j < MIN(i + 4, opr_sz_4)); >>> + } >>> + clear_tail(d, opr_sz, simd_maxsz(desc)); >>> +} >> >> Maybe I'm just half asleep this afternoon, but this is pretty >> confusing -- nested loops where the outer loop's increment >> uses the inner loop's index, and the inner loop's conditions >> depend on the outer loop index... > > Yeah, well. > > There is an edge case of aa64 advsimd, reusing this same helper, > > sdot v0.2s, v1.8b, v0.4b[0] > > where m values must be read (and held) before writing d results, > and there are not 16/4=4 elements to process but only 2. > > I suppose I could special-case oprsz == 8 in order to simplify > iteration of what is otherwise a multiple of 16. > > I thought iterating J from I to I+4 was easier to read than > writing out I+J everywhere. Perhaps not.
Mmm. I did indeed fail to notice the symmetry between the indexes into m[] and those into n[]. The other bit that threw me is where the outer loop on i updates using j. A comment describing the intent might assist ? thanks -- PMM