On 9/11/19 2:25 AM, liuzhiwei wrote: > +/* vredsum.vs vd, vs2, vs1, vm # vd[0] = sum(vs1[0] , vs2[*]) */ > +void VECTOR_HELPER(vredsum_vs)(CPURISCVState *env, uint32_t vm, uint32_t rs1, > + uint32_t rs2, uint32_t rd) > +{ > > + int width, lmul, vl, vlmax; > + int i, j, src2; > + uint64_t sum = 0; > + > + lmul = vector_get_lmul(env); > + vector_lmul_check_reg(env, lmul, rs2, false); > + > + if (vector_vtype_ill(env) || vector_overlap_vm_common(lmul, vm, rd)) { > + riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC()); > + return; > + } > + if (env->vfp.vstart != 0) { > + riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC()); > + return; > + } > + > + vl = env->vfp.vl; > + if (vl == 0) { > + return; > + } > + > + width = vector_get_width(env); > + vlmax = vector_get_vlmax(env); > + > + for (i = 0; i < VLEN / 64; i++) { > + env->vfp.vreg[rd].u64[i] = 0; > + } > +
There is no requirement that I see for vd != vs1 && vd != vs2. Thus clearing vd before the operation may clobber the inputs. > + if (i < vl) { > + switch (width) { > + case 8: > + if (vector_elem_mask(env, vm, width, lmul, i)) { > + sum += env->vfp.vreg[src2].u8[j]; > + } > + if (i == 0) { > + sum += env->vfp.vreg[rs1].u8[0]; > + } Hoist the rs1 case outside the loop. r~