On 9/11/19 2:25 AM, liuzhiwei wrote:
> +/* vredsum.vs vd, vs2, vs1, vm # vd[0] = sum(vs1[0] , vs2[*]) */
> +void VECTOR_HELPER(vredsum_vs)(CPURISCVState *env, uint32_t vm, uint32_t rs1,
> +    uint32_t rs2, uint32_t rd)
> +{
>  
> +    int width, lmul, vl, vlmax;
> +    int i, j, src2;
> +    uint64_t sum = 0;
> +
> +    lmul = vector_get_lmul(env);
> +    vector_lmul_check_reg(env, lmul, rs2, false);
> +
> +    if (vector_vtype_ill(env) || vector_overlap_vm_common(lmul, vm, rd)) {
> +        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
> +        return;
> +    }
> +    if (env->vfp.vstart != 0) {
> +        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
> +        return;
> +    }
> +
> +    vl = env->vfp.vl;
> +    if (vl == 0) {
> +        return;
> +    }
> +
> +    width = vector_get_width(env);
> +    vlmax = vector_get_vlmax(env);
> +
> +    for (i = 0; i < VLEN / 64; i++) {
> +        env->vfp.vreg[rd].u64[i] = 0;
> +    }
> +

There is no requirement that I see for vd != vs1 && vd != vs2.  Thus clearing
vd before the operation may clobber the inputs.

> +        if (i < vl) {
> +            switch (width) {
> +            case 8:
> +                if (vector_elem_mask(env, vm, width, lmul, i)) {
> +                    sum += env->vfp.vreg[src2].u8[j];
> +                }
> +                if (i == 0) {
> +                    sum += env->vfp.vreg[rs1].u8[0];
> +                }

Hoist the rs1 case outside the loop.


r~

Reply via email to