On 06/05/2018 08:02 AM, Peter Maydell wrote: >> + if (count & 63) { >> + d->p[i] = ~(-1ull << (count & 63)) & esz_mask; > > Is this d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; ?
Fixed. >> + tcg_gen_setcond_i64(cond, cmp, rn, rm); >> + tcg_gen_extrl_i64_i32(cpu_NF, cmp); >> + tcg_temp_free_i64(cmp); >> + >> + /* VF = !NF & !CF. */ >> + tcg_gen_xori_i32(cpu_VF, cpu_NF, 1); >> + tcg_gen_andc_i32(cpu_VF, cpu_VF, cpu_CF); >> + >> + /* Both NF and VF actually look at bit 31. */ >> + tcg_gen_neg_i32(cpu_NF, cpu_NF); >> + tcg_gen_neg_i32(cpu_VF, cpu_VF); > > Microoptimization, but I think you can save an instruction here > using > /* VF = !NF & !CF == !(NF || CF); we know NF and CF are > * both 0 or 1, so the result of the logical NOT has > * VF bit 31 set or clear as required. > */ > tcg_gen_or_i32(cpu_VF, cpu_NF, cpu_CF); > tcg_gen_not_i32(cpu_VF, cpu_VF); No, ~({0,1} | {0,1}) -> {-1,-2}. >> + /* For the helper, compress the different conditions into a computation >> + * of how many iterations for which the condition is true. >> + * >> + * This is slightly complicated by 0 <= UINT64_MAX, which is nominally >> + * 2**64 iterations, overflowing to 0. Of course, predicate registers >> + * aren't that large, so any value >= predicate size is sufficient. >> + */ > > The comment says that 0 <= UINT64_MAX is a special case, > but I don't understand how the code accounts for it ? > >> + tcg_gen_sub_i64(t0, op1, op0); >> + >> + /* t0 = MIN(op1 - op0, vsz). */ >> + if (a->eq) { >> + /* Equality means one more iteration. */ >> + tcg_gen_movi_i64(t1, vsz - 1); >> + tcg_gen_movcond_i64(TCG_COND_LTU, t0, t0, t1, t0, t1); By bounding the input, here, to the vector size. This reduces the (2**64-1)+1 case, which we can't represent, to a vsz+1 case, which we can. This produces the same result for this instruction. This does point out that I should be using the new tcg_gen_umin_i64 helper instead of open-coding with movcond. r~