On 4/9/24 06:43, Paolo Bonzini wrote:
+static void gen_ARPL(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode) +{ + TCGLabel *label1 = gen_new_label(); + TCGv rpl_adj = tcg_temp_new(); + TCGv flags = tcg_temp_new(); + + gen_mov_eflags(s, flags); + tcg_gen_andi_tl(flags, flags, ~CC_Z); + + /* Compute dest[rpl] - src[rpl], adjust if result <0. */ + tcg_gen_andi_tl(rpl_adj, s->T0, 3); + tcg_gen_andi_tl(s->T1, s->T1, 3); + tcg_gen_sub_tl(rpl_adj, rpl_adj, s->T1); + + tcg_gen_brcondi_tl(TCG_COND_LT, rpl_adj, 0, label1);
Comment is right, but branch condition is wrong. I think this might be better as: /* SRC = DST with SRC[RPL] */ tcg_gen_deposit_tl(s->T1, s->T0, s->T1, 0, 2); /* Z flag set if DST < SRC */ tcg_gen_setcond_tl(TCG_COND_LTU, tmp, s->T0, s->T1); /* Install Z */ tcg_gen_deposit_tl(flags, flags, tmp, ctz(CC_Z), 1); /* DST with maximum RPL */ tcg_gen_umax_tl(s->T0, s->T0, s->T1);
+ case MO_32: +#ifdef TARGET_X86_64 + /* + * This could also use the same algorithm as MO_16. It produces fewer + * TCG ops and better code if flags are needed, but it requires a 64-bit + * multiply even if they are not (and thus the high part of the multiply + * is dead). + */
Is 64-bit multiply ever slower these days? My intuition says "slow" multiply is at least a decade out of date.
+ tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0); + tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
Avoid s->tmp*, especially in new code.
+ tcg_gen_muls2_i32(s->tmp2_i32, s->tmp3_i32, + s->tmp2_i32, s->tmp3_i32); + tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32); + + cc_src_rhs = tcg_temp_new(); + tcg_gen_extu_i32_tl(cc_src_rhs, s->tmp3_i32); + /* Compare the high part to the sign bit of the truncated result */ + tcg_gen_negsetcondi_i32(TCG_COND_LT, s->tmp2_i32, s->tmp2_i32, 0);
This seems like something the optimizer should handle, but doesn't. I'd write this as tcg_gen_sari_i32(tmp, tmp, 31); or tcg_gen_sextract_i32(tmp, tmp, 31, 1); which I know will expand to the same thing.
+ case MO_64: +#endif + cc_src_rhs = tcg_temp_new(); + tcg_gen_muls2_tl(s->T0, cc_src_rhs, s->T0, s->T1); + /* Compare the high part to the sign bit of the truncated result */ + tcg_gen_negsetcondi_tl(TCG_COND_LT, s->T1, s->T0, 0);
Similarly. r~