On 4/9/24 06:43, Paolo Bonzini wrote:
+static void gen_ARPL(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
+{
+    TCGLabel *label1 = gen_new_label();
+    TCGv rpl_adj = tcg_temp_new();
+    TCGv flags = tcg_temp_new();
+
+    gen_mov_eflags(s, flags);
+    tcg_gen_andi_tl(flags, flags, ~CC_Z);
+
+    /* Compute dest[rpl] - src[rpl], adjust if result <0.  */
+    tcg_gen_andi_tl(rpl_adj, s->T0, 3);
+    tcg_gen_andi_tl(s->T1, s->T1, 3);
+    tcg_gen_sub_tl(rpl_adj, rpl_adj, s->T1);
+
+    tcg_gen_brcondi_tl(TCG_COND_LT, rpl_adj, 0, label1);

Comment is right, but branch condition is wrong.

I think this might be better as:

    /* SRC = DST with SRC[RPL] */
    tcg_gen_deposit_tl(s->T1, s->T0, s->T1, 0, 2);
    /* Z flag set if DST < SRC */
    tcg_gen_setcond_tl(TCG_COND_LTU, tmp, s->T0, s->T1);
    /* Install Z */
    tcg_gen_deposit_tl(flags, flags, tmp, ctz(CC_Z), 1);
    /* DST with maximum RPL */
    tcg_gen_umax_tl(s->T0, s->T0, s->T1);


+    case MO_32:
+#ifdef TARGET_X86_64
+        /*
+         * This could also use the same algorithm as MO_16.  It produces fewer
+         * TCG ops and better code if flags are needed, but it requires a 
64-bit
+         * multiply even if they are not (and thus the high part of the 
multiply
+         * is dead).
+         */

Is 64-bit multiply ever slower these days?
My intuition says "slow" multiply is at least a decade out of date.

+        tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
+        tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);

Avoid s->tmp*, especially in new code.

+        tcg_gen_muls2_i32(s->tmp2_i32, s->tmp3_i32,
+                          s->tmp2_i32, s->tmp3_i32);
+        tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32);
+
+        cc_src_rhs = tcg_temp_new();
+        tcg_gen_extu_i32_tl(cc_src_rhs, s->tmp3_i32);
+        /* Compare the high part to the sign bit of the truncated result */
+        tcg_gen_negsetcondi_i32(TCG_COND_LT, s->tmp2_i32, s->tmp2_i32, 0);

This seems like something the optimizer should handle, but doesn't.
I'd write this as

    tcg_gen_sari_i32(tmp, tmp, 31);
or
    tcg_gen_sextract_i32(tmp, tmp, 31, 1);

which I know will expand to the same thing.

+    case MO_64:
+#endif
+        cc_src_rhs = tcg_temp_new();
+        tcg_gen_muls2_tl(s->T0, cc_src_rhs, s->T0, s->T1);
+        /* Compare the high part to the sign bit of the truncated result */
+        tcg_gen_negsetcondi_tl(TCG_COND_LT, s->T1, s->T0, 0);

Similarly.


r~

Reply via email to