Re: [Qemu-devel] [PATCH 2/2] target-mips: reimplement SC instruction and use cmpxchg

2016-09-27 Thread Leon Alrae
On Wed, Sep 21, 2016 at 01:16:28PM -0700, Richard Henderson wrote:
> On 09/21/2016 01:07 AM, Leon Alrae wrote:
> >+tcg_gen_brcond_tl(TCG_COND_EQ, addr, cpu_lladdr, l1);
> >+tcg_temp_free(addr);
> >+tcg_gen_movi_tl(t0, 0);
> >+tcg_gen_br(done);
> >+
> >+gen_set_label(l1);
> >+/* generate cmpxchg */
> >+val = tcg_temp_new();
> >+gen_load_gpr(val, rt);
> >+tcg_gen_atomic_cmpxchg_tl(t0, cpu_lladdr, cpu_llval, val,
> >+  ctx->mem_idx, tcg_mo);
> >+tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_llval);
> >+tcg_temp_free(val);
> >+
> >+gen_set_label(done);
> >+/* store the result into the register */
> >+gen_store_gpr(t0, rt);
> > tcg_temp_free(t0);
> 
> The only thing I would change is to duplicate the gen_store_gpr into
> both branches, so that we don't have to store t0 into the stack
> across the blocks.

Done in v3.

> 
> Otherwise,
> 
> Reviewed-by: Richard Henderson 

Thanks for reviewing.

Leon



Re: [Qemu-devel] [PATCH 2/2] target-mips: reimplement SC instruction and use cmpxchg

2016-09-21 Thread Richard Henderson

On 09/21/2016 01:07 AM, Leon Alrae wrote:

+tcg_gen_brcond_tl(TCG_COND_EQ, addr, cpu_lladdr, l1);
+tcg_temp_free(addr);
+tcg_gen_movi_tl(t0, 0);
+tcg_gen_br(done);
+
+gen_set_label(l1);
+/* generate cmpxchg */
+val = tcg_temp_new();
+gen_load_gpr(val, rt);
+tcg_gen_atomic_cmpxchg_tl(t0, cpu_lladdr, cpu_llval, val,
+  ctx->mem_idx, tcg_mo);
+tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_llval);
+tcg_temp_free(val);
+
+gen_set_label(done);
+/* store the result into the register */
+gen_store_gpr(t0, rt);
 tcg_temp_free(t0);


The only thing I would change is to duplicate the gen_store_gpr into both 
branches, so that we don't have to store t0 into the stack across the blocks.


Otherwise,

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 2/2] target-mips: reimplement SC instruction and use cmpxchg

2016-09-19 Thread Leon Alrae
On Fri, Sep 16, 2016 at 09:48:51AM -0700, Richard Henderson wrote:
> On 09/15/2016 01:44 AM, Leon Alrae wrote:
> > /* Store conditional */
> >+static void gen_st_cond(DisasContext *ctx, int rt, int base, int offset,
> >+int size)
> > {
> >+TCGv addr, t0, val;
> >+TCGLabel *l1 = gen_new_label();
> >+TCGLabel *l2 = gen_new_label();
> >+TCGLabel *done = gen_new_label();
> >
> >-#ifdef CONFIG_USER_ONLY
> > t0 = tcg_temp_local_new();
> >+addr = tcg_temp_local_new();
> >+/* check the alignment of the address */
> >+gen_base_offset_addr(ctx, addr, base, offset);
> >+tcg_gen_andi_tl(t0, addr, size - 1);
> 
> You shouldn't have to test the alignment here, as the alignment
> should have been tested during the load-locked, and the (aligned)
> address will be compared.

This is to satisfy the requirement that unaligned SC generates Address
Error exception. But I agree that in practice this doesn't seem
particularly useful since LL will do that.

> 
> 
> >+/* compare the address against that of the preceeding LL */
> >+tcg_gen_brcond_tl(TCG_COND_EQ, addr, cpu_lladdr, l2);
> >+tcg_gen_movi_tl(t0, 0);
> >+tcg_gen_br(done);
> ...
> >+#ifdef TARGET_MIPS64
> >+case 8: /* SCD */
> >+tcg_gen_atomic_cmpxchg_i64(t0, addr, cpu_llval, val,
> >+   ctx->mem_idx, MO_TEQ);
> > break;
> > #endif
> >-case OPC_SC:
> >-case R6_OPC_SC:
> >-op_st_sc(t1, t0, rt, ctx);
> >+case 4: /* SC */
> >+{
> >+TCGv_i32 val32 = tcg_temp_new_i32();
> >+TCGv_i32 llval32 = tcg_temp_new_i32();
> >+TCGv_i32 old32 = tcg_temp_new_i32();
> >+tcg_gen_trunc_tl_i32(val32, val);
> >+tcg_gen_trunc_tl_i32(llval32, cpu_llval);
> >+
> >+tcg_gen_atomic_cmpxchg_i32(old32, addr, llval32, val32,
> >+   ctx->mem_idx, MO_TESL);
> >+tcg_gen_ext_i32_tl(t0, old32);
> 
> You can use tcg_gen_atomic_cmpxchg_tl so that you do not need to do
> all of this truncation yourself.  Which means that if you replace
> the size parameter with a TCGMemOp parameter (MO_TEQ vs MO_TESL) you
> can make all this code common.

Ah, yes.

> 
> Further, local temporaries are less than ideal and should be avoided
> if possible.  Using them results in an extra store into the local
> stack frame.
> 
> We can avoid this for addr by noting that once you have compared
> addr to cpu_lladdr, you can free addr and use cpu_lladdr in the
> actual cmpxchg.

Ok. I'll correct in v2.

Thanks,
Leon



Re: [Qemu-devel] [PATCH 2/2] target-mips: reimplement SC instruction and use cmpxchg

2016-09-16 Thread Richard Henderson

On 09/15/2016 01:44 AM, Leon Alrae wrote:

 /* Store conditional */
+static void gen_st_cond(DisasContext *ctx, int rt, int base, int offset,
+int size)
 {
+TCGv addr, t0, val;
+TCGLabel *l1 = gen_new_label();
+TCGLabel *l2 = gen_new_label();
+TCGLabel *done = gen_new_label();

-#ifdef CONFIG_USER_ONLY
 t0 = tcg_temp_local_new();
+addr = tcg_temp_local_new();
+/* check the alignment of the address */
+gen_base_offset_addr(ctx, addr, base, offset);
+tcg_gen_andi_tl(t0, addr, size - 1);


You shouldn't have to test the alignment here, as the alignment should have 
been tested during the load-locked, and the (aligned) address will be compared.




+/* compare the address against that of the preceeding LL */
+tcg_gen_brcond_tl(TCG_COND_EQ, addr, cpu_lladdr, l2);
+tcg_gen_movi_tl(t0, 0);
+tcg_gen_br(done);

...

+#ifdef TARGET_MIPS64
+case 8: /* SCD */
+tcg_gen_atomic_cmpxchg_i64(t0, addr, cpu_llval, val,
+   ctx->mem_idx, MO_TEQ);
 break;
 #endif
-case OPC_SC:
-case R6_OPC_SC:
-op_st_sc(t1, t0, rt, ctx);
+case 4: /* SC */
+{
+TCGv_i32 val32 = tcg_temp_new_i32();
+TCGv_i32 llval32 = tcg_temp_new_i32();
+TCGv_i32 old32 = tcg_temp_new_i32();
+tcg_gen_trunc_tl_i32(val32, val);
+tcg_gen_trunc_tl_i32(llval32, cpu_llval);
+
+tcg_gen_atomic_cmpxchg_i32(old32, addr, llval32, val32,
+   ctx->mem_idx, MO_TESL);
+tcg_gen_ext_i32_tl(t0, old32);


You can use tcg_gen_atomic_cmpxchg_tl so that you do not need to do all of this 
truncation yourself.  Which means that if you replace the size parameter with a 
TCGMemOp parameter (MO_TEQ vs MO_TESL) you can make all this code common.


Further, local temporaries are less than ideal and should be avoided if 
possible.  Using them results in an extra store into the local stack frame.


We can avoid this for addr by noting that once you have compared addr to 
cpu_lladdr, you can free addr and use cpu_lladdr in the actual cmpxchg.



r~