On Mon, Jan 10, 2011 at 07:23:46PM -0800, Richard Henderson wrote: > Special case deposits that are implementable with byte and word stores. > Otherwise implement with double-word shift plus rotates. > > Expose tcg_scratch_alloc to the backend for allocation of scratch registers. > > Signed-off-by: Richard Henderson <r...@twiddle.net>
Hi, I've tested this patch a bit and got mixed results. I tested with patched CRIS and MicroBlaze translators. The patch works OK (it doesn't break anything) for the usecases I had but I saw a bit of a slowdown with MicroBlaze (compare to not using deposit at all). I suspect that the fast 8 and 16 bit x86 deposits are giving me a slight speedup with CRIS. But MicroBlaze uses one bit fields into bit 2 and 31. Those seem to be slower with deposit than with other tcg sequences. I would have guessed that at worst, this patch would be equally fast as any TCG sequence. Am I missing something? These are the patches I've applied: Microblaze translator: diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c index 2207431..39ab3a5 100644 --- a/target-microblaze/translate.c +++ b/target-microblaze/translate.c @@ -160,6 +160,7 @@ static void read_carry(DisasContext *dc, TCGv d) static void write_carry(DisasContext *dc, TCGv v) { +#if 0 TCGv t0 = tcg_temp_new(); tcg_gen_shli_tl(t0, v, 31); tcg_gen_sari_tl(t0, t0, 31); @@ -168,6 +169,10 @@ static void write_carry(DisasContext *dc, TCGv v) ~(MSR_C | MSR_CC)); tcg_gen_or_tl(cpu_SR[SR_MSR], cpu_SR[SR_MSR], t0); tcg_temp_free(t0); +#else + tcg_gen_deposit_tl(cpu_SR[SR_MSR], cpu_SR[SR_MSR], v, 2, 1); + tcg_gen_deposit_tl(cpu_SR[SR_MSR], cpu_SR[SR_MSR], v, 31, 1); +#endif } CRIS translator: commit 9f427e14b2535a067bf046fea093f28cfaa92f7f Author: Edgar E. Iglesias <edgar.igles...@gmail.com> Date: Fri Jan 21 22:09:44 2011 +0100 cris: Use deposit for ALU writeback Most ALU insns on CRIS have deposit semantics in the writeback stage. Use the new deposit tcg operation to perform the write back to registers. Move the extract of the result into cc_result to the slow path in evaluate_flags. Signed-off-by: Edgar E. Iglesias <edgar.igles...@gmail.com> diff --git a/target-cris/translate.c b/target-cris/translate.c index f4cc125..018ce68 100644 --- a/target-cris/translate.c +++ b/target-cris/translate.c @@ -861,11 +861,6 @@ static void cris_alu_op_exec(DisasContext *dc, int op, BUG(); break; } - - if (size == 1) - tcg_gen_andi_tl(dst, dst, 0xff); - else if (size == 2) - tcg_gen_andi_tl(dst, dst, 0xffff); } static void cris_alu(DisasContext *dc, int op, @@ -880,6 +875,7 @@ static void cris_alu(DisasContext *dc, int op, tmp = tcg_temp_new(); writeback = 0; } else if (size == 4) { + /* We write directly into the dest. */ tmp = d; writeback = 0; } else @@ -892,11 +888,7 @@ static void cris_alu(DisasContext *dc, int op, /* Writeback. */ if (writeback) { - if (size == 1) - tcg_gen_andi_tl(d, d, ~0xff); - else - tcg_gen_andi_tl(d, d, ~0xffff); - tcg_gen_or_tl(d, d, tmp); + tcg_gen_deposit_tl(d, d, tmp, 0, size * 8); } if (!TCGV_EQUAL(tmp, d)) tcg_temp_free(tmp); @@ -941,6 +933,10 @@ static void gen_tst_cc (DisasContext *dc, TCGv cc, int cond) * When this function is done, T0 should be non-zero if the condition * code is true. */ + if (dc->cc_size != 4) { + tcg_gen_andi_tl(cc_result, cc_result, + (1 << (dc->cc_size * 8)) - 1); + } arith_opt = arith_cc(dc) && !dc->flags_uptodate; move_opt = (dc->cc_op == CC_OP_MOVE); switch (cond) {