Re: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves

2015-07-18 Thread Richard Henderson

On 07/17/2015 11:33 AM, Aurelien Jarno wrote:

For now I do wonder if we shouldn't get the size changing extu/exts
mandatory instead of reusing the 64-bit only version. This doesn't
change the generated code, at least on x86.


I'd be surprised if it did anywhere.  I don't mind starting with them being 
required, and then figuring out a way to optimize.



r~



Re: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves

2015-07-18 Thread Aurelien Jarno
On 2015-07-18 08:24, Richard Henderson wrote:
 On 07/17/2015 11:33 AM, Aurelien Jarno wrote:
 For now I do wonder if we shouldn't get the size changing extu/exts
 mandatory instead of reusing the 64-bit only version. This doesn't
 change the generated code, at least on x86.
 
 I'd be surprised if it did anywhere.  I don't mind starting with them being
 required, and then figuring out a way to optimize.

I have a patch series ready for that if you want I can post it as RFC.

That said looking more deeply into the problem you found I guess we can
solve that easily by using the same convention than the real CPU for
storing 32-bit constants in the TCG optimizer.

This roughly means the following code for the 32-bit ops:

 /* 32-bit ops generate 32-bit results.  */
 if (!(def-flags  TCG_OPF_64BIT)) {
 if (!TCG_TARGET_HAS_ext_i32_i64) {
 /* registers are maintained sign-extended */
 mask = (int32_t)mask;
 affected = (int32_t)mask;
 } else if (!TCG_TARGET_HAS_extu_i32_i64) { 
 /* registers are maintained zero-extended */
 mask = (uint32_t)mask;
 affected = (uint32_t)mask;
 } else {
 /* high bits will be computed by ext/extu_i32_i64 */
 mask = (uint32_t)mask;
 affected = (uint32_t)mask;
 }
 }

And that would be fine for my patch series in preparation, as long as I
can predict the high part instead of considering it as garbage.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves

2015-07-17 Thread Richard Henderson

On 07/15/2015 12:03 PM, Aurelien Jarno wrote:

Now that we have real size changing ops, we don't need to marked high
bits of the destination as garbage. The goal of the optimizer is to
predict the value of the temps (and not of the registers) and do
simplifications when possible. The problem there is therefore not the
fact that those bits are not counted as garbage, but that a size
changing op is replaced by a move.

This patch is basically a revert of 24666baf, including the changes that
have been made since then.

Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Richard Henderson r...@twiddle.net
Signed-off-by: Aurelien Jarno aurel...@aurel32.net


What we're missing here is whether the omitted size changing op is extu or 
exts.  Mask should be extended to match.  Which means keeping most of this code.



r~



Re: [Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves

2015-07-17 Thread Aurelien Jarno
On 2015-07-17 07:38, Richard Henderson wrote:
 On 07/15/2015 12:03 PM, Aurelien Jarno wrote:
 Now that we have real size changing ops, we don't need to marked high
 bits of the destination as garbage. The goal of the optimizer is to
 predict the value of the temps (and not of the registers) and do
 simplifications when possible. The problem there is therefore not the
 fact that those bits are not counted as garbage, but that a size
 changing op is replaced by a move.
 
 This patch is basically a revert of 24666baf, including the changes that
 have been made since then.
 
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Richard Henderson r...@twiddle.net
 Signed-off-by: Aurelien Jarno aurel...@aurel32.net
 
 What we're missing here is whether the omitted size changing op is extu or
 exts.  Mask should be extended to match.  Which means keeping most of this
 code.

I am afraid your correct. Unfortunately one of my goal is to remove this
part in the optimizer, as I need that in a patch series I am preparing.
I have also tried to check the temp type directly from the optimizer (it
is accessible), but it has some performance impact. Propagating the
extu/exts as real opcode means propagating the information about size
changing up to the optimizer or the register allocator, without having
to recreate it from other available information.

For now I do wonder if we shouldn't get the size changing extu/exts
mandatory instead of reusing the 64-bit only version. This doesn't
change the generated code, at least on x86.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



[Qemu-devel] [PATCH RFC 8/9] tcg/optimize: do not simplify size changing moves

2015-07-15 Thread Aurelien Jarno
Now that we have real size changing ops, we don't need to marked high
bits of the destination as garbage. The goal of the optimizer is to
predict the value of the temps (and not of the registers) and do
simplifications when possible. The problem there is therefore not the
fact that those bits are not counted as garbage, but that a size
changing op is replaced by a move.

This patch is basically a revert of 24666baf, including the changes that
have been made since then.

Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Richard Henderson r...@twiddle.net
Signed-off-by: Aurelien Jarno aurel...@aurel32.net
---
 tcg/optimize.c | 28 ++--
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 18b7bc3..d1a0b6d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -197,19 +197,13 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, 
TCGArg *args,
  TCGArg dst, TCGArg val)
 {
 TCGOpcode new_op = op_to_movi(op-opc);
-tcg_target_ulong mask;
 
 op-opc = new_op;
 
 reset_temp(dst);
 temps[dst].state = TCG_TEMP_CONST;
 temps[dst].val = val;
-mask = val;
-if (TCG_TARGET_REG_BITS  32  new_op == INDEX_op_mov_i32) {
-/* High bits of the destination are now garbage.  */
-mask |= ~0xull;
-}
-temps[dst].mask = mask;
+temps[dst].mask = val;
 
 args[0] = dst;
 args[1] = val;
@@ -229,17 +223,11 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg *args,
 }
 
 TCGOpcode new_op = op_to_mov(op-opc);
-tcg_target_ulong mask;
 
 op-opc = new_op;
 
 reset_temp(dst);
-mask = temps[src].mask;
-if (TCG_TARGET_REG_BITS  32  new_op == INDEX_op_mov_i32) {
-/* High bits of the destination are now garbage.  */
-mask |= ~0xull;
-}
-temps[dst].mask = mask;
+temps[src].mask = temps[dst].mask;
 
 assert(temps[src].state != TCG_TEMP_CONST);
 
@@ -590,7 +578,7 @@ void tcg_optimize(TCGContext *s)
 reset_all_temps(nb_temps);
 
 for (oi = s-gen_first_op_idx; oi = 0; oi = oi_next) {
-tcg_target_ulong mask, partmask, affected;
+tcg_target_ulong mask, affected;
 int nb_oargs, nb_iargs, i;
 TCGArg tmp;
 
@@ -945,17 +933,13 @@ void tcg_optimize(TCGContext *s)
 break;
 }
 
-/* 32-bit ops generate 32-bit results.  For the result is zero test
-   below, we can ignore high bits, but for further optimizations we
-   need to record that the high bits contain garbage.  */
-partmask = mask;
+/* 32-bit ops generate 32-bit results.  */
 if (!(def-flags  TCG_OPF_64BIT)) {
-mask |= ~(tcg_target_ulong)0xu;
-partmask = 0xu;
+mask = 0xu;
 affected = 0xu;
 }
 
-if (partmask == 0) {
+if (mask == 0) {
 assert(nb_oargs == 1);
 tcg_opt_gen_movi(s, op, args, args[0], 0);
 continue;
-- 
2.1.4