Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-07 Thread Uros Bizjak
On Mon, Feb 6, 2012 at 10:30 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Hmm.  Well, the only thing that's going to work for x86 is the double-compare
 elimination portion.

 If we want to use this pass for x86, then for 4.8 we should also fix the
 discrepancy between the compare-elim canonical

  [(operate)
   (set-cc)]

 and the combine canonical

  [(set-cc)
   (operate)]

 (Because of the simplicity of the substitution in compare-elim, I prefer
 the former as the canonical canonical.)

 You are probably referring to following testcase:

 --cut here--
 int test (int a, int b)
 {
  int lt = a + b  0;
  int eq = a + b == 0;
  if (lt)
    return 1;
  return eq;
 }
 --cut here--

 where combine creates:

 Trying 8 - 9:
 Successfully matched this instruction:
 (parallel [
        (set (reg:CCZ 17 flags)
            (compare:CCZ (plus:SI (reg/v:SI 63 [ a ])
                    (reg/v:SI 64 [ b ]))
                (const_int 0 [0])))
        (set (reg:SI 60 [ D.1710 ])
            (plus:SI (reg/v:SI 63 [ a ])
                (reg/v:SI 64 [ b ])))
    ])

Attached patch teaches combine to swap operands of a double set
pattern and retries recognition. Also added are minimum
target-dependant changes to handle the testcase above.

Unfortunately, compare elimination was not able to remove redundant
compare, although the testcase is carefully crafted to require only
sign flag to be valid. Following enters compare-elim pass:

(insn 9 8 10 2 (parallel [
(set (reg:SI 5 di [orig:60 D.1710 ] [60])
(plus:SI (reg/v:SI 5 di [orig:63 a ] [63])
(reg/v:SI 4 si [orig:64 b ] [64])))
(set (reg:CCZ 17 flags)
(compare:CCZ (plus:SI (reg/v:SI 5 di [orig:63 a ] [63])
(reg/v:SI 4 si [orig:64 b ] [64]))
(const_int 0 [0])))
]) cmp.c:4 261 {*addsi_2}
 (nil))

(note 10 9 33 2 NOTE_INSN_DELETED)

(insn 33 10 34 2 (set (reg:QI 1 dx [65])
(eq:QI (reg:CCZ 17 flags)
(const_int 0 [0]))) cmp.c:4 595 {*setcc_qi}
 (nil))

(insn 34 33 30 2 (set (reg:SI 1 dx [65])
(zero_extend:SI (reg:QI 1 dx [65]))) cmp.c:4 123
{*zero_extendqisi2_movzbl}
 (nil))

(insn 30 34 29 2 (set (reg/v:SI 0 ax [orig:59 eq ] [59])
(const_int 1 [0x1])) cmp.c:6 64 {*movsi_internal}
 (expr_list:REG_EQUAL (const_int 1 [0x1])
(nil)))

(insn 29 30 31 2 (set (reg:CCGOC 17 flags)
(compare:CCGOC (reg:SI 5 di [orig:60 D.1710 ] [60])
(const_int 0 [0]))) cmp.c:6 2 {*cmpsi_ccno_1}
 (nil))

(insn 31 29 25 2 (set (reg/v:SI 0 ax [orig:59 eq ] [59])
(if_then_else:SI (ge (reg:CCGOC 17 flags)
(const_int 0 [0]))
(reg:SI 1 dx [65])
(reg/v:SI 0 ax [orig:59 eq ] [59]))) cmp.c:6 903 {*movsicc_noc}
 (nil))

The resulting code still includes redundant test that sets sign flag:

test:
addl%esi, %edi
movl$1, %eax
sete%dl
testl   %edi, %edi
movzbl  %dl, %edx
cmovns  %edx, %eax
ret

(BTW: I think that the change to combine.c would be nice to have, to
find more other combine opportunities. I will propose the patch
separately.)

Uros.
Index: combine.c
===
--- combine.c   (revision 183953)
+++ combine.c   (working copy)
@@ -10687,6 +10687,30 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn
   print_rtl_single (dump_file, pat);
 }
 
+  /* If PAT is a PARALLEL with two SETs, swap the SETs and try again.  */
+  if (insn_code_number  0
+   GET_CODE (pat) == PARALLEL
+   XVECLEN (pat, 0) == 2
+   GET_CODE (XVECEXP (pat, 0, 0)) == SET
+   GET_CODE (XVECEXP (pat, 0, 1)) == SET)
+{
+  rtx set0 = XVECEXP (pat, 0, 0);
+  rtx set1 = XVECEXP (pat, 0, 1);
+
+  SUBST (XVECEXP (pat, 0, 0), set1);
+  SUBST (XVECEXP (pat, 0, 1), set0);
+
+  insn_code_number = recog (pat, insn, num_clobbers_to_add);
+  if (dump_file  (dump_flags  TDF_DETAILS))
+   {
+ if (insn_code_number  0)
+   fputs (Failed to match this instruction:\n, dump_file);
+ else
+   fputs (Successfully matched this instruction:\n, dump_file);
+ print_rtl_single (dump_file, pat);
+   }
+}
+
   /* If it isn't, there is the possibility that we previously had an insn
  that clobbered some register as a side effect, but the combined
  insn doesn't need to do that.  So try once more without the clobbers
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 183953)
+++ config/i386/i386.md (working copy)
@@ -5808,14 +5808,14 @@
(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2])
 
 (define_insn *addmode_2
-  [(set (reg FLAGS_REG)
+  [(set (match_operand:SWI 0 nonimmediate_operand =r,rm)
+   (plus:SWI
+ (match_operand:SWI 1 nonimmediate_operand %0,0)
+ 

Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-07 Thread Richard Guenther
On Tue, Feb 7, 2012 at 11:00 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Feb 6, 2012 at 10:30 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Hmm.  Well, the only thing that's going to work for x86 is the 
 double-compare
 elimination portion.

 If we want to use this pass for x86, then for 4.8 we should also fix the
 discrepancy between the compare-elim canonical

  [(operate)
   (set-cc)]

 and the combine canonical

  [(set-cc)
   (operate)]

 (Because of the simplicity of the substitution in compare-elim, I prefer
 the former as the canonical canonical.)

 You are probably referring to following testcase:

 --cut here--
 int test (int a, int b)
 {
  int lt = a + b  0;
  int eq = a + b == 0;
  if (lt)
    return 1;
  return eq;
 }
 --cut here--

 where combine creates:

 Trying 8 - 9:
 Successfully matched this instruction:
 (parallel [
        (set (reg:CCZ 17 flags)
            (compare:CCZ (plus:SI (reg/v:SI 63 [ a ])
                    (reg/v:SI 64 [ b ]))
                (const_int 0 [0])))
        (set (reg:SI 60 [ D.1710 ])
            (plus:SI (reg/v:SI 63 [ a ])
                (reg/v:SI 64 [ b ])))
    ])

 Attached patch teaches combine to swap operands of a double set
 pattern and retries recognition. Also added are minimum
 target-dependant changes to handle the testcase above.

 Unfortunately, compare elimination was not able to remove redundant
 compare, although the testcase is carefully crafted to require only
 sign flag to be valid. Following enters compare-elim pass:

 (insn 9 8 10 2 (parallel [
            (set (reg:SI 5 di [orig:60 D.1710 ] [60])
                (plus:SI (reg/v:SI 5 di [orig:63 a ] [63])
                    (reg/v:SI 4 si [orig:64 b ] [64])))
            (set (reg:CCZ 17 flags)
                (compare:CCZ (plus:SI (reg/v:SI 5 di [orig:63 a ] [63])
                        (reg/v:SI 4 si [orig:64 b ] [64]))
                    (const_int 0 [0])))
        ]) cmp.c:4 261 {*addsi_2}
     (nil))

 (note 10 9 33 2 NOTE_INSN_DELETED)

 (insn 33 10 34 2 (set (reg:QI 1 dx [65])
        (eq:QI (reg:CCZ 17 flags)
            (const_int 0 [0]))) cmp.c:4 595 {*setcc_qi}
     (nil))

 (insn 34 33 30 2 (set (reg:SI 1 dx [65])
        (zero_extend:SI (reg:QI 1 dx [65]))) cmp.c:4 123
 {*zero_extendqisi2_movzbl}
     (nil))

 (insn 30 34 29 2 (set (reg/v:SI 0 ax [orig:59 eq ] [59])
        (const_int 1 [0x1])) cmp.c:6 64 {*movsi_internal}
     (expr_list:REG_EQUAL (const_int 1 [0x1])
        (nil)))

 (insn 29 30 31 2 (set (reg:CCGOC 17 flags)
        (compare:CCGOC (reg:SI 5 di [orig:60 D.1710 ] [60])
            (const_int 0 [0]))) cmp.c:6 2 {*cmpsi_ccno_1}
     (nil))

 (insn 31 29 25 2 (set (reg/v:SI 0 ax [orig:59 eq ] [59])
        (if_then_else:SI (ge (reg:CCGOC 17 flags)
                (const_int 0 [0]))
            (reg:SI 1 dx [65])
            (reg/v:SI 0 ax [orig:59 eq ] [59]))) cmp.c:6 903 {*movsicc_noc}
     (nil))

 The resulting code still includes redundant test that sets sign flag:

 test:
        addl    %esi, %edi
        movl    $1, %eax
        sete    %dl
     testl   %edi, %edi
        movzbl  %dl, %edx
        cmovns  %edx, %eax
        ret

 (BTW: I think that the change to combine.c would be nice to have, to
 find more other combine opportunities. I will propose the patch
 separately.)

Shouldn't there be a canonical order for parallels throughout the whole
compiler?  Maybe just enforced by gen_rtx_PARALLEL / RTL checking?
At least as far as I understand execution order of insns inside a PARALLEL
is undefined.

Richard.

 Uros.


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-07 Thread Uros Bizjak
On Tue, Feb 7, 2012 at 11:04 AM, Richard Guenther
richard.guent...@gmail.com wrote:

 (BTW: I think that the change to combine.c would be nice to have, to
 find more other combine opportunities. I will propose the patch
 separately.)

 Shouldn't there be a canonical order for parallels throughout the whole
 compiler?  Maybe just enforced by gen_rtx_PARALLEL / RTL checking?
 At least as far as I understand execution order of insns inside a PARALLEL
 is undefined.

All operations inside parallel happen at the same time. And there is
no canonical order enforced, as sadly shown by the discrepancy between
combine and compare elimination passes.

Uros.


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-07 Thread Richard Guenther
On Tue, Feb 7, 2012 at 11:46 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Feb 7, 2012 at 11:04 AM, Richard Guenther
 richard.guent...@gmail.com wrote:

 (BTW: I think that the change to combine.c would be nice to have, to
 find more other combine opportunities. I will propose the patch
 separately.)

 Shouldn't there be a canonical order for parallels throughout the whole
 compiler?  Maybe just enforced by gen_rtx_PARALLEL / RTL checking?
 At least as far as I understand execution order of insns inside a PARALLEL
 is undefined.

 All operations inside parallel happen at the same time. And there is
 no canonical order enforced, as sadly shown by the discrepancy between
 combine and compare elimination passes.

Sure - all what I say is that the fix should be to enforce such canonical
order instead of dealing with both.

Richard.

 Uros.


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-07 Thread Uros Bizjak
On Tue, Feb 7, 2012 at 11:48 AM, Richard Guenther
richard.guent...@gmail.com wrote:

 (BTW: I think that the change to combine.c would be nice to have, to
 find more other combine opportunities. I will propose the patch
 separately.)

 Shouldn't there be a canonical order for parallels throughout the whole
 compiler?  Maybe just enforced by gen_rtx_PARALLEL / RTL checking?
 At least as far as I understand execution order of insns inside a PARALLEL
 is undefined.

 All operations inside parallel happen at the same time. And there is
 no canonical order enforced, as sadly shown by the discrepancy between
 combine and compare elimination passes.

 Sure - all what I say is that the fix should be to enforce such canonical
 order instead of dealing with both.

rth proposed to adopt new scheme to change combine.c. However, I don't
think this is a good idea, since it would mean fixing many existing
in-tree and out-of-tree targets. OTOH, I thought that swapping
operands in combine would also benefit other parts of the compiler,
namely load/store multiple patterns, maybe swap insns, and similar.

It was also fairly easy to teach combine to handle both approaches. ;)

Uros.


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-07 Thread Uros Bizjak
On Tue, Feb 7, 2012 at 11:00 AM, Uros Bizjak ubiz...@gmail.com wrote:

 Hmm.  Well, the only thing that's going to work for x86 is the 
 double-compare
 elimination portion.

 If we want to use this pass for x86, then for 4.8 we should also fix the
 discrepancy between the compare-elim canonical

  [(operate)
   (set-cc)]

 and the combine canonical

  [(set-cc)
   (operate)]

 (Because of the simplicity of the substitution in compare-elim, I prefer
 the former as the canonical canonical.)

 You are probably referring to following testcase:

 --cut here--
 int test (int a, int b)
 {
  int lt = a + b  0;
  int eq = a + b == 0;
  if (lt)
    return 1;
  return eq;
 }
 --cut here--

 where combine creates:

 Trying 8 - 9:
 Successfully matched this instruction:
 (parallel [
        (set (reg:CCZ 17 flags)
            (compare:CCZ (plus:SI (reg/v:SI 63 [ a ])
                    (reg/v:SI 64 [ b ]))
                (const_int 0 [0])))
        (set (reg:SI 60 [ D.1710 ])
            (plus:SI (reg/v:SI 63 [ a ])
                (reg/v:SI 64 [ b ])))
    ])

 Attached patch teaches combine to swap operands of a double set
 pattern and retries recognition. Also added are minimum
 target-dependant changes to handle the testcase above.

Please ignore this idea. I am preparing target-only patchset that
moves x86 entirely to post-reload flags handling (similar to rx and
mn10300 targets). Not a 4.7 material in any way.

Uros.


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-06 Thread Richard Henderson
On 02/05/2012 07:27 AM, Uros Bizjak wrote:
 Hello!
 
 Attached patch enables post-reload compare optimization pass for x86 targets.

Hmm.  Well, the only thing that's going to work for x86 is the double-compare
elimination portion.

If we want to use this pass for x86, then for 4.8 we should also fix the
discrepancy between the compare-elim canonical

  [(operate)
   (set-cc)]

and the combine canonical

  [(set-cc)
   (operate)]

(Because of the simplicity of the substitution in compare-elim, I prefer
the former as the canonical canonical.)

And, really, we ought to come up with some trick to eliminate some of the
redundancy in patterns in the md file too.


r~


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-06 Thread Uros Bizjak
On Mon, Feb 6, 2012 at 9:59 PM, Richard Henderson r...@redhat.com wrote:
 On 02/05/2012 07:27 AM, Uros Bizjak wrote:
 Hello!

 Attached patch enables post-reload compare optimization pass for x86 targets.

 Hmm.  Well, the only thing that's going to work for x86 is the double-compare
 elimination portion.

 If we want to use this pass for x86, then for 4.8 we should also fix the
 discrepancy between the compare-elim canonical

  [(operate)
   (set-cc)]

 and the combine canonical

  [(set-cc)
   (operate)]

 (Because of the simplicity of the substitution in compare-elim, I prefer
 the former as the canonical canonical.)

You are probably referring to following testcase:

--cut here--
int test (int a, int b)
{
  int lt = a + b  0;
  int eq = a + b == 0;
  if (lt)
return 1;
  return eq;
}
--cut here--

where combine creates:

Trying 8 - 9:
Successfully matched this instruction:
(parallel [
(set (reg:CCZ 17 flags)
(compare:CCZ (plus:SI (reg/v:SI 63 [ a ])
(reg/v:SI 64 [ b ]))
(const_int 0 [0])))
(set (reg:SI 60 [ D.1710 ])
(plus:SI (reg/v:SI 63 [ a ])
(reg/v:SI 64 [ b ])))
])

Uros.


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-06 Thread Mike Stump
On Feb 6, 2012, at 12:59 PM, Richard Henderson wrote:
 And, really, we ought to come up with some trick to eliminate some of the
 redundancy in patterns in the md file too.

:-)  That'd be awesome...


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-05 Thread Steven Bosscher
On Sun, Feb 5, 2012 at 4:27 PM, Uros Bizjak ubiz...@gmail.com wrote:
 2012-02-05  Uros Bizjak  ubiz...@gmail.com

        PR target/28685
        * config/i386/i386.c (TARGET_FLAGS_REGNUM): New.


Hmm, how is this (apparently new in 2011) TARGET_FLAGS_REGNUM
different from the older targetm.fixed_condition_code_regs?

Ciao!
Steven


Re: [PATCH 4.8, i386]: Enable post-reload compare optimization pass (PR28685)

2012-02-05 Thread Uros Bizjak
On Sun, Feb 5, 2012 at 6:27 PM, Steven Bosscher stevenb@gmail.com wrote:

        PR target/28685
        * config/i386/i386.c (TARGET_FLAGS_REGNUM): New.


 Hmm, how is this (apparently new in 2011) TARGET_FLAGS_REGNUM
 different from the older targetm.fixed_condition_code_regs?

This is how backend enables the pass, please see
gate_compare_elim_after_reload function in gcc/compare-elim.c.

FWIW, the value could be anything != INVALID_REGNUM.

Uros.