RE: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-29 Thread Greta Yorsh
I'm attaching a new version of the patch. Fixed all comments and retested.
No regression on qemu --with-cpu cortex-a9.

Thank you,

Greta

gcc/ChangeLog

2012-02-29  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-ldmstm.ml (write_ldm_commutative_peephole):
Improved
conditions of peepholes generating LDM followed by commutative
operator.
* config/arm/ldmstm.md: Regenerated.


 -Original Message-
 From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
 Sent: 29 February 2012 00:41
 To: Greta Yorsh
 Cc: gcc-patches@gcc.gnu.org; ram...@gcc.gnu.org; p...@codesourcery.com;
 ni...@redhat.com
 Subject: Re: [PATCH,ARM] Improve peepholes for LDM with commutative
 operators
 
 [Sorry about the duplicate mail. My mailer seems to have eaten up the
 original reply I sent. ]
 
 
 On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote:
  Is it OK for GCC 4.7 Stage 4 ?
 
 This is stage4 - I'd like to hear what the RM's think. Technically
 it's fixing a regression and is low risk to me.
 
 In any case there are a couple of changes that I'd like done
 as explained below.
 
 
  Thank you,
 
  Greta
 
  gcc/ChangeLog
 
  2012-02-28  Greta Yorsh  greta.yo...@arm.com
 
  * config/arm/arm-ldmstm.ml: Improved conditions of peepholes
 that
  generate
  LDM followed by a commutative operator.
  * config/arm/ldmstm.md: Regenerated.
 
 Can you mention which 2 peepholes are changed in some way.
 
  diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-
 ldmstm.ml
  index 221edd2..5f5a5e0 100644
  --- a/gcc/config/arm/arm-ldmstm.ml
  +++ b/gcc/config/arm/arm-ldmstm.ml
  @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb =
   Printf.printf %s  (match_operand:SI %d
 \s_register_operand\ \\)]))\n indent (nregs * 2 + 3);
   Printf.printf %s   (clobber (reg:CC CC_REGNUM))])]\n indent
 end;
  -  Printf.printf   \(((operands[%d] == operands[0]  operands[%d]
 == operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3);
  -  Printf.printf  || (operands[%d] == operands[0] 
 operands[%d] == operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2);
  -  Printf.printf  peep2_reg_dead_p (%d, operands[0]) 
 peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1);
  +  Printf.printf   \(((rtx_equal_p (operands[%d], operands[0]) 
 rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2
 + 3);
  +  Printf.printf  || (rtx_equal_p (operands[%d], operands[0]) 
 rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2
 + 2);
  +  Printf.printf  (peep2_reg_dead_p (%d, operands[0]) ||
 rtx_equal_p (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2);
  +  Printf.printf  (peep2_reg_dead_p (%d, operands[1]) ||
 rtx_equal_p (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2);
 begin
   if thumb then
 Printf.printf   [(set (match_dup %d) (match_op_dup %d
 [(match_dup %d) (match_dup %d)]))]\n
  diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
  index 5db4a32..5db3d57 100644
  --- a/gcc/config/arm/ldmstm.md
  +++ b/gcc/config/arm/ldmstm.md
  @@ -1160,9 +1160,10 @@
   [(match_operand:SI 6 s_register_operand )
(match_operand:SI 7 s_register_operand )]))
 (clobber (reg:CC CC_REGNUM))])]
  -  (((operands[6] == operands[0]  operands[7] == operands[1])
  - || (operands[7] == operands[0]  operands[6] == operands[1]))
  - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3,
 operands[1]))
  +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p
 (operands[7], operands[1]))
  + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p
 (operands[6], operands[1])))
  + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p
 (operands[0], operands[4]))
  + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p
 (operands[1], operands[4])))
 
 Line  80 characters -
 
 
 [(parallel
   [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup
 7)]))
(clobber (reg:CC CC_REGNUM))])]
  @@ -1180,9 +1181,10 @@
   (match_operator:SI 5 commutative_binary_operator
[(match_operand:SI 6 s_register_operand )
 (match_operand:SI 7 s_register_operand )]))]
  -  (((operands[6] == operands[0]  operands[7] == operands[1])
  - || (operands[7] == operands[0]  operands[6] == operands[1]))
  - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3,
 operands[1]))
  +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p
 (operands[7], operands[1]))
  + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p
 (operands[6], operands[1])))
 
 Again line  80 characters.
 
 Instead of rtx_equal_p, check that the REGNOs are equal.
 That will be cheaper: we know these are register_operands.
 
 For bonus points you use peep2_regno_dead_p with REGNO (operands[n])
 instead of peep2_reg_dead_p. If we are accessing REGNO might as well
 reuse it :).
 
 regards
 Ramana
diff --git a/gcc/config/arm/arm-ldmstm.ml

Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-29 Thread Ramana Radhakrishnan
On 29 February 2012 14:20, Greta Yorsh greta.yo...@arm.com wrote:
 I'm attaching a new version of the patch. Fixed all comments and retested.
 No regression on qemu --with-cpu cortex-a9.


OK by me but please give 24 hours for an RM to comment / object.

cheers
Ramana


Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-29 Thread Michael Hope
On Thu, Mar 1, 2012 at 3:20 AM, Greta Yorsh greta.yo...@arm.com wrote:
 I'm attaching a new version of the patch. Fixed all comments and retested.
 No regression on qemu --with-cpu cortex-a9.

I assume that on the Cortex-A9 this generates a LDM instead of an
expensive LDRD.  For reference, a tight load loop takes 2.5 s on the
current version, 3.1 s with a LDRD, and 1.9 s with a LDM.

-- Michael


[PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-28 Thread Greta Yorsh
This patch improves existing peephole optimizations that merge individual
LDRs into LDM, in the case that the order of registers in LDR instructions
is not ascending, but the loaded values can be reordered because their uses
commute. 

There are two changes:
* use rtx__equal_p to compare operands (instead of plain ==)
* identify more cases of dead registers in the pattern.

For example, the following sequence
LDR r1, [r2]
LDR r0, [r2, #4]
ADD r0, r0, r1
can be transformed into
LDRD r0, r1, [r2]
ADD r0, r0, r1
when r1 is dead after ADD. Such optimization opportunities are missed by the
existing peephole conditions, because r0 is not dead after ADD. This patch
enables such transformations.

No regression on qemu for --target=arm-none-eabi --with-cpu=cortex-a15 and
a9.

This patch was originally submitted as part of a sequence of patches
improving LDRD/STRD/LDM/STM generation:
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00920.html
but it is independent and it fixes a failures in one of the regression
tests:
PASS: gcc.target/arm/pr40457-1.c scan-assembler ldm

Is it OK for GCC 4.7 Stage 4 ?

Thank you,

Greta

gcc/ChangeLog

2012-02-28  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-ldmstm.ml: Improved conditions of peepholes that
generate
LDM followed by a commutative operator.
* config/arm/ldmstm.md: Regenerated.
diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml
index 221edd2..5f5a5e0 100644
--- a/gcc/config/arm/arm-ldmstm.ml
+++ b/gcc/config/arm/arm-ldmstm.ml
@@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb =
 Printf.printf %s  (match_operand:SI %d \s_register_operand\ 
\\)]))\n indent (nregs * 2 + 3);
 Printf.printf %s   (clobber (reg:CC CC_REGNUM))])]\n indent
   end;
-  Printf.printf   \(((operands[%d] == operands[0]  operands[%d] == 
operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3);
-  Printf.printf  || (operands[%d] == operands[0]  operands[%d] == 
operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2);
-  Printf.printf  peep2_reg_dead_p (%d, operands[0])  peep2_reg_dead_p 
(%d, operands[1]))\\n (nregs + 1) (nregs + 1);
+  Printf.printf   \(((rtx_equal_p (operands[%d], operands[0])  rtx_equal_p 
(operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3);
+  Printf.printf  || (rtx_equal_p (operands[%d], operands[0])  
rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2);
+  Printf.printf  (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p 
(operands[0], operands[%d]))\n (nregs + 1) (nregs * 2);
+  Printf.printf  (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p 
(operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2);
   begin
 if thumb then
   Printf.printf   [(set (match_dup %d) (match_op_dup %d [(match_dup %d) 
(match_dup %d)]))]\n
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index 5db4a32..5db3d57 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -1160,9 +1160,10 @@
 [(match_operand:SI 6 s_register_operand )
  (match_operand:SI 7 s_register_operand )]))
   (clobber (reg:CC CC_REGNUM))])]
-  (((operands[6] == operands[0]  operands[7] == operands[1])
- || (operands[7] == operands[0]  operands[6] == operands[1]))
- peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, operands[1]))
+  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
operands[1]))
+ || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
operands[1])))
+ (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], 
operands[4]))
+ (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], 
operands[4])))
   [(parallel
 [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))
  (clobber (reg:CC CC_REGNUM))])]
@@ -1180,9 +1181,10 @@
 (match_operator:SI 5 commutative_binary_operator
  [(match_operand:SI 6 s_register_operand )
   (match_operand:SI 7 s_register_operand )]))]
-  (((operands[6] == operands[0]  operands[7] == operands[1])
- || (operands[7] == operands[0]  operands[6] == operands[1]))
- peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, operands[1]))
+  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
operands[1]))
+ || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
operands[1])))
+ (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], 
operands[4]))
+ (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], 
operands[4])))
   [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))]
 {
   if (!gen_ldm_seq (operands, 2, true))


Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-28 Thread Ramana Radhakrishnan
On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote:
 Is it OK for GCC 4.7 Stage 4 ?

Technically this is a regression in 4.7 compared to 4.6, 
so I'd like to get this in. 

However given the stage we are and given that it's not a
correctness issue, I would defer to the RMs. 

In any case there are a couple of changes that I'd like done 
as explained below.

[(parallel
  [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))
   (clobber (reg:CC CC_REGNUM))])]
 @@ -1180,9 +1181,10 @@
  (match_operator:SI 5 commutative_binary_operator
   [(match_operand:SI 6 s_register_operand )
(match_operand:SI 7 s_register_operand )]))]
 -  (((operands[6] == operands[0]  operands[7] == operands[1])
 - || (operands[7] == operands[0]  operands[6] == operands[1]))
 - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, 
 operands[1]))
 +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
 operands[1]))
 + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
 operands[1])))

Likewise.

regards
Ramana


Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-28 Thread Ramana Radhakrishnan
[Sorry about the duplicate mail. My mailer seems to have eaten up the
original reply I sent. ]


On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote:
 Is it OK for GCC 4.7 Stage 4 ?

This is stage4 - I'd like to hear what the RM's think. Technically
it's fixing a regression and is low risk to me.

In any case there are a couple of changes that I'd like done
as explained below.


 Thank you,

 Greta

 gcc/ChangeLog

 2012-02-28  Greta Yorsh  greta.yo...@arm.com

 * config/arm/arm-ldmstm.ml: Improved conditions of peepholes that
 generate
 LDM followed by a commutative operator.
 * config/arm/ldmstm.md: Regenerated.

Can you mention which 2 peepholes are changed in some way.

 diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml
 index 221edd2..5f5a5e0 100644
 --- a/gcc/config/arm/arm-ldmstm.ml
 +++ b/gcc/config/arm/arm-ldmstm.ml
 @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb =
  Printf.printf %s  (match_operand:SI %d \s_register_operand\ 
 \\)]))\n indent (nregs * 2 + 3);
  Printf.printf %s   (clobber (reg:CC CC_REGNUM))])]\n indent
end;
 -  Printf.printf   \(((operands[%d] == operands[0]  operands[%d] == 
 operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3);
 -  Printf.printf  || (operands[%d] == operands[0]  operands[%d] == 
 operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2);
 -  Printf.printf  peep2_reg_dead_p (%d, operands[0])  
 peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1);
 +  Printf.printf   \(((rtx_equal_p (operands[%d], operands[0])  
 rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3);
 +  Printf.printf  || (rtx_equal_p (operands[%d], operands[0])  
 rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2);
 +  Printf.printf  (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p 
 (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2);
 +  Printf.printf  (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p 
 (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2);
begin
  if thumb then
Printf.printf   [(set (match_dup %d) (match_op_dup %d [(match_dup %d) 
 (match_dup %d)]))]\n
 diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
 index 5db4a32..5db3d57 100644
 --- a/gcc/config/arm/ldmstm.md
 +++ b/gcc/config/arm/ldmstm.md
 @@ -1160,9 +1160,10 @@
  [(match_operand:SI 6 s_register_operand )
   (match_operand:SI 7 s_register_operand )]))
(clobber (reg:CC CC_REGNUM))])]
 -  (((operands[6] == operands[0]  operands[7] == operands[1])
 - || (operands[7] == operands[0]  operands[6] == operands[1]))
 - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, 
 operands[1]))
 +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
 operands[1]))
 + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
 operands[1])))
 + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], 
 operands[4]))
 + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], 
 operands[4])))

Line  80 characters -


[(parallel
  [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))
   (clobber (reg:CC CC_REGNUM))])]
 @@ -1180,9 +1181,10 @@
  (match_operator:SI 5 commutative_binary_operator
   [(match_operand:SI 6 s_register_operand )
(match_operand:SI 7 s_register_operand )]))]
 -  (((operands[6] == operands[0]  operands[7] == operands[1])
 - || (operands[7] == operands[0]  operands[6] == operands[1]))
 - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, 
 operands[1]))
 +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
 operands[1]))
 + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
 operands[1])))

Again line  80 characters.

Instead of rtx_equal_p, check that the REGNOs are equal.
That will be cheaper: we know these are register_operands.

For bonus points you use peep2_regno_dead_p with REGNO (operands[n])
instead of peep2_reg_dead_p. If we are accessing REGNO might as well
reuse it :).

regards
Ramana