RE: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-29 Thread Greta Yorsh
I'm attaching a new version of the patch. Fixed all comments and retested.
No regression on qemu --with-cpu cortex-a9.

Thank you,

Greta

gcc/ChangeLog

2012-02-29  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm-ldmstm.ml (write_ldm_commutative_peephole):
Improved
conditions of peepholes generating LDM followed by commutative
operator.
* config/arm/ldmstm.md: Regenerated.


 -Original Message-
 From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org]
 Sent: 29 February 2012 00:41
 To: Greta Yorsh
 Cc: gcc-patches@gcc.gnu.org; ram...@gcc.gnu.org; p...@codesourcery.com;
 ni...@redhat.com
 Subject: Re: [PATCH,ARM] Improve peepholes for LDM with commutative
 operators
 
 [Sorry about the duplicate mail. My mailer seems to have eaten up the
 original reply I sent. ]
 
 
 On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote:
  Is it OK for GCC 4.7 Stage 4 ?
 
 This is stage4 - I'd like to hear what the RM's think. Technically
 it's fixing a regression and is low risk to me.
 
 In any case there are a couple of changes that I'd like done
 as explained below.
 
 
  Thank you,
 
  Greta
 
  gcc/ChangeLog
 
  2012-02-28  Greta Yorsh  greta.yo...@arm.com
 
  * config/arm/arm-ldmstm.ml: Improved conditions of peepholes
 that
  generate
  LDM followed by a commutative operator.
  * config/arm/ldmstm.md: Regenerated.
 
 Can you mention which 2 peepholes are changed in some way.
 
  diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-
 ldmstm.ml
  index 221edd2..5f5a5e0 100644
  --- a/gcc/config/arm/arm-ldmstm.ml
  +++ b/gcc/config/arm/arm-ldmstm.ml
  @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb =
   Printf.printf %s  (match_operand:SI %d
 \s_register_operand\ \\)]))\n indent (nregs * 2 + 3);
   Printf.printf %s   (clobber (reg:CC CC_REGNUM))])]\n indent
 end;
  -  Printf.printf   \(((operands[%d] == operands[0]  operands[%d]
 == operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3);
  -  Printf.printf  || (operands[%d] == operands[0] 
 operands[%d] == operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2);
  -  Printf.printf  peep2_reg_dead_p (%d, operands[0]) 
 peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1);
  +  Printf.printf   \(((rtx_equal_p (operands[%d], operands[0]) 
 rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2
 + 3);
  +  Printf.printf  || (rtx_equal_p (operands[%d], operands[0]) 
 rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2
 + 2);
  +  Printf.printf  (peep2_reg_dead_p (%d, operands[0]) ||
 rtx_equal_p (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2);
  +  Printf.printf  (peep2_reg_dead_p (%d, operands[1]) ||
 rtx_equal_p (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2);
 begin
   if thumb then
 Printf.printf   [(set (match_dup %d) (match_op_dup %d
 [(match_dup %d) (match_dup %d)]))]\n
  diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
  index 5db4a32..5db3d57 100644
  --- a/gcc/config/arm/ldmstm.md
  +++ b/gcc/config/arm/ldmstm.md
  @@ -1160,9 +1160,10 @@
   [(match_operand:SI 6 s_register_operand )
(match_operand:SI 7 s_register_operand )]))
 (clobber (reg:CC CC_REGNUM))])]
  -  (((operands[6] == operands[0]  operands[7] == operands[1])
  - || (operands[7] == operands[0]  operands[6] == operands[1]))
  - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3,
 operands[1]))
  +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p
 (operands[7], operands[1]))
  + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p
 (operands[6], operands[1])))
  + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p
 (operands[0], operands[4]))
  + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p
 (operands[1], operands[4])))
 
 Line  80 characters -
 
 
 [(parallel
   [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup
 7)]))
(clobber (reg:CC CC_REGNUM))])]
  @@ -1180,9 +1181,10 @@
   (match_operator:SI 5 commutative_binary_operator
[(match_operand:SI 6 s_register_operand )
 (match_operand:SI 7 s_register_operand )]))]
  -  (((operands[6] == operands[0]  operands[7] == operands[1])
  - || (operands[7] == operands[0]  operands[6] == operands[1]))
  - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3,
 operands[1]))
  +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p
 (operands[7], operands[1]))
  + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p
 (operands[6], operands[1])))
 
 Again line  80 characters.
 
 Instead of rtx_equal_p, check that the REGNOs are equal.
 That will be cheaper: we know these are register_operands.
 
 For bonus points you use peep2_regno_dead_p with REGNO (operands[n])
 instead of peep2_reg_dead_p. If we are accessing REGNO might as well
 reuse it :).
 
 regards
 Ramana
diff --git a/gcc/config/arm/arm-ldmstm.ml

Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-29 Thread Ramana Radhakrishnan
On 29 February 2012 14:20, Greta Yorsh greta.yo...@arm.com wrote:
 I'm attaching a new version of the patch. Fixed all comments and retested.
 No regression on qemu --with-cpu cortex-a9.


OK by me but please give 24 hours for an RM to comment / object.

cheers
Ramana


Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-29 Thread Michael Hope
On Thu, Mar 1, 2012 at 3:20 AM, Greta Yorsh greta.yo...@arm.com wrote:
 I'm attaching a new version of the patch. Fixed all comments and retested.
 No regression on qemu --with-cpu cortex-a9.

I assume that on the Cortex-A9 this generates a LDM instead of an
expensive LDRD.  For reference, a tight load loop takes 2.5 s on the
current version, 3.1 s with a LDRD, and 1.9 s with a LDM.

-- Michael


Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-28 Thread Ramana Radhakrishnan
On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote:
 Is it OK for GCC 4.7 Stage 4 ?

Technically this is a regression in 4.7 compared to 4.6, 
so I'd like to get this in. 

However given the stage we are and given that it's not a
correctness issue, I would defer to the RMs. 

In any case there are a couple of changes that I'd like done 
as explained below.

[(parallel
  [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))
   (clobber (reg:CC CC_REGNUM))])]
 @@ -1180,9 +1181,10 @@
  (match_operator:SI 5 commutative_binary_operator
   [(match_operand:SI 6 s_register_operand )
(match_operand:SI 7 s_register_operand )]))]
 -  (((operands[6] == operands[0]  operands[7] == operands[1])
 - || (operands[7] == operands[0]  operands[6] == operands[1]))
 - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, 
 operands[1]))
 +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
 operands[1]))
 + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
 operands[1])))

Likewise.

regards
Ramana


Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators

2012-02-28 Thread Ramana Radhakrishnan
[Sorry about the duplicate mail. My mailer seems to have eaten up the
original reply I sent. ]


On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote:
 Is it OK for GCC 4.7 Stage 4 ?

This is stage4 - I'd like to hear what the RM's think. Technically
it's fixing a regression and is low risk to me.

In any case there are a couple of changes that I'd like done
as explained below.


 Thank you,

 Greta

 gcc/ChangeLog

 2012-02-28  Greta Yorsh  greta.yo...@arm.com

 * config/arm/arm-ldmstm.ml: Improved conditions of peepholes that
 generate
 LDM followed by a commutative operator.
 * config/arm/ldmstm.md: Regenerated.

Can you mention which 2 peepholes are changed in some way.

 diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml
 index 221edd2..5f5a5e0 100644
 --- a/gcc/config/arm/arm-ldmstm.ml
 +++ b/gcc/config/arm/arm-ldmstm.ml
 @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb =
  Printf.printf %s  (match_operand:SI %d \s_register_operand\ 
 \\)]))\n indent (nregs * 2 + 3);
  Printf.printf %s   (clobber (reg:CC CC_REGNUM))])]\n indent
end;
 -  Printf.printf   \(((operands[%d] == operands[0]  operands[%d] == 
 operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3);
 -  Printf.printf  || (operands[%d] == operands[0]  operands[%d] == 
 operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2);
 -  Printf.printf  peep2_reg_dead_p (%d, operands[0])  
 peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1);
 +  Printf.printf   \(((rtx_equal_p (operands[%d], operands[0])  
 rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3);
 +  Printf.printf  || (rtx_equal_p (operands[%d], operands[0])  
 rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2);
 +  Printf.printf  (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p 
 (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2);
 +  Printf.printf  (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p 
 (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2);
begin
  if thumb then
Printf.printf   [(set (match_dup %d) (match_op_dup %d [(match_dup %d) 
 (match_dup %d)]))]\n
 diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
 index 5db4a32..5db3d57 100644
 --- a/gcc/config/arm/ldmstm.md
 +++ b/gcc/config/arm/ldmstm.md
 @@ -1160,9 +1160,10 @@
  [(match_operand:SI 6 s_register_operand )
   (match_operand:SI 7 s_register_operand )]))
(clobber (reg:CC CC_REGNUM))])]
 -  (((operands[6] == operands[0]  operands[7] == operands[1])
 - || (operands[7] == operands[0]  operands[6] == operands[1]))
 - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, 
 operands[1]))
 +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
 operands[1]))
 + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
 operands[1])))
 + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], 
 operands[4]))
 + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], 
 operands[4])))

Line  80 characters -


[(parallel
  [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))
   (clobber (reg:CC CC_REGNUM))])]
 @@ -1180,9 +1181,10 @@
  (match_operator:SI 5 commutative_binary_operator
   [(match_operand:SI 6 s_register_operand )
(match_operand:SI 7 s_register_operand )]))]
 -  (((operands[6] == operands[0]  operands[7] == operands[1])
 - || (operands[7] == operands[0]  operands[6] == operands[1]))
 - peep2_reg_dead_p (3, operands[0])  peep2_reg_dead_p (3, 
 operands[1]))
 +  (((rtx_equal_p (operands[6], operands[0])  rtx_equal_p (operands[7], 
 operands[1]))
 + || (rtx_equal_p (operands[7], operands[0])  rtx_equal_p (operands[6], 
 operands[1])))

Again line  80 characters.

Instead of rtx_equal_p, check that the REGNOs are equal.
That will be cheaper: we know these are register_operands.

For bonus points you use peep2_regno_dead_p with REGNO (operands[n])
instead of peep2_reg_dead_p. If we are accessing REGNO might as well
reuse it :).

regards
Ramana