RE: [PATCH,ARM] Improve peepholes for LDM with commutative operators
I'm attaching a new version of the patch. Fixed all comments and retested. No regression on qemu --with-cpu cortex-a9. Thank you, Greta gcc/ChangeLog 2012-02-29 Greta Yorsh greta.yo...@arm.com * config/arm/arm-ldmstm.ml (write_ldm_commutative_peephole): Improved conditions of peepholes generating LDM followed by commutative operator. * config/arm/ldmstm.md: Regenerated. -Original Message- From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] Sent: 29 February 2012 00:41 To: Greta Yorsh Cc: gcc-patches@gcc.gnu.org; ram...@gcc.gnu.org; p...@codesourcery.com; ni...@redhat.com Subject: Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators [Sorry about the duplicate mail. My mailer seems to have eaten up the original reply I sent. ] On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote: Is it OK for GCC 4.7 Stage 4 ? This is stage4 - I'd like to hear what the RM's think. Technically it's fixing a regression and is low risk to me. In any case there are a couple of changes that I'd like done as explained below. Thank you, Greta gcc/ChangeLog 2012-02-28 Greta Yorsh greta.yo...@arm.com * config/arm/arm-ldmstm.ml: Improved conditions of peepholes that generate LDM followed by a commutative operator. * config/arm/ldmstm.md: Regenerated. Can you mention which 2 peepholes are changed in some way. diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm- ldmstm.ml index 221edd2..5f5a5e0 100644 --- a/gcc/config/arm/arm-ldmstm.ml +++ b/gcc/config/arm/arm-ldmstm.ml @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb = Printf.printf %s (match_operand:SI %d \s_register_operand\ \\)]))\n indent (nregs * 2 + 3); Printf.printf %s (clobber (reg:CC CC_REGNUM))])]\n indent end; - Printf.printf \(((operands[%d] == operands[0] operands[%d] == operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3); - Printf.printf || (operands[%d] == operands[0] operands[%d] == operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2); - Printf.printf peep2_reg_dead_p (%d, operands[0]) peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1); + Printf.printf \(((rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3); + Printf.printf || (rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2); + Printf.printf (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2); + Printf.printf (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2); begin if thumb then Printf.printf [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))]\n diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index 5db4a32..5db3d57 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -1160,9 +1160,10 @@ [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )])) (clobber (reg:CC CC_REGNUM))])] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], operands[4])) + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], operands[4]))) Line 80 characters - [(parallel [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)])) (clobber (reg:CC CC_REGNUM))])] @@ -1180,9 +1181,10 @@ (match_operator:SI 5 commutative_binary_operator [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )]))] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) Again line 80 characters. Instead of rtx_equal_p, check that the REGNOs are equal. That will be cheaper: we know these are register_operands. For bonus points you use peep2_regno_dead_p with REGNO (operands[n]) instead of peep2_reg_dead_p. If we are accessing REGNO might as well reuse it :). regards Ramana diff --git a/gcc/config/arm/arm-ldmstm.ml
Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators
On 29 February 2012 14:20, Greta Yorsh greta.yo...@arm.com wrote: I'm attaching a new version of the patch. Fixed all comments and retested. No regression on qemu --with-cpu cortex-a9. OK by me but please give 24 hours for an RM to comment / object. cheers Ramana
Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators
On Thu, Mar 1, 2012 at 3:20 AM, Greta Yorsh greta.yo...@arm.com wrote: I'm attaching a new version of the patch. Fixed all comments and retested. No regression on qemu --with-cpu cortex-a9. I assume that on the Cortex-A9 this generates a LDM instead of an expensive LDRD. For reference, a tight load loop takes 2.5 s on the current version, 3.1 s with a LDRD, and 1.9 s with a LDM. -- Michael
Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators
On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote: Is it OK for GCC 4.7 Stage 4 ? Technically this is a regression in 4.7 compared to 4.6, so I'd like to get this in. However given the stage we are and given that it's not a correctness issue, I would defer to the RMs. In any case there are a couple of changes that I'd like done as explained below. [(parallel [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)])) (clobber (reg:CC CC_REGNUM))])] @@ -1180,9 +1181,10 @@ (match_operator:SI 5 commutative_binary_operator [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )]))] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) Likewise. regards Ramana
Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators
[Sorry about the duplicate mail. My mailer seems to have eaten up the original reply I sent. ] On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote: Is it OK for GCC 4.7 Stage 4 ? This is stage4 - I'd like to hear what the RM's think. Technically it's fixing a regression and is low risk to me. In any case there are a couple of changes that I'd like done as explained below. Thank you, Greta gcc/ChangeLog 2012-02-28 Greta Yorsh greta.yo...@arm.com * config/arm/arm-ldmstm.ml: Improved conditions of peepholes that generate LDM followed by a commutative operator. * config/arm/ldmstm.md: Regenerated. Can you mention which 2 peepholes are changed in some way. diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml index 221edd2..5f5a5e0 100644 --- a/gcc/config/arm/arm-ldmstm.ml +++ b/gcc/config/arm/arm-ldmstm.ml @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb = Printf.printf %s (match_operand:SI %d \s_register_operand\ \\)]))\n indent (nregs * 2 + 3); Printf.printf %s (clobber (reg:CC CC_REGNUM))])]\n indent end; - Printf.printf \(((operands[%d] == operands[0] operands[%d] == operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3); - Printf.printf || (operands[%d] == operands[0] operands[%d] == operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2); - Printf.printf peep2_reg_dead_p (%d, operands[0]) peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1); + Printf.printf \(((rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3); + Printf.printf || (rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2); + Printf.printf (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2); + Printf.printf (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2); begin if thumb then Printf.printf [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))]\n diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index 5db4a32..5db3d57 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -1160,9 +1160,10 @@ [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )])) (clobber (reg:CC CC_REGNUM))])] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], operands[4])) + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], operands[4]))) Line 80 characters - [(parallel [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)])) (clobber (reg:CC CC_REGNUM))])] @@ -1180,9 +1181,10 @@ (match_operator:SI 5 commutative_binary_operator [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )]))] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) Again line 80 characters. Instead of rtx_equal_p, check that the REGNOs are equal. That will be cheaper: we know these are register_operands. For bonus points you use peep2_regno_dead_p with REGNO (operands[n]) instead of peep2_reg_dead_p. If we are accessing REGNO might as well reuse it :). regards Ramana