RE: [PATCH,ARM] Improve peepholes for LDM with commutative operators
I'm attaching a new version of the patch. Fixed all comments and retested. No regression on qemu --with-cpu cortex-a9. Thank you, Greta gcc/ChangeLog 2012-02-29 Greta Yorsh greta.yo...@arm.com * config/arm/arm-ldmstm.ml (write_ldm_commutative_peephole): Improved conditions of peepholes generating LDM followed by commutative operator. * config/arm/ldmstm.md: Regenerated. -Original Message- From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] Sent: 29 February 2012 00:41 To: Greta Yorsh Cc: gcc-patches@gcc.gnu.org; ram...@gcc.gnu.org; p...@codesourcery.com; ni...@redhat.com Subject: Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators [Sorry about the duplicate mail. My mailer seems to have eaten up the original reply I sent. ] On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote: Is it OK for GCC 4.7 Stage 4 ? This is stage4 - I'd like to hear what the RM's think. Technically it's fixing a regression and is low risk to me. In any case there are a couple of changes that I'd like done as explained below. Thank you, Greta gcc/ChangeLog 2012-02-28 Greta Yorsh greta.yo...@arm.com * config/arm/arm-ldmstm.ml: Improved conditions of peepholes that generate LDM followed by a commutative operator. * config/arm/ldmstm.md: Regenerated. Can you mention which 2 peepholes are changed in some way. diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm- ldmstm.ml index 221edd2..5f5a5e0 100644 --- a/gcc/config/arm/arm-ldmstm.ml +++ b/gcc/config/arm/arm-ldmstm.ml @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb = Printf.printf %s (match_operand:SI %d \s_register_operand\ \\)]))\n indent (nregs * 2 + 3); Printf.printf %s (clobber (reg:CC CC_REGNUM))])]\n indent end; - Printf.printf \(((operands[%d] == operands[0] operands[%d] == operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3); - Printf.printf || (operands[%d] == operands[0] operands[%d] == operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2); - Printf.printf peep2_reg_dead_p (%d, operands[0]) peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1); + Printf.printf \(((rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3); + Printf.printf || (rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2); + Printf.printf (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2); + Printf.printf (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2); begin if thumb then Printf.printf [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))]\n diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index 5db4a32..5db3d57 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -1160,9 +1160,10 @@ [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )])) (clobber (reg:CC CC_REGNUM))])] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], operands[4])) + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], operands[4]))) Line 80 characters - [(parallel [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)])) (clobber (reg:CC CC_REGNUM))])] @@ -1180,9 +1181,10 @@ (match_operator:SI 5 commutative_binary_operator [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )]))] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) Again line 80 characters. Instead of rtx_equal_p, check that the REGNOs are equal. That will be cheaper: we know these are register_operands. For bonus points you use peep2_regno_dead_p with REGNO (operands[n]) instead of peep2_reg_dead_p. If we are accessing REGNO might as well reuse it :). regards Ramana diff --git a/gcc/config/arm/arm-ldmstm.ml
Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators
On 29 February 2012 14:20, Greta Yorsh greta.yo...@arm.com wrote: I'm attaching a new version of the patch. Fixed all comments and retested. No regression on qemu --with-cpu cortex-a9. OK by me but please give 24 hours for an RM to comment / object. cheers Ramana
Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators
On Thu, Mar 1, 2012 at 3:20 AM, Greta Yorsh greta.yo...@arm.com wrote: I'm attaching a new version of the patch. Fixed all comments and retested. No regression on qemu --with-cpu cortex-a9. I assume that on the Cortex-A9 this generates a LDM instead of an expensive LDRD. For reference, a tight load loop takes 2.5 s on the current version, 3.1 s with a LDRD, and 1.9 s with a LDM. -- Michael
[PATCH,ARM] Improve peepholes for LDM with commutative operators
This patch improves existing peephole optimizations that merge individual LDRs into LDM, in the case that the order of registers in LDR instructions is not ascending, but the loaded values can be reordered because their uses commute. There are two changes: * use rtx__equal_p to compare operands (instead of plain ==) * identify more cases of dead registers in the pattern. For example, the following sequence LDR r1, [r2] LDR r0, [r2, #4] ADD r0, r0, r1 can be transformed into LDRD r0, r1, [r2] ADD r0, r0, r1 when r1 is dead after ADD. Such optimization opportunities are missed by the existing peephole conditions, because r0 is not dead after ADD. This patch enables such transformations. No regression on qemu for --target=arm-none-eabi --with-cpu=cortex-a15 and a9. This patch was originally submitted as part of a sequence of patches improving LDRD/STRD/LDM/STM generation: http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00920.html but it is independent and it fixes a failures in one of the regression tests: PASS: gcc.target/arm/pr40457-1.c scan-assembler ldm Is it OK for GCC 4.7 Stage 4 ? Thank you, Greta gcc/ChangeLog 2012-02-28 Greta Yorsh greta.yo...@arm.com * config/arm/arm-ldmstm.ml: Improved conditions of peepholes that generate LDM followed by a commutative operator. * config/arm/ldmstm.md: Regenerated. diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml index 221edd2..5f5a5e0 100644 --- a/gcc/config/arm/arm-ldmstm.ml +++ b/gcc/config/arm/arm-ldmstm.ml @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb = Printf.printf %s (match_operand:SI %d \s_register_operand\ \\)]))\n indent (nregs * 2 + 3); Printf.printf %s (clobber (reg:CC CC_REGNUM))])]\n indent end; - Printf.printf \(((operands[%d] == operands[0] operands[%d] == operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3); - Printf.printf || (operands[%d] == operands[0] operands[%d] == operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2); - Printf.printf peep2_reg_dead_p (%d, operands[0]) peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1); + Printf.printf \(((rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3); + Printf.printf || (rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2); + Printf.printf (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2); + Printf.printf (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2); begin if thumb then Printf.printf [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))]\n diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index 5db4a32..5db3d57 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -1160,9 +1160,10 @@ [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )])) (clobber (reg:CC CC_REGNUM))])] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], operands[4])) + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], operands[4]))) [(parallel [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)])) (clobber (reg:CC CC_REGNUM))])] @@ -1180,9 +1181,10 @@ (match_operator:SI 5 commutative_binary_operator [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )]))] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], operands[4])) + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], operands[4]))) [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))] { if (!gen_ldm_seq (operands, 2, true))
Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators
On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote: Is it OK for GCC 4.7 Stage 4 ? Technically this is a regression in 4.7 compared to 4.6, so I'd like to get this in. However given the stage we are and given that it's not a correctness issue, I would defer to the RMs. In any case there are a couple of changes that I'd like done as explained below. [(parallel [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)])) (clobber (reg:CC CC_REGNUM))])] @@ -1180,9 +1181,10 @@ (match_operator:SI 5 commutative_binary_operator [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )]))] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) Likewise. regards Ramana
Re: [PATCH,ARM] Improve peepholes for LDM with commutative operators
[Sorry about the duplicate mail. My mailer seems to have eaten up the original reply I sent. ] On Tue, Feb 28, 2012 at 05:09:05PM -, Greta Yorsh wrote: Is it OK for GCC 4.7 Stage 4 ? This is stage4 - I'd like to hear what the RM's think. Technically it's fixing a regression and is low risk to me. In any case there are a couple of changes that I'd like done as explained below. Thank you, Greta gcc/ChangeLog 2012-02-28 Greta Yorsh greta.yo...@arm.com * config/arm/arm-ldmstm.ml: Improved conditions of peepholes that generate LDM followed by a commutative operator. * config/arm/ldmstm.md: Regenerated. Can you mention which 2 peepholes are changed in some way. diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml index 221edd2..5f5a5e0 100644 --- a/gcc/config/arm/arm-ldmstm.ml +++ b/gcc/config/arm/arm-ldmstm.ml @@ -216,9 +216,10 @@ let write_ldm_commutative_peephole thumb = Printf.printf %s (match_operand:SI %d \s_register_operand\ \\)]))\n indent (nregs * 2 + 3); Printf.printf %s (clobber (reg:CC CC_REGNUM))])]\n indent end; - Printf.printf \(((operands[%d] == operands[0] operands[%d] == operands[1])\n (nregs * 2 + 2) (nregs * 2 + 3); - Printf.printf || (operands[%d] == operands[0] operands[%d] == operands[1]))\n (nregs * 2 + 3) (nregs * 2 + 2); - Printf.printf peep2_reg_dead_p (%d, operands[0]) peep2_reg_dead_p (%d, operands[1]))\\n (nregs + 1) (nregs + 1); + Printf.printf \(((rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1]))\n (nregs * 2 + 2) (nregs * 2 + 3); + Printf.printf || (rtx_equal_p (operands[%d], operands[0]) rtx_equal_p (operands[%d], operands[1])))\n (nregs * 2 + 3) (nregs * 2 + 2); + Printf.printf (peep2_reg_dead_p (%d, operands[0]) || rtx_equal_p (operands[0], operands[%d]))\n (nregs + 1) (nregs * 2); + Printf.printf (peep2_reg_dead_p (%d, operands[1]) || rtx_equal_p (operands[1], operands[%d])))\\n (nregs + 1) (nregs * 2); begin if thumb then Printf.printf [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))]\n diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index 5db4a32..5db3d57 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -1160,9 +1160,10 @@ [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )])) (clobber (reg:CC CC_REGNUM))])] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) + (peep2_reg_dead_p (3, operands[0]) || rtx_equal_p (operands[0], operands[4])) + (peep2_reg_dead_p (3, operands[1]) || rtx_equal_p (operands[1], operands[4]))) Line 80 characters - [(parallel [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)])) (clobber (reg:CC CC_REGNUM))])] @@ -1180,9 +1181,10 @@ (match_operator:SI 5 commutative_binary_operator [(match_operand:SI 6 s_register_operand ) (match_operand:SI 7 s_register_operand )]))] - (((operands[6] == operands[0] operands[7] == operands[1]) - || (operands[7] == operands[0] operands[6] == operands[1])) - peep2_reg_dead_p (3, operands[0]) peep2_reg_dead_p (3, operands[1])) + (((rtx_equal_p (operands[6], operands[0]) rtx_equal_p (operands[7], operands[1])) + || (rtx_equal_p (operands[7], operands[0]) rtx_equal_p (operands[6], operands[1]))) Again line 80 characters. Instead of rtx_equal_p, check that the REGNOs are equal. That will be cheaper: we know these are register_operands. For bonus points you use peep2_regno_dead_p with REGNO (operands[n]) instead of peep2_reg_dead_p. If we are accessing REGNO might as well reuse it :). regards Ramana