Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-12-10 Thread Yangfei (Felix)
  --- gcc/config/aarch64/aarch64.c(revision 217394)
  +++ gcc/config/aarch64/aarch64.c(working copy)
  @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p
 (unsigned i
#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
  aarch64_use_by_pieces_infrastructure_p
 
  +#undef TARGET_CAN_USE_DOLOOP_P
  +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
  +
struct gcc_target targetm = TARGET_INITIALIZER;
 
#include gt-aarch64.h
 
 
 Hi Felix,
 
 This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64240
 when sms-3 is tested with -fPIC. It runs fine when I reverse this patch out.
 
 Please could you have a look?
 
 Thanks,
 Tejas.

OK, I have reproduced with -fPIC option. Will take a look.  



Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-12-09 Thread Tejas Belagod

On 18/11/14 11:28, Yangfei (Felix) wrote:

On 11/18/2014 11:48 AM, Yangfei (Felix) wrote:

+(define_expand doloop_end
+  [(use (match_operand 0  ))  ; loop pseudo
+   (use (match_operand 1  ))] ; label
+  
+{
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+  if (optimize  0  flag_modulo_sched)


You'd be better off moving this condition into the expansion predicate (which is
currently ).

This short-circuits a lot of unnecessary work.  See  pass_rtl_doloop::gate.


r~



Yeah, that's a good idea.  See my updated patch :-)


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-11-13  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64.c (doloop_end): New pattern.
+   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
+
  2014-11-11  Andrew Pinski  apin...@cavium.com

Bug target/61997
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 217394)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -4087,6 +4087,43 @@
[(set_attr type mrs)])


+;; Define the subtract-one-and-jump insns so loop.c
+;; knows what to generate.
+(define_expand doloop_end
+  [(use (match_operand 0  ))  ; loop pseudo
+   (use (match_operand 1  ))] ; label
+  optimize  0  flag_modulo_sched
+{
+  rtx s0;
+  rtx bcomp;
+  rtx loc_ref;
+  rtx cc_reg;
+  rtx insn;
+  rtx cmp;
+
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+
+  if (GET_MODE (operands[0]) != DImode)
+FAIL;
+
+  s0 = operands [0];
+  insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1)));
+
+  cmp = XVECEXP (PATTERN (insn), 0, 0);
+  cc_reg = SET_DEST (cmp);
+  bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
+  loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]);
+  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
+  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
+loc_ref, pc_rtx)));
+  DONE;
+})
+
  ;; AdvSIMD Stuff
  (include aarch64-simd.md)

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 217394)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i
  #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
aarch64_use_by_pieces_infrastructure_p

+#undef TARGET_CAN_USE_DOLOOP_P
+#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
+
  struct gcc_target targetm = TARGET_INITIALIZER;

  #include gt-aarch64.h



Hi Felix,

This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64240 
when sms-3 is tested with -fPIC. It runs fine when I reverse this patch out.


Please could you have a look?

Thanks,
Tejas.



Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Marcus Shawcroft
On 17 November 2014 07:59, Yangfei (Felix) felix.y...@huawei.com wrote:

 +2014-11-13  Felix Yang  felix.y...@huawei.com
 +
 + * config/aarch64/aarch64.md (doloop_end): New pattern.
 +

This looks like a straight copy of the ARM code, but without the
TARGET_CAN_USE_DOLOOP_P definition.  If the reason for including this
code is for the benefit of module-sched then should the hook be
defined to limit the use of this pattern to inner most loops only?

Cheers
/Marcus


Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Yangfei (Felix)
Yes, major code is borrowed from ARM port with the expected mode of loop pseudo 
changed to DImode. 
The function loop_canon_p called in sms_schedule can make sure that SMS is only 
performed for innermost loops. 
But I think it's a good idea to implement the TARGET_CAN_USE_DOLOOP_P hook 
here. 
See my updated patch below.  How about this one? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-11-13  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64.c (doloop_end): New pattern.
+   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
+
 2014-11-11  Andrew Pinski  apin...@cavium.com
 
Bug target/61997
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 217394)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -4087,6 +4087,47 @@
   [(set_attr type mrs)])
 
 
+;; Define the subtract-one-and-jump insns so loop.c
+;; knows what to generate.
+(define_expand doloop_end
+  [(use (match_operand 0  ))  ; loop pseudo
+   (use (match_operand 1  ))] ; label
+  
+{
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+  if (optimize  0  flag_modulo_sched)
+{
+  rtx s0;
+  rtx bcomp;
+  rtx loc_ref;
+  rtx cc_reg;
+  rtx insn;
+  rtx cmp;
+
+  if (GET_MODE (operands[0]) != DImode)
+   FAIL;
+
+  s0 = operands [0];
+  insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1)));
+
+  cmp = XVECEXP (PATTERN (insn), 0, 0);
+  cc_reg = SET_DEST (cmp);
+  bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
+  loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]);
+  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
+  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
+loc_ref, pc_rtx)));
+  DONE;
+}
+  else
+FAIL;
+})
+
 ;; AdvSIMD Stuff
 (include aarch64-simd.md)
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 217394)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   aarch64_use_by_pieces_infrastructure_p
 
+#undef TARGET_CAN_USE_DOLOOP_P
+#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include gt-aarch64.h


 
 On 17 November 2014 07:59, Yangfei (Felix) felix.y...@huawei.com wrote:
 
  +2014-11-13  Felix Yang  felix.y...@huawei.com
  +
  + * config/aarch64/aarch64.md (doloop_end): New pattern.
  +
 
 This looks like a straight copy of the ARM code, but without the
 TARGET_CAN_USE_DOLOOP_P definition.  If the reason for including this code
 is for the benefit of module-sched then should the hook be defined to limit 
 the
 use of this pattern to inner most loops only?
 
 Cheers
 /Marcus


aarch64-doloop-v4.diff
Description: aarch64-doloop-v4.diff


Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Richard Henderson
On 11/18/2014 11:48 AM, Yangfei (Felix) wrote:
 +(define_expand doloop_end
 +  [(use (match_operand 0  ))  ; loop pseudo
 +   (use (match_operand 1  ))] ; label
 +  
 +{
 +  /* Currently SMS relies on the do-loop pattern to recognize loops
 + where (1) the control part consists of all insns defining and/or
 + using a certain 'count' register and (2) the loop count can be
 + adjusted by modifying this register prior to the loop.
 + ??? The possible introduction of a new block to initialize the
 + new IV can potentially affect branch optimizations.  */
 +  if (optimize  0  flag_modulo_sched)

You'd be better off moving this condition into the expansion predicate (which
is currently ).

This short-circuits a lot of unnecessary work.  See  pass_rtl_doloop::gate.


r~


Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Yangfei (Felix)
 On 11/18/2014 11:48 AM, Yangfei (Felix) wrote:
  +(define_expand doloop_end
  +  [(use (match_operand 0  ))  ; loop pseudo
  +   (use (match_operand 1  ))] ; label
  +  
  +{
  +  /* Currently SMS relies on the do-loop pattern to recognize loops
  + where (1) the control part consists of all insns defining and/or
  + using a certain 'count' register and (2) the loop count can be
  + adjusted by modifying this register prior to the loop.
  + ??? The possible introduction of a new block to initialize the
  + new IV can potentially affect branch optimizations.  */
  +  if (optimize  0  flag_modulo_sched)
 
 You'd be better off moving this condition into the expansion predicate (which 
 is
 currently ).
 
 This short-circuits a lot of unnecessary work.  See  pass_rtl_doloop::gate.
 
 
 r~


Yeah, that's a good idea.  See my updated patch :-)


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217394)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2014-11-13  Felix Yang  felix.y...@huawei.com
+
+   * config/aarch64/aarch64.c (doloop_end): New pattern.
+   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
+
 2014-11-11  Andrew Pinski  apin...@cavium.com
 
Bug target/61997
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   (revision 217394)
+++ gcc/config/aarch64/aarch64.md   (working copy)
@@ -4087,6 +4087,43 @@
   [(set_attr type mrs)])
 
 
+;; Define the subtract-one-and-jump insns so loop.c
+;; knows what to generate.
+(define_expand doloop_end
+  [(use (match_operand 0  ))  ; loop pseudo
+   (use (match_operand 1  ))] ; label
+  optimize  0  flag_modulo_sched
+{
+  rtx s0;
+  rtx bcomp;
+  rtx loc_ref;
+  rtx cc_reg;
+  rtx insn;
+  rtx cmp;
+
+  /* Currently SMS relies on the do-loop pattern to recognize loops
+ where (1) the control part consists of all insns defining and/or
+ using a certain 'count' register and (2) the loop count can be
+ adjusted by modifying this register prior to the loop.
+ ??? The possible introduction of a new block to initialize the
+ new IV can potentially affect branch optimizations.  */
+
+  if (GET_MODE (operands[0]) != DImode)
+FAIL;
+
+  s0 = operands [0];
+  insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1)));
+
+  cmp = XVECEXP (PATTERN (insn), 0, 0);
+  cc_reg = SET_DEST (cmp);
+  bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
+  loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]);
+  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
+  gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
+loc_ref, pc_rtx)));
+  DONE;
+})
+
 ;; AdvSIMD Stuff
 (include aarch64-simd.md)
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c(revision 217394)
+++ gcc/config/aarch64/aarch64.c(working copy)
@@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   aarch64_use_by_pieces_infrastructure_p
 
+#undef TARGET_CAN_USE_DOLOOP_P
+#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include gt-aarch64.h



aarch64-doloop-v5.diff
Description: aarch64-doloop-v5.diff


Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Richard Henderson
On 11/18/2014 12:28 PM, Yangfei (Felix) wrote:
 +2014-11-13  Felix Yang  felix.y...@huawei.com
 +
 + * config/aarch64/aarch64.c (doloop_end): New pattern.
 + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.

Looks good to me.  I'll leave it for aarch64 maintainers for final approval.


r~


Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-18 Thread Marcus Shawcroft
On 18 November 2014 11:28, Yangfei (Felix) felix.y...@huawei.com wrote:

 Yeah, that's a good idea.  See my updated patch :-)


 Index: gcc/ChangeLog
 ===
 --- gcc/ChangeLog   (revision 217394)
 +++ gcc/ChangeLog   (working copy)
 @@ -1,3 +1,8 @@
 +2014-11-13  Felix Yang  felix.y...@huawei.com
 +
 +   * config/aarch64/aarch64.c (doloop_end): New pattern.
 +   * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement.
 +


OK with me /Marcus


[PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched

2014-11-17 Thread Yangfei (Felix)
PING?  Is it OK for me to apply this patch?  Thanks.


 
  On 11/12/2014 11:01 AM, Yangfei (Felix) wrote:
   +(define_expand doloop_end
   +  [(use (match_operand 0  ))  ; loop pseudo
   +   (use (match_operand 1  ))] ; label
   +  
   +  
   +{
 
  Drop the  surrounding the { }.
 
 
  r~
 
 
 Hello,
 I updated the patch with the redundant  removed.  Is it OK for trunk
 now?
 I hope this patch can catch up with stage 1 of GCC-5.0.  Thanks.
 
 
 Index: gcc/ChangeLog
 =
 ==
 --- gcc/ChangeLog (revision 217394)
 +++ gcc/ChangeLog (working copy)
 @@ -1,3 +1,7 @@
 +2014-11-13  Felix Yang  felix.y...@huawei.com
 +
 + * config/aarch64/aarch64.md (doloop_end): New pattern.
 +
  2014-11-11  Andrew Pinski  apin...@cavium.com
 
   Bug target/61997
 Index: gcc/config/aarch64/aarch64.md
 =
 ==
 --- gcc/config/aarch64/aarch64.md (revision 217394)
 +++ gcc/config/aarch64/aarch64.md (working copy)
 @@ -4087,6 +4087,47 @@
[(set_attr type mrs)])
 
 
 +;; Define the subtract-one-and-jump insns so loop.c ;; knows what to
 +generate.
 +(define_expand doloop_end
 +  [(use (match_operand 0  ))  ; loop pseudo
 +   (use (match_operand 1  ))] ; label
 +  
 +{
 +  /* Currently SMS relies on the do-loop pattern to recognize loops
 + where (1) the control part consists of all insns defining and/or
 + using a certain 'count' register and (2) the loop count can be
 + adjusted by modifying this register prior to the loop.
 + ??? The possible introduction of a new block to initialize the
 + new IV can potentially affect branch optimizations.  */
 +  if (optimize  0  flag_modulo_sched)
 +{
 +  rtx s0;
 +  rtx bcomp;
 +  rtx loc_ref;
 +  rtx cc_reg;
 +  rtx insn;
 +  rtx cmp;
 +
 +  if (GET_MODE (operands[0]) != DImode)
 + FAIL;
 +
 +  s0 = operands [0];
 +  insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1)));
 +
 +  cmp = XVECEXP (PATTERN (insn), 0, 0);
 +  cc_reg = SET_DEST (cmp);
 +  bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx);
 +  loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]);
 +  emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
 +gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
 +  loc_ref, pc_rtx)));
 +  DONE;
 +}
 +  else
 +FAIL;
 +})
 +
  ;; AdvSIMD Stuff
  (include aarch64-simd.md)