Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
--- gcc/config/aarch64/aarch64.c(revision 217394) +++ gcc/config/aarch64/aarch64.c(working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-aarch64.h Hi Felix, This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64240 when sms-3 is tested with -fPIC. It runs fine when I reverse this patch out. Please could you have a look? Thanks, Tejas. OK, I have reproduced with -fPIC option. Will take a look.
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 18/11/14 11:28, Yangfei (Felix) wrote: On 11/18/2014 11:48 AM, Yangfei (Felix) wrote: +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) You'd be better off moving this condition into the expansion predicate (which is currently ). This short-circuits a lot of unnecessary work. See pass_rtl_doloop::gate. r~ Yeah, that's a good idea. See my updated patch :-) Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,43 @@ [(set_attr type mrs)]) +;; Define the subtract-one-and-jump insns so loop.c +;; knows what to generate. +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + optimize 0 flag_modulo_sched +{ + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + + if (GET_MODE (operands[0]) != DImode) +FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, +loc_ref, pc_rtx))); + DONE; +}) + ;; AdvSIMD Stuff (include aarch64-simd.md) Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c(revision 217394) +++ gcc/config/aarch64/aarch64.c(working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-aarch64.h Hi Felix, This patch causes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64240 when sms-3 is tested with -fPIC. It runs fine when I reverse this patch out. Please could you have a look? Thanks, Tejas.
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 17 November 2014 07:59, Yangfei (Felix) felix.y...@huawei.com wrote: +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.md (doloop_end): New pattern. + This looks like a straight copy of the ARM code, but without the TARGET_CAN_USE_DOLOOP_P definition. If the reason for including this code is for the benefit of module-sched then should the hook be defined to limit the use of this pattern to inner most loops only? Cheers /Marcus
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
Yes, major code is borrowed from ARM port with the expected mode of loop pseudo changed to DImode. The function loop_canon_p called in sms_schedule can make sure that SMS is only performed for innermost loops. But I think it's a good idea to implement the TARGET_CAN_USE_DOLOOP_P hook here. See my updated patch below. How about this one? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,47 @@ [(set_attr type mrs)]) +;; Define the subtract-one-and-jump insns so loop.c +;; knows what to generate. +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) +{ + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + if (GET_MODE (operands[0]) != DImode) + FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, +loc_ref, pc_rtx))); + DONE; +} + else +FAIL; +}) + ;; AdvSIMD Stuff (include aarch64-simd.md) Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c(revision 217394) +++ gcc/config/aarch64/aarch64.c(working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-aarch64.h On 17 November 2014 07:59, Yangfei (Felix) felix.y...@huawei.com wrote: +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.md (doloop_end): New pattern. + This looks like a straight copy of the ARM code, but without the TARGET_CAN_USE_DOLOOP_P definition. If the reason for including this code is for the benefit of module-sched then should the hook be defined to limit the use of this pattern to inner most loops only? Cheers /Marcus aarch64-doloop-v4.diff Description: aarch64-doloop-v4.diff
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 11/18/2014 11:48 AM, Yangfei (Felix) wrote: +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) You'd be better off moving this condition into the expansion predicate (which is currently ). This short-circuits a lot of unnecessary work. See pass_rtl_doloop::gate. r~
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 11/18/2014 11:48 AM, Yangfei (Felix) wrote: +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) You'd be better off moving this condition into the expansion predicate (which is currently ). This short-circuits a lot of unnecessary work. See pass_rtl_doloop::gate. r~ Yeah, that's a good idea. See my updated patch :-) Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,43 @@ [(set_attr type mrs)]) +;; Define the subtract-one-and-jump insns so loop.c +;; knows what to generate. +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + optimize 0 flag_modulo_sched +{ + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + + if (GET_MODE (operands[0]) != DImode) +FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, + gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, +loc_ref, pc_rtx))); + DONE; +}) + ;; AdvSIMD Stuff (include aarch64-simd.md) Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c(revision 217394) +++ gcc/config/aarch64/aarch64.c(working copy) @@ -10224,6 +10224,9 @@ aarch64_use_by_pieces_infrastructure_p (unsigned i #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_CAN_USE_DOLOOP_P +#define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-aarch64.h aarch64-doloop-v5.diff Description: aarch64-doloop-v5.diff
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 11/18/2014 12:28 PM, Yangfei (Felix) wrote: +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. Looks good to me. I'll leave it for aarch64 maintainers for final approval. r~
Re: [PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
On 18 November 2014 11:28, Yangfei (Felix) felix.y...@huawei.com wrote: Yeah, that's a good idea. See my updated patch :-) Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.c (doloop_end): New pattern. + * config/aarch64/aarch64.md (TARGET_CAN_USE_DOLOOP_P): Implement. + OK with me /Marcus
[PING ^ 3][PATCH, AArch64] Add doloop_end pattern for -fmodulo-sched
PING? Is it OK for me to apply this patch? Thanks. On 11/12/2014 11:01 AM, Yangfei (Felix) wrote: +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + + +{ Drop the surrounding the { }. r~ Hello, I updated the patch with the redundant removed. Is it OK for trunk now? I hope this patch can catch up with stage 1 of GCC-5.0. Thanks. Index: gcc/ChangeLog = == --- gcc/ChangeLog (revision 217394) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,7 @@ +2014-11-13 Felix Yang felix.y...@huawei.com + + * config/aarch64/aarch64.md (doloop_end): New pattern. + 2014-11-11 Andrew Pinski apin...@cavium.com Bug target/61997 Index: gcc/config/aarch64/aarch64.md = == --- gcc/config/aarch64/aarch64.md (revision 217394) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -4087,6 +4087,47 @@ [(set_attr type mrs)]) +;; Define the subtract-one-and-jump insns so loop.c ;; knows what to +generate. +(define_expand doloop_end + [(use (match_operand 0 )) ; loop pseudo + (use (match_operand 1 ))] ; label + +{ + /* Currently SMS relies on the do-loop pattern to recognize loops + where (1) the control part consists of all insns defining and/or + using a certain 'count' register and (2) the loop count can be + adjusted by modifying this register prior to the loop. + ??? The possible introduction of a new block to initialize the + new IV can potentially affect branch optimizations. */ + if (optimize 0 flag_modulo_sched) +{ + rtx s0; + rtx bcomp; + rtx loc_ref; + rtx cc_reg; + rtx insn; + rtx cmp; + + if (GET_MODE (operands[0]) != DImode) + FAIL; + + s0 = operands [0]; + insn = emit_insn (gen_adddi3_compare0 (s0, s0, GEN_INT (-1))); + + cmp = XVECEXP (PATTERN (insn), 0, 0); + cc_reg = SET_DEST (cmp); + bcomp = gen_rtx_NE (VOIDmode, cc_reg, const0_rtx); + loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands [1]); + emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, +gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp, + loc_ref, pc_rtx))); + DONE; +} + else +FAIL; +}) + ;; AdvSIMD Stuff (include aarch64-simd.md)