Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
On Thu, Dec 4, 2014 at 9:19 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 02/12/14 22:58, Ramana Radhakrishnan wrote: On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune-fuseable_ops ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. Done. It's similar to the aarch64 adrp+add case. It does make it easier to read, thanks. 2014-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com\ * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM +REG_P (SET_DEST (curr_set)) +REG_P (SET_DEST (prev_set)) +GET_CODE (SET_SRC (prev_set)) == HIGH +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; Done, but if/when we extend the function to handle more fusion cases it will need to be refactored, since we will want to just bail out of this MOVW+MOVT case rather than the whole function. Sure - I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Here's the updated patch. I've tested on arm-none-eabi and made sure that the fusion still happens on the benchmarks I looked at. Ok? Ok - thanks, sorry about the slow response - been on vacation and still catching up. regards Ramana Thanks, Kyrill Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Ping. Thanks, Kyrill On 18/12/14 15:55, Kyrill Tkachov wrote: Ping. Thanks, Kyrill On 11/12/14 15:06, Kyrill Tkachov wrote: Ping. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html Thanks, Kyrill On 04/12/14 09:19, Kyrill Tkachov wrote: On 02/12/14 22:58, Ramana Radhakrishnan wrote: On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune-fuseable_ops ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. Done. It's similar to the aarch64 adrp+add case. It does make it easier to read, thanks. 2014-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com\ * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM +REG_P (SET_DEST (curr_set)) +REG_P (SET_DEST (prev_set)) +GET_CODE (SET_SRC (prev_set)) == HIGH +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; Done, but if/when we extend the function to handle more fusion cases it will need to be refactored, since we will want to just bail out of this MOVW+MOVT case rather than the whole function. I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Here's the updated patch. I've tested on arm-none-eabi and made sure that the fusion still happens on the benchmarks I looked at. Ok? Thanks, Kyrill Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Ping. Thanks, Kyrill On 11/12/14 15:06, Kyrill Tkachov wrote: Ping. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html Thanks, Kyrill On 04/12/14 09:19, Kyrill Tkachov wrote: On 02/12/14 22:58, Ramana Radhakrishnan wrote: On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune-fuseable_ops ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. Done. It's similar to the aarch64 adrp+add case. It does make it easier to read, thanks. 2014-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com\ * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM +REG_P (SET_DEST (curr_set)) +REG_P (SET_DEST (prev_set)) +GET_CODE (SET_SRC (prev_set)) == HIGH +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; Done, but if/when we extend the function to handle more fusion cases it will need to be refactored, since we will want to just bail out of this MOVW+MOVT case rather than the whole function. I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Here's the updated patch. I've tested on arm-none-eabi and made sure that the fusion still happens on the benchmarks I looked at. Ok? Thanks, Kyrill Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Ping. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html Thanks, Kyrill On 04/12/14 09:19, Kyrill Tkachov wrote: On 02/12/14 22:58, Ramana Radhakrishnan wrote: On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune-fuseable_ops ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. Done. It's similar to the aarch64 adrp+add case. It does make it easier to read, thanks. 2014-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com\ * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM +REG_P (SET_DEST (curr_set)) +REG_P (SET_DEST (prev_set)) +GET_CODE (SET_SRC (prev_set)) == HIGH +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; Done, but if/when we extend the function to handle more fusion cases it will need to be refactored, since we will want to just bail out of this MOVW+MOVT case rather than the whole function. I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Here's the updated patch. I've tested on arm-none-eabi and made sure that the fusion still happens on the benchmarks I looked at. Ok? Thanks, Kyrill Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
On 02/12/14 22:58, Ramana Radhakrishnan wrote: On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune-fuseable_ops ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. Done. It's similar to the aarch64 adrp+add case. It does make it easier to read, thanks. 2014-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com\ * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM +REG_P (SET_DEST (curr_set)) +REG_P (SET_DEST (prev_set)) +GET_CODE (SET_SRC (prev_set)) == HIGH +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; Done, but if/when we extend the function to handle more fusion cases it will need to be refactored, since we will want to just bail out of this MOVW+MOVT case rather than the whole function. I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Here's the updated patch. I've tested on arm-none-eabi and made sure that the fusion still happens on the benchmarks I looked at. Ok? Thanks, Kyrill Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 20cfa9f..19925e9 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -289,6 +289,8 @@ struct tune_params bool string_ops_prefer_neon; /* Maximum number of instructions to inline calls to memset. */ int max_insns_inline_memset; + /* Bitfield encoding the fuseable pairs of instructions. */ + unsigned int fuseable_ops; }; extern const struct tune_params *current_tune; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 64494e8..6f847d6 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -251,6 +251,7 @@ static void arm_expand_builtin_va_start (tree, rtx); static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *); static void arm_option_override (void); static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode); +static bool arm_macro_fusion_p (void); static bool arm_cannot_copy_insn_p (rtx_insn *); static int arm_issue_rate (void); static void arm_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED; @@ -291,6 +292,8 @@ static int arm_cortex_m_branch_cost (bool, bool); static bool arm_vectorize_vec_perm_const_ok (machine_mode vmode, const unsigned char *sel); +static bool
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune-fuseable_ops ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM +REG_P (SET_DEST (curr_set)) +REG_P (SET_DEST (prev_set)) +GET_CODE (SET_SRC (prev_set)) == HIGH +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Ping. Thanks, Kyrill On 11/11/14 11:55, Kyrill Tkachov wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
[PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index a37aa80..98e3cf0 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -281,6 +281,8 @@ struct tune_params bool string_ops_prefer_neon; /* Maximum number of instructions to inline calls to memset. */ int max_insns_inline_memset; + /* Bitfield encoding the fuseable pairs of instructions. */ + unsigned int fuseable_ops; }; extern const struct tune_params *current_tune; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 3f2ddd4..40df4c0 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -258,6 +258,7 @@ static tree arm_build_builtin_va_list (void); static void arm_expand_builtin_va_start (tree, rtx); static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *); static void arm_option_override (void); +static bool arm_macro_fusion_p (void); static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode); static bool arm_cannot_copy_insn_p (rtx_insn *); static int arm_issue_rate (void); @@ -296,6 +297,7 @@ static int arm_default_branch_cost (bool, bool); static int arm_cortex_a5_branch_cost (bool, bool); static int arm_cortex_m_branch_cost (bool, bool); +static bool aarch_macro_fusion_pair_p (rtx_insn*, rtx_insn*); static bool arm_vectorize_vec_perm_const_ok (machine_mode vmode, const unsigned char *sel); @@ -404,6 +406,12 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_COMP_TYPE_ATTRIBUTES #define TARGET_COMP_TYPE_ATTRIBUTES arm_comp_type_attributes +#undef TARGET_SCHED_MACRO_FUSION_P +#define TARGET_SCHED_MACRO_FUSION_P arm_macro_fusion_p + +#undef TARGET_SCHED_MACRO_FUSION_PAIR_P +#define TARGET_SCHED_MACRO_FUSION_PAIR_P aarch_macro_fusion_pair_p + #undef TARGET_SET_DEFAULT_TYPE_ATTRIBUTES #define TARGET_SET_DEFAULT_TYPE_ATTRIBUTES arm_set_default_type_attributes @@ -1710,6 +1718,9 @@ const struct cpu_cost_table v7m_extra_costs = } }; +#define ARM_FUSE_NOTHING (0) +#define ARM_FUSE_MOVW_MOVT (1 0) + const struct tune_params arm_slowmul_tune = { arm_slowmul_rtx_costs, @@ -1726,7 +1737,8 @@ const struct tune_params arm_slowmul_tune = false,/* Prefer Neon for 64-bits bitops. */ false, false, /* Prefer 32-bit encodings. */ false, /* Prefer Neon for stringops. */ - 8 /* Maximum insns to inline memset. */ + 8, /* Maximum insns to inline memset. */ + ARM_FUSE_NOTHING/* Fuseable pairs of instructions. */ }; const struct tune_params arm_fastmul_tune = @@ -1745,7 +1757,8 @@ const struct tune_params arm_fastmul_tune = false,/* Prefer Neon for 64-bits bitops. */ false, false, /* Prefer 32-bit encodings. */ false, /* Prefer Neon for stringops. */ - 8 /* Maximum insns to inline memset. */ + 8, /* Maximum insns to inline memset. */ + ARM_FUSE_NOTHING/* Fuseable pairs of instructions. */ }; /* StrongARM has early execution of branches, so a sequence that is worth @@ -1767,7 +1780,8 @@ const struct tune_params arm_strongarm_tune = false,/* Prefer Neon for 64-bits bitops. */ false, false, /* Prefer 32-bit encodings. */ false, /* Prefer Neon for stringops. */ - 8 /* Maximum insns to inline memset. */ + 8, /* Maximum insns to inline memset. */ + ARM_FUSE_NOTHING/* Fuseable pairs of instructions. */ }; const struct tune_params arm_xscale_tune = @@ -1786,7 +1800,8 @@ const struct tune_params arm_xscale_tune = false,/* Prefer Neon for 64-bits bitops. */ false, false, /* Prefer 32-bit encodings. */ false, /* Prefer Neon for stringops. */