Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2015-01-12 Thread Ramana Radhakrishnan
On Thu, Dec 4, 2014 at 9:19 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 On 02/12/14 22:58, Ramana Radhakrishnan wrote:

 On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com
 wrote:

 Hi all,

 This is the arm implementation of the macro fusion hook.
 It tries to fuse movw+movt operations together. It also tries to take
 lo_sum
 RTXs into account since those generate movt instructions as well.

 Bootstrapped and tested on arm-none-linux-gnueabihf.

 Ok for trunk?



   if (current_tune-fuseable_ops  ARM_FUSE_MOVW_MOVT)
 +{
 +  /* We are trying to fuse
 + movw imm / movt imm
 + instructions as a group that gets scheduled together.  */
 +

 A comment here about the insn structure would be useful.


 Done. It's similar to the aarch64 adrp+add case. It does make it easier to
 read, thanks.

 2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

   * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
   * config/arm/arm.c (arm_macro_fusion_p): New function.
   (arm_macro_fusion_pair_p): Likewise.
   (TARGET_SCHED_MACRO_FUSION_P): Define.
   (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
   (ARM_FUSE_NOTHING): Likewise.
   (ARM_FUSE_MOVW_MOVT): Likewise.
   (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
   arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
   arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
   arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
   arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
   arm_cortex_a5_tune): Specify fuseable_ops value.


 +  set_dest = SET_DEST (curr_set);
 +  if (GET_CODE (set_dest) == ZERO_EXTRACT)
 +{
 +  if (CONST_INT_P (SET_SRC (curr_set))
 +   CONST_INT_P (SET_SRC (prev_set))
 +   REG_P (XEXP (set_dest, 0))
 +   REG_P (SET_DEST (prev_set))
 +   REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
 +return true;
 +}
 +  else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
 +REG_P (SET_DEST (curr_set))
 +REG_P (SET_DEST (prev_set))
 +GET_CODE (SET_SRC (prev_set)) == HIGH
 +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST
 (prev_set)))
 +{
 +  return true;
 +}

 Can we add a fast path exit to be

 if (GET_MODE (set_dest) != SImode)
return false;


 Done, but if/when we extend the function to handle more fusion cases it will
 need to be
 refactored, since we will want to just bail out of this MOVW+MOVT case
 rather than the whole function.

Sure -



 I did think whether we wanted to use reg_overlap_mentioned_p as that
 may simplify the logic a bit but that's  overkill here as we still
 want to restrict it to the cases above.

 Otherwise OK.


 Here's the updated patch. I've tested on arm-none-eabi and made sure that
 the
 fusion still happens on the benchmarks I looked at.
 Ok?

Ok - thanks, sorry about the slow response - been on vacation and
still catching up.

regards
Ramana


 Thanks,
 Kyrill



 Ramana




 +}
 +  return false;
 Thanks,
 Kyrill

 2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
  * config/arm/arm.c (arm_macro_fusion_p): New function.
  (arm_macro_fusion_pair_p): Likewise.
  (TARGET_SCHED_MACRO_FUSION_P): Define.
  (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
  (ARM_FUSE_NOTHING): Likewise.
  (ARM_FUSE_MOVW_MOVT): Likewise.
  (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
  arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
  arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
  arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
  arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
  arm_cortex_a5_tune): Specify fuseable_ops value.


Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2015-01-09 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill

On 18/12/14 15:55, Kyrill Tkachov wrote:

Ping.

Thanks,
Kyrill

On 11/12/14 15:06, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html

Thanks,
Kyrill

On 04/12/14 09:19, Kyrill Tkachov wrote:

On 02/12/14 22:58, Ramana Radhakrishnan wrote:

On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take lo_sum
RTXs into account since those generate movt instructions as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?
 if (current_tune-fuseable_ops  ARM_FUSE_MOVW_MOVT)
+{
+  /* We are trying to fuse
+ movw imm / movt imm
+ instructions as a group that gets scheduled together.  */
+

A comment here about the insn structure would be useful.

Done. It's similar to the aarch64 adrp+add case. It does make it easier
to read, thanks.

2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

  * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
  * config/arm/arm.c (arm_macro_fusion_p): New function.
  (arm_macro_fusion_pair_p): Likewise.
  (TARGET_SCHED_MACRO_FUSION_P): Define.
  (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
  (ARM_FUSE_NOTHING): Likewise.
  (ARM_FUSE_MOVW_MOVT): Likewise.
  (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
  arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
  arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
  arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
  arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
  arm_cortex_a5_tune): Specify fuseable_ops value.


+  set_dest = SET_DEST (curr_set);
+  if (GET_CODE (set_dest) == ZERO_EXTRACT)
+{
+  if (CONST_INT_P (SET_SRC (curr_set))
+   CONST_INT_P (SET_SRC (prev_set))
+   REG_P (XEXP (set_dest, 0))
+   REG_P (SET_DEST (prev_set))
+   REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
+return true;
+}
+  else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+REG_P (SET_DEST (curr_set))
+REG_P (SET_DEST (prev_set))
+GET_CODE (SET_SRC (prev_set)) == HIGH
+REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
+{
+  return true;
+}

Can we add a fast path exit to be

if (GET_MODE (set_dest) != SImode)
  return false;

Done, but if/when we extend the function to handle more fusion cases it
will need to be
refactored, since we will want to just bail out of this MOVW+MOVT case
rather than the whole function.


I did think whether we wanted to use reg_overlap_mentioned_p as that
may simplify the logic a bit but that's  overkill here as we still
want to restrict it to the cases above.

Otherwise OK.

Here's the updated patch. I've tested on arm-none-eabi and made sure
that the
fusion still happens on the benchmarks I looked at.
Ok?

Thanks,
Kyrill


Ramana





+}
+  return false;
Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
* config/arm/arm.c (arm_macro_fusion_p): New function.
(arm_macro_fusion_pair_p): Likewise.
(TARGET_SCHED_MACRO_FUSION_P): Define.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
(ARM_FUSE_NOTHING): Likewise.
(ARM_FUSE_MOVW_MOVT): Likewise.
(arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
arm_cortex_a5_tune): Specify fuseable_ops value.










Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-12-18 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill

On 11/12/14 15:06, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html

Thanks,
Kyrill

On 04/12/14 09:19, Kyrill Tkachov wrote:

On 02/12/14 22:58, Ramana Radhakrishnan wrote:

On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take lo_sum
RTXs into account since those generate movt instructions as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?
if (current_tune-fuseable_ops  ARM_FUSE_MOVW_MOVT)
+{
+  /* We are trying to fuse
+ movw imm / movt imm
+ instructions as a group that gets scheduled together.  */
+

A comment here about the insn structure would be useful.

Done. It's similar to the aarch64 adrp+add case. It does make it easier
to read, thanks.

2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

 * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
 * config/arm/arm.c (arm_macro_fusion_p): New function.
 (arm_macro_fusion_pair_p): Likewise.
 (TARGET_SCHED_MACRO_FUSION_P): Define.
 (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
 (ARM_FUSE_NOTHING): Likewise.
 (ARM_FUSE_MOVW_MOVT): Likewise.
 (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
 arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
 arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
 arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
 arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
 arm_cortex_a5_tune): Specify fuseable_ops value.


+  set_dest = SET_DEST (curr_set);
+  if (GET_CODE (set_dest) == ZERO_EXTRACT)
+{
+  if (CONST_INT_P (SET_SRC (curr_set))
+   CONST_INT_P (SET_SRC (prev_set))
+   REG_P (XEXP (set_dest, 0))
+   REG_P (SET_DEST (prev_set))
+   REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
+return true;
+}
+  else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+REG_P (SET_DEST (curr_set))
+REG_P (SET_DEST (prev_set))
+GET_CODE (SET_SRC (prev_set)) == HIGH
+REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
+{
+  return true;
+}

Can we add a fast path exit to be

if (GET_MODE (set_dest) != SImode)
 return false;

Done, but if/when we extend the function to handle more fusion cases it
will need to be
refactored, since we will want to just bail out of this MOVW+MOVT case
rather than the whole function.


I did think whether we wanted to use reg_overlap_mentioned_p as that
may simplify the logic a bit but that's  overkill here as we still
want to restrict it to the cases above.

Otherwise OK.

Here's the updated patch. I've tested on arm-none-eabi and made sure
that the
fusion still happens on the benchmarks I looked at.
Ok?

Thanks,
Kyrill


Ramana





+}
+  return false;
Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

   * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
   * config/arm/arm.c (arm_macro_fusion_p): New function.
   (arm_macro_fusion_pair_p): Likewise.
   (TARGET_SCHED_MACRO_FUSION_P): Define.
   (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
   (ARM_FUSE_NOTHING): Likewise.
   (ARM_FUSE_MOVW_MOVT): Likewise.
   (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
   arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
   arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
   arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
   arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
   arm_cortex_a5_tune): Specify fuseable_ops value.








Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-12-11 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html

Thanks,
Kyrill

On 04/12/14 09:19, Kyrill Tkachov wrote:

On 02/12/14 22:58, Ramana Radhakrishnan wrote:

On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take lo_sum
RTXs into account since those generate movt instructions as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?



   if (current_tune-fuseable_ops  ARM_FUSE_MOVW_MOVT)
+{
+  /* We are trying to fuse
+ movw imm / movt imm
+ instructions as a group that gets scheduled together.  */
+

A comment here about the insn structure would be useful.

Done. It's similar to the aarch64 adrp+add case. It does make it easier
to read, thanks.

2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

* config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
* config/arm/arm.c (arm_macro_fusion_p): New function.
(arm_macro_fusion_pair_p): Likewise.
(TARGET_SCHED_MACRO_FUSION_P): Define.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
(ARM_FUSE_NOTHING): Likewise.
(ARM_FUSE_MOVW_MOVT): Likewise.
(arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
arm_cortex_a5_tune): Specify fuseable_ops value.


+  set_dest = SET_DEST (curr_set);
+  if (GET_CODE (set_dest) == ZERO_EXTRACT)
+{
+  if (CONST_INT_P (SET_SRC (curr_set))
+   CONST_INT_P (SET_SRC (prev_set))
+   REG_P (XEXP (set_dest, 0))
+   REG_P (SET_DEST (prev_set))
+   REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
+return true;
+}
+  else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+REG_P (SET_DEST (curr_set))
+REG_P (SET_DEST (prev_set))
+GET_CODE (SET_SRC (prev_set)) == HIGH
+REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
+{
+  return true;
+}

Can we add a fast path exit to be

if (GET_MODE (set_dest) != SImode)
return false;

Done, but if/when we extend the function to handle more fusion cases it
will need to be
refactored, since we will want to just bail out of this MOVW+MOVT case
rather than the whole function.


I did think whether we wanted to use reg_overlap_mentioned_p as that
may simplify the logic a bit but that's  overkill here as we still
want to restrict it to the cases above.

Otherwise OK.

Here's the updated patch. I've tested on arm-none-eabi and made sure
that the
fusion still happens on the benchmarks I looked at.
Ok?

Thanks,
Kyrill


Ramana





+}
+  return false;
Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
  * config/arm/arm.c (arm_macro_fusion_p): New function.
  (arm_macro_fusion_pair_p): Likewise.
  (TARGET_SCHED_MACRO_FUSION_P): Define.
  (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
  (ARM_FUSE_NOTHING): Likewise.
  (ARM_FUSE_MOVW_MOVT): Likewise.
  (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
  arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
  arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
  arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
  arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
  arm_cortex_a5_tune): Specify fuseable_ops value.





Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-12-04 Thread Kyrill Tkachov


On 02/12/14 22:58, Ramana Radhakrishnan wrote:

On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take lo_sum
RTXs into account since those generate movt instructions as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?




  if (current_tune-fuseable_ops  ARM_FUSE_MOVW_MOVT)
+{
+  /* We are trying to fuse
+ movw imm / movt imm
+ instructions as a group that gets scheduled together.  */
+

A comment here about the insn structure would be useful.


Done. It's similar to the aarch64 adrp+add case. It does make it easier 
to read, thanks.


2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

  * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
  * config/arm/arm.c (arm_macro_fusion_p): New function.
  (arm_macro_fusion_pair_p): Likewise.
  (TARGET_SCHED_MACRO_FUSION_P): Define.
  (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
  (ARM_FUSE_NOTHING): Likewise.
  (ARM_FUSE_MOVW_MOVT): Likewise.
  (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
  arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
  arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
  arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
  arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
  arm_cortex_a5_tune): Specify fuseable_ops value.




+  set_dest = SET_DEST (curr_set);
+  if (GET_CODE (set_dest) == ZERO_EXTRACT)
+{
+  if (CONST_INT_P (SET_SRC (curr_set))
+   CONST_INT_P (SET_SRC (prev_set))
+   REG_P (XEXP (set_dest, 0))
+   REG_P (SET_DEST (prev_set))
+   REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
+return true;
+}
+  else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+REG_P (SET_DEST (curr_set))
+REG_P (SET_DEST (prev_set))
+GET_CODE (SET_SRC (prev_set)) == HIGH
+REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
+{
+  return true;
+}

Can we add a fast path exit to be

if (GET_MODE (set_dest) != SImode)
   return false;


Done, but if/when we extend the function to handle more fusion cases it 
will need to be
refactored, since we will want to just bail out of this MOVW+MOVT case 
rather than the whole function.




I did think whether we wanted to use reg_overlap_mentioned_p as that
may simplify the logic a bit but that's  overkill here as we still
want to restrict it to the cases above.

Otherwise OK.


Here's the updated patch. I've tested on arm-none-eabi and made sure 
that the

fusion still happens on the benchmarks I looked at.
Ok?

Thanks,
Kyrill



Ramana





+}
+  return false;
Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
 * config/arm/arm.c (arm_macro_fusion_p): New function.
 (arm_macro_fusion_pair_p): Likewise.
 (TARGET_SCHED_MACRO_FUSION_P): Define.
 (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
 (ARM_FUSE_NOTHING): Likewise.
 (ARM_FUSE_MOVW_MOVT): Likewise.
 (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
 arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
 arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
 arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
 arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
 arm_cortex_a5_tune): Specify fuseable_ops value.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 20cfa9f..19925e9 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -289,6 +289,8 @@ struct tune_params
   bool string_ops_prefer_neon;
   /* Maximum number of instructions to inline calls to memset.  */
   int max_insns_inline_memset;
+  /* Bitfield encoding the fuseable pairs of instructions.  */
+  unsigned int fuseable_ops;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 64494e8..6f847d6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -251,6 +251,7 @@ static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
+static bool arm_macro_fusion_p (void);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
 static void arm_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
@@ -291,6 +292,8 @@ static int arm_cortex_m_branch_cost (bool, bool);
 static bool arm_vectorize_vec_perm_const_ok (machine_mode vmode,
 	 const unsigned char *sel);
 
+static bool 

Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-12-02 Thread Ramana Radhakrishnan
On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,

 This is the arm implementation of the macro fusion hook.
 It tries to fuse movw+movt operations together. It also tries to take lo_sum
 RTXs into account since those generate movt instructions as well.

 Bootstrapped and tested on arm-none-linux-gnueabihf.

 Ok for trunk?



  if (current_tune-fuseable_ops  ARM_FUSE_MOVW_MOVT)
+{
+  /* We are trying to fuse
+ movw imm / movt imm
+ instructions as a group that gets scheduled together.  */
+

A comment here about the insn structure would be useful.

+  set_dest = SET_DEST (curr_set);
+  if (GET_CODE (set_dest) == ZERO_EXTRACT)
+{
+  if (CONST_INT_P (SET_SRC (curr_set))
+   CONST_INT_P (SET_SRC (prev_set))
+   REG_P (XEXP (set_dest, 0))
+   REG_P (SET_DEST (prev_set))
+   REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
+return true;
+}
+  else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+REG_P (SET_DEST (curr_set))
+REG_P (SET_DEST (prev_set))
+GET_CODE (SET_SRC (prev_set)) == HIGH
+REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
+{
+  return true;
+}

Can we add a fast path exit to be

if (GET_MODE (set_dest) != SImode)
  return false;

I did think whether we wanted to use reg_overlap_mentioned_p as that
may simplify the logic a bit but that's  overkill here as we still
want to restrict it to the cases above.

Otherwise OK.

Ramana




+}
+  return false;


 Thanks,
 Kyrill

 2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
 * config/arm/arm.c (arm_macro_fusion_p): New function.
 (arm_macro_fusion_pair_p): Likewise.
 (TARGET_SCHED_MACRO_FUSION_P): Define.
 (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
 (ARM_FUSE_NOTHING): Likewise.
 (ARM_FUSE_MOVW_MOVT): Likewise.
 (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
 arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
 arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
 arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
 arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
 arm_cortex_a5_tune): Specify fuseable_ops value.


Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-11-27 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill

On 11/11/14 11:55, Kyrill Tkachov wrote:

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take
lo_sum RTXs into account since those generate movt instructions as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
  * config/arm/arm.c (arm_macro_fusion_p): New function.
  (arm_macro_fusion_pair_p): Likewise.
  (TARGET_SCHED_MACRO_FUSION_P): Define.
  (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
  (ARM_FUSE_NOTHING): Likewise.
  (ARM_FUSE_MOVW_MOVT): Likewise.
  (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
  arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
  arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
  arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
  arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
  arm_cortex_a5_tune): Specify fuseable_ops value.





[PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-11-11 Thread Kyrill Tkachov

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take 
lo_sum RTXs into account since those generate movt instructions as well.


Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
* config/arm/arm.c (arm_macro_fusion_p): New function.
(arm_macro_fusion_pair_p): Likewise.
(TARGET_SCHED_MACRO_FUSION_P): Define.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
(ARM_FUSE_NOTHING): Likewise.
(ARM_FUSE_MOVW_MOVT): Likewise.
(arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
arm_cortex_a5_tune): Specify fuseable_ops value.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index a37aa80..98e3cf0 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -281,6 +281,8 @@ struct tune_params
   bool string_ops_prefer_neon;
   /* Maximum number of instructions to inline calls to memset.  */
   int max_insns_inline_memset;
+  /* Bitfield encoding the fuseable pairs of instructions.  */
+  unsigned int fuseable_ops;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3f2ddd4..40df4c0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -258,6 +258,7 @@ static tree arm_build_builtin_va_list (void);
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static bool arm_macro_fusion_p (void);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
@@ -296,6 +297,7 @@ static int arm_default_branch_cost (bool, bool);
 static int arm_cortex_a5_branch_cost (bool, bool);
 static int arm_cortex_m_branch_cost (bool, bool);
 
+static bool aarch_macro_fusion_pair_p (rtx_insn*, rtx_insn*);
 static bool arm_vectorize_vec_perm_const_ok (machine_mode vmode,
 	 const unsigned char *sel);
 
@@ -404,6 +406,12 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef  TARGET_COMP_TYPE_ATTRIBUTES
 #define TARGET_COMP_TYPE_ATTRIBUTES arm_comp_type_attributes
 
+#undef TARGET_SCHED_MACRO_FUSION_P
+#define TARGET_SCHED_MACRO_FUSION_P arm_macro_fusion_p
+
+#undef TARGET_SCHED_MACRO_FUSION_PAIR_P
+#define TARGET_SCHED_MACRO_FUSION_PAIR_P aarch_macro_fusion_pair_p
+
 #undef  TARGET_SET_DEFAULT_TYPE_ATTRIBUTES
 #define TARGET_SET_DEFAULT_TYPE_ATTRIBUTES arm_set_default_type_attributes
 
@@ -1710,6 +1718,9 @@ const struct cpu_cost_table v7m_extra_costs =
   }
 };
 
+#define ARM_FUSE_NOTHING	(0)
+#define ARM_FUSE_MOVW_MOVT	(1  0)
+
 const struct tune_params arm_slowmul_tune =
 {
   arm_slowmul_rtx_costs,
@@ -1726,7 +1737,8 @@ const struct tune_params arm_slowmul_tune =
   false,/* Prefer Neon for 64-bits bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
   false,	/* Prefer Neon for stringops.  */
-  8		/* Maximum insns to inline memset.  */
+  8,		/* Maximum insns to inline memset.  */
+  ARM_FUSE_NOTHING/* Fuseable pairs of instructions.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -1745,7 +1757,8 @@ const struct tune_params arm_fastmul_tune =
   false,/* Prefer Neon for 64-bits bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
   false,	/* Prefer Neon for stringops.  */
-  8		/* Maximum insns to inline memset.  */
+  8,		/* Maximum insns to inline memset.  */
+  ARM_FUSE_NOTHING/* Fuseable pairs of instructions.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -1767,7 +1780,8 @@ const struct tune_params arm_strongarm_tune =
   false,/* Prefer Neon for 64-bits bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
   false,	/* Prefer Neon for stringops.  */
-  8		/* Maximum insns to inline memset.  */
+  8,		/* Maximum insns to inline memset.  */
+  ARM_FUSE_NOTHING/* Fuseable pairs of instructions.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -1786,7 +1800,8 @@ const struct tune_params arm_xscale_tune =
   false,/* Prefer Neon for 64-bits bitops.  */
   false, false, /* Prefer 32-bit encodings.  */
   false,	/* Prefer Neon for stringops.  */