Re: [PING][PATCH][AArch64] Implement ALU_BRANCH fusion

2017-06-14 Thread James Greenhalgh
On Thu, Apr 27, 2017 at 05:07:26AM +, Hurugalawadi, Naveen wrote:
> Hi Wilco,
> 
> >> You should only return true if there is a match, not if there is
> >> not a match.
> 
> Done.
> 
> Bootstrapped and Regression tested on AArch64 and X86_64.
> Please review the patch and let us know if its okay?

OK.

Thanks,
James

> 
> Thanks,
> Naveen
>    




Re: [PING 3] [PATCH] [AArch64] Implement ALU_BRANCH fusion

2017-06-14 Thread Hurugalawadi, Naveen
Hi Wilco,

>> That looks good to me now.

Thanks for the review and your okay for the patch.

Please consider this as a personal reminder to review the patch
at following link and let me know if its okay to commit?

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01333.html

Thanks,
Naveen

Re: [PING 2] [PATCH] [AArch64] Implement ALU_BRANCH fusion

2017-05-26 Thread Wilco Dijkstra
Hurugalawadi, Naveen  wrote:
>
> Please consider this as a personal reminder to review the patch
> at following link and let me know your comments on the same.  
>
> https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01333.html

Looks good to me.

Wilco
   

Re: [PING 2] [PATCH] [AArch64] Implement ALU_BRANCH fusion

2017-05-26 Thread Hurugalawadi, Naveen
Hi,  

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.  

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01333.html

Thanks,
Naveen
   

Re: [PING] [PATCH] [AArch64] Implement ALU_BRANCH fusion

2017-05-10 Thread Hurugalawadi, Naveen
Hi,  

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.  

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01333.html

Thanks,
Naveen
   

Re: [PING][PATCH][AArch64] Implement ALU_BRANCH fusion

2017-04-26 Thread Hurugalawadi, Naveen
Hi Wilco,

>> You should only return true if there is a match, not if there is
>> not a match.

Done.

Bootstrapped and Regression tested on AArch64 and X86_64.
Please review the patch and let us know if its okay?

Thanks,
Naveen
   diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def
index f0e6dbc..300cd00 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -34,5 +34,6 @@ AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
+AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1e58e9d..d3b66f2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -792,7 +792,8 @@ static const struct tune_params thunderx2t99_tunings =
   _approx_modes,
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
   16,	/* loop_align.  */
@@ -14031,6 +14032,49 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 return true;
 }
 
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && any_condjump_p (curr))
+{
+  /* We're trying to match:
+	  prev (alu_insn) == (set (r0) plus ((r0) (r1/imm)))
+	  curr (cbz) ==  (set (pc) (if_then_else (eq/ne) (r0)
+			 (const_int 0))
+		 (label_ref ("SYM"))
+		 (pc))  */
+  if (SET_DEST (curr_set) == (pc_rtx)
+	  && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
+	  && REG_P (XEXP (XEXP (SET_SRC (curr_set), 0), 0))
+	  && REG_P (SET_DEST (prev_set))
+	  && REGNO (SET_DEST (prev_set))
+	 == REGNO (XEXP (XEXP (SET_SRC (curr_set), 0), 0)))
+	{
+	  /* Fuse ALU operations followed by conditional branch instruction.  */
+	  switch (get_attr_type (prev))
+	{
+	case TYPE_ALU_IMM:
+	case TYPE_ALU_SREG:
+	case TYPE_ADC_REG:
+	case TYPE_ADC_IMM:
+	case TYPE_ADCS_REG:
+	case TYPE_ADCS_IMM:
+	case TYPE_LOGIC_REG:
+	case TYPE_LOGIC_IMM:
+	case TYPE_CSEL:
+	case TYPE_ADR:
+	case TYPE_MOV_IMM:
+	case TYPE_SHIFT_REG:
+	case TYPE_SHIFT_IMM:
+	case TYPE_BFM:
+	case TYPE_RBIT:
+	case TYPE_REV:
+	case TYPE_EXTEND:
+	  return true;
+
+	default:;
+	}
+	}
+}
+
   return false;
 }
 


Re: [PING][PATCH][AArch64] Implement ALU_BRANCH fusion

2017-04-26 Thread Wilco Dijkstra
Hi Naveen,

This version has the same issue of claiming that all instructions should
be fused except for the cases that can be fused. You should only return
true if there is a match, not if there is not a match.

Cheers,
Wilco
   

Re: [PING][PATCH][AArch64] Implement ALU_BRANCH fusion

2017-04-26 Thread Hurugalawadi, Naveen
Hi Wilco,

>> Same comment for this part, we want to return true if we match:

Thanks for pointing out about the confusion.

>> Note writing these complex conditions using positive logic makes them much
>> more readable - if you have to negate use !(X && Y && Z) rather than
>> !X || !Y || !Z.

Modified the code as required.

Bootstrapped and Regression tested on AArch64 and X86_64.
Please review the patch and let us know if its okay?

Thanks,
Naveen diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def
index f0e6dbc..300cd00 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -34,5 +34,6 @@ AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
+AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2e385c4..1a63ad0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -792,7 +792,8 @@ static const struct tune_params thunderx2t99_tunings =
   _approx_modes,
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
   16,	/* loop_align.  */
@@ -13982,6 +13992,49 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 return true;
 }
 
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && any_condjump_p (curr))
+{
+  /* We're trying to match:
+	  prev (alu_insn) == (set (r0) plus ((r0) (r1/imm)))
+	  curr (cbz) ==  (set (pc) (if_then_else (eq/ne) (r0)
+			 (const_int 0))
+		 (label_ref ("SYM"))
+		 (pc))  */
+  if (! (SET_DEST (curr_set) == (pc_rtx)
+	 && GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
+	 && REG_P (XEXP (XEXP (SET_SRC (curr_set), 0), 0))
+	 && REG_P (SET_DEST (prev_set))
+	 && REGNO (SET_DEST (prev_set))
+		== REGNO (XEXP (XEXP (SET_SRC (curr_set), 0), 0
+	return true;
+
+  /* Fuse ALU operations followed by conditional branch instruction.  */
+  switch (get_attr_type (prev))
+	{
+	case TYPE_ALU_IMM:
+	case TYPE_ALU_SREG:
+	case TYPE_ADC_REG:
+	case TYPE_ADC_IMM:
+	case TYPE_ADCS_REG:
+	case TYPE_ADCS_IMM:
+	case TYPE_LOGIC_REG:
+	case TYPE_LOGIC_IMM:
+	case TYPE_CSEL:
+	case TYPE_ADR:
+	case TYPE_MOV_IMM:
+	case TYPE_SHIFT_REG:
+	case TYPE_SHIFT_IMM:
+	case TYPE_BFM:
+	case TYPE_RBIT:
+	case TYPE_REV:
+	case TYPE_EXTEND:
+	  return true;
+
+	default:;
+	}
+}
+
   return false;
 }
 



Re: [PING][PATCH][AArch64] Implement ALU_BRANCH fusion

2017-04-25 Thread Wilco Dijkstra
Hi Naveen,

> https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01369.html

Same comment for this part, we want to return true if we match:

+  if (SET_DEST (curr_set) != (pc_rtx)
+ || GET_CODE (SET_SRC (curr_set)) != IF_THEN_ELSE
+ || ! REG_P (XEXP (XEXP (SET_SRC (curr_set), 0), 0))
+ || ! REG_P (SET_DEST (prev_set))
+ || REGNO (SET_DEST (prev_set))
+!= REGNO (XEXP (XEXP (SET_SRC (curr_set), 0), 0)))
+   return false;

Note writing these complex conditions using positive logic makes them much
more readable - if you have to negate use !(X && Y && Z) rather than !X || !Y 
|| !Z.

Wilco



    

Re: [PING][PATCH][AArch64] Implement ALU_BRANCH fusion

2017-04-25 Thread Hurugalawadi, Naveen
Hi,  

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.  

https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01369.html

Thanks,
Naveen





Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-26 Thread Hurugalawadi, Naveen
Hi,

Thanks for the review and suggestions.

> I think the patch isn't quite complete yet. You will also need changes in
> generic code. Currently sched_macro_fuse_insns() does:

Modified the sched_macro_fuse_insns() as required.

> Basically the idea is to push the check for CC usage into target macros

Done. Pushed the check into target macros.

The modifications were generic and and quite different from ALU+BRANCH
fusion; a separate patch is posted with the above 2 modifications at:-
https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01368.html

> Also in aarch64.c's macro fusion you need check that the branch
>> instruction uses the same register 

Added to check that same registers are used in ALU and Branch instruction.

Bootstrapped and Regression tested on AArch64.

Please review the patch and let us know if its okay?

Thanks,
Naveendiff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def
index f0e6dbc..300cd00 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -34,5 +34,6 @@ AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
+AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4f769a4..31bc5f4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -792,7 +792,8 @@ static const struct tune_params thunderx2t99_tunings =
   _approx_modes,
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
   16,	/* loop_align.  */
@@ -13981,6 +13982,50 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 return true;
 }
 
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && any_condjump_p (curr))
+{
+  /* We're trying to match:
+	  prev (alu_insn) == (set (r0) plus ((r0) (r1/imm)))
+	  curr (cbz) ==  (set (pc) (if_then_else (eq/ne) (r0)
+			 (const_int 0))
+		 (label_ref ("SYM"))
+		 (pc))  */
+
+  if (SET_DEST (curr_set) != (pc_rtx)
+	  || GET_CODE (SET_SRC (curr_set)) != IF_THEN_ELSE
+	  || ! REG_P (XEXP (XEXP (SET_SRC (curr_set), 0), 0))
+	  || ! REG_P (SET_DEST (prev_set))
+	  || REGNO (SET_DEST (prev_set))
+	 != REGNO (XEXP (XEXP (SET_SRC (curr_set), 0), 0)))
+	return false;
+
+  /* Fuse ALU operations followed by conditional branch instruction.  */
+  switch (get_attr_type (prev))
+	{
+	case TYPE_ALU_IMM:
+	case TYPE_ALU_SREG:
+	case TYPE_ADC_REG:
+	case TYPE_ADC_IMM:
+	case TYPE_ADCS_REG:
+	case TYPE_ADCS_IMM:
+	case TYPE_LOGIC_REG:
+	case TYPE_LOGIC_IMM:
+	case TYPE_CSEL:
+	case TYPE_ADR:
+	case TYPE_MOV_IMM:
+	case TYPE_SHIFT_REG:
+	case TYPE_SHIFT_IMM:
+	case TYPE_BFM:
+	case TYPE_RBIT:
+	case TYPE_REV:
+	case TYPE_EXTEND:
+	  return true;
+
+	default:;
+	}
+}
+
   return false;
 }
 


Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-20 Thread Andrew Pinski
On Wed, Mar 15, 2017 at 8:20 AM, Wilco Dijkstra  wrote:
> Hi,
>
> I think the patch isn't quite complete yet. You will also need changes in
> generic code. Currently sched_macro_fuse_insns() does:
>
>   if (any_condjump_p (insn))
> {
>   unsigned int condreg1, condreg2;
>   rtx cc_reg_1;
>   targetm.fixed_condition_code_regs (, );
>   cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
>   prev = prev_nonnote_nondebug_insn (insn);
>   if (!reg_referenced_p (cc_reg_1, PATTERN (insn))
>   || !prev
>   || !modified_in_p (cc_reg_1, prev))
> return;
> }
>
> Ie. it explicitly looks for a flag-setting ALU instruction whose condition is
> used by a conditional branch, so none of the cases in your patch can match.
>
> Note this code also excludes all CBZ type branches as fusion candidates,
> is that intended too?

It is not intended that way; I did not even notice it after the
previous changes to make the macro_fusion more generic.  I wonder how
this code ever worked for the folks before we started to touch it :).

Naveen,
  Basically the idea is to push the check for CC usage into the target
macros (macro_fusion_pair_p in i386.c and aarch64.c are the only usage
of compare/branch fusion) instead of keeping it in the general code.

Also in aarch64.c's macro fusion you need check that the branch
instruction uses the same register as the other instruction sets like
the other code in this area.

Thanks,
Andrew Pinski



>
> Wilco


Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-15 Thread Wilco Dijkstra
Hi,

I think the patch isn't quite complete yet. You will also need changes in
generic code. Currently sched_macro_fuse_insns() does:

  if (any_condjump_p (insn))
{
  unsigned int condreg1, condreg2;
  rtx cc_reg_1;
  targetm.fixed_condition_code_regs (, );
  cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
  prev = prev_nonnote_nondebug_insn (insn);
  if (!reg_referenced_p (cc_reg_1, PATTERN (insn))
  || !prev
  || !modified_in_p (cc_reg_1, prev))
return;
}

Ie. it explicitly looks for a flag-setting ALU instruction whose condition is
used by a conditional branch, so none of the cases in your patch can match.

Note this code also excludes all CBZ type branches as fusion candidates,
is that intended too?

Wilco


Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-15 Thread Hurugalawadi, Naveen
Hi Kyrill,

>> I suggest you reword the whole comment and not talk about particular CPUs
>> but rather about the kinds of instructions you want to fuse

Modified as per the comments. Had modified the earlier version of patch
which had the vulcan reservation before James comments.

Please find attached the modified patch with comments incorporated.

Thanks,
Naveen


diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def
index f0e6dbc..300cd00 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -34,5 +34,6 @@ AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
+AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a069427..3af0b1a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -792,7 +792,8 @@ static const struct tune_params thunderx2t99_tunings =
   _approx_modes,
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
   16,	/* loop_align.  */
@@ -13981,6 +13982,35 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 return true;
 }
 
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && any_condjump_p (curr))
+{
+  /* Fuse ALU operations followed by conditional branch instruction.  */
+  switch (get_attr_type (prev))
+	{
+	case TYPE_ALU_IMM:
+	case TYPE_ALU_SREG:
+	case TYPE_ADC_REG:
+	case TYPE_ADC_IMM:
+	case TYPE_ADCS_REG:
+	case TYPE_ADCS_IMM:
+	case TYPE_LOGIC_REG:
+	case TYPE_LOGIC_IMM:
+	case TYPE_CSEL:
+	case TYPE_ADR:
+	case TYPE_MOV_IMM:
+	case TYPE_SHIFT_REG:
+	case TYPE_SHIFT_IMM:
+	case TYPE_BFM:
+	case TYPE_RBIT:
+	case TYPE_REV:
+	case TYPE_EXTEND:
+	  return true;
+
+	default:;
+	}
+}
+
   return false;
 }
 


Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-15 Thread Kyrill Tkachov

Hi Naveen,

On 15/03/17 05:32, Hurugalawadi, Naveen wrote:

Hi James,


My reason for asking is that the instruction fusion implemented in LLVM
( lib/Target/AArch64/AArch64MacroFusion.cpp::shouldScheduleAdjacent )

Sorry. There seems to be some confusion in the branch instructions.
The branch should be conditional for ALU_BRANCH fusion.

Please find attached the modified patch that fuses ALU instructions and
conditional branches.

Bootstrapped and Regression tested on aarch64-thunder-linux.
Please review the patch and let us know if its okay?

Thanks,
Naveen
 


+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && any_condjump_p (curr))
+{
+  /* These types correspond to the reservation "vulcan_alu_basic" for
+Broadcom Vulcan: these are ALU operations that produce a single uop
+during instruction decoding.  */

The comment here still looks wrong. There is no vulcan_alu_basic reservation in 
any of the scheduling models.
I suggest you reword the whole comment and not talk about particular CPUs, but 
rather about the kinds of instructions
you want to fuse. If a reader wants to know which CPUs enable this fusion they 
should be looking at the CPU tuning structures
rather than reading the comments here.

Kyrill




Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-14 Thread Hurugalawadi, Naveen
Hi James,

>> My reason for asking is that the instruction fusion implemented in LLVM
>> ( lib/Target/AArch64/AArch64MacroFusion.cpp::shouldScheduleAdjacent )

Sorry. There seems to be some confusion in the branch instructions.
The branch should be conditional for ALU_BRANCH fusion.

Please find attached the modified patch that fuses ALU instructions and
conditional branches.

Bootstrapped and Regression tested on aarch64-thunder-linux.
Please review the patch and let us know if its okay?

Thanks,
Naveen
diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def
index f0e6dbc..300cd00 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -34,5 +34,6 @@ AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
+AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a069427..f76a2ff 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -792,7 +792,8 @@ static const struct tune_params thunderx2t99_tunings =
   _approx_modes,
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
   16,	/* loop_align.  */
@@ -13981,6 +13982,37 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 return true;
 }
 
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && any_condjump_p (curr))
+{
+  /* These types correspond to the reservation "vulcan_alu_basic" for
+	 Broadcom Vulcan: these are ALU operations that produce a single uop
+	 during instruction decoding.  */
+  switch (get_attr_type (prev))
+	{
+	case TYPE_ALU_IMM:
+	case TYPE_ALU_SREG:
+	case TYPE_ADC_REG:
+	case TYPE_ADC_IMM:
+	case TYPE_ADCS_REG:
+	case TYPE_ADCS_IMM:
+	case TYPE_LOGIC_REG:
+	case TYPE_LOGIC_IMM:
+	case TYPE_CSEL:
+	case TYPE_ADR:
+	case TYPE_MOV_IMM:
+	case TYPE_SHIFT_REG:
+	case TYPE_SHIFT_IMM:
+	case TYPE_BFM:
+	case TYPE_RBIT:
+	case TYPE_REV:
+	case TYPE_EXTEND:
+	  return true;
+
+	default:;
+	}
+}
+
   return false;
 }
 


Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-09 Thread James Greenhalgh
On Thu, Mar 09, 2017 at 06:22:33AM +, Hurugalawadi, Naveen wrote:
> Hi James,
> 
> Thanks for the review and your comments.
> 
> >> I'd need more detail on what types of instruction pairs you
> >> are trying to fuse. 
> 
> The documentation mentions it as follows:-
> Single uop ALU instruction may fuse with adjacent branch instruction in the
> same bundle
> 
> >> This comment looks incorrect - there is no vulcan_alu_basic reservation
> 
> Modified as per comment.
> 
> Please let us know if the description is sufficient?

My reason for asking is that the instruction fusion implemented in LLVM
( lib/Target/AArch64/AArch64MacroFusion.cpp::shouldScheduleAdjacent ) is
between ALU instructions and conditional branches, while this patch fuses
ALU instructions and unconditional branches. I'm trying to understand why
there is a discrepancy, and consequently whether this patch is correct.

Your clarification helps, but it would be useful to know which sort of
branches you are actually targeting and to fix the disagreement between
this patch and LLVM.

Thanks,
James



Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-08 Thread Hurugalawadi, Naveen
Hi James,

Thanks for the review and your comments.

>> I'd need more detail on what types of instruction pairs you
>> are trying to fuse. 

The documentation mentions it as follows:-
Single uop ALU instruction may fuse with adjacent branch instruction in the 
same bundle

>> This comment looks incorrect - there is no vulcan_alu_basic reservation

Modified as per comment.

Please let us know if the description is sufficient?

Thanks,
Naveen

Re: [PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-08 Thread James Greenhalgh
On Mon, Mar 06, 2017 at 05:10:10AM +, Hurugalawadi, Naveen wrote:
> Hi,
> 
> Please find attached the patch that implements alu_branch fusion
> for AArch64.
> The patch doesn't change spec but improve other benchmarks.
> 
> Bootstrapped and Regression tested on aarch64-thunder-linux.
> Please review the patch and let us know if its okay for Stage-1?

This description is insufficient for me to review this patch - in
particular I'd need more detail on what types of instruction pairs you
are trying to fuse. From inspection you will be trying to fuse any
ALU operation with an unconditional direct branch. Is that what you
intend?

i.e. you are looking to fuse instruction sequences like:

  add   x0, x1, #5
  b .L3

  csel  x0, x1, x1, gt
  b .L4

Have I understood that right?

> +  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
> +  && any_uncondjump_p (curr))
> +{
> +  /* These types correspond to the reservation "vulcan_alu_basic" for
> +  Broadcom Vulcan: these are ALU operations that produce a single uop
> +  during instruction decoding.  */

This comment looks incorrect - there is no vulcan_alu_basic reservation
in trunk GCC.

Thanks,
James


[PATCH][AArch64] Implement ALU_BRANCH fusion

2017-03-05 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that implements alu_branch fusion
for AArch64.
The patch doesn't change spec but improve other benchmarks.

Bootstrapped and Regression tested on aarch64-thunder-linux.
Please review the patch and let us know if its okay for Stage-1?

Thanks,
Naveen

2017-03-06  Julian Brown  
Naveen H.S  

* config/aarch64/aarch64-fusion-pairs.def: Add ALU_BRANCH entry.
* config/aarch64/aarch64.c (AARCH64_FUSE_ALU_BRANCH): New fusion type.
(thunderx2t99_tunings): Set AARCH64_FUSE_ALU_BRANCH flag.
(aarch_macro_fusion_pair_p): Add support for AARCH64_FUSE_ALU_BRANCH.
diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def
index f0e6dbc..300cd00 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -34,5 +34,6 @@ AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
 AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
+AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa25d43..62f5461 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -792,7 +792,8 @@ static const struct tune_params thunderx2t99_tunings =
   _approx_modes,
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
+  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
   16,	/* function_align.  */
   8,	/* jump_align.  */
   16,	/* loop_align.  */
@@ -14063,6 +14064,37 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 return true;
 }
 
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && any_uncondjump_p (curr))
+{
+  /* These types correspond to the reservation "vulcan_alu_basic" for
+	 Broadcom Vulcan: these are ALU operations that produce a single uop
+	 during instruction decoding.  */
+  switch (get_attr_type (prev))
+	{
+	case TYPE_ALU_IMM:
+	case TYPE_ALU_SREG:
+	case TYPE_ADC_REG:
+	case TYPE_ADC_IMM:
+	case TYPE_ADCS_REG:
+	case TYPE_ADCS_IMM:
+	case TYPE_LOGIC_REG:
+	case TYPE_LOGIC_IMM:
+	case TYPE_CSEL:
+	case TYPE_ADR:
+	case TYPE_MOV_IMM:
+	case TYPE_SHIFT_REG:
+	case TYPE_SHIFT_IMM:
+	case TYPE_BFM:
+	case TYPE_RBIT:
+	case TYPE_REV:
+	case TYPE_EXTEND:
+	  return true;
+
+	default:;
+	}
+}
+
   return false;
 }