Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Marcus Shawcroft
On 10 October 2014 11:53, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,


 Some early revisions of the Cortex-A53 have an erratum (835769) whereby
 it is possible for a 64-bit multiply-accumulate instruction in
 AArch64 state to generate an incorrect result.  The details are quite
 complex and hard to determine statically, since branches in the code
 may exist in some circumstances, but all cases end with a memory
 (load, store, or prefetch) instruction followed immediately by the
 multiply-accumulate operation.

 The safest work-around for this issue is to make the compiler avoid
 emitting multiply-accumulate instructions immediately after memory
 instructions and the simplest way to do this is to insert a NOP. A
 linker patching technique can also be used, by moving potentially
 affected multiply-accumulate instruction into a patch region and
 replacing the original instruction with a branch to the patch.

 This patch achieves the compiler part by using the final prescan pass.
 The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
 so that conditional branches work properly.

 The fix is disabled by default and can be turned on by the
 -mfix-cortex-a53-835769 compiler option.

 I'm attaching a trunk and a 4.9 version of the patch.
 The 4.9 version is different only in that rtx_insn* is replaced by rtx.

 Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
 built various large benchmarks with it.

 Ok?


OK /Marcus


Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Kyrill Tkachov


On 10/10/14 15:18, Richard Earnshaw wrote:

On 10/10/14 11:53, Kyrill Tkachov wrote:

Hi all,


Some early revisions of the Cortex-A53 have an erratum (835769) whereby
it is possible for a 64-bit multiply-accumulate instruction in
AArch64 state to generate an incorrect result.  The details are quite
complex and hard to determine statically, since branches in the code
may exist in some circumstances, but all cases end with a memory
(load, store, or prefetch) instruction followed immediately by the
multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid
emitting multiply-accumulate instructions immediately after memory
instructions and the simplest way to do this is to insert a NOP. A
linker patching technique can also be used, by moving potentially
affected multiply-accumulate instruction into a patch region and
replacing the original instruction with a branch to the patch.

This patch achieves the compiler part by using the final prescan pass.
The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
so that conditional branches work properly.

The fix is disabled by default and can be turned on by the
-mfix-cortex-a53-835769 compiler option.

I'm attaching a trunk and a 4.9 version of the patch.
The 4.9 version is different only in that rtx_insn* is replaced by rtx.

Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
built various large benchmarks with it.

Ok?

Thanks,
Kyrill

2014-10-10  Kyrylo Tkachovkyrylo.tkac...@arm.com
  Ramana Radhakrishnanramana.radhakrish...@arm.com

   * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
   (ADJUST_INSN_LENGTH): Define.
   * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
   * config/aarch64/aarch64.c (is_mem_p): New function.
   (is_memory_op): Likewise.
   (aarch64_prev_real_insn): Likewise.
   (is_madd_op): Likewise.
   (dep_between_memop_and_next): Likewise.
   (aarch64_madd_needs_nop): Likewise.
   (aarch64_final_prescan_insn): Likewise.
   * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
   and -mno-fix-cortex-a53-835769 options.


a53-workaround-new.patch


commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
Author: Kyrill Tkachov kyrylo.tkac...@arm.com
Date:   Wed Oct 8 12:48:34 2014 +

 [AArch64] Add final prescan workaround for Cortex-A53 erratum.

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index b5f53d2..c57a467 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
  
  extern void aarch64_split_combinev16qi (rtx operands[3]);

  extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
+extern bool aarch64_madd_needs_nop (rtx_insn *);
+extern void aarch64_final_prescan_insn (rtx_insn *);
  extern bool
  aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5144c35..76a2480 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
return NULL;
  }
  
+static int

+is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  return MEM_P (*x);
+}
+
+static bool
+is_memory_op (rtx_insn *mem_insn)
+{
+   rtx pattern = PATTERN (mem_insn);
+   return for_each_rtx (pattern, is_mem_p, NULL);
+}
+
+/* Find the first rtx_insn before insn that will generate an assembly
+   instruction.  */
+
+static rtx_insn *
+aarch64_prev_real_insn (rtx_insn *insn)
+{
+  if (!insn)
+return NULL;
+
+  do
+{
+  insn = prev_real_insn (insn);
+}
+  while (insn  recog_memoized (insn)  0);
+
+  return insn;
+}
+
+static bool
+is_madd_op (enum attr_type t1)
+{
+  unsigned int i;
+  /* A number of these may be AArch32 only.  */
+  enum attr_type mlatypes[] = {
+TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
+TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
+TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD
+  };
+
+  for (i = 0; i  sizeof (mlatypes) / sizeof (enum attr_type); i++)
+{
+  if (t1 == mlatypes[i])
+   return true;
+}
+
+  return false;
+}
+
+/* Check if there is a register dependency between a load and the insn
+   for which we hold recog_data.  */
+
+static bool
+dep_between_memop_and_curr (rtx memop)
+{
+  rtx load_reg;
+  int opno;
+
+  if (!memop)
+return false;
+
+  if (!REG_P (SET_DEST (memop)))
+return false;
+
+  load_reg = SET_DEST (memop);
+  for (opno = 0; opno  recog_data.n_operands; opno++)
+{
+  rtx operand = recog_data.operand[opno];
+  if (REG_P (operand)
+   reg_overlap_mentioned_p (load_reg, operand))
+return true;
+
+}
+ 

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Richard Earnshaw
On 10/10/14 15:22, Kyrill Tkachov wrote:
 
 On 10/10/14 15:18, Richard Earnshaw wrote:
 On 10/10/14 11:53, Kyrill Tkachov wrote:
 Hi all,


 Some early revisions of the Cortex-A53 have an erratum (835769) whereby
 it is possible for a 64-bit multiply-accumulate instruction in
 AArch64 state to generate an incorrect result.  The details are quite
 complex and hard to determine statically, since branches in the code
 may exist in some circumstances, but all cases end with a memory
 (load, store, or prefetch) instruction followed immediately by the
 multiply-accumulate operation.

 The safest work-around for this issue is to make the compiler avoid
 emitting multiply-accumulate instructions immediately after memory
 instructions and the simplest way to do this is to insert a NOP. A
 linker patching technique can also be used, by moving potentially
 affected multiply-accumulate instruction into a patch region and
 replacing the original instruction with a branch to the patch.

 This patch achieves the compiler part by using the final prescan pass.
 The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
 so that conditional branches work properly.

 The fix is disabled by default and can be turned on by the
 -mfix-cortex-a53-835769 compiler option.

 I'm attaching a trunk and a 4.9 version of the patch.
 The 4.9 version is different only in that rtx_insn* is replaced by rtx.

 Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
 built various large benchmarks with it.

 Ok?

 Thanks,
 Kyrill

 2014-10-10  Kyrylo Tkachovkyrylo.tkac...@arm.com
   Ramana Radhakrishnanramana.radhakrish...@arm.com

* config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
(ADJUST_INSN_LENGTH): Define.
* config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
* config/aarch64/aarch64.c (is_mem_p): New function.
(is_memory_op): Likewise.
(aarch64_prev_real_insn): Likewise.
(is_madd_op): Likewise.
(dep_between_memop_and_next): Likewise.
(aarch64_madd_needs_nop): Likewise.
(aarch64_final_prescan_insn): Likewise.
* doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
and -mno-fix-cortex-a53-835769 options.


 a53-workaround-new.patch


 commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
 Author: Kyrill Tkachov kyrylo.tkac...@arm.com
 Date:   Wed Oct 8 12:48:34 2014 +

  [AArch64] Add final prescan workaround for Cortex-A53 erratum.

 diff --git a/gcc/config/aarch64/aarch64-protos.h 
 b/gcc/config/aarch64/aarch64-protos.h
 index b5f53d2..c57a467 100644
 --- a/gcc/config/aarch64/aarch64-protos.h
 +++ b/gcc/config/aarch64/aarch64-protos.h
 @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
   
   extern void aarch64_split_combinev16qi (rtx operands[3]);
   extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx 
 sel);
 +extern bool aarch64_madd_needs_nop (rtx_insn *);
 +extern void aarch64_final_prescan_insn (rtx_insn *);
   extern bool
   aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
   void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 5144c35..76a2480 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
 return NULL;
   }
   
 +static int
 +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
 +{
 +  return MEM_P (*x);
 +}
 +
 +static bool
 +is_memory_op (rtx_insn *mem_insn)
 +{
 +   rtx pattern = PATTERN (mem_insn);
 +   return for_each_rtx (pattern, is_mem_p, NULL);
 +}
 +
 +/* Find the first rtx_insn before insn that will generate an assembly
 +   instruction.  */
 +
 +static rtx_insn *
 +aarch64_prev_real_insn (rtx_insn *insn)
 +{
 +  if (!insn)
 +return NULL;
 +
 +  do
 +{
 +  insn = prev_real_insn (insn);
 +}
 +  while (insn  recog_memoized (insn)  0);
 +
 +  return insn;
 +}
 +
 +static bool
 +is_madd_op (enum attr_type t1)
 +{
 +  unsigned int i;
 +  /* A number of these may be AArch32 only.  */
 +  enum attr_type mlatypes[] = {
 +TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
 +TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
 +TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, 
 TYPE_SMLSLD
 +  };
 +
 +  for (i = 0; i  sizeof (mlatypes) / sizeof (enum attr_type); i++)
 +{
 +  if (t1 == mlatypes[i])
 +   return true;
 +}
 +
 +  return false;
 +}
 +
 +/* Check if there is a register dependency between a load and the insn
 +   for which we hold recog_data.  */
 +
 +static bool
 +dep_between_memop_and_curr (rtx memop)
 +{
 +  rtx load_reg;
 +  int opno;
 +
 +  if (!memop)
 +return false;
 +
 +  if (!REG_P (SET_DEST (memop)))
 +return false;
 +
 +  load_reg = SET_DEST (memop);
 +  for (opno = 0; opno  recog_data.n_operands; 

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Andrew Pinski
On Fri, Oct 10, 2014 at 3:53 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 Hi all,


 Some early revisions of the Cortex-A53 have an erratum (835769) whereby
 it is possible for a 64-bit multiply-accumulate instruction in
 AArch64 state to generate an incorrect result.  The details are quite
 complex and hard to determine statically, since branches in the code
 may exist in some circumstances, but all cases end with a memory
 (load, store, or prefetch) instruction followed immediately by the
 multiply-accumulate operation.

 The safest work-around for this issue is to make the compiler avoid
 emitting multiply-accumulate instructions immediately after memory
 instructions and the simplest way to do this is to insert a NOP. A
 linker patching technique can also be used, by moving potentially
 affected multiply-accumulate instruction into a patch region and
 replacing the original instruction with a branch to the patch.

 This patch achieves the compiler part by using the final prescan pass.
 The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
 so that conditional branches work properly.

 The fix is disabled by default and can be turned on by the
 -mfix-cortex-a53-835769 compiler option.

 I'm attaching a trunk and a 4.9 version of the patch.
 The 4.9 version is different only in that rtx_insn* is replaced by rtx.

 Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
 built various large benchmarks with it.

 Ok?

A few comments:
This:
+static int
+is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  return MEM_P (*x);
+}
+
+static bool
+is_memory_op (rtx_insn *mem_insn)
+{
+   rtx pattern = PATTERN (mem_insn);
+   return for_each_rtx (pattern, is_mem_p, NULL);
+}

Should be using the new FOR_EACH_SUBRTX instead.  This should simplify
the code something like:

static bool
is_memory_op (rtx_insn *mem_insn)
{
  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, PATTERN (mem_insn), ALL)
if (MEM_P (*iter))
  return true;
  return false;
}

Also this should be a has_ function rather than a is_ function.

I think you should have ADJUST_INSN_LENGTH as a function call rather
than inline it there or at least wrap it with do { } while (0);  which
I think is documented part of the coding style.

Also you added no comment before each function.  The coding style says
each function needs a comment describing what the function does.

Thanks,
Andrew Pinski


 Thanks,
 Kyrill

 2014-10-10  Kyrylo Tkachovkyrylo.tkac...@arm.com
 Ramana Radhakrishnanramana.radhakrish...@arm.com

  * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
  (ADJUST_INSN_LENGTH): Define.
  * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
  * config/aarch64/aarch64.c (is_mem_p): New function.
  (is_memory_op): Likewise.
  (aarch64_prev_real_insn): Likewise.
  (is_madd_op): Likewise.
  (dep_between_memop_and_next): Likewise.
  (aarch64_madd_needs_nop): Likewise.
  (aarch64_final_prescan_insn): Likewise.
  * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
  and -mno-fix-cortex-a53-835769 options.