Re: [backport gcc-10][AArch64] ACLE bf16 convert

2020-12-11 Thread Dennis Zhang via Gcc-patches
> 
> From: Kyrylo Tkachov 
> Sent: Friday, December 11, 2020 11:23 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Marcus Shawcroft; Richard Sandiford
> Subject: RE: [backport gcc-10][AArch64] ACLE bf16 convert
> 
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 10 December 2020 14:27
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; Richard Earnshaw ;
> > Marcus Shawcroft ; Kyrylo Tkachov
> > ; Richard Sandiford
> > 
> > Subject: [backport gcc-10][AArch64] ACLE bf16 convert
> >
> > Hi all,
> >
> > This patch backports the commit
> > f7d6961126a7f06c8089d8a58bd21be43bc16806.
> > The original is approved at https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > November/557859.html
> > The only change is to remove FPCR-reading flags for builtin definition since
> > it's not supported in gcc-10.
> > Regtested and bootstrapped for aarch64-none-linux-gnu.
> >
> > Is it OK to backport?
> 
> Ok.
> Thanks,
> Kyrill

Thanks Kyrill!
The patch is committed as 702e45ee471422dee86d32fc84f617d341d33175.

Bests
Dennis


Re: [backport gcc-10][AArch64] ACLE bf16 get

2020-12-11 Thread Dennis Zhang via Gcc-patches
Hi Kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Friday, December 11, 2020 11:58 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Marcus Shawcroft; Richard Sandiford
> Subject: RE: [backport gcc-10][AArch64] ACLE bf16 get
> 
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 10 December 2020 14:35
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; Richard Earnshaw ;
> > Marcus Shawcroft ; Kyrylo Tkachov
> > ; Richard Sandiford
> > 
> > Subject: [backport gcc-10][AArch64] ACLE bf16 get
> >
> > Hi all,
> >
> > This patch backports the commit
> > 3553c658533e430b232997bdfd97faf6606fb102.
> > The original is approved at https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > November/557871.html
> > There is a change to remove FPCR-reading flag for builtin declaration since
> > it's not supported in gcc-10.
> >
> > Another change is to remove a test (bf16_get-be.c) that fails compiling on
> > aarch64-none-linux-gnu in the original patch.
> > This is reported at https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > November/558195.html
> > The failure happens for several bf16 big-endian tests so the bug would be
> > fixed in a separate patch.
> > And the test should be added after the bug is fixed.
> >
> > Is it OK to backport?
> 
> But do the tests added here work for big-endian?
> Ok if they do.
> Thanks,
> Kyrill

Thanks for asking. The added test (bf16_get.c) works for both 
aarch64-none-linux-gnu and aarch64_be-none-linux-gnu.
The patch is commited as c25f7eac6555d67523f0520c7e93bbc398d0da84.

Cheers
Dennis


Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-12-10 Thread Dennis Zhang via Gcc-patches
Hi Christophe,

> From: Christophe Lyon 
> Sent: Monday, November 9, 2020 1:38 PM
> To: Dennis Zhang
> Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; nd; Ramana 
> Radhakrishnan
> Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
>
> Hi,
>
> I have just noticed that the new test has:
> /* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
> /* { dg-additional-options "-O3" } */
> That is, the first line has a typo (space between dg and -additional-options),
> so the test is effectively compiled with -O3, and without
> -funsafe-math-optimizations
>
> Since I can see it passing, it looks like -funsafe-math-optimizations
> is not needed, can you clarify?
>
> Thanks

Thank you for the report. The '-funsafe-math-optimizations' option is not 
needed.
The typo is fixed by commit b46dd03fe94e2428cbcdbfc4d081d89ed604803a.

Bests
Dennis


[committed][Patch]arm: Fix typo in testcase mve-vsub_1.c

2020-12-10 Thread Dennis Zhang via Gcc-patches
This patch fixes a typo reported at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558478.html

gcc/testsuite/
* gcc.target/arm/simd/mve-vsub_1.c: Fix typo.
Remove needless dg-additional-options.

Cheers,
Dennisdiff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
index cb3ef3a14e0..842e5c6a30b 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
 /* { dg-additional-options "-O3" } */
 
 #include 


[backport gcc-10][AArch64] ACLE bf16 get

2020-12-10 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch backports the commit 3553c658533e430b232997bdfd97faf6606fb102.
The original is approved at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557871.html
There is a change to remove FPCR-reading flag for builtin declaration since 
it's not supported in gcc-10.

Another change is to remove a test (bf16_get-be.c) that fails compiling on 
aarch64-none-linux-gnu in the original patch.
This is reported at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558195.html
The failure happens for several bf16 big-endian tests so the bug would be fixed 
in a separate patch.
And the test should be added after the bug is fixed.

Is it OK to backport?

Cheers
Dennisdiff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index ba2bda26dcdd4947dc724851433451433d378724..05726db1f6137f9ab29fcdd51f804199e24bbfcf 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -718,6 +718,10 @@
   VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, v4sf)
   VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, v4sf)
 
+  /* Implemented by aarch64_vget_lo/hi_halfv8bf.  */
+  VAR1 (UNOP, vget_lo_half, 0, v8bf)
+  VAR1 (UNOP, vget_hi_half, 0, v8bf)
+
   /* Implemented by aarch64_simd_mmlav16qi.  */
   VAR1 (TERNOP, simd_smmla, 0, v16qi)
   VAR1 (TERNOPU, simd_ummla, 0, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6ff5246f84e919402c687687a84beb8..43ac3cd40fe8379567b7a60772f360d37818e8e9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,27 @@
   [(set_attr "type" "neon_dot")]
 )
 
+;; vget_low/high_bf16
+(define_expand "aarch64_vget_lo_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")]
+  "TARGET_BF16_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, false);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p));
+  DONE;
+})
+
+(define_expand "aarch64_vget_hi_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")]
+  "TARGET_BF16_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, true);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p));
+  DONE;
+})
+
 ;; bfmmla
 (define_insn "aarch64_bfmmlaqv4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 95bfa5ebba21b739ee3c84e3971337646f8881d4..0fd78a6fd076f788d2618c492a026246e61e438c 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35680,6 +35680,20 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vget_low_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vget_lo_halfv8bf (__a);
+}
+
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vget_high_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vget_hi_halfv8bf (__a);
+}
+
 __extension__ extern __inline bfloat16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vcvt_bf16_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c
new file mode 100644
index ..2193753ffbb6246aa16eb5033559b21266a556a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c
@@ -0,0 +1,27 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+
+#include 
+
+/*
+**test_vget_low_bf16:
+** ret
+*/
+bfloat16x4_t test_vget_low_bf16 (bfloat16x8_t a)
+{
+  return vget_low_bf16 (a);
+}
+
+/*
+**test_vget_high_bf16:
+** dup	d0, v0.d\[1\]
+** ret
+*/
+bfloat16x4_t test_vget_high_bf16 (bfloat16x8_t a)
+{
+  return vget_high_bf16 (a);
+}


[backport gcc-10][AArch64] ACLE bf16 convert

2020-12-10 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch backports the commit f7d6961126a7f06c8089d8a58bd21be43bc16806.
The original is approved at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557859.html
The only change is to remove FPCR-reading flags for builtin definition since 
it's not supported in gcc-10.
Regtested and bootstrapped for aarch64-none-linux-gnu.

Is it OK to backport?

Cheers
Dennisdiff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index ba2bda26dcdd4947dc724851433451433d378724..7192f3954d311d89064707cfcb735efad4377c12 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -728,3 +728,8 @@
   VAR1 (UNOP, bfcvtn_q, 0, v8bf)
   VAR1 (BINOP, bfcvtn2, 0, v8bf)
   VAR1 (UNOP, bfcvt, 0, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, v8bf)
+  VAR1 (UNOP, bfcvt, 0, sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6ff5246f84e919402c687687a84beb8..2e8aa668b107f039e4958b6998da180a6d11b881 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7238,3 +7238,31 @@
   "bfcvt\\t%h0, %s1"
   [(set_attr "type" "f_cvt")]
 )
+
+;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes.
+(define_insn "aarch64_vbfcvt"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN))]
+  "TARGET_BF16_SIMD"
+  "shll\\t%0.4s, %1.4h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_vbfcvt_highv8bf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN2))]
+  "TARGET_BF16_SIMD"
+  "shll2\\t%0.4s, %1.8h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+	(unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+		UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_imm")]
+)
diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
index 984875dcc014300c489209c11abf41b1c47b7fbe..881615498d3d52662d7ebb3ab1e8d52d5a40cab8 100644
--- a/gcc/config/aarch64/arm_bf16.h
+++ b/gcc/config/aarch64/arm_bf16.h
@@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a)
   return __builtin_aarch64_bfcvtbf (__a);
 }
 
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtah_f32_bf16 (bfloat16_t __a)
+{
+  return __builtin_aarch64_bfcvtsf (__a);
+}
+
 #pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 95bfa5ebba21b739ee3c84e3971337646f8881d4..69cccd3278642814f3961c5bf52be5639f5ef3f3 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35680,6 +35680,27 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_f32_bf16 (bfloat16x4_t __a)
+{
+  return __builtin_aarch64_vbfcvtv4bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvtv8bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvt_highv8bf (__a);
+}
+
 __extension__ extern __inline bfloat16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vcvt_bf16_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
index bbea630b1820d578bdf1619834f29b919f5c3f32..47af7c494d9b9d1f4b63e802efc293348a40e270 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
@@ -46,3 +46,43 @@ bfloat16_t test_bfcvt (float32_t a)
 {
   return vcvth_bf16_f32 (a);
 }
+
+/*
+**test_vcvt_f32_bf16:
+** shll	v0.4s, v0.4h, #16
+** ret
+*/
+float32x4_t test_vcvt_f32_bf16 (bfloat16x4_t a)
+{
+  return vcvt_f32_bf16 (a);
+}
+
+/*
+**test_vcvtq_low_f32_bf16:
+** shll	v0.4s, v0.4h, #16
+** ret
+*/
+float32x4_t test_vcvtq_low_f32_bf16 (bfloat16x8_t a)
+{
+  return vcvtq_low_f32_bf16 (a);
+}
+
+/*
+**test_vcvtq_high_f32_bf16:
+** shll2	v0.4s, v0.8h, #16
+** ret
+*/
+float32x4_t test_vcvtq_high_f32_bf16 (bfloat16x8_t a)
+{
+  return vcvtq_high_f32_bf16 (a);
+}
+
+/*
+**test_vcvtah_f32_bf16:
+** shl	d0, d0, 

Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

2020-11-03 Thread Dennis Zhang via Gcc-patches

On 11/3/20 2:05 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

Hi Richard,

On 10/30/20 2:07 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 332a0b6b1ea..39ebb776d1d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -719,6 +719,9 @@
 VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
 VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
   
+  /* Implemented by aarch64_vget_halfv8bf.  */

+  VAR1 (GETREG, vget_half, 0, ALL, v8bf)


This should be AUTO_FP, since it doesn't have any side-effects.
(As before, we should probably rename the flag, but that's separate work.)


+
 /* Implemented by aarch64_simd_mmlav16qi.  */
 VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
 VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6f..f62c52ca327 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,19 @@
 [(set_attr "type" "neon_dot")]
   )
   
+;; vget_low/high_bf16

+(define_expand "aarch64_vget_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")
+   (match_operand:SI 2 "aarch64_zero_or_1")]
+  "TARGET_BF16_SIMD"
+{
+  int hbase = INTVAL (operands[2]);
+  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);


I think this needs to be:

aarch64_simd_vect_par_cnst_half

instead.  The issue is that on big-endian targets, GCC assumes vector
lane 0 is in the high part of the register, whereas for AArch64 it's
always in the low part of the register.  So we convert from AArch64
numbering to GCC numbering when generating the rtx and then take
endianness into account when matching the rtx later.

It would be good to have -mbig-endian tests that make sure we generate
the right instruction for each function (i.e. we get them the right way
round).  I guess it would be good to test that for little-endian too.



I've updated the expander using aarch64_simd_vect_par_cnst_half.
And the expander is divided into two for getting low and high half
seperately.
It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu
targets with new tests including -mbig-endian option.


+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
+  DONE;
+})
+
   ;; bfmmla
   (define_insn "aarch64_bfmmlaqv4sf"
 [(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..0c8bc2b0c73 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -84,6 +84,10 @@
 (ior (match_test "op == constm1_rtx")
  (match_test "op == const1_rtx"))
   
+(define_predicate "aarch64_zero_or_1"

+  (and (match_code "const_int")
+   (match_test "op == const0_rtx || op == const1_rtx")))


zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
so let's keep it as-is.



This predicate is removed since there is no need of the imm operand in
the new expanders.

Thanks for the reviews.
Is it OK for trunk now?


Looks good.  OK for trunk and branches, thanks.

Richard



Thanks for approval, Richard!
This patch is committed at 3553c658533e430b232997bdfd97faf6606fb102

Bests
Dennis


Re: [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32

2020-11-03 Thread Dennis Zhang via Gcc-patches



On 11/2/20 7:05 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

Hi Richard,

On 10/29/20 5:48 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 5bc596dbffc..b68c3ca7f4b 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -732,3 +732,8 @@
 VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf)
 VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf)
 VAR1 (UNOP, bfcvt, 0, ALL, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf)
+  VAR1 (UNOP, bfcvt, 0, ALL, sf)


New intrinsics should use something more specific than “ALL”.
Since these functions are pure non-trapping integer operations,
I think they should use “AUTO_FP” instead.  (On reflection,
we should probably change the name.)


+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+   (unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+   UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_reg")]


I think this should be neon_shift_imm instead.

OK with those changes, thanks.

Richard



I've fixed the Flag and the insn attribute.
I will commit it if no further issues.


LGTM, thanks.

Richard


Thanks Richard!
This patch is committed as f7d6961126a7f06c8089d8a58bd21be43bc16806.

Bests
Dennis


Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

2020-11-03 Thread Dennis Zhang via Gcc-patches

Hi Richard,

On 10/30/20 2:07 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 332a0b6b1ea..39ebb776d1d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -719,6 +719,9 @@
VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
  
+  /* Implemented by aarch64_vget_halfv8bf.  */

+  VAR1 (GETREG, vget_half, 0, ALL, v8bf)


This should be AUTO_FP, since it doesn't have any side-effects.
(As before, we should probably rename the flag, but that's separate work.)


+
/* Implemented by aarch64_simd_mmlav16qi.  */
VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6f..f62c52ca327 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,19 @@
[(set_attr "type" "neon_dot")]
  )
  
+;; vget_low/high_bf16

+(define_expand "aarch64_vget_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")
+   (match_operand:SI 2 "aarch64_zero_or_1")]
+  "TARGET_BF16_SIMD"
+{
+  int hbase = INTVAL (operands[2]);
+  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);


I think this needs to be:

   aarch64_simd_vect_par_cnst_half

instead.  The issue is that on big-endian targets, GCC assumes vector
lane 0 is in the high part of the register, whereas for AArch64 it's
always in the low part of the register.  So we convert from AArch64
numbering to GCC numbering when generating the rtx and then take
endianness into account when matching the rtx later.

It would be good to have -mbig-endian tests that make sure we generate
the right instruction for each function (i.e. we get them the right way
round).  I guess it would be good to test that for little-endian too.



I've updated the expander using aarch64_simd_vect_par_cnst_half.
And the expander is divided into two for getting low and high half 
seperately.
It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu 
targets with new tests including -mbig-endian option.



+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
+  DONE;
+})
+
  ;; bfmmla
  (define_insn "aarch64_bfmmlaqv4sf"
[(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..0c8bc2b0c73 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -84,6 +84,10 @@
 (ior (match_test "op == constm1_rtx")
  (match_test "op == const1_rtx"))
  
+(define_predicate "aarch64_zero_or_1"

+  (and (match_code "const_int")
+   (match_test "op == const0_rtx || op == const1_rtx")))


zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
so let's keep it as-is.



This predicate is removed since there is no need of the imm operand in 
the new expanders.


Thanks for the reviews.
Is it OK for trunk now?

Cheers
Dennis


diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index eb8e6f7b3d8..f26a96042bc 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -722,6 +722,10 @@
   VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
   VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
 
+  /* Implemented by aarch64_vget_lo/hi_halfv8bf.  */
+  VAR1 (UNOP, vget_lo_half, 0, AUTO_FP, v8bf)
+  VAR1 (UNOP, vget_hi_half, 0, AUTO_FP, v8bf)
+
   /* Implemented by aarch64_simd_mmlav16qi.  */
   VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
   VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 381a702eba0..af29a2f26f5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,27 @@
   [(set_attr "type" "neon_dot")]
 )
 
+;; vget_low/high_bf16
+(define_expand "aarch64_vget_lo_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")]
+  "TARGET_BF16_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, false);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p));
+  DONE;
+})
+
+(define_expand "aarch64_vget_hi_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")]
+  "TARGET_BF16_SIMD"
+{
+  rtx 

Re: [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32

2020-11-02 Thread Dennis Zhang via Gcc-patches

Hi Richard,

On 10/29/20 5:48 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 5bc596dbffc..b68c3ca7f4b 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -732,3 +732,8 @@
VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf)
VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf)
VAR1 (UNOP, bfcvt, 0, ALL, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf)
+  VAR1 (UNOP, bfcvt, 0, ALL, sf)


New intrinsics should use something more specific than “ALL”.
Since these functions are pure non-trapping integer operations,
I think they should use “AUTO_FP” instead.  (On reflection,
we should probably change the name.)


+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+   (unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+   UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_reg")]


I think this should be neon_shift_imm instead.

OK with those changes, thanks.

Richard



I've fixed the Flag and the insn attribute.
I will commit it if no further issues.
Thanks for the review.

Regards
Dennis
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index eb8e6f7b3d8..f494b535a30 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -732,3 +732,8 @@
   VAR1 (UNOP, bfcvtn_q, 0, FP, v8bf)
   VAR1 (BINOP, bfcvtn2, 0, FP, v8bf)
   VAR1 (UNOP, bfcvt, 0, FP, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, AUTO_FP, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, AUTO_FP, v8bf)
+  VAR1 (UNOP, bfcvt, 0, AUTO_FP, sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 381a702eba0..030a086d31c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7238,3 +7238,31 @@
   "bfcvt\\t%h0, %s1"
   [(set_attr "type" "f_cvt")]
 )
+
+;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes.
+(define_insn "aarch64_vbfcvt"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN))]
+  "TARGET_BF16_SIMD"
+  "shll\\t%0.4s, %1.4h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_vbfcvt_highv8bf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN2))]
+  "TARGET_BF16_SIMD"
+  "shll2\\t%0.4s, %1.8h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+	(unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+		UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_imm")]
+)
diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
index 984875dcc01..881615498d3 100644
--- a/gcc/config/aarch64/arm_bf16.h
+++ b/gcc/config/aarch64/arm_bf16.h
@@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a)
   return __builtin_aarch64_bfcvtbf (__a);
 }
 
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtah_f32_bf16 (bfloat16_t __a)
+{
+  return __builtin_aarch64_bfcvtsf (__a);
+}
+
 #pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 95bfa5ebba2..69cccd32786 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35680,6 +35680,27 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_f32_bf16 (bfloat16x4_t __a)
+{
+  return __builtin_aarch64_vbfcvtv4bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvtv8bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvt_highv8bf (__a);
+}
+
 __extension__ extern __inli

[PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

2020-10-29 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch implements ACLE intrinsics vget_low_bf16 and vget_high_bf16 to 
extract lower or higher half from a bfloat16x8 vector.
The vget_high_bf16 is done by 'dup' instruction. The vget_low_bf16 could be 
done by a 'dup' or 'mov', or it's mostly optimized out by just using the lower 
half of a vector register.
The test for vget_low_bf16 only checks that the interface can be compiled but 
no instruction is checked since none is generated in the test case.

Arm ACLE document at 
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics

Regtested and bootstrapped.

Is it OK for trunk please?

Thanks
Denni

gcc/ChangeLog:

2020-10-29  Dennis Zhang  

* config/aarch64/aarch64-simd-builtins.def (vget_half): New entry.
* config/aarch64/aarch64-simd.md (aarch64_vget_halfv8bf): New entry.
* config/aarch64/arm_neon.h (vget_low_bf16): New intrinsic.
(vget_high_bf16): Likewise.
* config/aarch64/predicates.md (aarch64_zero_or_1): New predicate
for zero or one immediate to indicate the lower or higher half.

gcc/testsuite/ChangeLog

2020-10-29  Dennis Zhang  

* gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c
(test_vget_low_bf16, test_vget_high_bf16): New tests.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 332a0b6b1ea..39ebb776d1d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -719,6 +719,9 @@
   VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
   VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
 
+  /* Implemented by aarch64_vget_halfv8bf.  */
+  VAR1 (GETREG, vget_half, 0, ALL, v8bf)
+
   /* Implemented by aarch64_simd_mmlav16qi.  */
   VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
   VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6f..f62c52ca327 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,19 @@
   [(set_attr "type" "neon_dot")]
 )
 
+;; vget_low/high_bf16
+(define_expand "aarch64_vget_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")
+   (match_operand:SI 2 "aarch64_zero_or_1")]
+  "TARGET_BF16_SIMD"
+{
+  int hbase = INTVAL (operands[2]);
+  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
+  DONE;
+})
+
 ;; bfmmla
 (define_insn "aarch64_bfmmlaqv4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 50f8b23bc17..c6ac0b8dd17 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35530,6 +35530,20 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vget_low_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vget_halfv8bf (__a, 0);
+}
+
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vget_high_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vget_halfv8bf (__a, 1);
+}
+
 __extension__ extern __inline bfloat16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vcvt_bf16_f32 (float32x4_t __a)
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..0c8bc2b0c73 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -84,6 +84,10 @@
 		 (ior (match_test "op == constm1_rtx")
 		  (match_test "op == const1_rtx"))
 
+(define_predicate "aarch64_zero_or_1"
+  (and (match_code "const_int")
+   (match_test "op == const0_rtx || op == const1_rtx")))
+
 (define_predicate "aarch64_reg_or_orr_imm"
(ior (match_operand 0 "register_operand")
 	(and (match_code "const_vector")
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c
index c42c7acbbe9..35f4cb864f2 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c
@@ -83,3 +83,14 @@ bfloat16_t test_vduph_laneq_bf16 (bfloat16x8_t a)
   return vduph_laneq_bf16 (a, 7);
 }
 /* { dg-final { scan-assembler-times "dup\\th\[0-9\]+, v\[0-9\]+\.h\\\[7\\\]" 2 } } */
+
+bfloat16x4_t test_vget_low_bf16 (bfloat16x8_t a)
+{
+  return vget_low_bf16 (a);
+}
+
+bfloat16x4_t test_vget_high_bf16 (bfloat16x8_t a)
+{
+  return vget_high_bf16 (a);
+}
+/* { dg-final { scan-assembler-times "dup\\td\[0-9\]+, v\[0-9\]+\.d\\\[1\\\]" 1 } } */


[PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32

2020-10-29 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch enables intrinsics to convert BFloat16 scalar and vector operands to 
Float32 modes.
The intrinsics are implemented by shifting each BFloat16 item 16 bits to left 
using shl/shll/shll2 instructions.

Intrinsics are documented at 
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics
ISA is documented at https://developer.arm.com/docs/ddi0596/latest

Regtested and bootstrapped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-29  Dennis Zhang  

* config/aarch64/aarch64-simd-builtins.def(vbfcvt): New entry.
(vbfcvt_high, bfcvt): Likewise.
* config/aarch64/aarch64-simd.md(aarch64_vbfcvt): New entry.
(aarch64_vbfcvt_highv8bf, aarch64_bfcvtsf): Likewise.
* config/aarch64/arm_bf16.h (vcvtah_f32_bf16): New intrinsic.
* config/aarch64/arm_neon.h (vcvt_f32_bf16): Likewise.
(vcvtq_low_f32_bf16, vcvtq_high_f32_bf16): Likewise.

gcc/testsuite/ChangeLog

2020-10-29  Dennis Zhang  

* gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
(test_vcvt_f32_bf16, test_vcvtq_low_f32_bf16): New tests.
(test_vcvtq_high_f32_bf16, test_vcvth_f32_bf16): Likewise.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 5bc596dbffc..b68c3ca7f4b 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -732,3 +732,8 @@
   VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf)
   VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf)
   VAR1 (UNOP, bfcvt, 0, ALL, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf)
+  VAR1 (UNOP, bfcvt, 0, ALL, sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 381a702eba0..5ae79d67981 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7238,3 +7238,31 @@
   "bfcvt\\t%h0, %s1"
   [(set_attr "type" "f_cvt")]
 )
+
+;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes.
+(define_insn "aarch64_vbfcvt"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN))]
+  "TARGET_BF16_SIMD"
+  "shll\\t%0.4s, %1.4h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_vbfcvt_highv8bf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN2))]
+  "TARGET_BF16_SIMD"
+  "shll2\\t%0.4s, %1.8h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+	(unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+		UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_reg")]
+)
diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
index 984875dcc01..881615498d3 100644
--- a/gcc/config/aarch64/arm_bf16.h
+++ b/gcc/config/aarch64/arm_bf16.h
@@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a)
   return __builtin_aarch64_bfcvtbf (__a);
 }
 
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtah_f32_bf16 (bfloat16_t __a)
+{
+  return __builtin_aarch64_bfcvtsf (__a);
+}
+
 #pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 85c0d62ca12..9c0386ed7b1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35716,6 +35716,27 @@ vcvtq_high_bf16_f32 (bfloat16x8_t __inactive, float32x4_t __a)
   return __builtin_aarch64_bfcvtn2v8bf (__inactive, __a);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_f32_bf16 (bfloat16x4_t __a)
+{
+  return __builtin_aarch64_vbfcvtv4bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvtv8bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvt_highv8bf (__a);
+}
+
 #pragma GCC pop_options
 
 /* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
index bbea630b182..47af7c494d9 100644
--- a/gcc/

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-10-23 Thread Dennis Zhang via Gcc-patches
Hi Kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Thursday, October 22, 2020 9:40 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vsub
>
> Hi Dennis,
>
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 06 October 2020 17:47
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; nd ;
> > Richard Earnshaw ; Ramana Radhakrishnan
> > 
> > Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
> >
> > Hi all,
> >
> > On 8/17/20 6:41 PM, Dennis Zhang wrote:
> > >
> > > Hi all,
> > >
> > > This patch enables MVE vsub instructions for auto-vectorization.
> > > It adds RTL templates for MVE vsub instructions using 'minus' instead of
> > > unspec expression to make the instructions recognizable for vectorization.
> > > MVE target is added in sub3 optab. The sub3 optab is
> > > modified to use a mode iterator that selects available modes for various
> > > targets correspondingly.
> > > MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> > > support vectorization.
> > >
> > > This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> > > generate wrong instruction numbers because of unexpected icf
> > optimization.
> > > This bug is exposed by the MVE vector modes enabled in this patch,
> > > therefore it is corrected in this patch to avoid test failures.
> > >
> > > MVE instructions are documented here:
> > > https://developer.arm.com/architectures/instruction-sets/simd-
> > isas/helium/helium-intrinsics
> > >
> > > The patch is regtested for arm-none-eabi and bootstrapped for
> > > arm-none-linux-gnueabihf.
> > >
> > > Is it OK for trunk please?
> > >
> > > Thanks
> > > Dennis
> > >
> > > gcc/ChangeLog:
> > >
> > > 2020-08-10  Dennis Zhang  
> > >
> > > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector
> > modes.
> > > * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> > > (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP):
> > Likewise.
> > > (TARGET_NEON_MVE_HFP): Likewise.
> > > * config/arm/iterators.md (VSEL): New mode iterator to select modes
> > > for corresponding targets.
> > > * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
> > > using expression 'minus'.
> > > (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
> > > * config/arm/neon.md (sub3): Removed here. Integrated in the
> > > sub3 in vec-common.md
> > > * config/arm/vec-common.md (sub3): Enable MVE target. Use
> > VSEL
> > > to select available modes. Exclude TARGET_NEON_FP16INST from
> > > TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> > > originally in neon.md.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2020-08-10  Dennis Zhang  
> > >
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> > > option -fno-ipa-icf and change the instruction count from 8 to 16.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> > > * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> > > * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> > > * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> > >
> >
> > This patch is updated based on Richard Sandiford's patch adding new
> > vector mode macros:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
> > The old version of this patch is at
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > And a less related part in the old version is separated into another
> > patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > September/554100.html
> >
> > T

Ping: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-10-21 Thread Dennis Zhang via Gcc-patches
Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555646.html
Thanks


From: Dennis Zhang 
Sent: Tuesday, October 6, 2020 5:46 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov; nd; Richard Earnshaw; Ramana Radhakrishnan
Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
>
> Hi all,
>
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
>
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
>
> Is it OK for trunk please?
>
> Thanks
> Dennis
>
> gcc/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>   * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>   (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>   (TARGET_NEON_MVE_HFP): Likewise.
>   * config/arm/iterators.md (VSEL): New mode iterator to select modes
>   for corresponding targets.
>   * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
>   using expression 'minus'.
>   (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
>   * config/arm/neon.md (sub3): Removed here. Integrated in the
>   sub3 in vec-common.md
>   * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
>   to select available modes. Exclude TARGET_NEON_FP16INST from
>   TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>   originally in neon.md.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>   * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>   * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>   * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
>

This patch is updated based on Richard Sandiford's patch adding new
vector mode macros:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec
expression to make the instructions recognizable for auto-vectorization.
The sub3 in mve.md is modified to use new mode macros which make
the expander available when certain modes are supported. Then various
targets can share this expander for vectorization. The redundant
sub3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (*sub3_neon): Use the new mode macros
ARM_HAVE__ARITH.
(sub3, sub3_fp16): Removed.
    (neon_vsub): Use gen_sub3 instead of gen_sub3_fp16.
* config/arm/vec-common.md (sub3): Use the new mode macros
ARM_HAVE__ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vsub_1.c: New test.



Re: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax

2020-10-21 Thread Dennis Zhang via Gcc-patches
Hi Kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Wednesday, October 14, 2020 10:15 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
>
> Hi Dennis,
>
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 06 October 2020 17:59
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; nd ;
> > Richard Earnshaw ; Ramana Radhakrishnan
> > 
> > Subject: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
> >
> > Hi all,
> >
> > This patch enables MVE vmin/vmax instructions for auto-vectorization.
> > MVE target is included in expander smin3, umin3,
> > smax3
> > and umax3 for vectorization.
> > Related insns for vmin/vmax in mve.md are modified to use smin, umin,
> > smax and umax expressions instead of unspec to support the expanders.
> >
> > Regression tested on arm-none-eabi and bootstraped on
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
>
> Ok.
> Thanks,
> Kyrill
>

Thanks for your approval.
This patch has been committed to trunk at 
76835dca95ab9f3f106a0db1e6152ad0740b38b3

Cheers
Dennis

Re: [PATCH][Arm] Auto-vectorization for MVE: vmul

2020-10-21 Thread Dennis Zhang via Gcc-patches
Hi kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Wednesday, October 14, 2020 10:14 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vmul
> 
> Hi Dennis,
> 
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 06 October 2020 17:55
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; nd ;
> > Richard Earnshaw ; Ramana Radhakrishnan
> > 
> > Subject: [PATCH][Arm] Auto-vectorization for MVE: vmul
> >
> > Hi all,
> >
> > This patch enables MVE vmul instructions for auto-vectorization.
> > It includes MVE in expander mul3 to enable vectorization for MVE 
> > and modifies related vmul insns to support the expander by using 'mult'
> > instead of unspec.
> > The mul3 for vectorization in vec-common.md uses mode iterator
> > VDQWH instead of VALLW to cover all supported modes.
> > The macros ARM_HAVE__ARITH are used to select supported
> > modes for 
> > different targets. The redundant mul3 in neon.md is removed.
> >
> > Regression tested on arm-none-eabi and bootstraped on
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
> 
> Ok, thank you for your patience.
> Kyrill
> 

Thanks for your approval.
It's committed to trunk at 0f41b5e02fa47db2080b77e4e1f7cd3305457c05

Cheers
Dennis


Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-12 Thread Dennis Zhang via Gcc-patches

Hi Christophe,

On 12/10/2020 12:40, Christophe Lyon wrote:

Hi,


On Thu, 8 Oct 2020 at 16:22, Christophe Lyon  wrote:


On Thu, 8 Oct 2020 at 16:08, Dennis Zhang  wrote:


Hi Christophe,

On 08/10/2020 14:14, Christophe Lyon wrote:

Hi,


On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
 wrote:


On 9/16/20 4:00 PM, Dennis Zhang wrote:

Hi all,

This patch enables SIMD modes for MVE auto-vectorization.
In this patch, the integer and float MVE SIMD modes are returned by
arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
MVE or MVE_FLOAT is enabled.
Then the expanders for auto-vectorization can be used for generating MVE
SIMD code.

This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
revealed by the enabled MVE SIMD modes.
The tests are for checking the MVE reinterpret intrinsics.
There are two functions in each of the tests. The two functions contain
the pattern of identical code so that they are folded in icf pass.
Because of icf, the instruction count only checks one function which is 8.
However when the SIMD modes are enabled, the estimation of the code size
becomes smaller so that inlining is applied after icf, then the
instruction count becomes 16 which causes failure of the tests.
Because the icf is not the expected pattern to be tested but causes
above issues, -fno-ipa-icf is applied to the tests to avoid unstable
instruction count.

This patch is separated from
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
because this part is not strongly connected to the aim of that one so
that causing confusion.

Regtested and bootstraped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-09-15  Dennis Zhang  

* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.



Since toolchain builds work again after Jakub's divmod fix, I'm now
facing another build error likely caused by this patch:
In file included from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
   from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
./insn-modes.h:196:71: error: temporary of non-literal type
'scalar_int_mode' in a constant expression
   #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
 ^
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
note: in expansion of macro 'QImode'
 case QImode:

and similarly for the other cases.

Does the build work for you?

Thanks,

Christophe



Thanks for the report. Sorry to see the error.
I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I
didn't get this error.
Could you please help to show the configuration you use for your build?
I will test and fix at once.



It fails on all of them for me. Does it work for you with current
master? (r11-3720-gf18eeb6b958acd5e1590ca4a73231486b749be9b)



So... I guess you are using a host with GCC more recent than 4.8.5? :-)
When I build manually on ubuntu-16.04 with gcc-5.4, the build succeeds,
and after manually building with the same environment in the compute
farm I use for validation (RHEL 7, gcc-4.8.5), I managed to reproduce the
build failure.
It's a matter of replacing
case QImode:
with
case E_QImode:

Is the attached patch OK? Or do we instead want to revisit the minimum
gcc version required to build gcc?

Thanks,

Christophe



I've tested your patch and it works with my other patches depending on 
this one. So I agree this patch is OK. Thanks for the fix.


Bests
Dennis


Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-08 Thread Dennis Zhang via Gcc-patches

Hi Christophe,

On 08/10/2020 14:14, Christophe Lyon wrote:

Hi,


On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
 wrote:


On 9/16/20 4:00 PM, Dennis Zhang wrote:

Hi all,

This patch enables SIMD modes for MVE auto-vectorization.
In this patch, the integer and float MVE SIMD modes are returned by
arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
MVE or MVE_FLOAT is enabled.
Then the expanders for auto-vectorization can be used for generating MVE
SIMD code.

This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
revealed by the enabled MVE SIMD modes.
The tests are for checking the MVE reinterpret intrinsics.
There are two functions in each of the tests. The two functions contain
the pattern of identical code so that they are folded in icf pass.
Because of icf, the instruction count only checks one function which is 8.
However when the SIMD modes are enabled, the estimation of the code size
becomes smaller so that inlining is applied after icf, then the
instruction count becomes 16 which causes failure of the tests.
Because the icf is not the expected pattern to be tested but causes
above issues, -fno-ipa-icf is applied to the tests to avoid unstable
instruction count.

This patch is separated from
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
because this part is not strongly connected to the aim of that one so
that causing confusion.

Regtested and bootstraped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-09-15  Dennis Zhang  

   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.



Since toolchain builds work again after Jakub's divmod fix, I'm now
facing another build error likely caused by this patch:
In file included from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
  from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
./insn-modes.h:196:71: error: temporary of non-literal type
'scalar_int_mode' in a constant expression
  #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
^
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
note: in expansion of macro 'QImode'
case QImode:

and similarly for the other cases.

Does the build work for you?

Thanks,

Christophe



Thanks for the report. Sorry to see the error.
I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I 
didn't get this error.

Could you please help to show the configuration you use for your build?
I will test and fix at once.

Thanks
Dennis


[PATCH][Arm] Auto-vectorization for MVE: vmin/vmax

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch enables MVE vmin/vmax instructions for auto-vectorization.
MVE target is included in expander smin3, umin3, smax3 
and umax3 for vectorization.
Related insns for vmin/vmax in mve.md are modified to use smin, umin, 
smax and umax expressions instead of unspec to support the expanders.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vmaxq_): Replace with ...
(mve_vmaxq_s, mve_vmaxq_u): ... these new insns to
use smax/umax instead of VMAXQ.
(mve_vminq_): Replace with ...
(mve_vminq_s, mve_vminq_u): ... these new insns to
use smin/umin instead of VMINQ.
(mve_vmaxnmq_f): Use smax instead of VMAXNMQ_F.
(mve_vminnmq_f): Use smin instead of VMINNMQ_F.
* config/arm/vec-common.md (smin3): Use the new mode macros
ARM_HAVE__ARITH.
(umin3, smax3, umax3): Likewise.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vminmax_1.c: New test.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..0d9f932e983 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1977,15 +1977,25 @@
 ;;
 ;; [vmaxq_u, vmaxq_s])
 ;;
-(define_insn "mve_vmaxq_"
+(define_insn "mve_vmaxq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		   (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMAXQ))
+	(smax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmax.%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vmaxq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmax.%#\t%q0, %q1, %q2"
+  "vmax.%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2037,15 +2047,25 @@
 ;;
 ;; [vminq_s, vminq_u])
 ;;
-(define_insn "mve_vminq_"
+(define_insn "mve_vminq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		   (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMINQ))
+	(smin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmin.%#\t%q0, %q1, %q2"
+  "vmin.%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vminq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmin.%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -3030,9 +3050,8 @@
 (define_insn "mve_vmaxnmq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMAXNMQ_F))
+	(smax:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmaxnm.f%#	%q0, %q1, %q2"
@@ -3090,9 +3109,8 @@
 (define_insn "mve_vminnmq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMINNMQ_F))
+	(smin:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vminnm.f%#	%q0, %q1, %q2"
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..6a330cc82f6 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -114,39 +114,29 @@
   [(set (match_operand

[PATCH][Arm] Auto-vectorization for MVE: vmul

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch enables MVE vmul instructions for auto-vectorization.
It includes MVE in expander mul3 to enable vectorization for MVE 
and modifies related vmul insns to support the expander by using 'mult' 
instead of unspec.
The mul3 for vectorization in vec-common.md uses mode iterator 
VDQWH instead of VALLW to cover all supported modes.
The macros ARM_HAVE__ARITH are used to select supported modes for 
different targets. The redundant mul3 in neon.md is removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vmulq): New entry for vmul instruction
using expression 'mult'.
(mve_vmulq_f): Use mult instead of VMULQ_F.
* config/arm/neon.md (mul3): Removed.
* config/arm/vec-common.md (mul3): Use the new mode macros
ARM_HAVE__ARITH. Use mode iterator VDQWH instead of VALLW.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vmul_1.c: New test.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..5b2b609174c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2199,6 +2199,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vmulq"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmul.i%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vornq_u, vornq_s])
 ;;
@@ -3210,9 +3221,8 @@
 (define_insn "mve_vmulq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMULQ_F))
+	(mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmul.f%#	%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 96bf277f501..f6632f1a25a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1899,17 +1899,6 @@
 (const_string "neon_mul_")))]
 )
 
-(define_insn "mul3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (mult:VH
-(match_operand:VH 1 "s_register_operand" "w")
-(match_operand:VH 2 "s_register_operand" "w")))]
-  "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
-  "vmul.f16\t%0, %1, %2"
- [(set_attr "type" "neon_mul_")]
-)
-
 (define_insn "neon_vmulf"
  [(set
(match_operand:VH 0 "s_register_operand" "=w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..45db60e7411 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -101,14 +101,11 @@
 })
 
 (define_expand "mul3"
-  [(set (match_operand:VALLW 0 "s_register_operand")
-(mult:VALLW (match_operand:VALLW 1 "s_register_operand")
-		(match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode)
-		|| flag_unsafe_math_optimizations))
-   || (mode == V4HImode && TARGET_REALLY_IWMMXT)"
-{
-})
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+	(mult:VDQWH (match_operand:VDQWH 1 "s_register_operand")
+		(match_operand:VDQWH 2 "s_register_operand")))]
+  "ARM_HAVE__ARITH"
+)
 
 (define_expand "smin3"
   [(set (match_operand:VALLW 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
new file mode 100644
index 000..514f292c15e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+void test_vmul_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i16 (int16_t * dest, int1

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
> 
> Hi all,
> 
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
> 
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
> 
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
> 
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>   * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>   (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>   (TARGET_NEON_MVE_HFP): Likewise.
>   * config/arm/iterators.md (VSEL): New mode iterator to select modes
>   for corresponding targets.
>   * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
>   using expression 'minus'.
>   (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
>   * config/arm/neon.md (sub3): Removed here. Integrated in the
>   sub3 in vec-common.md
>   * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
>   to select available modes. Exclude TARGET_NEON_FP16INST from
>   TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>   originally in neon.md.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  
> 
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>   * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>   * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>   * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> 

This patch is updated based on Richard Sandiford's patch adding new 
vector mode macros: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another 
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec 
expression to make the instructions recognizable for auto-vectorization.
The sub3 in mve.md is modified to use new mode macros which make 
the expander available when certain modes are supported. Then various 
targets can share this expander for vectorization. The redundant 
sub3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (*sub3_neon): Use the new mode macros
ARM_HAVE__ARITH.
(sub3, sub3_fp16): Removed.
    (neon_vsub): Use gen_sub3 instead of gen_sub3_fp16.
* config/arm/vec-common.md (sub3): Use the new mode macros
ARM_HAVE__ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vsub_1.c: New test.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..7853b642262 100644
--- a/gcc/config/arm/

Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-06 Thread Dennis Zhang via Gcc-patches
On 9/16/20 4:00 PM, Dennis Zhang wrote:
> Hi all,
> 
> This patch enables SIMD modes for MVE auto-vectorization.
> In this patch, the integer and float MVE SIMD modes are returned by
> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> MVE or MVE_FLOAT is enabled.
> Then the expanders for auto-vectorization can be used for generating MVE
> SIMD code.
> 
> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> revealed by the enabled MVE SIMD modes.
> The tests are for checking the MVE reinterpret intrinsics.
> There are two functions in each of the tests. The two functions contain
> the pattern of identical code so that they are folded in icf pass.
> Because of icf, the instruction count only checks one function which is 8.
> However when the SIMD modes are enabled, the estimation of the code size
> becomes smaller so that inlining is applied after icf, then the
> instruction count becomes 16 which causes failure of the tests.
> Because the icf is not the expected pattern to be tested but causes
> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> instruction count.
> 
> This patch is separated from
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> because this part is not strongly connected to the aim of that one so
> that causing confusion.
> 
> Regtested and bootstraped.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  
> 
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> 

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html


[PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-09-16 Thread Dennis Zhang
Hi all,

This patch enables SIMD modes for MVE auto-vectorization.
In this patch, the integer and float MVE SIMD modes are returned by 
arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when 
MVE or MVE_FLOAT is enabled.
Then the expanders for auto-vectorization can be used for generating MVE 
SIMD code.

This patch also fixes bugs in MVE vreiterpretq_*.c tests which are 
revealed by the enabled MVE SIMD modes.
The tests are for checking the MVE reinterpret intrinsics.
There are two functions in each of the tests. The two functions contain 
the pattern of identical code so that they are folded in icf pass.
Because of icf, the instruction count only checks one function which is 8.
However when the SIMD modes are enabled, the estimation of the code size 
becomes smaller so that inlining is applied after icf, then the 
instruction count becomes 16 which causes failure of the tests.
Because the icf is not the expected pattern to be tested but causes 
above issues, -fno-ipa-icf is applied to the tests to avoid unstable 
instruction count.

This patch is separated from 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html 
because this part is not strongly connected to the aim of that one so 
that causing confusion.

Regtested and bootstraped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-09-15  Dennis Zhang  

* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.

gcc/testsuite/ChangeLog:

2020-09-15  Dennis Zhang  

* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
option -fno-ipa-icf and change the instruction count from 8 to 16.
* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd78141519e..c50d5aca6a9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28964,6 +28964,30 @@ arm_preferred_simd_mode (scalar_mode mode)
   default:;
   }
 
+  if (TARGET_HAVE_MVE)
+switch (mode)
+  {
+  case QImode:
+	return V16QImode;
+  case HImode:
+	return V8HImode;
+  case SImode:
+	return V4SImode;
+
+  default:;
+  }
+
+  if (TARGET_HAVE_MVE_FLOAT)
+switch (mode)
+  {
+  case HFmode:
+	return V8HFmode;
+  case SFmode:
+	return V4SFmode;
+
+  default:;
+  }
+
   return word_mode;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
index f59f69734ed..2398d894861 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f16 (r7, vreinterpretq_f16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
index dac47c7e924..5a58dc6eb4c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f32 (r7, vreinterpretq_f32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
index edc2f2f3bc6..9ab05e95420 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-o

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-09-07 Thread Dennis Zhang
Hi Ramana,

On 8/21/20 10:33 PM, Ramana Radhakrishnan wrote:
> On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang  wrote:
>>
>>
>> Hi all,
>>
>> This patch enables MVE vsub instructions for auto-vectorization.
>> It adds RTL templates for MVE vsub instructions using 'minus' instead of
>> unspec expression to make the instructions recognizable for vectorization.
>> MVE target is added in sub3 optab. The sub3 optab is
>> modified to use a mode iterator that selects available modes for various
>> targets correspondingly.
>> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
>> support vectorization.
>>
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
>> MVE instructions are documented here:
>> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>>
> 
> Hi Dennis,
> 
> Thanks for this patch . However a quick read suggests  at first glance
> that it could do with some refactoring or indeed further breaking
> down.
> 
> 1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
> the motivation for obviously on a quick read. I'll try and read that
> again. Please document why these complex TARGET_ macros exist and how
> they are expected to be used in the machine description and what they
> are indicated to do.

Thanks for the questions.
The macros are used in the iterators as conditions to enable modes 
separately for different targets. The reason to define these macros is 
to make the iterators short.
And about why using conditions for the iterators, the aim is to put 
different modes in a single expander. Otherwise the expander would 
repeat several times for different sets of modes supported by different 
targets.

> 2. It seems odd that we would have
>   "&& ((mode != V2SFmode && mode != V4SFmode)
> +|| flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
> apply this to TARGET_MVE_FLOAT in the sub3 expander. The point
> is that if it isn't safe to vectorize a subtract for Neon, why is it
> safe to do the same for MVE ? This was done in 2010 by Julian to fix
> PR target/43703 - isn't this applicable on MVE as well ?

I agree with this after investigation. I've add 
flag_unsafe_math_optimizations fot MVE_FLOAT target.

> 3. I'm also going to quibble a bit about the use of VSEL as the name
> of an iterator as that conflates it with the instruction vsel and it's
> not obvious what's going on here.

I have changed the name to VNIM_COND, which means NONE, IWWMMXT and MVE 
according to conditions.
I've add comments to document the aim of the iterator.
Please let me know if you think it needs further fix.

> 
> 
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
> 
> I'm a bit confused as to why this got exposed because of the new MVE
> vector modes exposed by this patch.

The aim of the tests is only to check the reinterpret intrinsics working 
well.
However the two functions in each test contain icf optimization pattern 
and then the second function is folded due to same code. The icf pattern 
is not expected but to make the test pass, the author only checked the 
instruction count for the first function.
With my patch that enables MVE vector modes in arm_preferred_simd_mode, 
the estimated code size is smaller so that the code is inlined from the 
first function back to the second one in inlining optimization after icf 
optimization. Then the instruction count changes.
Because the icf is not the expected pattern to be tested but causes 
above mentioned issues, -fno-ipa-icf is used to avoid unstable 
instruction count in these tests.

> 
>> The patch is regtested for arm-none-eabi and bootstrapped for
>> arm-none-linux-gnueabihf.
>>
> Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
> --with-fpu=neon in the configuration ?

Yes, for arm-none-linux-gnueabihf bootstrap there is --with-fpu=neon.
Should I test it without this configuration?

The new patch is attached.
I updated the comments for the iterator and the macros.

Many thanks!
Dennis

gcc/ChangeLog:

2020-08-27  Dennis Zhang  

* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
(TARGET_NEON_IWMMXT_M

[PATCH][Arm] Auto-vectorization for MVE: vsub

2020-08-17 Thread Dennis Zhang

Hi all,

This patch enables MVE vsub instructions for auto-vectorization.
It adds RTL templates for MVE vsub instructions using 'minus' instead of 
unspec expression to make the instructions recognizable for vectorization.
MVE target is added in sub3 optab. The sub3 optab is 
modified to use a mode iterator that selects available modes for various 
targets correspondingly.
MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to 
support vectorization.

This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests 
generate wrong instruction numbers because of unexpected icf optimization.
This bug is exposed by the MVE vector modes enabled in this patch, 
therefore it is corrected in this patch to avoid test failures.

MVE instructions are documented here: 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics

The patch is regtested for arm-none-eabi and bootstrapped for 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-08-10  Dennis Zhang  

* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
(TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
(TARGET_NEON_MVE_HFP): Likewise.
* config/arm/iterators.md (VSEL): New mode iterator to select modes
for corresponding targets.
* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (sub3): Removed here. Integrated in the
sub3 in vec-common.md
* config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
to select available modes. Exclude TARGET_NEON_FP16INST from
TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
originally in neon.md.

gcc/testsuite/ChangeLog:

2020-08-10  Dennis Zhang  

* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
option -fno-ipa-icf and change the instruction count from 8 to 16.
* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
* gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
* gcc.target/arm/mve/vect/vect_sub_0.c: New test.
* gcc.target/arm/mve/vect/vect_sub_1.c: New test.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 30e1d6dc994..eb8c9599357 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -334,6 +334,14 @@ emission of floating point pcs attributes.  */
 		isa_bit_mve_float) \
 			   && !TARGET_GENERAL_REGS_ONLY)
 
+#define TARGET_NEON_IWMMXT	(TARGET_NEON || TARGET_REALLY_IWMMXT)
+#define TARGET_NEON_IWMMXT_MVE	(TARGET_NEON || TARGET_REALLY_IWMMXT \
+ || TARGET_HAVE_MVE)
+#define TARGET_NEON_IWMMXT_MVE_FP ((TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT) \
+   || TARGET_NEON || TARGET_REALLY_IWMMXT)
+#define TARGET_NEON_MVE_HFP	((TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT) \
+ || TARGET_NEON_FP16INST)
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6b7ca829f1c..dcbcbbeced0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28913,6 +28913,30 @@ arm_preferred_simd_mode (scalar_mode mode)
   default:;
   }
 
+  if (TARGET_HAVE_MVE)
+switch (mode)
+  {
+  case QImode:
+	return V16QImode;
+  case HImode:
+	return V8HImode;
+  case SImode:
+	return V4SImode;
+
+  default:;
+  }
+
+  if (TARGET_HAVE_MVE_FLOAT)
+switch (mode)
+  {
+  case HFmode:
+	return V8HFmode;
+  case SFmode:
+	return V4SFmode;
+
+  default:;
+  }
+
   return word_mode;
 }
 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722..52c3a8a4355 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -80,6 +80,19 @@
 ;; Integer and float modes supported by Neon and IWMMXT but not MVE.
 (define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])
 
+;; Select modes for NEON, IWMMXT and MVE.
+(define_mode_iterator VSEL [(V16QI "TARGET_NEON_IWMMXT_MVE")
+			

Re: [PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers

2020-04-08 Thread Dennis Zhang
Hi kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Tuesday, April 7, 2020 3:07 PM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm][2/4]  Custom Datapath Extension intrinsics: 
> instructions using FPU/MVE S/D registers
>
> Hi Dennis,
>
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 07 April 2020 13:31
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; Richard Earnshaw ;
> > Ramana Radhakrishnan ; Kyrylo
> > Tkachov 
> > Subject: Re: [PATCH][Arm][2/4] Custom Datapath Extension intrinsics:
> > instructions using FPU/MVE S/D registers
> >
> > Hi all,
> >
> > This patch is updated to support DImode for vfp target as required by CDE.
> > Changelog is updated as following.
> >
> > Is this ready for commit please?
>
> This is ok.
> Has the first patch been updated and committed yet?
> Thanks,
> Kyrill
>

This patch has been committed as 07b9bfd02b88cad2f6b3f50ad610dd75cb989ed3.

Many thanks
Dennis

> >
> > Cheers
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2020-04-07  Dennis Zhang  
> > Matthew Malcomson 
> >
> > * config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
> > (CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
> > (CX_TERNARY_QUALIFIERS): Likewise.
> > (ARM_BUILTIN_CDE_PATTERN_START): Likewise.
> > (ARM_BUILTIN_CDE_PATTERN_END): Likewise.
> > (arm_init_acle_builtins): Initialize CDE builtins.
> > (arm_expand_acle_builtin): Check CDE constant operands.
> > * config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the
> > range
> > of CDE constant operand.
> > * config/arm/arm.c (arm_hard_regno_mode_ok): Support DImode for
> > TARGET_VFP_BASE.
> > (ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3):
> > Likewise.
> > * config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
> > (__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
> > (__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
> > (__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
> > (__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
> > * config/arm/arm_cde_builtins.def: New file.
> > * config/arm/iterators.md (V_reg): New attribute of SI.
> > * config/arm/predicates.md (const_int_coproc_operand): New.
> > (const_int_vcde1_operand, const_int_vcde2_operand): New.
> > (const_int_vcde3_operand): New.
> > * config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
> > * config/arm/vfp.md (arm_vcx1): New entry.
> > (arm_vcx1a, arm_vcx2, arm_vcx2a): Likewise.
> > (arm_vcx3, arm_vcx3a): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-04-07  Dennis Zhang  
> >
> > * gcc.target/arm/acle/cde_v_1.c: New test.
> > * gcc.target/arm/acle/cde_v_1_err.c: New test.
> > * gcc.target/arm/acle/cde_v_1_mve.c: New test.
> >
> > > Hi all,
> > >
> > > This patch is updated as attached.
> > > It's rebased to the top. Is it ready for commit please?
> > >
> > > Cheers
> > > Dennis
> > >
> > > > Hi all,
> > > >
> > > > This patch is part of a series that adds support for the ARMv8.m Custom
> > Datapath Extension (CDE).
> > > > It enables the ACLE intrinsics calling VCX1, VCX2, and VCX3
> > instructions who work with FPU/MVE 32-bit/64-bit registers.
> > > >
> > > > This patch depends on the CDE feature patch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
> > > > It also depends on the MVE framework patch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
> > > > ISA has been announced at
> > https://developer.arm.com/architectures/instruction-sets/custom-
> > instructions
> > > >
> > > > Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.
> > > >
> > > > Is it OK for commit please?
> > > >
> > > > Cheers
> > > > Dennis

Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature

2020-04-08 Thread Dennis Zhang
Hi Kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Wednesday, April 8, 2020 1:34 PM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension 
> (CDE): enable the feature
>
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 08 April 2020 12:34
> > To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> > Cc: nd ; Richard Earnshaw ;
> > Ramana Radhakrishnan 
> > Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> > (CDE): enable the feature
> >
> > Hi Kyrylo
> >
> > > Hi Dennis,
> > >
> > > > -Original Message-
> > > > From: Dennis Zhang 
> > > > Sent: 19 March 2020 14:03
> > > > To: Kyrylo Tkachov ; gcc-
> > patc...@gcc.gnu.org
> > > > Cc: nd ; Richard Earnshaw
> > ;
> > > > Ramana Radhakrishnan 
> > > > Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > Extension
> > > > (CDE): enable the feature
> > > >
> > > > Hi Kyrylo,
> > > >
> > > > >
> > > > >From: Kyrylo Tkachov 
> > > > >Sent: Wednesday, March 18, 2020 9:04 AM
> > > > >To: Dennis Zhang; gcc-patches@gcc.gnu.org
> > > > >Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> > > > >Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > > > >Extension (CDE): enable the feature
> > > > >
> > > > >Hi Dennis,
> > > > >
> > > > >> -Original Message-
> > > > >> From: Dennis Zhang 
> > > > >> Sent: 12 March 2020 12:06
> > > > >> To: gcc-patches@gcc.gnu.org
> > > > >> Cc: nd ; Richard Earnshaw
> > ;
> > > > >> Ramana Radhakrishnan ; Kyrylo
> > > > Tkachov
> > > > >> 
> > > > >> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > Extension
> > > > >> (CDE): enable the feature
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >> This patch is part of a series that adds support for the ARMv8.m
> > > > >> Custom Datapath Extension.
> > > > >> This patch defines the options cdecp0-cdecp7 for CLI to enable the
> > > > >> CDE on corresponding coprocessor 0-7.
> > > > >> It also adds new check-effective for CDE feature.
> > > > >>
> > > > >> ISA has been announced at
> > > > >> https://developer.arm.com/architectures/instruction-sets/custom-
> > > > >> instructions
> > > > >>
> > > > >> Regtested and bootstrapped.
> > > > >>
> > > > >> Is it OK to commit please?
> > > > >
> > > > >Can you please rebase this patch on top of the recent MVE commits?
> > > > >It currently doesn't apply cleanly to trunk.
> > > > >Thanks,
> > > > >Kyrill
> > > >
> > > > The rebase patches is as attached.
> > > > Is it OK to commit?
> > >
> > > Ok, with a few fixes...
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c
> > b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> > > new file mode 100644
> > > index 000..97643a08405
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> > > @@ -0,0 +1,98 @@
> > > +/* Test for CDE #prama target macros.  */
> > > +/* { dg-do compile } */
> > >
> > > Typo in "pragma" in the comment.
> > >
> > >
> > > +# A series of routines are created to 1) check if a given architecture is
> > > +# effective (check_effective_target_*_ok) and then 2) give the
> > corresponding
> > > +# flags that enable the architecture (add_options_for_*).
> > > +# The series includes:
> > > +#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
> > > +#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
> > > +#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
> > > +# Usage:
> > > +#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
> > > +#   /* { dg-add-options arm_v8m_main_cde } */
> > > +# The tests are valid for Arm.
>

Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature

2020-04-08 Thread Dennis Zhang
Hi Kyrylo

> Hi Dennis,
>
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 19 March 2020 14:03
> > To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> > Cc: nd ; Richard Earnshaw ;
> > Ramana Radhakrishnan 
> > Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> > (CDE): enable the feature
> >
> > Hi Kyrylo,
> >
> > >
> > >From: Kyrylo Tkachov 
> > >Sent: Wednesday, March 18, 2020 9:04 AM
> > >To: Dennis Zhang; gcc-patches@gcc.gnu.org
> > >Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> > >Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > >Extension (CDE): enable the feature
> > >
> > >Hi Dennis,
> > >
> > >> -Original Message-
> > >> From: Dennis Zhang 
> > >> Sent: 12 March 2020 12:06
> > >> To: gcc-patches@gcc.gnu.org
> > >> Cc: nd ; Richard Earnshaw ;
> > >> Ramana Radhakrishnan ; Kyrylo
> > Tkachov
> > >> 
> > >> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> > >> (CDE): enable the feature
> > >>
> > >> Hi all,
> > >>
> > >> This patch is part of a series that adds support for the ARMv8.m
> > >> Custom Datapath Extension.
> > >> This patch defines the options cdecp0-cdecp7 for CLI to enable the
> > >> CDE on corresponding coprocessor 0-7.
> > >> It also adds new check-effective for CDE feature.
> > >>
> > >> ISA has been announced at
> > >> https://developer.arm.com/architectures/instruction-sets/custom-
> > >> instructions
> > >>
> > >> Regtested and bootstrapped.
> > >>
> > >> Is it OK to commit please?
> > >
> > >Can you please rebase this patch on top of the recent MVE commits?
> > >It currently doesn't apply cleanly to trunk.
> > >Thanks,
> > >Kyrill
> >
> > The rebase patches is as attached.
> > Is it OK to commit?
>
> Ok, with a few fixes...
>
> diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c 
> b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> new file mode 100644
> index 000..97643a08405
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> @@ -0,0 +1,98 @@
> +/* Test for CDE #prama target macros.  */
> +/* { dg-do compile } */
>
> Typo in "pragma" in the comment.
>
>
> +# A series of routines are created to 1) check if a given architecture is
> +# effective (check_effective_target_*_ok) and then 2) give the corresponding
> +# flags that enable the architecture (add_options_for_*).
> +# The series includes:
> +#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
> +#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
> +#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
> +# Usage:
> +#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
> +#   /* { dg-add-options arm_v8m_main_cde } */
> +# The tests are valid for Arm.
> +
> +foreach { armfunc armflag armdef } {
>
>   New effective target checks need to be documented in doc/invoke.texi
>

Thanks a lot for the review.
The document has been updated and the changelog, too.
Is it ready to commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-04-08  Dennis Zhang  

* config.gcc: Add arm_cde.h.
* config/arm/arm-c.c (arm_cpu_builtins): Define or undefine
    __ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
* config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
* config/arm/arm.c (arm_option_reconfigure_globals): Configure
arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
* config/arm/arm.h (TARGET_CDE): New macro.
* config/arm/arm_cde.h: New file.
* doc/invoke.texi: Document CDE options +cdecp[0-7].
* doc/sourcebuild.texi (arm_v8m_main_cde_ok): Document new target
supports option.
(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

gcc/testsuite/ChangeLog:

2020-04-08  Dennis Zhang  

* gcc.target/arm/pragma_cde.c: New test.
* lib/target-supports.exp (arm_v8m_main_cde_ok): New target support
option.
(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

arm-m-cde-cli-20200408.patch
Description: arm-m-cde-cli-20200408.patch


Re: [PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers

2020-04-07 Thread Dennis Zhang
Hi all,

This patch is updated to support DImode for vfp target as required by CDE.
Changelog is updated as following.

Is this ready for commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-04-07  Dennis Zhang  
Matthew Malcomson 

* config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
(CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
(CX_TERNARY_QUALIFIERS): Likewise.
(ARM_BUILTIN_CDE_PATTERN_START): Likewise.
(ARM_BUILTIN_CDE_PATTERN_END): Likewise.
(arm_init_acle_builtins): Initialize CDE builtins.
(arm_expand_acle_builtin): Check CDE constant operands.
* config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the range
of CDE constant operand.
* config/arm/arm.c (arm_hard_regno_mode_ok): Support DImode for
TARGET_VFP_BASE.
(ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3): Likewise.
* config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
(__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
(__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
(__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
(__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
* config/arm/arm_cde_builtins.def: New file.
* config/arm/iterators.md (V_reg): New attribute of SI.
* config/arm/predicates.md (const_int_coproc_operand): New.
(const_int_vcde1_operand, const_int_vcde2_operand): New.
(const_int_vcde3_operand): New.
* config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
* config/arm/vfp.md (arm_vcx1): New entry.
(arm_vcx1a, arm_vcx2, arm_vcx2a): Likewise.
(arm_vcx3, arm_vcx3a): Likewise.

gcc/testsuite/ChangeLog:

2020-04-07  Dennis Zhang  

* gcc.target/arm/acle/cde_v_1.c: New test.
* gcc.target/arm/acle/cde_v_1_err.c: New test.
* gcc.target/arm/acle/cde_v_1_mve.c: New test.

> Hi all,
>
> This patch is updated as attached.
> It's rebased to the top. Is it ready for commit please?
>
> Cheers
> Dennis
>
> > Hi all,
> >
> > This patch is part of a series that adds support for the ARMv8.m Custom 
> > Datapath Extension (CDE).
> > It enables the ACLE intrinsics calling VCX1, VCX2, and VCX3 
> > instructions who work with FPU/MVE 32-bit/64-bit registers.
> >
> > This patch depends on the CDE feature patch: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
> > It also depends on the MVE framework patch: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
> > ISA has been announced at 
> > https://developer.arm.com/architectures/instruction-sets/custom-instructions
> >
> > Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.
> >
> > Is it OK for commit please?
> >
> > Cheers
> > Dennis
> >

arm-m-cde-vcxsidi-final-20200407-rb12663.patch
Description: arm-m-cde-vcxsidi-final-20200407-rb12663.patch


Re: [PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers

2020-03-20 Thread Dennis Zhang
Hi all,

This patch is updated as attached.
It's rebased to the top. Is it ready for commit please?

Cheers
Dennis

> Hi all,
>
> This patch is part of a series that adds support for the ARMv8.m Custom 
> Datapath Extension (CDE).
> It enables the ACLE intrinsics calling VCX1, VCX2, and VCX3 
> instructions who work with FPU/MVE 32-bit/64-bit registers.
>
> This patch depends on the CDE feature patch: 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
> It also depends on the MVE framework patch: 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
> ISA has been announced at 
> https://developer.arm.com/architectures/instruction-sets/custom-instructions
>
> Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.
>
> Is it OK for commit please?
>
> Cheers
> Dennis
>
> gcc/ChangeLog:
>
> 2020-03-12  Dennis Zhang  
>  Matthew Malcomson 
>
> * config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
> (CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
> (CX_TERNARY_QUALIFIERS): Likewise.
> (ARM_BUILTIN_CDE_PATTERN_START): Likewise.
> (ARM_BUILTIN_CDE_PATTERN_END): Likewise.
> (arm_init_acle_builtins): Initialize CDE builtins.
> (arm_expand_acle_builtin): Check CDE constant operands.
> * config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the range
> of CDE constant operand.
> (ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3): Likewise.
> * config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
> (__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
> (__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
> (__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
> (__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
> * config/arm/arm_cde_builtins.def: New file.
> * config/arm/iterators.md (V_reg): New attribute of SI.
> * config/arm/predicates.md (const_int_coproc_operand): New.
> (const_int_vcde1_operand, const_int_vcde2_operand): New.
> (const_int_vcde3_operand): New.
> * config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
> * config/arm/vfp.md (arm_vcx1): New entry.
>     (arm_vcx1a, arm_vcx2, arm_vcx2a): Likewise.
> (arm_vcx3, arm_vcx3a): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2020-03-12  Dennis Zhang  
>
> * gcc.target/arm/acle/cde_v_1.c: New test.
> * gcc.target/arm/acle/cde_v_1_err.c: New test.
> * gcc.target/arm/acle/cde_v_1_mve.c: New test.
>

arm-m-cde-vcxsidi-final-20200319.patch
Description: arm-m-cde-vcxsidi-final-20200319.patch


Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature

2020-03-19 Thread Dennis Zhang
Hi Kyrylo,

>
>From: Kyrylo Tkachov 
>Sent: Wednesday, March 18, 2020 9:04 AM
>To: Dennis Zhang; gcc-patches@gcc.gnu.org
>Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
>Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension 
>(CDE): enable the feature
>
>Hi Dennis,
>
>> -Original Message-
>> From: Dennis Zhang 
>> Sent: 12 March 2020 12:06
>> To: gcc-patches@gcc.gnu.org
>> Cc: nd ; Richard Earnshaw ;
>> Ramana Radhakrishnan ; Kyrylo Tkachov
>> 
>> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
>> (CDE): enable the feature
>>
>> Hi all,
>>
>> This patch is part of a series that adds support for the ARMv8.m
>> Custom Datapath Extension.
>> This patch defines the options cdecp0-cdecp7 for CLI to enable the CDE
>> on corresponding coprocessor 0-7.
>> It also adds new check-effective for CDE feature.
>>
>> ISA has been announced at
>> https://developer.arm.com/architectures/instruction-sets/custom-
>> instructions
>>
>> Regtested and bootstrapped.
>>
>> Is it OK to commit please?
>
>Can you please rebase this patch on top of the recent MVE commits?
>It currently doesn't apply cleanly to trunk.
>Thanks,
>Kyrill

The rebase patches is as attached.
Is it OK to commit?

Thanks
Dennis


arm-m-cde-cli-20200318.patch
Description: arm-m-cde-cli-20200318.patch


[PATCH][Arm][3/4] Implement scalar Custom Datapath Extension intrinsics

2020-03-17 Thread Dennis Zhang
Hi all,

This patch introduces the scalar CDE (Custom Datapath Extension) intrinsics for 
the arm backend.

There is nothing beyond the standard in this patch. We simply build upon what 
has been done by Dennis for the vector intrinsics 
(https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542008.html which this 
patch depends on).

We do add `+cdecp6` to the default arguments for `target-supports.exp`, this 
allows for using coprocessor 6 in tests.
This patch uses an alternate coprocessor to ease assembler scanning by looking 
for a use of coprocessor 6.

We also ensure that any DImode registers are put in an even-odd register pair 
when compiling for a target with CDE -- this avoids faulty code generation for 
-Os when producing the cx*d instructions.

Testing done:
Bootstrapped and regtested for arm-none-linux-gnueabihf.

This patch is done by Matthew. Is it OK for commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-03-03  Matthew Malcomson  

* config/arm/arm.c (arm_hard_regno_mode_ok): DImode registers forced 
into
even-odd register pairs for TARGET_CDE.
* config/arm/arm.h (ARM_CCDE_CONST_1): New.
(ARM_CCDE_CONST_2): New.
(ARM_CCDE_CONST_3): New.
* config/arm/arm.md (arm_cx1si, arm_cx1di arm_cx1asi, arm_cx1adi 
arm_cx2si,
arm_cx2di arm_cx2asi, arm_cx2adi arm_cx3si, arm_cx3di arm_cx3asi,
arm_cx3adi): New patterns.
* config/arm/arm_cde.h (__arm_cx1, __arm_cx1a, __arm_cx2, __arm_cx2a,
__arm_cx3, __arm_cx3a, __arm_cx1d, __arm_cx1da, __arm_cx2d, __arm_cx2da,
__arm_cx3d, __arm_cx3da): New ACLE function macros.
* config/arm/arm_cde_builtins.def (cx1, cx1a, cx2, cx2a, cx3, cx3a): 
Define
intrinsics.
* config/arm/iterators.md (cde_suffix, cde_dest): New mode attributes.
* config/arm/predicates.md (const_int_ccde1_operand,
const_int_ccde2_operand, const_int_ccde3_operand): New.
* config/arm/unspecs.md (UNSPEC_CDE, UNSPEC_CDEA): New.

gcc/testsuite/ChangeLog:

2020-03-03  Matthew Malcomson  

* gcc.target/arm/acle/cde-errors.c: New test.
* gcc.target/arm/acle/cde.c: New test.
* lib/target-supports.exp: Update CDE flags to enable coprocessor 6.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index b9c55cd6f39dc4806501543ff6157ef6ba787b4a..0d31d98a670b346c9488fba292a15e45285cb1fa 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -576,6 +576,9 @@ extern int arm_arch_cde;
 extern int arm_arch_cde_coproc;
 extern const int arm_arch_cde_coproc_bits[];
 #define ARM_CDE_CONST_COPROC	7
+#define ARM_CCDE_CONST_1	((1 << 13) - 1)
+#define ARM_CCDE_CONST_2	((1 << 9 ) - 1)
+#define ARM_CCDE_CONST_3	((1 << 6 ) - 1)
 #define ARM_VCDE_CONST_1	((1 << 11) - 1)
 #define ARM_VCDE_CONST_2	((1 << 6 ) - 1)
 #define ARM_VCDE_CONST_3	((1 << 3 ) - 1)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 55a4ebf5147a93ecd679c2d48c93e1441fc8c6d8..7b9e311c992a75ac7208cbf8301eb608d777a7c4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25066,10 +25066,11 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
   if (ARM_NUM_REGS (mode) > 4)
 	return false;
 
-  if (TARGET_THUMB2 && !TARGET_HAVE_MVE)
+  if (TARGET_THUMB2 && !(TARGET_HAVE_MVE || TARGET_CDE))
 	return true;
 
-  return !(TARGET_LDRD && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0);
+  return !((TARGET_LDRD || TARGET_CDE)
+	   && GET_MODE_SIZE (mode) > 4 && (regno & 1) != 0);
 }
 
   if (regno == FRAME_POINTER_REGNUM
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5387f972f5a864a153873f21b9423d28446daefc..9dd446fd2e97f7a608785080fef109f167961f13 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4407,6 +4407,70 @@
(set_attr "shift" "3")
(set_attr "type" "logic_shift_reg")])
 
+;; Custom Datapath Extension insns.
+(define_insn "arm_cx1"
+   [(set (match_operand:SIDI 0 "s_register_operand" "=r")
+	 (unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+	   (match_operand:SI 2 "const_int_ccde1_operand" "i")]
+	UNSPEC_CDE))]
+   "TARGET_CDE"
+   "cx1\\tp%c1, , %2"
+)
+
+(define_insn "arm_cx1a"
+   [(set (match_operand:SIDI 0 "s_register_operand" "=r")
+	 (unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		   (match_operand:SIDI 2 "s_register_operand" "0")
+	   (match_operand:SI 3 "const_int_ccde1_operand" "i")]
+	UNSPEC_CDEA))]
+   "TARGET_CDE"
+   "cx1a\\tp%c1, , %3"
+)
+
+(define_insn "arm_cx2"
+   [(set (match_operand:SIDI 0 "s_register_operand" "=r")
+	 (unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		   (match_operand:SI 2 "s_register_operand" "r")
+	   (match_operand:SI 3 "const_int_ccde2_operand" "i")]
+	UNSPEC_CDE))]
+   "TARGET_CDE"
+   "cx2\\tp%c1, , %2, %3"
+)
+
+(define_insn "arm_cx2a"
+   [(set (match_operand:SIDI 0 "s_register_operand" "=r")
+	 (unspec:SIDI [(match_operand:SI 1 

[PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers

2020-03-13 Thread Dennis Zhang
Hi all,

This patch is part of a series that adds support for the ARMv8.m Custom 
Datapath Extension (CDE).
It enables the ACLE intrinsics calling VCX1, VCX2, and VCX3 
instructions who work with FPU/MVE 32-bit/64-bit registers.

This patch depends on the CDE feature patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
It also depends on the MVE framework patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
ISA has been announced at 
https://developer.arm.com/architectures/instruction-sets/custom-instructions

Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.

Is it OK for commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-03-12  Dennis Zhang  
 Matthew Malcomson 

* config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
(CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
(CX_TERNARY_QUALIFIERS): Likewise.
(ARM_BUILTIN_CDE_PATTERN_START): Likewise.
(ARM_BUILTIN_CDE_PATTERN_END): Likewise.
(arm_init_acle_builtins): Initialize CDE builtins.
(arm_expand_acle_builtin): Check CDE constant operands.
* config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the range
of CDE constant operand.
(ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3): Likewise.
* config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
(__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
(__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
(__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
(__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
* config/arm/arm_cde_builtins.def: New file.
* config/arm/iterators.md (V_reg): New attribute of SI.
* config/arm/predicates.md (const_int_coproc_operand): New.
(const_int_vcde1_operand, const_int_vcde2_operand): New.
(const_int_vcde3_operand): New.
* config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
* config/arm/vfp.md (arm_vcx1): New entry.
(arm_vcx1a, arm_vcx2, arm_vcx2a): Likewise.
(arm_vcx3, arm_vcx3a): Likewise.

gcc/testsuite/ChangeLog:

2020-03-12  Dennis Zhang  

* gcc.target/arm/acle/cde_v_1.c: New test.
* gcc.target/arm/acle/cde_v_1_err.c: New test.
* gcc.target/arm/acle/cde_v_1_mve.c: New test.diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 4d31405cf6e09e3a61faa3e8142940bbdb23c60a..89142a276b071b069cddabb5170ad0d4ca213d20 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -305,6 +305,35 @@ arm_mrrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define MRRC_QUALIFIERS \
   (arm_mrrc_qualifiers)
 
+/* T (immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate, qualifier_unsigned_immediate };
+#define CX_IMM_QUALIFIERS (arm_cx_imm_qualifiers)
+
+/* T (immediate, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_unary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate, qualifier_none,
+  qualifier_unsigned_immediate };
+#define CX_UNARY_QUALIFIERS (arm_cx_unary_qualifiers)
+
+/* T (immediate, T, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_binary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate,
+  qualifier_none, qualifier_none,
+  qualifier_unsigned_immediate };
+#define CX_BINARY_QUALIFIERS (arm_cx_binary_qualifiers)
+
+/* T (immediate, T, T, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_ternary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate,
+  qualifier_none, qualifier_none, qualifier_none,
+  qualifier_unsigned_immediate };
+#define CX_TERNARY_QUALIFIERS (arm_cx_ternary_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
@@ -438,7 +467,23 @@ static arm_builtin_datum acle_builtin_data[] =
 };
 
 #undef VAR1
+/* IMM_MAX sets the maximum valid value of the CDE immediate operand.
+   ECF_FLAG sets the flag used for set_call_expr_flags.  */
+#define VAR1(T, N, A, IMM_MAX, ECF_FLAG) \
+  {{#N #A, UP (A), CODE_FOR_arm_##N##A, 0, T##_QUALIFIERS}, IMM_MAX, ECF_FLAG},
+
+typedef struct {
+  arm_builtin_datum base;
+  unsigned int imm_max;
+  int ecf_flag;
+} arm_builtin_cde_datum;
+
+static arm_builtin_cde_datum cde_builtin_data[] =
+{
+#include "arm_cde_builtins.def"
+};
 
+#undef VAR1
 #define VAR1(T, N, X) \
   ARM_BUILTIN_NEON_##N##X,
 
@@ -732,6 +777,14 @@ enum arm_builtins
 
 #include "arm_acle_builtins.def"
 
+#undef VAR1
+#define VAR1(T, N, X, ... ) \
+  ARM_BUILTIN_##N##X,
+
+  ARM_BUILTIN_CDE_BASE,
+
+#include "arm_cde_builtins.

[PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature

2020-03-12 Thread Dennis Zhang
Hi all,

This patch is part of a series that adds support for the ARMv8.m Custom 
Datapath Extension.
This patch defines the options cdecp0-cdecp7 for CLI to enable the CDE on 
corresponding coprocessor 0-7.
It also adds new check-effective for CDE feature.

ISA has been announced at 
https://developer.arm.com/architectures/instruction-sets/custom-instructions

Regtested and bootstrapped.

Is it OK to commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-03-11  Dennis Zhang  

* config.gcc: Add arm_cde.h.
* config/arm/arm-c.c (arm_cpu_builtins): Define or undefine
__ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
* config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
* config/arm/arm.c (arm_option_reconfigure_globals): Configure
arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
* config/arm/arm.h (TARGET_CDE): New macro.
* config/arm/arm_cde.h: New file.
* doc/invoke.texi: Document cdecp[0-7] options.

gcc/testsuite/ChangeLog:

2020-03-11  Dennis Zhang  

* gcc.target/arm/pragma_cde.c: New test.
* lib/target-supports.exp (arm_v8m_main_cde): New check effective.
(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2df4b36d190..43967b7d1ff 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -346,7 +346,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_cde.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 38edaff17a2..77753015b34 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -227,6 +227,12 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
 }
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_CDE", TARGET_CDE);
+  cpp_undef (pfile, "__ARM_FEATURE_CDE_COPROC");
+  if (TARGET_CDE)
+builtin_define_with_int_value ("__ARM_FEATURE_CDE_COPROC",
+   arm_arch_cde_coproc);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
 		  TARGET_BF16_FP);
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 96f584da325..5a7498e18db 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -207,6 +207,16 @@ define feature i8mm
 # Brain half-precision floating-point extension. Optional from v8.2-A.
 define feature bf16
 
+# Arm Custom Datapath Extension (CDE).
+define feature cdecp0
+define feature cdecp1
+define feature cdecp2
+define feature cdecp3
+define feature cdecp4
+define feature cdecp5
+define feature cdecp6
+define feature cdecp7
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -670,6 +680,14 @@ begin arch armv8-m.main
  option fp.dp add FPv5 FP_DBL
  option nofp remove ALL_FP
  option nodsp remove armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8-m.main
 
 begin arch armv8-r
@@ -701,6 +719,14 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add mve armv7em
  option mve.fp add mve FPv5 fp16 mve_float armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8.1-m.main
 
 begin arch iwmmxt
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9cc7bc0e562..9f1e1ec5c88 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1021,6 +1021,13 @@ int arm_arch_i8mm = 0;
 /* Nonzero if chip supports the BFloat16 instructions.  */
 int arm_arch_bf16 = 0;
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+int arm_arch_cde = 0;
+int arm_arch_cde_coproc = 0;
+const int arm_arch_cde_coproc_bits[] = {
+  0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
+};
+
 /* The condition codes of the ARM, and the inverse function.  */
 static const char * const arm_condition_codes[] =
 {
@@ -3740,6 +3747,21 @@ arm_option_reconfigure_globals (void)
   arm_fp16_format = ARM_FP16_FORMAT_IEEE;
 }
 
+  arm_arch_cde = 0;
+  arm_arch_cde_coproc = 0;
+  int cde_bits[] = {isa_bit_cdecp0, isa_bit_cdecp1, isa_bit_cdecp2,
+		isa_bit_cdecp3, isa_bit_c

Re: [Ping][PATCH][Arm] ACLE intrinsics: AdvSIMD BFloat16 convert instructions

2020-03-03 Thread Dennis Zhang

Hi Kyrill

On 03/03/2020 09:39, Kyrill Tkachov wrote:

Hi Dennis,

On 3/2/20 5:41 PM, Dennis Zhang wrote:

Hi all,

On 17/01/2020 16:46, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on Arm BFMode patch
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> This patch implements intrinsics to convert between bfloat16 and 
float32

> formats.
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regression tested.
>
> Is it OK for trunk please?



Ok.

Thanks,

Kyrill


Thanks for the approval.
It's pushed as 8e6d0dba166324f4b257329bd4b4ddc2b4522359.

Cheers
Dennis





>
> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2020-01-17  Dennis Zhang  
>
>  * config/arm/arm_bf16.h (vcvtah_f32_bf16, vcvth_bf16_f32): New.
>  * config/arm/arm_neon.h (vcvt_f32_bf16, vcvtq_low_f32_bf16): New.
>  (vcvtq_high_f32_bf16, vcvt_bf16_f32): New.
>  (vcvtq_low_bf16_f32, vcvtq_high_bf16_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfcvt, vbfcvt_high): New 
entries.

>  (vbfcvtv4sf, vbfcvtv4sf_high): Likewise.
>  * config/arm/iterators.md (VBFCVT, VBFCVTM): New mode iterators.
>  (V_bf_low, V_bf_cvt_m): New mode attributes.
>  * config/arm/neon.md (neon_vbfcvtv4sf): New.
>  (neon_vbfcvtv4sf_highv8bf, neon_vbfcvtsf): New.
>  (neon_vbfcvt, neon_vbfcvt_highv8bf): New.
>  (neon_vbfcvtbf_cvtmode, neon_vbfcvtbf): New
>  * config/arm/unspecs.md (UNSPEC_BFCVT, UNSPEC_BFCVT_HIG): New.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-17  Dennis Zhang  
>
>  * gcc.target/arm/simd/bf16_cvt_1.c: New test.
>
>

The tests are updated in this patch for assembly test.
Rebased to trunk top.

Is it OK to commit please?

Cheers
Dennis


[Ping][PATCH][Arm] ACLE intrinsics: AdvSIMD BFloat16 convert instructions

2020-03-02 Thread Dennis Zhang

Hi all,

On 17/01/2020 16:46, Dennis Zhang wrote:

Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It depends on Arm BFMode patch 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html


This patch implements intrinsics to convert between bfloat16 and float32 
formats.

ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested.

Is it OK for trunk please?

Thanks,
Dennis

gcc/ChangeLog:

2020-01-17  Dennis Zhang  

 * config/arm/arm_bf16.h (vcvtah_f32_bf16, vcvth_bf16_f32): New.
 * config/arm/arm_neon.h (vcvt_f32_bf16, vcvtq_low_f32_bf16): New.
 (vcvtq_high_f32_bf16, vcvt_bf16_f32): New.
 (vcvtq_low_bf16_f32, vcvtq_high_bf16_f32): New.
 * config/arm/arm_neon_builtins.def (vbfcvt, vbfcvt_high): New entries.
 (vbfcvtv4sf, vbfcvtv4sf_high): Likewise.
 * config/arm/iterators.md (VBFCVT, VBFCVTM): New mode iterators.
 (V_bf_low, V_bf_cvt_m): New mode attributes.
 * config/arm/neon.md (neon_vbfcvtv4sf): New.
 (neon_vbfcvtv4sf_highv8bf, neon_vbfcvtsf): New.
 (neon_vbfcvt, neon_vbfcvt_highv8bf): New.
 (neon_vbfcvtbf_cvtmode, neon_vbfcvtbf): New
 * config/arm/unspecs.md (UNSPEC_BFCVT, UNSPEC_BFCVT_HIG): New.

gcc/testsuite/ChangeLog:

2020-01-17  Dennis Zhang  

 * gcc.target/arm/simd/bf16_cvt_1.c: New test.




The tests are updated in this patch for assembly test.
Rebased to trunk top.

Is it OK to commit please?

Cheers
Dennis
diff --git a/gcc/config/arm/arm_bf16.h b/gcc/config/arm/arm_bf16.h
index decf23f3834..1aa593192c0 100644
--- a/gcc/config/arm/arm_bf16.h
+++ b/gcc/config/arm/arm_bf16.h
@@ -34,6 +34,20 @@ extern "C" {
 typedef __bf16 bfloat16_t;
 typedef float float32_t;
 
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtah_f32_bf16 (bfloat16_t __a)
+{
+  return __builtin_neon_vbfcvtbf (__a);
+}
+
+__extension__ extern __inline bfloat16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvth_bf16_f32 (float32_t __a)
+{
+  return __builtin_neon_vbfcvtsf (__a);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 81c407f5152..a66961d0c51 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -19379,6 +19379,55 @@ vbfdotq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
 
 #pragma GCC pop_options
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_f32_bf16 (bfloat16x4_t __a)
+{
+  return __builtin_neon_vbfcvtv4bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_neon_vbfcvtv8bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_neon_vbfcvt_highv8bf (__a);
+}
+
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_bf16_f32 (float32x4_t __a)
+{
+  return __builtin_neon_vbfcvtv4sfv4bf (__a);
+}
+
+__extension__ extern __inline bfloat16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_bf16_f32 (float32x4_t __a)
+{
+  return __builtin_neon_vbfcvtv4sfv8bf (__a);
+}
+
+/* The 'inactive' operand is not converted but it provides the
+   low 64 bits to assemble the final 128-bit result.  */
+__extension__ extern __inline bfloat16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_bf16_f32 (bfloat16x8_t inactive, float32x4_t __a)
+{
+  return __builtin_neon_vbfcvtv4sf_highv8bf (inactive, __a);
+}
+
+#pragma GCC pop_options
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 4b4d1c808d8..48c06c43a17 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -385,3 +385,9 @@ VAR1 (USTERNOP, usmmla, v16qi)
 VAR2 (TERNOP, vbfdot, v2sf, v4sf)
 VAR2 (MAC_LANE_PAIR, vbfdot_lanev4bf, v2sf, v4sf)
 VAR2 (MAC_LANE_PAIR, vbfdot_lanev8bf, v2sf, v4sf)
+
+VAR2 (UNOP, vbfcvt, sf, bf)
+VAR2 (UNOP, vbfcvt, v4bf, v8bf)
+VAR1 (UNOP, vbfcvt_high, v8bf)
+VAR2 (UNOP, vbfcvtv4sf, v4bf, v8bf)
+VAR1 (BINOP, vbfcvtv4sf_high, v8bf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index ab30c371583..5f4e3d12358 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -229,6 +229,10 @@
 ;; Modes for polynomial or float values.
 (define_mode_iterator VPF [V8QI V16QI V2SF V4SF])
 
+;; Modes for BF16 convert instructions.
+(define_mode_iterator VBFCVT [V4BF V8BF])
+(define_mode_itera

Re: [Ping][PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product

2020-02-25 Thread Dennis Zhang

Hi Kyrill,

On 25/02/2020 17:22, Kyrill Tkachov wrote:

Hi Dennis,

On 2/25/20 5:18 PM, Dennis Zhang wrote:

Hi Kyrill,

On 25/02/2020 12:18, Kyrill Tkachov wrote:

Hi Dennis,

On 2/25/20 11:54 AM, Dennis Zhang wrote:

Hi all,

On 07/01/2020 12:12, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the patch enabling Arm BFmode
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> This patch adds intrinsics for brain half-precision float-point dot
> product.
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regression tested for arm-none-linux-gnueabi-armv8-a.
>
> Is it OK for trunk please?
>
> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2020-01-03  Dennis Zhang  
>
>  * config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
>  (vbfdot_lane_f32, vbfdotq_laneq_f32): New.
>  (vbfdot_laneq_f32, vbfdotq_lane_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfdot): New.
>  (vbfdot_lanev4bf, vbfdot_lanev8bf): New.
>  * config/arm/iterators.md (VSF2BF): New mode attribute.
>  * config/arm/neon.md (neon_vbfdot): New.
>  (neon_vbfdot_lanev4bf): New.
>  (neon_vbfdot_lanev8bf): New.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-03  Dennis Zhang  
>
>  * gcc.target/arm/simd/bf16_dot_1.c: New test.
>  * gcc.target/arm/simd/bf16_dot_2.c: New test.
>

This patch updates tests in bf16_dot_1.c to make proper assembly check.
Is it OK for trunk, please?

Cheers
Dennis


Looks ok but...


new file mode 100644
index 000..c533f9d0b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_dot_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+
+#include "arm_neon.h"
+
+float32x2_t
+test_vbfdot_lane_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x4_t b)
+{
+  return __builtin_neon_vbfdot_lanev4bfv2sf (r, a, b, 2); /* { 
dg-error {out of range 0 - 1} } */

+}
+
+float32x4_t
+test_vbfdotq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return __builtin_neon_vbfdot_lanev4bfv4sf (r, a, b, 2); /* { 
dg-error {out of range 0 - 1} } */

+}
+
+float32x2_t
+test_vbfdot_laneq_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x8_t b)
+{
+  return __builtin_neon_vbfdot_lanev8bfv2sf (r, a, b, 4); /* { 
dg-error {out of range 0 - 3} } */

+}
+
+float32x4_t
+test_vbfdotq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return __builtin_neon_vbfdot_lanev8bfv4sf (r, a, b, 4); /* { 
dg-error {out of range 0 - 3} } */

+}

These  tests shouldn't be calling the __builtin* directly, they are 
just an implementation detail.

What we want to test is the intrinsic itself.
Thanks,
Kyrill



Many thanks for the review.
The issue is fixed in the updated patch.
Is it ready please?



Ok.

Thanks,

Kyrill




Thanks for the approval!
The patch is pushed as eb7ba6c36b8a17c79936abe26245e4bc66bb8859.

Cheers
Dennis


Re: [Ping][PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product

2020-02-25 Thread Dennis Zhang

Hi Kyrill,

On 25/02/2020 12:18, Kyrill Tkachov wrote:

Hi Dennis,

On 2/25/20 11:54 AM, Dennis Zhang wrote:

Hi all,

On 07/01/2020 12:12, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the patch enabling Arm BFmode
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> This patch adds intrinsics for brain half-precision float-point dot
> product.
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regression tested for arm-none-linux-gnueabi-armv8-a.
>
> Is it OK for trunk please?
>
> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2020-01-03  Dennis Zhang  
>
>  * config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
>  (vbfdot_lane_f32, vbfdotq_laneq_f32): New.
>  (vbfdot_laneq_f32, vbfdotq_lane_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfdot): New.
>  (vbfdot_lanev4bf, vbfdot_lanev8bf): New.
>  * config/arm/iterators.md (VSF2BF): New mode attribute.
>  * config/arm/neon.md (neon_vbfdot): New.
>  (neon_vbfdot_lanev4bf): New.
>  (neon_vbfdot_lanev8bf): New.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-03  Dennis Zhang  
>
>  * gcc.target/arm/simd/bf16_dot_1.c: New test.
>  * gcc.target/arm/simd/bf16_dot_2.c: New test.
>

This patch updates tests in bf16_dot_1.c to make proper assembly check.
Is it OK for trunk, please?

Cheers
Dennis


Looks ok but...


new file mode 100644
index 000..c533f9d0b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_dot_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+
+#include "arm_neon.h"
+
+float32x2_t
+test_vbfdot_lane_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x4_t b)
+{
+  return __builtin_neon_vbfdot_lanev4bfv2sf (r, a, b, 2); /* { dg-error 
{out of range 0 - 1} } */

+}
+
+float32x4_t
+test_vbfdotq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return __builtin_neon_vbfdot_lanev4bfv4sf (r, a, b, 2); /* { dg-error 
{out of range 0 - 1} } */

+}
+
+float32x2_t
+test_vbfdot_laneq_f32 (float32x2_t r, bfloat16x4_t a, bfloat16x8_t b)
+{
+  return __builtin_neon_vbfdot_lanev8bfv2sf (r, a, b, 4); /* { dg-error 
{out of range 0 - 3} } */

+}
+
+float32x4_t
+test_vbfdotq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return __builtin_neon_vbfdot_lanev8bfv4sf (r, a, b, 4); /* { dg-error 
{out of range 0 - 3} } */

+}

These  tests shouldn't be calling the __builtin* directly, they are just 
an implementation detail.

What we want to test is the intrinsic itself.
Thanks,
Kyrill



Many thanks for the review.
The issue is fixed in the updated patch.
Is it ready please?

Dennis
Cheers

gcc/ChangeLog:

2020-02-25  Dennis Zhang  

* config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
(vbfdot_lane_f32, vbfdotq_laneq_f32): New.
(vbfdot_laneq_f32, vbfdotq_lane_f32): New.
* config/arm/arm_neon_builtins.def (vbfdot): New entry.
(vbfdot_lanev4bf, vbfdot_lanev8bf): Likewise.
* config/arm/iterators.md (VSF2BF): New attribute.
* config/arm/neon.md (neon_vbfdot): New entry.
    (neon_vbfdot_lanev4bf): Likewise.
(neon_vbfdot_lanev8bf): Likewise.

gcc/testsuite/ChangeLog:

2020-02-25  Dennis Zhang  

* gcc.target/arm/simd/bf16_dot_1.c: New test.
* gcc.target/arm/simd/bf16_dot_2.c: New test.
* gcc.target/arm/simd/bf16_dot_3.c: New test.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index e81681aa415..d2ebee40538 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18819,6 +18819,58 @@ vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 
 #pragma GCC pop_options
 
+/* AdvSIMD Brain half-precision float-point (Bfloat16) intrinsics.  */
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b)
+{
+  return __builtin_neon_vbfdotv2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfdotv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_lane_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vbfdot_lanev4bfv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_

[Ping][PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product

2020-02-25 Thread Dennis Zhang

Hi all,

On 07/01/2020 12:12, Dennis Zhang wrote:

Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It depends on the patch enabling Arm BFmode 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html


This patch adds intrinsics for brain half-precision float-point dot 
product.

ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested for arm-none-linux-gnueabi-armv8-a.

Is it OK for trunk please?

Thanks,
Dennis

gcc/ChangeLog:

2020-01-03  Dennis Zhang  

 * config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
 (vbfdot_lane_f32, vbfdotq_laneq_f32): New.
 (vbfdot_laneq_f32, vbfdotq_lane_f32): New.
 * config/arm/arm_neon_builtins.def (vbfdot): New.
 (vbfdot_lanev4bf, vbfdot_lanev8bf): New.
 * config/arm/iterators.md (VSF2BF): New mode attribute.
 * config/arm/neon.md (neon_vbfdot): New.
 (neon_vbfdot_lanev4bf): New.
 (neon_vbfdot_lanev8bf): New.

gcc/testsuite/ChangeLog:

2020-01-03  Dennis Zhang  

 * gcc.target/arm/simd/bf16_dot_1.c: New test.
 * gcc.target/arm/simd/bf16_dot_2.c: New test.



This patch updates tests in bf16_dot_1.c to make proper assembly check.
Is it OK for trunk, please?

Cheers
Dennis
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index e81681aa415..d2ebee40538 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18819,6 +18819,58 @@ vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 
 #pragma GCC pop_options
 
+/* AdvSIMD Brain half-precision float-point (Bfloat16) intrinsics.  */
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b)
+{
+  return __builtin_neon_vbfdotv2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfdotv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_lane_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vbfdot_lanev4bfv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		   const int __index)
+{
+  return __builtin_neon_vbfdot_lanev8bfv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_laneq_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x8_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vbfdot_lanev8bfv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vbfdot_lanev4bfv4sf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index f4a97fd764c..4a6f4cfc44e 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -381,3 +381,7 @@ VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
 VAR1 (TERNOP, smmla, v16qi)
 VAR1 (UTERNOP, ummla, v16qi)
 VAR1 (USTERNOP, usmmla, v16qi)
+
+VAR2 (TERNOP, vbfdot, v2sf, v4sf)
+VAR2 (MAC_LANE_PAIR, vbfdot_lanev4bf, v2sf, v4sf)
+VAR2 (MAC_LANE_PAIR, vbfdot_lanev8bf, v2sf, v4sf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 136c45274ae..b435a05d219 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -835,6 +835,8 @@
 (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
 (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
 
+(define_mode_attr VSF2BF [(V2SF "V4BF") (V4SF "V8BF")])
+
 ;;
 ;; Code attributes
 ;;
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 039cd90c3da..80e94de4b84 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -6596,3 +6596,51 @@ if (BYTES_BIG_ENDIAN)
   "vmmla.\t%q0, %q2, %q3"
   [(set_attr "type" "neon_mla_s_q")]
 )
+
+(define_insn "neon_vbfdot"
+  [(set (match_operand:VCVTF 0 "register_operand" "=w")
+	(plus:VCVTF (match_operand:

Re: [Ping][PATCH][Arm] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-02-21 Thread Dennis Zhang

Hi Kyrill,

On 21/02/2020 11:47, Kyrill Tkachov wrote:

Hi Dennis,

On 2/11/20 12:03 PM, Dennis Zhang wrote:

Hi all,

On 16/12/2019 13:45, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the Arm Armv8.6-A CLI patch,
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html.
> It also depends on the Armv8.6-A effective target checking patch,
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.
> It also depends on the ARMv8.6-A I8MM dot product patch for using the
> same builtin qualifier
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00945.html.
>
> This patch adds intrinsics for matrix multiply-accumulate operations
> including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.
>
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regtested for arm-none-linux-gnueabi-armv8.2-a.
>
> Is it OK for trunk please?
>


This is ok.

Thanks,

Kyrill



Thanks a lot for the approval.
The patch has been pushed as 436016f45694c7236e2e9f9db2adb0b4d9bf6b94.

Bests
Dennis


[Ping][PATCH][Arm] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-02-11 Thread Dennis Zhang

Hi all,

On 16/12/2019 13:45, Dennis Zhang wrote:

Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It depends on the Arm Armv8.6-A CLI patch, 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html.
It also depends on the Armv8.6-A effective target checking patch, 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.
It also depends on the ARMv8.6-A I8MM dot product patch for using the 
same builtin qualifier 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00945.html.


This patch adds intrinsics for matrix multiply-accumulate operations 
including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regtested for arm-none-linux-gnueabi-armv8.2-a.

Is it OK for trunk please?

Thanks,
Dennis

gcc/ChangeLog:

2019-12-10  Dennis Zhang  

 * config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, vusmmlaq_s32): New.
 * config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New.
 * config/arm/iterators.md (MATMUL): New.
 (sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US.
 (mmla_sfx): New.
 * config/arm/neon.md (neon_mmlav16qi): New.
 * config/arm/unspecs.md (UNSPEC_MATMUL_S): New.
 (UNSPEC_MATMUL_U, UNSPEC_MATMUL_US): New.

gcc/testsuite/ChangeLog:

2019-12-10  Dennis Zhang  

 * gcc.target/arm/simd/vmmla_1.c: New test.


This patch has been updated according to the feedback on related AArch64 
version at https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01591.html


Regtested. OK to commit please?

Many thanks
Dennis

gcc/ChangeLog:

2020-02-11  Dennis Zhang  

* config/arm/arm-builtins.c (USTERNOP_QUALIFIERS): New macro.
* config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, vusmmlaq_s32): New.
* config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New.
* config/arm/iterators.md (MATMUL): New iterator.
(sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US.
(mmla_sfx): New attribute.
* config/arm/neon.md (neon_mmlav16qi): New.
* config/arm/unspecs.md (UNSPEC_MATMUL_S, UNSPEC_MATMUL_U): New.
(UNSPEC_MATMUL_US): New.

gcc/testsuite/ChangeLog:

2020-02-11  Dennis Zhang  

* gcc.target/arm/simd/vmmla_1.c: New test.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 7f279cca668..60c65c1772f 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -122,6 +122,11 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned };
 #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers)
 
+static enum arm_type_qualifiers
+arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
+#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers)
+
 /* T (T, immediate).  */
 static enum arm_type_qualifiers
 arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3c78f435009..7461c90e3fe 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18745,6 +18745,34 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
 #pragma GCC pop_options
 #endif
 
+/* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_neon_smmlav16qi (__r, __a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_neon_ummlav16qi_ (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_neon_usmmlav16qi_ssus (__r, __a, __b);
+}
+
+#pragma GCC pop_options
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index e9ff4e501cb..d304cdb33cc 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -373,3 +373,7 @@ VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
+
+VAR1 (TERNOP, smmla, v16qi)
+VAR1 (UTERNOP, ummla, v16qi)
+VAR1 (USTERNOP, usmmla, v16qi)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 33e29509f00..141ad96d6db 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -485,6 +485,8 @@
 (define_int_iterator VCADD [UNSP

Re: [PATCH][AArch64] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-02-07 Thread Dennis Zhang

Hi all,

On 27/01/2020 13:01, Richard Sandiford wrote:

Dennis Zhang  writes:

[...]
gcc/ChangeLog:

2020-01-23  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SSUS): New macro.
* config/aarch64/aarch64-simd-builtins.def (simd_smmla): New.
(simd_ummla, simd_usmmla): New.
* config/aarch64/aarch64-simd.md (aarch64_simd_mmlav16qi): New.
* config/aarch64/arm_neon.h (vmmlaq_s32, vmmlaq_u32): New.
(vusmmlaq_s32): New.
* config/aarch64/iterators.md (unspec): Add UNSPEC_SMATMUL,
UNSPEC_UMATMUL, and UNSPEC_USMATMUL.
(sur): Likewise.
(MATMUL): New iterator.

gcc/testsuite/ChangeLog:

2020-01-23  Dennis Zhang  

* gcc.target/aarch64/simd/vmmla.c: New test.


OK, thanks.

One note below...


diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index f0e0461b7f0..033a6d4e92f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -176,6 +176,10 @@ aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_unsigned,
qualifier_unsigned, qualifier_immediate };
  #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
+#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
  
  
  static enum aarch64_type_qualifiers

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 57fc5933b43..885c2540514 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -682,3 +682,8 @@
BUILTIN_VSFDF (UNOP, frint32x, 0)
BUILTIN_VSFDF (UNOP, frint64z, 0)
BUILTIN_VSFDF (UNOP, frint64x, 0)
+
+  /* Implemented by aarch64_simd_mmlav16qi.  */
+  VAR1 (TERNOP, simd_smmla, 0, v16qi)
+  VAR1 (TERNOPU, simd_ummla, 0, v16qi)
+  VAR1 (TERNOP_SSUS, simd_usmmla, 0, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 2989096b170..b7659068b7d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7025,3 +7025,15 @@
"xtn\t%0., %1."
[(set_attr "type" "neon_shift_imm_narrow_q")]
  )
+
+;; 8-bit integer matrix multiply-accumulate
+(define_insn "aarch64_simd_mmlav16qi"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+   (plus:V4SI
+(unspec:V4SI [(match_operand:V16QI 2 "register_operand" "w")
+  (match_operand:V16QI 3 "register_operand" "w")] MATMUL)
+(match_operand:V4SI 1 "register_operand" "0")))]
+  "TARGET_I8MM"
+  "mmla\\t%0.4s, %2.16b, %3.16b"
+  [(set_attr "type" "neon_mla_s_q")]
+)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index eaba156e26c..918000d98dc 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34609,6 +34609,36 @@ vrnd64xq_f64 (float64x2_t __a)
  
  #pragma GCC pop_options
  
+/* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */

+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+/* Matrix Multiply-Accumulate.  */
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_aarch64_simd_smmlav16qi (__r, __a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_aarch64_simd_ummlav16qi_ (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_aarch64_simd_usmmlav16qi_ssus (__r, __a, __b);
+}
+
+#pragma GCC pop_options
+
  #include "arm_bf16.h"
  
  #undef __aarch64_vget_lane_any

diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index b9843b83c5f..57aca36f646 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -581,6 +581,9 @@
  UNSPEC_FMLSL  ; Used in aarch64-simd.md.
  UNSPEC_FMLAL2 ; Used in aarch64-simd.md.
  UNSPEC_FMLSL2 ; Used in aarch64-simd.md.
+UNSPEC_SMATMUL ; Used in aarch64-simd.md.
+UNSPEC_UMATMUL ; Used in aarch64-simd.md.
+UNSPEC_USMATMUL; Used in aarch64-simd.md.
  UNSPEC_ADR; Used in aarch64-sve.md.
  UNSPEC_SEL; Used in aarch64-sve.md.
  UNSPEC_BRKA   ; Used in aa

Re: [PATCH][AArch64] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-01-27 Thread Dennis Zhang
Hi Richard,

On 23/01/2020 15:28, Richard Sandiford wrote:
> Dennis Zhang  writes:
>> Hi all,
>> On 16/12/2019 13:53, Dennis Zhang wrote:
>>> Hi all,
>>>
>>> This patch is part of a series adding support for Armv8.6-A features.
>>> It depends on the Armv8.6-A effective target checking patch,
>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.
>>>
>>> This patch adds intrinsics for matrix multiply-accumulate operations
>>> including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.
>>>
>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>
>>> Regtested & bootstrapped for aarch64-none-linux-gnu.
>>>
>>> Is it OK for trunk please?
>>>
>>
>> This patch is rebased to the trunk top.
>> There is no dependence on any other patches now.
>> Regtested again.
>>
>> Is it OK for trunk please?
>>
>> Cheers
>> Dennis
>>
>> gcc/ChangeLog:
>>
>> 2020-01-23  Dennis Zhang  
>>
>>  * config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SSUS): New macro.
>>  * config/aarch64/aarch64-simd-builtins.def (simd_smmla): New.
>>  (simd_ummla, simd_usmmla): New.
>>  * config/aarch64/aarch64-simd.md (aarch64_simd_mmlav16qi): New.
>>  * config/aarch64/arm_neon.h (vmmlaq_s32, vmmlaq_u32): New.
>>  (vusmmlaq_s32): New.
>>  * config/aarch64/iterators.md (unspec): Add UNSPEC_SMATMUL,
>>  UNSPEC_UMATMUL, and UNSPEC_USMATMUL.
>>  (sur): Likewise.
>>  (MATMUL): New iterator.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2020-01-23  Dennis Zhang  
>>
>>  * gcc.target/aarch64/advsimd-intrinsics/vmmla.c: New test.
>>
>> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
>> b/gcc/config/aarch64/aarch64-builtins.c
>> index f0e0461b7f0..033a6d4e92f 100644
>> --- a/gcc/config/aarch64/aarch64-builtins.c
>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>> @@ -176,6 +176,10 @@ 
>> aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>> = { qualifier_unsigned, qualifier_unsigned,
>> qualifier_unsigned, qualifier_immediate };
>>   #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers)
>> +static enum aarch64_type_qualifiers
>> +aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>> +  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
>> +#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
>>   
>>   
>>   static enum aarch64_type_qualifiers
>> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
>> b/gcc/config/aarch64/aarch64-simd-builtins.def
>> index 57fc5933b43..06025b110cc 100644
>> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
>> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
>> @@ -682,3 +682,8 @@
>> BUILTIN_VSFDF (UNOP, frint32x, 0)
>> BUILTIN_VSFDF (UNOP, frint64z, 0)
>> BUILTIN_VSFDF (UNOP, frint64x, 0)
>> +
>> +  /* Implemented by aarch64_simd_mmlav16qi.  */
>> +  VAR1 (TERNOP, simd_smmla, 0, v16qi)
>> +  VAR1 (TERNOPU, simd_ummla, 0, v16qi)
>> +  VAR1 (TERNOP_SSUS, simd_usmmla, 0, v16qi)
>> \ No newline at end of file
>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 2989096b170..409ec28d293 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -7025,3 +7025,15 @@
>> "xtn\t%0., %1."
>> [(set_attr "type" "neon_shift_imm_narrow_q")]
>>   )
>> +
>> +;; 8-bit integer matrix multiply-accumulate
>> +(define_insn "aarch64_simd_mmlav16qi"
>> +  [(set (match_operand:V4SI 0 "register_operand" "=w")
>> +(plus:V4SI (match_operand:V4SI 1 "register_operand" "0")
>> +   (unspec:V4SI [(match_operand:V16QI 2 "register_operand" "w")
>> + (match_operand:V16QI 3 "register_operand" "w")]
>> +MATMUL)))]
>> +  "TARGET_I8MM"
>> +  "mmla\\t%0.4s, %2.16b, %3.16b"
>> +  [(set_attr "type" "neon_mla_s_q")]
>> +)
>> \ No newline at end of file
> 
> (Would be good to add the newline)
> 
> The canonical rtl order for commutative operations like plus is
> to put the most complicated expression first (roughly speaking --
> the rules are a bit

Re: [PATCH][AArch64] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-01-23 Thread Dennis Zhang
Hi all,

On 16/12/2019 13:53, Dennis Zhang wrote:
> Hi all,
> 
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the Armv8.6-A effective target checking patch, 
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.
> 
> This patch adds intrinsics for matrix multiply-accumulate operations 
> including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.
> 
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
> 
> Regtested & bootstrapped for aarch64-none-linux-gnu.
> 
> Is it OK for trunk please?
> 

This patch is rebased to the trunk top.
There is no dependence on any other patches now.
Regtested again.

Is it OK for trunk please?

Cheers
Dennis

gcc/ChangeLog:

2020-01-23  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SSUS): New macro.
* config/aarch64/aarch64-simd-builtins.def (simd_smmla): New.
(simd_ummla, simd_usmmla): New.
* config/aarch64/aarch64-simd.md (aarch64_simd_mmlav16qi): New.
* config/aarch64/arm_neon.h (vmmlaq_s32, vmmlaq_u32): New.
(vusmmlaq_s32): New.
* config/aarch64/iterators.md (unspec): Add UNSPEC_SMATMUL,
UNSPEC_UMATMUL, and UNSPEC_USMATMUL.
(sur): Likewise.
(MATMUL): New iterator.

gcc/testsuite/ChangeLog:

2020-01-23  Dennis Zhang  

* gcc.target/aarch64/advsimd-intrinsics/vmmla.c: New test.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index f0e0461b7f0..033a6d4e92f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -176,6 +176,10 @@ aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned,
   qualifier_unsigned, qualifier_immediate };
 #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
+#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
 
 
 static enum aarch64_type_qualifiers
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 57fc5933b43..06025b110cc 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -682,3 +682,8 @@
   BUILTIN_VSFDF (UNOP, frint32x, 0)
   BUILTIN_VSFDF (UNOP, frint64z, 0)
   BUILTIN_VSFDF (UNOP, frint64x, 0)
+
+  /* Implemented by aarch64_simd_mmlav16qi.  */
+  VAR1 (TERNOP, simd_smmla, 0, v16qi)
+  VAR1 (TERNOPU, simd_ummla, 0, v16qi)
+  VAR1 (TERNOP_SSUS, simd_usmmla, 0, v16qi)
\ No newline at end of file
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2989096b170..409ec28d293 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7025,3 +7025,15 @@
   "xtn\t%0., %1."
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
+
+;; 8-bit integer matrix multiply-accumulate
+(define_insn "aarch64_simd_mmlav16qi"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+	(plus:V4SI (match_operand:V4SI 1 "register_operand" "0")
+		   (unspec:V4SI [(match_operand:V16QI 2 "register_operand" "w")
+ (match_operand:V16QI 3 "register_operand" "w")]
+		MATMUL)))]
+  "TARGET_I8MM"
+  "mmla\\t%0.4s, %2.16b, %3.16b"
+  [(set_attr "type" "neon_mla_s_q")]
+)
\ No newline at end of file
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index eaba156e26c..918000d98dc 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34609,6 +34609,36 @@ vrnd64xq_f64 (float64x2_t __a)
 
 #pragma GCC pop_options
 
+/* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+/* Matrix Multiply-Accumulate.  */
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_aarch64_simd_smmlav16qi (__r, __a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_aarch64_simd_ummlav16qi_ (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_aarch64_simd_usmmlav16qi_ssus (__r, __a, __b);
+}
+
+#pragma GCC pop_options
+
 #include "arm_bf16.h&

[PATCH][Arm] ACLE intrinsics: AdvSIMD BFloat16 convert instructions

2020-01-17 Thread Dennis Zhang
Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It depends on Arm BFMode patch 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html

This patch implements intrinsics to convert between bfloat16 and float32 
formats.
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested.

Is it OK for trunk please?

Thanks,
Dennis

gcc/ChangeLog:

2020-01-17  Dennis Zhang  

* config/arm/arm_bf16.h (vcvtah_f32_bf16, vcvth_bf16_f32): New.
* config/arm/arm_neon.h (vcvt_f32_bf16, vcvtq_low_f32_bf16): New.
(vcvtq_high_f32_bf16, vcvt_bf16_f32): New.
(vcvtq_low_bf16_f32, vcvtq_high_bf16_f32): New.
* config/arm/arm_neon_builtins.def (vbfcvt, vbfcvt_high): New entries.
(vbfcvtv4sf, vbfcvtv4sf_high): Likewise.
* config/arm/iterators.md (VBFCVT, VBFCVTM): New mode iterators.
(V_bf_low, V_bf_cvt_m): New mode attributes.
* config/arm/neon.md (neon_vbfcvtv4sf): New.
(neon_vbfcvtv4sf_highv8bf, neon_vbfcvtsf): New.
(neon_vbfcvt, neon_vbfcvt_highv8bf): New.
(neon_vbfcvtbf_cvtmode, neon_vbfcvtbf): New
* config/arm/unspecs.md (UNSPEC_BFCVT, UNSPEC_BFCVT_HIG): New.

gcc/testsuite/ChangeLog:

2020-01-17  Dennis Zhang  

* gcc.target/arm/simd/bf16_cvt_1.c: New test.


diff --git a/gcc/config/arm/arm_bf16.h b/gcc/config/arm/arm_bf16.h
index decf23f38346c033f9d7502ce82e11ce81b9bc3a..1aa593192c091850e3ffbe4433d18c0ff543173a 100644
--- a/gcc/config/arm/arm_bf16.h
+++ b/gcc/config/arm/arm_bf16.h
@@ -34,6 +34,20 @@ extern "C" {
 typedef __bf16 bfloat16_t;
 typedef float float32_t;
 
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtah_f32_bf16 (bfloat16_t __a)
+{
+  return __builtin_neon_vbfcvtbf (__a);
+}
+
+__extension__ extern __inline bfloat16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvth_bf16_f32 (float32_t __a)
+{
+  return __builtin_neon_vbfcvtsf (__a);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3c78f435009ab027f92693d00ab5b40960d5419d..60ac68702c4f1ef0408c2d0663ebd89bfc6610a2 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18745,6 +18745,55 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
 #pragma GCC pop_options
 #endif
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_f32_bf16 (bfloat16x4_t __a)
+{
+  return __builtin_neon_vbfcvtv4bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_neon_vbfcvtv8bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_neon_vbfcvt_highv8bf (__a);
+}
+
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_bf16_f32 (float32x4_t __a)
+{
+  return __builtin_neon_vbfcvtv4sfv4bf (__a);
+}
+
+__extension__ extern __inline bfloat16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_bf16_f32 (float32x4_t __a)
+{
+  return __builtin_neon_vbfcvtv4sfv8bf (__a);
+}
+
+/* The 'inactive' operand is not converted but it provides the
+   low 64 bits to assemble the final 128-bit result.  */
+__extension__ extern __inline bfloat16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_bf16_f32 (bfloat16x8_t inactive, float32x4_t __a)
+{
+  return __builtin_neon_vbfcvtv4sf_highv8bf (inactive, __a);
+}
+
+#pragma GCC pop_options
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index e9ff4e501cbb5d16b9211f5bc96db376ddf21afc..bc750895f994bff6799232ef2e63e27c9349e27d 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -373,3 +373,9 @@ VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
+
+VAR2 (UNOP, vbfcvt, sf, bf)
+VAR2 (UNOP, vbfcvt, v4bf, v8bf)
+VAR1 (UNOP, vbfcvt_high, v8bf)
+VAR2 (UNOP, vbfcvtv4sf, v4bf, v8bf)
+VAR1 (BINOP, vbfcvtv4sf_high, v8bf)
\ No newline at end of file
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 33e29509f00a89fa23d0546687c0e4643f0b32d2..003de33bcddcec1c0d9682f775acdedf69c09ea8 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -229,6 +229,10 @@
 ;; Modes for polyno

[PATCH][Arm] ACLE intrinsics for AdvSIMD bfloat16 dot product

2020-01-07 Thread Dennis Zhang
Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It depends on the patch enabling Arm BFmode 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html

This patch adds intrinsics for brain half-precision float-point dot product.
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested for arm-none-linux-gnueabi-armv8-a.

Is it OK for trunk please?

Thanks,
Dennis

gcc/ChangeLog:

2020-01-03  Dennis Zhang  

* config/arm/arm_neon.h (vbfdot_f32, vbfdotq_f32): New
(vbfdot_lane_f32, vbfdotq_laneq_f32): New.
(vbfdot_laneq_f32, vbfdotq_lane_f32): New.
* config/arm/arm_neon_builtins.def (vbfdot): New.
(vbfdot_lanev4bf, vbfdot_lanev8bf): New.
* config/arm/iterators.md (VSF2BF): New mode attribute.
* config/arm/neon.md (neon_vbfdot): New.
(neon_vbfdot_lanev4bf): New.
(neon_vbfdot_lanev8bf): New.

gcc/testsuite/ChangeLog:

2020-01-03  Dennis Zhang  

* gcc.target/arm/simd/bf16_dot_1.c: New test.
* gcc.target/arm/simd/bf16_dot_2.c: New test.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 7433559f00020a4f7878dff22ddc2b9d40bb2e06..1d9e7d40ccdd86e9ece300b9e08c78bcffe915a6 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18745,6 +18745,59 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
 #pragma GCC pop_options
 #endif
 
+/* AdvSIMD Brain half-precision float-point (Bfloat16) intrinsics.  */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b)
+{
+  return __builtin_neon_vbfdotv2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfdotv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_lane_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vbfdot_lanev4bfv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		   const int __index)
+{
+  return __builtin_neon_vbfdot_lanev8bfv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_laneq_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x8_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vbfdot_lanev8bfv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vbfdot_lanev4bfv4sf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index bcccf93f7fa2750e9006e5856efecbec0fb331b9..367fd21f5546c6b5a49d79df2822537cbb98e1f7 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -373,3 +373,7 @@ VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
+
+VAR2 (TERNOP, vbfdot, v2sf, v4sf)
+VAR2 (MAC_LANE_PAIR, vbfdot_lanev4bf, v2sf, v4sf)
+VAR2 (MAC_LANE_PAIR, vbfdot_lanev8bf, v2sf, v4sf)
\ No newline at end of file
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 439021fa0733ac31706287c4f98d62b080afc3a1..eb001131dc5cb7bed2afe428664d7c863595c60c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -834,6 +834,8 @@
 (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
 (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
 
+(define_mode_attr VSF2BF [(V2SF "V4BF") (V4SF "V8BF")])
+
 ;;
 ;; Code attributes
 ;;
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7ebd7464d4d42eac6a525b5f1b39eae08c9086..248c5f622421d7e8197adb23d7f28588840ff772 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -6556,3 +6556,51 @@ if (BYTES_BIG_ENDIAN)
  "vabd. %0, %1, %2"

Re: [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2020-01-02 Thread Dennis Zhang
Hi Kyrill,

On 20/12/2019 15:30, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 12/12/19 5:30 PM, Dennis Zhang wrote:
>> Hi all,
>>
>> On 22/11/2019 14:33, Dennis Zhang wrote:
>> > Hi all,
>> >
>> > This patch is part of a series adding support for Armv8.6-A features.
>> > It enables options including -march=armv8.6-a, +i8mm and +bf16.
>> > The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
>> > Documents are at https://developer.arm.com/docs/ddi0596/latest
>> >
>> > Regtested for arm-none-linux-gnueabi-armv8-a.
>> >
>>
>> This is an update to rebase the patch to the top.
>> Some issues are fixed according to the recent CLI patch for AArch64.
>> ChangeLog is updated as following:
>>
>> gcc/ChangeLog:
>>
>> 2019-12-12  Dennis Zhang  
>>
>>     * config/arm/arm-c.c (arm_cpu_builtins): Define
>>     __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
>>     __ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
>>     __ARM_BF16_FORMAT_ALTERNATIVE when enabled.
>>     * config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
>>     * config/arm/arm-tables.opt: Regenerated.
>>     * config/arm/arm.c (arm_option_reconfigure_globals): Initialize
>>     arm_arch_i8mm and arm_arch_bf16 when enabled.
>>     * config/arm/arm.h (TARGET_I8MM): New macro.
>>     (TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
>>     * config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
>>     * config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
>>     * config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
>>     (v8_6_a_simd_variants): New.
>>     (v8_*_a_simd_variants): Add i8mm and bf16.
>>     * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-12  Dennis Zhang  
>>
>>     * gcc.target/arm/multilib.exp: Add combination tests for 
>> armv8.6-a.
>>
>> Is it OK for trunk?
> 
> 
> This is ok for trunk.
> 
> Please follow the steps at https://gcc.gnu.org/svnwrite.html to get 
> write permission to the repo (listing me as approver).
> 
> You can then commit it yourself :)

Thanks for the sponsorship. I have done with the write permission.

The patch is committed as r279839.

Cheers
Dennis




[PATCH][committed] Add myself to MAINTAINERS

2020-01-02 Thread Dennis Zhang
Hi all,

This patch is to add myself to the Write After Approval section in 
MAINTAINERS.

Committed with r279837.

Cheers
Dennis

ChangeLog:

2019-01-02  Dennis Zhang  

* MAINTAINERS (Write After Approval): Add myself.

diff --git a/MAINTAINERS b/MAINTAINERS
index e3925a82355..e2ef8ae9af1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -650,6 +650,7 @@ Kwok Cheung Yeung
 Greta Yorsh	
 David Yuste	
 Adhemerval Zanella
+Dennis Zhang	
 Yufeng Zhang	
 Qing Zhao	
 Shujing Zhao	


[PATCH][AArch64] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2019-12-16 Thread Dennis Zhang
Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It depends on the Armv8.6-A effective target checking patch, 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.

This patch adds intrinsics for matrix multiply-accumulate operations 
including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.

ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regtested & bootstrapped for aarch64-none-linux-gnu.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2019-12-13  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SSUS): New macro.
* config/aarch64/aarch64-simd-builtins.def (simd_smmla): New.
(simd_ummla, simd_usmmla): New.
* config/aarch64/aarch64-simd.md (aarch64_simd_mmlav16qi): New.
* config/aarch64/arm_neon.h (vmmlaq_s32, vmmlaq_u32): New.
(vusmmlaq_s32): New.
* config/aarch64/iterators.md (unspec): Add UNSPEC_SMATMUL,
UNSPEC_UMATMUL, and UNSPEC_USMATMUL.
(sur): Likewise.
(MATMUL): New.

gcc/testsuite/ChangeLog:

2019-12-13  Dennis Zhang  

* gcc.target/aarch64/advsimd-intrinsics/vmmla.c: New test.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index c35a1b1f029..5b048dc9402 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -173,6 +173,10 @@ aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned,
   qualifier_unsigned, qualifier_immediate };
 #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
+#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
 
 
 static enum aarch64_type_qualifiers
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index f4ca35a5970..744f880c450 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -682,3 +682,8 @@
   BUILTIN_VSFDF (UNOP, frint32x, 0)
   BUILTIN_VSFDF (UNOP, frint64z, 0)
   BUILTIN_VSFDF (UNOP, frint64x, 0)
+
+  /* Implemented by aarch64_simd_mmlav16qi.  */
+  VAR1 (TERNOP, simd_smmla, 0, v16qi)
+  VAR1 (TERNOPU, simd_ummla, 0, v16qi)
+  VAR1 (TERNOP_SSUS, simd_usmmla, 0, v16qi)
\ No newline at end of file
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ad4676bc167..fc0c8d21599 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7025,3 +7025,15 @@
   "xtn\t%0., %1."
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
+
+;; 8-bit integer matrix multiply-accumulate
+(define_insn "aarch64_simd_mmlav16qi"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+	(plus:V4SI (match_operand:V4SI 1 "register_operand" "0")
+		   (unspec:V4SI [(match_operand:V16QI 2 "register_operand" "w")
+ (match_operand:V16QI 3 "register_operand" "w")]
+		MATMUL)))]
+  "TARGET_I8MM"
+  "mmla\\t%0.4s, %2.16b, %3.16b"
+  [(set_attr "type" "neon_mla_s_q")]
+)
\ No newline at end of file
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 8b861601a48..e6af2c2960d 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34606,6 +34606,36 @@ vrnd64xq_f64 (float64x2_t __a)
 
 #pragma GCC pop_options
 
+/* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+/* Matrix Multiply-Accumulate.  */
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_aarch64_simd_smmlav16qi (__r, __a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_aarch64_simd_ummlav16qi_ (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_aarch64_simd_usmmlav16qi_ssus (__r, __a, __b);
+}
+
+#pragma GCC pop_options
+
 #undef __aarch64_vget_lane_any
 
 #undef __aarch64_vdup_lane_any
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 83a0d156e84..f2a9298fbf8 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -650,6 +650,9 @@
 UNS

[PATCH][Arm] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2019-12-16 Thread Dennis Zhang
Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It depends on the Arm Armv8.6-A CLI patch, 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html.
It also depends on the Armv8.6-A effective target checking patch, 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.
It also depends on the ARMv8.6-A I8MM dot product patch for using the 
same builtin qualifier 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00945.html.

This patch adds intrinsics for matrix multiply-accumulate operations 
including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.

ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regtested for arm-none-linux-gnueabi-armv8.2-a.

Is it OK for trunk please?

Thanks,
Dennis

gcc/ChangeLog:

2019-12-10  Dennis Zhang  

* config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, vusmmlaq_s32): New.
* config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New.
* config/arm/iterators.md (MATMUL): New.
(sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US.
(mmla_sfx): New.
* config/arm/neon.md (neon_mmlav16qi): New.
* config/arm/unspecs.md (UNSPEC_MATMUL_S): New.
(UNSPEC_MATMUL_U, UNSPEC_MATMUL_US): New.

gcc/testsuite/ChangeLog:

2019-12-10  Dennis Zhang  

* gcc.target/arm/simd/vmmla_1.c: New test.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 1f200d491d1..7beab449e4c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18741,6 +18741,34 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
 #pragma GCC pop_options
 #endif
 
+/* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_neon_smmlav16qi (__r, __a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_neon_ummlav16qi_ (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_neon_usmmlav16qi_ssus (__r, __a, __b);
+}
+
+#pragma GCC pop_options
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index bcccf93f7fa..bc0d06c8bc7 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -373,3 +373,7 @@ VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
 VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
+
+VAR1 (TERNOP, smmla, v16qi)
+VAR1 (UTERNOP, ummla, v16qi)
+VAR1 (USTERNOP, usmmla, v16qi)
\ No newline at end of file
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index c412851843f..ece8cc2acea 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -471,6 +471,8 @@
 (define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270])
 (define_int_iterator VCMLA [UNSPEC_VCMLA UNSPEC_VCMLA90 UNSPEC_VCMLA180 UNSPEC_VCMLA270])
 
+(define_int_iterator MATMUL [UNSPEC_MATMUL_S UNSPEC_MATMUL_U UNSPEC_MATMUL_US])
+
 ;;
 ;; Mode attributes
 ;;
@@ -883,6 +885,7 @@
   (UNSPEC_VMLSL_S_LANE "s") (UNSPEC_VMLSL_U_LANE "u")
   (UNSPEC_VMULL_S "s") (UNSPEC_VMULL_U "u") (UNSPEC_VMULL_P "p")
   (UNSPEC_VMULL_S_LANE "s") (UNSPEC_VMULL_U_LANE "u")
+  (UNSPEC_MATMUL_S "s") (UNSPEC_MATMUL_U "u") (UNSPEC_MATMUL_US "us")
   (UNSPEC_VSUBL_S "s") (UNSPEC_VSUBL_U "u")
   (UNSPEC_VSUBW_S "s") (UNSPEC_VSUBW_U "u")
   (UNSPEC_VHSUB_S "s") (UNSPEC_VHSUB_U "u")
@@ -1089,6 +1092,9 @@
 			(UNSPEC_SMUADX "smuadx") (UNSPEC_SSAT16 "ssat16")
 			(UNSPEC_USAT16 "usat16")])
 
+(define_int_attr mmla_sfx [(UNSPEC_MATMUL_S "s8") (UNSPEC_MATMUL_U "u8")
+			   (UNSPEC_MATMUL_US "s8")])
+
 ;; Both kinds of return insn.
 (define_code_iterator RETURNS [return simple_return])
 (define_code_attr return_str [(return "") (simple_return "simple_")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a0ee28efc9..260202a8fb7 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@

Re: [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2019-12-12 Thread Dennis Zhang
Hi all,

On 22/11/2019 14:33, Dennis Zhang wrote:
> Hi all,
> 
> This patch is part of a series adding support for Armv8.6-A features.
> It enables options including -march=armv8.6-a, +i8mm and +bf16.
> The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
> Documents are at https://developer.arm.com/docs/ddi0596/latest
> 
> Regtested for arm-none-linux-gnueabi-armv8-a.
> 

This is an update to rebase the patch to the top.
Some issues are fixed according to the recent CLI patch for AArch64.
ChangeLog is updated as following:

gcc/ChangeLog:

2019-12-12  Dennis Zhang  

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
__ARM_BF16_FORMAT_ALTERNATIVE when enabled.
* config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
* config/arm/arm-tables.opt: Regenerated.
* config/arm/arm.c (arm_option_reconfigure_globals): Initialize
arm_arch_i8mm and arm_arch_bf16 when enabled.
* config/arm/arm.h (TARGET_I8MM): New macro.
(TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
* config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
* config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
* config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
(v8_6_a_simd_variants): New.
(v8_*_a_simd_variants): Add i8mm and bf16.
* doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.

gcc/testsuite/ChangeLog:

2019-12-12  Dennis Zhang  

* gcc.target/arm/multilib.exp: Add combination tests for armv8.6-a.

Is it OK for trunk?

Many thanks!
Dennis
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 546b35a5cbd..9cd1c5bdcba 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -226,6 +226,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
 }
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
+		  TARGET_BF16_FP);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
+		  TARGET_BF16_SIMD);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+		  TARGET_BF16_FP || TARGET_BF16_SIMD);
 }
 
 void
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 7090775aa7e..a2f6ce00af4 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -123,6 +123,9 @@ define feature armv8_4
 # Architecture rel 8.5.
 define feature armv8_5
 
+# Architecture rel 8.6.
+define feature armv8_6
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -191,6 +194,12 @@ define feature sb
 # v8-A architectures, added by default from v8.5-A
 define feature predres
 
+# 8-bit Integer Matrix Multiply extension. Optional from v8.2-A.
+define feature i8mm
+
+# Brain half-precision floating-point extension. Optional from v8.2-A.
+define feature bf16
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -213,7 +222,7 @@ define fgroup ALL_CRYPTO	crypto
 # strip off 32 D-registers, but does not remove support for
 # double-precision FP.
 define fgroup ALL_SIMD_INTERNAL	fp_d32 neon ALL_CRYPTO
-define fgroup ALL_SIMD_EXTERNAL dotprod fp16fml
+define fgroup ALL_SIMD_EXTERNAL dotprod fp16fml i8mm
 define fgroup ALL_SIMD	ALL_SIMD_INTERNAL ALL_SIMD_EXTERNAL
 
 # List of all FPU bits to strip out if -mfpu is used to override the
@@ -221,7 +230,7 @@ define fgroup ALL_SIMD	ALL_SIMD_INTERNAL ALL_SIMD_EXTERNAL
 define fgroup ALL_FPU_INTERNAL	vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl ALL_SIMD_INTERNAL
 # Similarly, but including fp16 and other extensions that aren't part of
 # -mfpu support.
-define fgroup ALL_FPU_EXTERNAL fp16
+define fgroup ALL_FPU_EXTERNAL fp16 bf16
 
 # Everything related to the FPU extensions (FP or SIMD).
 define fgroup ALL_FP	ALL_FPU_EXTERNAL ALL_FPU_INTERNAL ALL_SIMD
@@ -256,6 +265,7 @@ define fgroup ARMv8_2aARMv8_1a armv8_2
 define fgroup ARMv8_3aARMv8_2a armv8_3
 define fgroup ARMv8_4aARMv8_3a armv8_4
 define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres
+define fgroup ARMv8_6aARMv8_5a armv8_6
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
@@ -563,6 +573,8 @@ begin arch armv8.2-a
  option dotprod add FP_ARMv8 DOTPROD
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.2-a
 
 begin arch armv8.3-a
@@ -580,6 +592,8 @@ begin arch armv8.3-a
  option dotprod add FP_ARMv8 DOTPROD
  option sb add sb
  opti

Re: [PATCH][AArch64] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2019-12-12 Thread Dennis Zhang
Hi Richard,

On 06/12/2019 10:22, Richard Sandiford wrote:
> Dennis Zhang  writes:
>> 2019-12-04  Dennis Zhang  
>>
>>  * config/aarch64/aarch64-arches.def (armv8.6-a): New.
>>  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
>>  __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC and
>>  __ARM_FEATURE_BF16_SCALAR_ARITHMETIC when enabled.
>>  * config/aarch64/aarch64-option-extensions.def (i8mm, bf16): New.
>>  (fp): Disabling fp also disables i8mm and bf16.
>>  (simd): Disabling simd also disables i8mm.
>>  * config/aarch64/aarch64.h (AARCH64_FL_V8_6): New macro.
>>  (AARCH64_FL_I8MM, AARCH64_FL_BF16, AARCH64_FL_FOR_ARCH8_6): Likewise.
>>  (AARCH64_ISA_V8_6, AARCH64_ISA_I8MM, AARCH64_ISA_BF16): Likewise.
>>  (TARGET_I8MM, TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
>>  * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options. Add
>>  a new table to list permissible values for ARCH.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-04  Dennis Zhang  
>>
>>  * gcc.target/aarch64/pragma_cpp_predefs_2.c: Add tests for i8mm
>>  and bf16 features.
> 
> Thanks for the update, looks great.  A couple of comments below.
> 
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index d165f31a865..1192e8f4b06 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -16050,25 +16050,22 @@ Specify the name of the target architecture and, 
>> optionally, one or
>>   more feature modifiers.  This option has the form
>>   @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
>>   
>> -The permissible values for @var{arch} are @samp{armv8-a},
>> -@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a}, @samp{armv8.4-a},
>> -@samp{armv8.5-a} or @var{native}.
>> -
>> -The value @samp{armv8.5-a} implies @samp{armv8.4-a} and enables compiler
>> -support for the ARMv8.5-A architecture extensions.
>> -
>> -The value @samp{armv8.4-a} implies @samp{armv8.3-a} and enables compiler
>> -support for the ARMv8.4-A architecture extensions.
>> -
>> -The value @samp{armv8.3-a} implies @samp{armv8.2-a} and enables compiler
>> -support for the ARMv8.3-A architecture extensions.
>> -
>> -The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler
>> -support for the ARMv8.2-A architecture extensions.
>> -
>> -The value @samp{armv8.1-a} implies @samp{armv8-a} and enables compiler
>> -support for the ARMv8.1-A architecture extension.  In particular, it
>> -enables the @samp{+crc}, @samp{+lse}, and @samp{+rdma} features.
>> +The table below summarizes the permissible values for @var{arch}
>> +and the features that they enable by default:
>> +
>> +@multitable @columnfractions 0.20 0.20 0.60
>> +@headitem @var{arch} value @tab Architecture @tab Includes by default
> 
> We should have an armv8-a entry here, something like:
> 
> @item @samp{armv8-a} @tab Armv8-A @tab @samp{+fp}, @samp{+simd}
> 

The armv8-a entry is added.

>> +@item @samp{armv8.1-a} @tab Armv8.1-A
>> +@tab @samp{armv8-a}, @samp{+crc}, @samp{+lse}, @samp{+rdma}
>> +@item @samp{armv8.2-a} @tab Armv8.2-A @tab @samp{armv8.1-a}
>> +@item @samp{armv8.3-a} @tab Armv8.3-A @tab @samp{armv8.2-a}
>> +@item @samp{armv8.4-a} @tab Armv8.4-A
>> +@tab @samp{armv8.3-a}, @samp{+fp16fml}, @samp{+dotprod}
>> +@item @samp{armv8.5-a} @tab Armv8.5-A
>> +@tab @samp{armv8.4-a}, @samp{+sb}, @samp{+ssbs}, @samp{+predres}
>> +@item @samp{armv8.6-a} @tab Armv8.6-A
>> +@tab @samp{armv8.5-a}, @samp{+bf16}, @samp{+i8mm}
>> +@end multitable
> 
> I should have tried a proof of concept of this before suggesting it, sorry.
> Trying the patch locally I get:
> 
> gcc.pod around line 18643: You can't have =items (as at line 18649) unless 
> the first thing after the =over is an =item
> POD document had syntax errors at /usr/bin/pod2man line 71.
> Makefile:3363: recipe for target 'doc/gcc.1' failed
> make: [doc/gcc.1] Error 1 (ignored)
> 
> (Odd that this is an ignored error, since we end up with an empty man page.)
> 
> I've posted a texi2pod.pl patch for that:
> 
>  https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00407.html
> 
> However, even with that patch, the script needs the full table row to be
> on a single line, so I think we need to do that and live with the long lines.
> 

The items are kept in a single line for each.

>> [...]
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_2.c 
>> b/gcc/testsuite/gcc.target/aarch64/pragma_cpp_predefs_2.c
>> index 6

Re: [PATCH][AArch64] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2019-12-05 Thread Dennis Zhang
Hi Richard,

On 29/11/2019 13:00, Richard Sandiford wrote:
> Hi Dennis,
> 
> Sorry for the slow response.
> 
> Dennis Zhang  writes:
>> Hi all,
>>
>> This patch is part of a series adding support for Armv8.6-A features.
>> It enables options including -march=armv8.6-a, +i8mm and +bf16.
>> The +i8mm and +bf16 features are mandatory for Armv8.6-a and optional
>> for Armv8.2-a and onward.
>> Documents are at https://developer.arm.com/docs/ddi0596/latest
>>
>> Regtested for aarch64-none-linux-gnu.
>>
>> Please help to check if it's ready for trunk.
>>
>> Many thanks!
>> Dennis
>>
>> gcc/ChangeLog:
>>
>> 2019-11-26  Dennis Zhang  
>>
>>  * config/aarch64/aarch64-arches.def (armv8.6-a): New.
>>  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
>>  __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC and
>>  __ARM_FEATURE_BF16_SCALAR_ARITHMETIC when enabled.
>>  * config/aarch64/aarch64-option-extensions.def (i8mm, bf16): New.
>>  * config/aarch64/aarch64.h (AARCH64_FL_V8_6): New macro.
>>  (AARCH64_FL_I8MM, AARCH64_FL_BF16, AARCH64_FL_FOR_ARCH8_6): Likewise.
>>  (AARCH64_ISA_V8_6, AARCH64_ISA_I8MM, AARCH64_ISA_BF16): Likewise.
>>  (TARGET_I8MM, TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
>>  * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-26  Dennis Zhang  
>>
>>  * gcc.target/aarch64/pragma_cpp_predefs_2.c: Add tests for i8mm
>>  and bf16 features.
>>
>> diff --git a/gcc/config/aarch64/aarch64-arches.def 
>> b/gcc/config/aarch64/aarch64-arches.def
>> index d258bd49244..e464d329c1a 100644
>> --- a/gcc/config/aarch64/aarch64-arches.def
>> +++ b/gcc/config/aarch64/aarch64-arches.def
>> @@ -36,5 +36,6 @@ AARCH64_ARCH("armv8.2-a", generic,  8_2A,  
>> 8,  AARCH64_FL_FOR_ARCH8_2)
>>   AARCH64_ARCH("armv8.3-a", generic,  8_3A,  8,  
>> AARCH64_FL_FOR_ARCH8_3)
>>   AARCH64_ARCH("armv8.4-a", generic,  8_4A,  8,  
>> AARCH64_FL_FOR_ARCH8_4)
>>   AARCH64_ARCH("armv8.5-a", generic,  8_5A,  8,  
>> AARCH64_FL_FOR_ARCH8_5)
>> +AARCH64_ARCH("armv8.6-a", generic,   8_6A,  8,  
>> AARCH64_FL_FOR_ARCH8_6)
>>   
>>   #undef AARCH64_ARCH
>> diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
>> index f3da07fd28a..20d1e00552b 100644
>> --- a/gcc/config/aarch64/aarch64-c.c
>> +++ b/gcc/config/aarch64/aarch64-c.c
>> @@ -165,6 +165,12 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>> aarch64_def_or_undef (TARGET_RNG, "__ARM_FEATURE_RNG", pfile);
>> aarch64_def_or_undef (TARGET_MEMTAG, "__ARM_FEATURE_MEMORY_TAGGING", 
>> pfile);
>>   
>> +  aarch64_def_or_undef (TARGET_I8MM, "__ARM_FEATURE_MATMUL_INT8", pfile);
>> +  aarch64_def_or_undef (TARGET_BF16_SIMD,
>> +"__ARM_FEATURE_BF16_VECTOR_ARITHMETIC", pfile);
>> +  aarch64_def_or_undef (TARGET_BF16_FP,
>> +"__ARM_FEATURE_BF16_SCALAR_ARITHMETIC", pfile);
>> +
>> /* Not for ACLE, but required to keep "float.h" correct if we switch
>>target between implementations that do or do not support ARMv8.2-A
>>16-bit floating-point extensions.  */
>> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
>> b/gcc/config/aarch64/aarch64-option-extensions.def
>> index d3ae1b2431b..5b7c3b8a213 100644
>> --- a/gcc/config/aarch64/aarch64-option-extensions.def
>> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
>> @@ -198,4 +198,14 @@ AARCH64_OPT_EXTENSION("sve2-bitperm", 
>> AARCH64_FL_SVE2_BITPERM, AARCH64_FL_SIMD |
>>   /* Enabling or disabling "tme" only changes "tme".  */
>>   AARCH64_OPT_EXTENSION("tme", AARCH64_FL_TME, 0, 0, false, "")
>>   
>> +/* Enabling "i8mm" also enables "simd".
>> +   Disabling "i8mm" only disables "i8mm".  */
>> +AARCH64_OPT_EXTENSION("i8mm", AARCH64_FL_I8MM, AARCH64_FL_SIMD, \
>> +  0, false, "i8mm")
> 
> We have to maintain the transitive closure of features by hand,
> so anything that enables AARCH64_FL_SIMD also needs to enable
> AARCH64_FL_FP.
> 
> We should also add i8mm to the list of things that +nosimd and +nofp
> disable.
> 
> (It would be better to do t

[PATCH][AArch64] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2019-11-26 Thread Dennis Zhang
Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It enables options including -march=armv8.6-a, +i8mm and +bf16.
The +i8mm and +bf16 features are mandatory for Armv8.6-a and optional 
for Armv8.2-a and onward.
Documents are at https://developer.arm.com/docs/ddi0596/latest

Regtested for aarch64-none-linux-gnu.

Please help to check if it's ready for trunk.

Many thanks!
Dennis

gcc/ChangeLog:

2019-11-26  Dennis Zhang  

* config/aarch64/aarch64-arches.def (armv8.6-a): New.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC and
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC when enabled.
* config/aarch64/aarch64-option-extensions.def (i8mm, bf16): New.
* config/aarch64/aarch64.h (AARCH64_FL_V8_6): New macro.
(AARCH64_FL_I8MM, AARCH64_FL_BF16, AARCH64_FL_FOR_ARCH8_6): Likewise.
(AARCH64_ISA_V8_6, AARCH64_ISA_I8MM, AARCH64_ISA_BF16): Likewise.
(TARGET_I8MM, TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
* doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.

gcc/testsuite/ChangeLog:

2019-11-26  Dennis Zhang  

* gcc.target/aarch64/pragma_cpp_predefs_2.c: Add tests for i8mm
and bf16 features.
diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index d258bd49244..e464d329c1a 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -36,5 +36,6 @@ AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
 AARCH64_ARCH("armv8.3-a", generic,	 8_3A,	8,  AARCH64_FL_FOR_ARCH8_3)
 AARCH64_ARCH("armv8.4-a", generic,	 8_4A,	8,  AARCH64_FL_FOR_ARCH8_4)
 AARCH64_ARCH("armv8.5-a", generic,	 8_5A,	8,  AARCH64_FL_FOR_ARCH8_5)
+AARCH64_ARCH("armv8.6-a", generic,	 8_6A,	8,  AARCH64_FL_FOR_ARCH8_6)
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index f3da07fd28a..20d1e00552b 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -165,6 +165,12 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_RNG, "__ARM_FEATURE_RNG", pfile);
   aarch64_def_or_undef (TARGET_MEMTAG, "__ARM_FEATURE_MEMORY_TAGGING", pfile);
 
+  aarch64_def_or_undef (TARGET_I8MM, "__ARM_FEATURE_MATMUL_INT8", pfile);
+  aarch64_def_or_undef (TARGET_BF16_SIMD,
+			"__ARM_FEATURE_BF16_VECTOR_ARITHMETIC", pfile);
+  aarch64_def_or_undef (TARGET_BF16_FP,
+			"__ARM_FEATURE_BF16_SCALAR_ARITHMETIC", pfile);
+
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A
  16-bit floating-point extensions.  */
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index d3ae1b2431b..5b7c3b8a213 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -198,4 +198,14 @@ AARCH64_OPT_EXTENSION("sve2-bitperm", AARCH64_FL_SVE2_BITPERM, AARCH64_FL_SIMD |
 /* Enabling or disabling "tme" only changes "tme".  */
 AARCH64_OPT_EXTENSION("tme", AARCH64_FL_TME, 0, 0, false, "")
 
+/* Enabling "i8mm" also enables "simd".
+   Disabling "i8mm" only disables "i8mm".  */
+AARCH64_OPT_EXTENSION("i8mm", AARCH64_FL_I8MM, AARCH64_FL_SIMD, \
+		  0, false, "i8mm")
+
+/* Enabling "bf16" also enables "simd" and "fp".
+   Disabling "bf16" only disables "bf16".  */
+AARCH64_OPT_EXTENSION("bf16", AARCH64_FL_BF16, AARCH64_FL_SIMD | AARCH64_FL_FP,
+		  0, false, "bf16")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index ee01909abb9..7de99285e8a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -202,6 +202,15 @@ extern unsigned aarch64_architecture_version;
 /* Transactional Memory Extension.  */
 #define AARCH64_FL_TME	  (1ULL << 33)  /* Has TME instructions.  */
 
+/* Armv8.6-A architecture extensions.  */
+#define AARCH64_FL_V8_6	  (1ULL << 34)
+
+/* 8-bit Integer Matrix Multiply (I8MM) extensions.  */
+#define AARCH64_FL_I8MM   (1ULL << 35)
+
+/* Brain half-precision floating-point (BFloat16) Extension.  */
+#define AARCH64_FL_BF16	  (1ULL << 36)
+
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
 
@@ -223,6 +232,9 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8_5			\
   (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5	\
| AARCH64_FL_SB | AARCH64_FL_SSBS | A

[PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2019-11-22 Thread Dennis Zhang
Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It enables options including -march=armv8.6-a, +i8mm and +bf16.
The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
Documents are at https://developer.arm.com/docs/ddi0596/latest

Regtested for arm-none-linux-gnueabi-armv8-a.

Please help to check if ready for trunk.

Many thanks!
Dennis

gcc/ChangeLog:

2019-11-15  Dennis Zhang  

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
__ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
__ARM_BF16_FORMAT_ALTERNATIVE when enabled.
* config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
* config/arm/arm-tables.opt: Regenerated.
* config/arm/arm.c (arm_option_reconfigure_globals): Init
arm_arch_i8mm and arm_arch_bf16 to enable features.
* config/arm/arm.h (TARGET_I8MM): New macro.
(TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
* config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
* config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
* config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
(v8_6_a_simd_variants): New.
(v8_*_a_simd_variants): Add i8mm and bf16.
* doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.

gcc/testsuite/ChangeLog:

2019-11-15  Dennis Zhang  

* gcc.target/arm/multilib.exp: Add combination tests for armv8.6-a.
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c4485ce7af1..b47e64c2151 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -225,6 +225,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
 }
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
+		  TARGET_BF16_FP);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
+		  TARGET_BF16_SIMD);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+		  TARGET_BF16_FP || TARGET_BF16_SIMD);
 }
 
 void
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 50379a0a10a..d373406649c 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -123,6 +123,9 @@ define feature armv8_4
 # Architecture rel 8.5.
 define feature armv8_5
 
+# Architecture rel 8.6.
+define feature armv8_6
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -191,6 +194,12 @@ define feature sb
 # v8-A architectures, added by default from v8.5-A
 define feature predres
 
+# 8-bit Integer Matrix Multiply extension. Optional from v8.2-A.
+define feature i8mm
+
+# Brain half-precision floating-point extension. Optional from v8.2-A.
+define feature bf16
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -213,7 +222,7 @@ define fgroup ALL_CRYPTO	crypto
 # strip off 32 D-registers, but does not remove support for
 # double-precision FP.
 define fgroup ALL_SIMD_INTERNAL	fp_d32 neon ALL_CRYPTO
-define fgroup ALL_SIMD	ALL_SIMD_INTERNAL dotprod fp16fml
+define fgroup ALL_SIMD	ALL_SIMD_INTERNAL dotprod fp16fml i8mm
 
 # List of all FPU bits to strip out if -mfpu is used to override the
 # default.  fp16 is deliberately missing from this list.
@@ -253,6 +262,7 @@ define fgroup ARMv8_2aARMv8_1a armv8_2
 define fgroup ARMv8_3aARMv8_2a armv8_3
 define fgroup ARMv8_4aARMv8_3a armv8_4
 define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres
+define fgroup ARMv8_6aARMv8_5a armv8_6
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
@@ -560,6 +570,8 @@ begin arch armv8.2-a
  option dotprod add FP_ARMv8 DOTPROD
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.2-a
 
 begin arch armv8.3-a
@@ -577,6 +589,8 @@ begin arch armv8.3-a
  option dotprod add FP_ARMv8 DOTPROD
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.3-a
 
 begin arch armv8.4-a
@@ -592,6 +606,8 @@ begin arch armv8.4-a
  option nofp remove ALL_FP
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.4-a
 
 begin arch armv8.5-a
@@ -605,8 +621,25 @@ begin arch armv8.5-a
  option crypto add FP_ARMv8 CRYPTO DOTPROD
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.5-a
 
+begin arch armv8.6-a
+ tune for cortex-a53
+ tune 

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-19 Thread Dennis Zhang
Hi Kyrill,

On 19/11/2019 11:21, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 11/12/19 5:32 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> On 12/11/2019 15:57, Kyrill Tkachov wrote:
>>> On 11/12/19 3:50 PM, Dennis Zhang wrote:
>>>> Hi Kyrill,
>>>>
>>>> On 12/11/2019 09:40, Kyrill Tkachov wrote:
>>>>> Hi Dennis,
>>>>>
>>>>> On 11/7/19 1:48 PM, Dennis Zhang wrote:
>>>>>> Hi Kyrill,
>>>>>>
>>>>>> I have rebased the patch on top of current truck.
>>>>>> For resolve_overloaded, I redefined my memtag overloading function to
>>>>>> fit the latest resolve_overloaded_builtin interface.
>>>>>>
>>>>>> Regression tested again and survived for aarch64-none-linux-gnu.
>>>>> Please reply inline rather than top-posting on gcc-patches.
>>>>>
>>>>>
>>>>>> Cheers
>>>>>> Dennis
>>>>>>
>>>>>> Changelog is updated as following:
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 2019-11-07  Dennis Zhang  
>>>>>>
>>>>>>   * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): 
>>>>>> Add
>>>>>>   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>>>>>>   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>>>>>>   AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
>>>>>>   AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
>>>>>>   (aarch64_init_memtag_builtins): New.
>>>>>>   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
>>>>>>   (aarch64_general_init_builtins): Call
>>>>>> aarch64_init_memtag_builtins.
>>>>>>   (aarch64_expand_builtin_memtag): New.
>>>>>>   (aarch64_general_expand_builtin): Call
>>>>>> aarch64_expand_builtin_memtag.
>>>>>>   (AARCH64_BUILTIN_SUBCODE): New macro.
>>>>>>   (aarch64_resolve_overloaded_memtag): New.
>>>>>>   (aarch64_resolve_overloaded_builtin_general): New hook. Call
>>>>>>   aarch64_resolve_overloaded_memtag to handle overloaded MTE
>>>>>> builtins.
>>>>>>   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): 
>>>>>> Define
>>>>>>   __ARM_FEATURE_MEMORY_TAGGING when enabled.
>>>>>>   (aarch64_resolve_overloaded_builtin): Call
>>>>>>   aarch64_resolve_overloaded_builtin_general.
>>>>>>   * config/aarch64/aarch64-protos.h
>>>>>>   (aarch64_resolve_overloaded_builtin_general): New declaration.
>>>>>>   * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
>>>>>>   (TARGET_MEMTAG): Likewise.
>>>>>>   * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
>>>>>>   UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
>>>>>>   (irg, gmi, subp, addg, ldg, stg): New instructions.
>>>>>>   * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New
>>>>>> macro.
>>>>>>   (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
>>>>>>   (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag):
>>>>>> Likewise.
>>>>>>   * config/aarch64/predicates.md (aarch64_memtag_tag_offset): 
>>>>>> New.
>>>>>>   (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
>>>>>>   * config/arm/types.md (memtag): New.
>>>>>>   * doc/invoke.texi (-memtag): Update description.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>> 2019-11-07  Dennis Zhang  
>>>>>>
>>>>>>   * gcc.target/aarch64/acle/memtag_1.c: New test.
>>>>>>   * gcc.target/aarch64/acle/memtag_2.c: New test.
>>>>>>   * gcc.target/aarch64/acle/memtag_3.c: New test.
>>>>>>
>>>>>>
>>>>>> On 04/11/2019 16:40, Kyrill Tkachov wrote:
>>>>>>> Hi Dennis,
>>>>>>>
>>>>>>> On 10/17/19 11:03 AM, Dennis Zhang wrote:
>>>>>>>

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Dennis Zhang
Hi Kyrill,

On 12/11/2019 15:57, Kyrill Tkachov wrote:
> 
> On 11/12/19 3:50 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> On 12/11/2019 09:40, Kyrill Tkachov wrote:
>>> Hi Dennis,
>>>
>>> On 11/7/19 1:48 PM, Dennis Zhang wrote:
>>>> Hi Kyrill,
>>>>
>>>> I have rebased the patch on top of current truck.
>>>> For resolve_overloaded, I redefined my memtag overloading function to
>>>> fit the latest resolve_overloaded_builtin interface.
>>>>
>>>> Regression tested again and survived for aarch64-none-linux-gnu.
>>> Please reply inline rather than top-posting on gcc-patches.
>>>
>>>
>>>> Cheers
>>>> Dennis
>>>>
>>>> Changelog is updated as following:
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-11-07  Dennis Zhang  
>>>>
>>>>  * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
>>>>  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>>>>  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>>>>  AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
>>>>  AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
>>>>  (aarch64_init_memtag_builtins): New.
>>>>  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
>>>>  (aarch64_general_init_builtins): Call 
>>>> aarch64_init_memtag_builtins.
>>>>  (aarch64_expand_builtin_memtag): New.
>>>>  (aarch64_general_expand_builtin): Call 
>>>> aarch64_expand_builtin_memtag.
>>>>  (AARCH64_BUILTIN_SUBCODE): New macro.
>>>>  (aarch64_resolve_overloaded_memtag): New.
>>>>  (aarch64_resolve_overloaded_builtin_general): New hook. Call
>>>>  aarch64_resolve_overloaded_memtag to handle overloaded MTE 
>>>> builtins.
>>>>  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
>>>>  __ARM_FEATURE_MEMORY_TAGGING when enabled.
>>>>  (aarch64_resolve_overloaded_builtin): Call
>>>>  aarch64_resolve_overloaded_builtin_general.
>>>>  * config/aarch64/aarch64-protos.h
>>>>  (aarch64_resolve_overloaded_builtin_general): New declaration.
>>>>  * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
>>>>  (TARGET_MEMTAG): Likewise.
>>>>  * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
>>>>  UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
>>>>  (irg, gmi, subp, addg, ldg, stg): New instructions.
>>>>  * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New 
>>>> macro.
>>>>  (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
>>>>  (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): 
>>>> Likewise.
>>>>  * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
>>>>  (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
>>>>  * config/arm/types.md (memtag): New.
>>>>  * doc/invoke.texi (-memtag): Update description.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-11-07  Dennis Zhang  
>>>>
>>>>  * gcc.target/aarch64/acle/memtag_1.c: New test.
>>>>  * gcc.target/aarch64/acle/memtag_2.c: New test.
>>>>  * gcc.target/aarch64/acle/memtag_3.c: New test.
>>>>
>>>>
>>>> On 04/11/2019 16:40, Kyrill Tkachov wrote:
>>>>> Hi Dennis,
>>>>>
>>>>> On 10/17/19 11:03 AM, Dennis Zhang wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
>>>>>> It can be used for spatial and temporal memory safety detection and
>>>>>> lightweight lock and key system.
>>>>>>
>>>>>> This patch enables new intrinsics leveraging MTE instructions to
>>>>>> implement functionalities of creating tags, setting tags, reading 
>>>>>> tags,
>>>>>> and manipulating tags.
>>>>>> The intrinsics are part of Arm ACLE extension:
>>>>>> https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics 
>>>>>>
>>>>>> The MTE ISA specification can be found at
>>>>>> https://developer.arm.com/docs/ddi0487/latest

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Dennis Zhang
Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 11/7/19 1:48 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> I have rebased the patch on top of current truck.
>> For resolve_overloaded, I redefined my memtag overloading function to
>> fit the latest resolve_overloaded_builtin interface.
>>
>> Regression tested again and survived for aarch64-none-linux-gnu.
> 
> Please reply inline rather than top-posting on gcc-patches.
> 
> 
>> Cheers
>> Dennis
>>
>> Changelog is updated as following:
>>
>> gcc/ChangeLog:
>>
>> 2019-11-07  Dennis Zhang  
>>
>> * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
>> AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>> AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>> AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
>> AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
>> (aarch64_init_memtag_builtins): New.
>> (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
>> (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
>> (aarch64_expand_builtin_memtag): New.
>> (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
>> (AARCH64_BUILTIN_SUBCODE): New macro.
>> (aarch64_resolve_overloaded_memtag): New.
>> (aarch64_resolve_overloaded_builtin_general): New hook. Call
>> aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
>> * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
>> __ARM_FEATURE_MEMORY_TAGGING when enabled.
>> (aarch64_resolve_overloaded_builtin): Call
>> aarch64_resolve_overloaded_builtin_general.
>> * config/aarch64/aarch64-protos.h
>> (aarch64_resolve_overloaded_builtin_general): New declaration.
>> * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
>> (TARGET_MEMTAG): Likewise.
>> * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
>> UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
>> (irg, gmi, subp, addg, ldg, stg): New instructions.
>> * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
>> (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
>> (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
>> * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
>> (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
>> * config/arm/types.md (memtag): New.
>> * doc/invoke.texi (-memtag): Update description.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-07  Dennis Zhang  
>>
>> * gcc.target/aarch64/acle/memtag_1.c: New test.
>> * gcc.target/aarch64/acle/memtag_2.c: New test.
>> * gcc.target/aarch64/acle/memtag_3.c: New test.
>>
>>
>> On 04/11/2019 16:40, Kyrill Tkachov wrote:
>>> Hi Dennis,
>>>
>>> On 10/17/19 11:03 AM, Dennis Zhang wrote:
>>>> Hi,
>>>>
>>>> Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
>>>> It can be used for spatial and temporal memory safety detection and
>>>> lightweight lock and key system.
>>>>
>>>> This patch enables new intrinsics leveraging MTE instructions to
>>>> implement functionalities of creating tags, setting tags, reading tags,
>>>> and manipulating tags.
>>>> The intrinsics are part of Arm ACLE extension:
>>>> https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
>>>> The MTE ISA specification can be found at
>>>> https://developer.arm.com/docs/ddi0487/latest chapter D6.
>>>>
>>>> Bootstraped and regtested for aarch64-none-linux-gnu.
>>>>
>>>> Please help to check if it's OK for trunk.
>>>>
>>> This looks mostly ok to me but for further review this needs to be
>>> rebased on top of current trunk as there are some conflicts with the SVE
>>> ACLE changes that recently went in. Most conflicts looks trivial to
>>> resolve but one that needs more attention is the definition of the
>>> TARGET_RESOLVE_OVERLOADED_BUILTIN hook.
>>>
>>> Thanks,
>>>
>>> Kyrill
>>>
>>>> Many Thanks
>>>> Dennis
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-10-16  Dennis Zhang  
>>>>
>>>>  * config/aarch64/aarch64-builtins.c (enum 
>>>> aarch64_builtins): Add

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-07 Thread Dennis Zhang
Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to 
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
(aarch64_init_memtag_builtins): New.
(AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
(aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
(aarch64_expand_builtin_memtag): New.
(aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
(AARCH64_BUILTIN_SUBCODE): New macro.
(aarch64_resolve_overloaded_memtag): New.
(aarch64_resolve_overloaded_builtin_general): New hook. Call
aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MEMORY_TAGGING when enabled.
(aarch64_resolve_overloaded_builtin): Call
aarch64_resolve_overloaded_builtin_general.
* config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin_general): New declaration.
* config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
(TARGET_MEMTAG): Likewise.
* config/aarch64/aarch64.md (define_c_enum "unspec"): Add
UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
(irg, gmi, subp, addg, ldg, stg): New instructions.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
(__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
(__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
(aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
* config/arm/types.md (memtag): New.
* doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

* gcc.target/aarch64/acle/memtag_1.c: New test.
* gcc.target/aarch64/acle/memtag_2.c: New test.
* gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 10/17/19 11:03 AM, Dennis Zhang wrote:
>> Hi,
>>
>> Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
>> It can be used for spatial and temporal memory safety detection and
>> lightweight lock and key system.
>>
>> This patch enables new intrinsics leveraging MTE instructions to
>> implement functionalities of creating tags, setting tags, reading tags,
>> and manipulating tags.
>> The intrinsics are part of Arm ACLE extension:
>> https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
>> The MTE ISA specification can be found at
>> https://developer.arm.com/docs/ddi0487/latest chapter D6.
>>
>> Bootstraped and regtested for aarch64-none-linux-gnu.
>>
>> Please help to check if it's OK for trunk.
>>
> 
> This looks mostly ok to me but for further review this needs to be 
> rebased on top of current trunk as there are some conflicts with the SVE 
> ACLE changes that recently went in. Most conflicts looks trivial to 
> resolve but one that needs more attention is the definition of the 
> TARGET_RESOLVE_OVERLOADED_BUILTIN hook.
> 
> Thanks,
> 
> Kyrill
> 
>> Many Thanks
>> Dennis
>>
>> gcc/ChangeLog:
>>
>> 2019-10-16  Dennis Zhang  
>>
>>     * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
>>     AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>>     AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>>     AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
>>     AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
>>     (aarch64_init_memtag_builtins): New.
>>     (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
>>     (aarch64_general_init_builtins): Call 
>> aarch64_init_memtag_builtins.
>>     (aarch64_expand_builtin_memtag): New.
>>     (aarch64_general_expand_builtin): Call 
>> aarch64_expand_builtin_memtag.
>>     (AARCH64_BUILTIN_SUBCODE): New macro.
>>     (aarch64_resolve_overloaded_memtag): New.
>>     (aarch64_resolve_overloaded_builtin): New hook. Call
>>     aarch64_resolve_overload

[PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-10-17 Thread Dennis Zhang
Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and 
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to 
implement functionalities of creating tags, setting tags, reading tags, 
and manipulating tags.
The intrinsics are part of Arm ACLE extension: 
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
The MTE ISA specification can be found at 
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.

Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
(aarch64_init_memtag_builtins): New.
(AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
(aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
(aarch64_expand_builtin_memtag): New.
(aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
(AARCH64_BUILTIN_SUBCODE): New macro.
(aarch64_resolve_overloaded_memtag): New.
(aarch64_resolve_overloaded_builtin): New hook. Call
aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MEMORY_TAGGING when enabled.
* config/aarch64/aarch64-protos.h (aarch64_resolve_overloaded_builtin):
Add declaration.
* config/aarch64/aarch64.c (TARGET_RESOLVE_OVERLOADED_BUILTIN):
New hook.
* config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
(TARGET_MEMTAG): Likewise.
* config/aarch64/aarch64.md (define_c_enum "unspec"): Add
UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
(irg, gmi, subp, addg, ldg, stg): New instructions.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
(__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
(__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
(aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
* config/arm/types.md (memtag): New.
* doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-10-16  Dennis Zhang  

* gcc.target/aarch64/acle/memtag_1.c: New test.
* gcc.target/aarch64/acle/memtag_2.c: New test.
* gcc.target/aarch64/acle/memtag_3.c: New test.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index e02ece8672a..b77bcc42eab 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -445,6 +445,15 @@ enum aarch64_builtins
   AARCH64_TME_BUILTIN_TCOMMIT,
   AARCH64_TME_BUILTIN_TTEST,
   AARCH64_TME_BUILTIN_TCANCEL,
+  /* MEMTAG builtins.  */
+  AARCH64_MEMTAG_BUILTIN_START,
+  AARCH64_MEMTAG_BUILTIN_IRG,
+  AARCH64_MEMTAG_BUILTIN_GMI,
+  AARCH64_MEMTAG_BUILTIN_SUBP,
+  AARCH64_MEMTAG_BUILTIN_INC_TAG,
+  AARCH64_MEMTAG_BUILTIN_SET_TAG,
+  AARCH64_MEMTAG_BUILTIN_GET_TAG,
+  AARCH64_MEMTAG_BUILTIN_END,
   AARCH64_BUILTIN_MAX
 };
 
@@ -,6 +1120,52 @@ aarch64_init_tme_builtins (void)
    AARCH64_TME_BUILTIN_TCANCEL);
 }
 
+/* Initialize the memory tagging extension (MTE) builtins.  */
+struct
+{
+  tree ftype;
+  enum insn_code icode;
+} aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_END -
+			  AARCH64_MEMTAG_BUILTIN_START - 1];
+
+static void
+aarch64_init_memtag_builtins (void)
+{
+  tree fntype = NULL;
+
+#define AARCH64_INIT_MEMTAG_BUILTINS_DECL(F, N, I, T) \
+  aarch64_builtin_decls[AARCH64_MEMTAG_BUILTIN_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_memtag_"#N, \
+   T, AARCH64_MEMTAG_BUILTIN_##F); \
+  aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_##F - \
+			  AARCH64_MEMTAG_BUILTIN_START - 1] = \
+{T, CODE_FOR_##I};
+
+  fntype = build_function_type_list (ptr_type_node, ptr_type_node,
+ uint64_type_node, NULL);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (IRG, irg, irg, fntype);
+
+  fntype = build_function_type_list (uint64_type_node, ptr_type_node,
+ uint64_type_node, NULL);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (GMI, gmi, gmi, fntype);
+
+  fntype = build_function_type_list (ptrdiff_type_node, ptr_type_node,
+ ptr_type_node, NULL);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (SUBP, subp, subp, fntype);
+
+  fntype = build_function_type_list (ptr_type_node, ptr_type_node,
+ unsigned_type_node, NULL);
+  AARCH64_INIT_MEMTAG_BU

Re: [PATCH][Arm] Add support for missing CPUs

2019-08-23 Thread Dennis Zhang
Thanks a lot Kyrill!
I will bare this in mind for future patches.

Dennis


From: Kyrill Tkachov 
Sent: Friday, August 23, 2019 9:27 AM
To: Dennis Zhang ; gcc-patches@gcc.gnu.org 

Cc: nd ; Richard Earnshaw ; Ramana 
Radhakrishnan 
Subject: Re: [PATCH][Arm] Add support for missing CPUs

Hi Dennis,

On 8/22/19 4:52 PM, Dennis Zhang wrote:
> Hi all,
>
> This patch adds '-mcpu' options for following CPUs:
> Cortex-M35P, Cortex-A77, Cortex-A76AE.
>
> Related specifications are as following:
> https://developer.arm.com/ip-products/processors/cortex-m
> https://developer.arm.com/ip-products/processors/cortex-a
>
> Bootstraped/Regtested for arm-none-linux-gnueabihf.
>
> Please help to check.
>
> Cheers
> Dennis
>
> gcc/ChangeLog:
>
> 2019-07-29  Dennis Zhang  
>
> * config/arm/arm-cpus.in: New entries for Cortex-M35P,
> Cortex-A76AE and Cortex-A77.
> * config/arm/arm-tables.opt: Regenerated.
> * config/arm/arm-tune.md: Likewise.
> * doc/invoke.texi: Document the added processors.

The patch is ok. The ChangeLogshould list each new cpu entry as a
separate entity.

I've adjusted the ChangeLog to:

2019-08-23  Dennis Zhang 

 * config/arm/arm-cpus.in (cortex-m35p): New entry.
 (cortex-a76ae): Likewise.
 (cortex-a77): Likewise
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/arm-tune.md: Likewise.
 * doc/invoke.texi (ARM Options): Document cortex-m35p, cortx-a76ae,
 cortex-a77 CPU options.

and committed it to trunk for you with r274845.

Thanks for the patch!

Kyrill




[PATCH][Arm] Add support for missing CPUs

2019-08-22 Thread Dennis Zhang
Hi all,

This patch adds '-mcpu' options for following CPUs:
Cortex-M35P, Cortex-A77, Cortex-A76AE.

Related specifications are as following:
https://developer.arm.com/ip-products/processors/cortex-m
https://developer.arm.com/ip-products/processors/cortex-a

Bootstraped/Regtested for arm-none-linux-gnueabihf.

Please help to check.

Cheers
Dennis

gcc/ChangeLog:

2019-07-29  Dennis Zhang  

* config/arm/arm-cpus.in: New entries for Cortex-M35P,
Cortex-A76AE and Cortex-A77.
* config/arm/arm-tables.opt: Regenerated.
* config/arm/arm-tune.md: Likewise.
* doc/invoke.texi: Document the added processors.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 3a55f6ac6d2..f8a3b3db67a 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1331,6 +1331,28 @@ begin cpu cortex-a76
  part d0b
 end cpu cortex-a76
 
+begin cpu cortex-a76ae
+ cname cortexa76ae
+ tune for cortex-a57
+ tune flags LDSCHED
+ architecture armv8.2-a+fp16+dotprod+simd
+ option crypto add FP_ARMv8 CRYPTO
+ costs cortex_a57
+ vendor 41
+ part d0e
+end cpu cortex-a76ae
+
+begin cpu cortex-a77
+ cname cortexa77
+ tune for cortex-a57
+ tune flags LDSCHED
+ architecture armv8.2-a+fp16+dotprod+simd
+ option crypto add FP_ARMv8 CRYPTO
+ costs cortex_a57
+ vendor 41
+ part d0d
+end cpu cortex-a77
+
 begin cpu neoverse-n1
  cname neoversen1
  alias !ares
@@ -1379,6 +1401,15 @@ begin cpu cortex-m33
  costs v7m
 end cpu cortex-m33
 
+begin cpu cortex-m35p
+ cname cortexm35p
+ tune flags LDSCHED
+ architecture armv8-m.main+dsp+fp
+ option nofp remove ALL_FP
+ option nodsp remove armv7em
+ costs v7m
+end cpu cortex-m35p
+
 # V8 R-profile implementations.
 begin cpu cortex-r52
  cname cortexr52
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index bba54aea3d6..aeb5b3fbf62 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -234,6 +234,12 @@ Enum(processor_type) String(cortex-a75) Value( TARGET_CPU_cortexa75)
 EnumValue
 Enum(processor_type) String(cortex-a76) Value( TARGET_CPU_cortexa76)
 
+EnumValue
+Enum(processor_type) String(cortex-a76ae) Value( TARGET_CPU_cortexa76ae)
+
+EnumValue
+Enum(processor_type) String(cortex-a77) Value( TARGET_CPU_cortexa77)
+
 EnumValue
 Enum(processor_type) String(neoverse-n1) Value( TARGET_CPU_neoversen1)
 
@@ -249,6 +255,9 @@ Enum(processor_type) String(cortex-m23) Value( TARGET_CPU_cortexm23)
 EnumValue
 Enum(processor_type) String(cortex-m33) Value( TARGET_CPU_cortexm33)
 
+EnumValue
+Enum(processor_type) String(cortex-m35p) Value( TARGET_CPU_cortexm35p)
+
 EnumValue
 Enum(processor_type) String(cortex-r52) Value( TARGET_CPU_cortexr52)
 
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index b9dfb66ec84..6fa5bb27750 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -44,7 +44,8 @@
 	cortexa73,exynosm1,xgene1,
 	cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,
 	cortexa73cortexa53,cortexa55,cortexa75,
-	cortexa76,neoversen1,cortexa75cortexa55,
-	cortexa76cortexa55,cortexm23,cortexm33,
+	cortexa76,cortexa76ae,cortexa77,
+	neoversen1,cortexa75cortexa55,cortexa76cortexa55,
+	cortexm23,cortexm33,cortexm35p,
 	cortexr52"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 29585cf15aa..42a9d7f81ed 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17510,10 +17510,12 @@ Permissible names are: @samp{arm7tdmi}, @samp{arm7tdmi-s}, @samp{arm710t},
 @samp{cortex-a9}, @samp{cortex-a12}, @samp{cortex-a15}, @samp{cortex-a17},
 @samp{cortex-a32}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{cortex-a76}, @samp{ares}, @samp{cortex-r4}, @samp{cortex-r4f},
+@samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
+@samp{ares}, @samp{cortex-r4}, @samp{cortex-r4f},
 @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-r8}, @samp{cortex-r52},
 @samp{cortex-m0}, @samp{cortex-m0plus}, @samp{cortex-m1}, @samp{cortex-m3},
 @samp{cortex-m4}, @samp{cortex-m7}, @samp{cortex-m23}, @samp{cortex-m33},
+@samp{cortex-m35p},
 @samp{cortex-m1.small-multiply}, @samp{cortex-m0.small-multiply},
 @samp{cortex-m0plus.small-multiply}, @samp{exynos-m1}, @samp{marvell-pj4},
 @samp{neoverse-n1}, @samp{xscale}, @samp{iwmmxt}, @samp{iwmmxt2},
@@ -17577,14 +17579,14 @@ The following extension options are common to the listed CPUs:
 
 @table @samp
 @item +nodsp
-Disable the DSP instructions on @samp{cortex-m33}.
+Disable the DSP instructions on @samp{cortex-m33}, @samp{cortex-m35p}.
 
 @item  +nofp
 Disables the floating-point instructions on @samp{arm9e},
 @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},
 @samp{arm1020e}, @samp{arm1022e}, @samp{arm926ej-s},
 @samp{arm1026ej-s}, @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-r8},
-@samp{cortex-m4}, @samp{cortex-m7} and @samp{cort

[PATCH][AArch64] Add support for missing CPUs

2019-08-21 Thread Dennis Zhang
Hi all,

This patch adds '-mcpu' options for following CPUs:
Cortex-A77, Cortex-A76AE, Cortex-A65, Cortex-A65AE, and Cortex-A34.

Related specifications are as following:
https://developer.arm.com/ip-products/processors/cortex-a

Bootstraped/regtested for aarch64-none-linux-gnu.

Please help to check if it's ready.

Many thanks!
Dennis

gcc/ChangeLog:

2019-08-21  Dennis Zhang  

* config/aarch64/aarch64-cores.def (AARCH64_CORE): New entries
for Cortex-A77, Cortex-A76AE, Cortex-A65, Cortex-A65AE, and
Cortex-A34.
* config/aarch64/aarch64-tune.md: Regenerated.
* doc/invoke.texi: Document the new processors.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 82d91d62519..c0be109009f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -46,6 +46,7 @@
 /* ARMv8-A Architecture Processors.  */
 
 /* ARM ('A') cores. */
+AARCH64_CORE("cortex-a34",  cortexa34, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa35, 0x41, 0xd02, -1)
 AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04, -1)
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03, -1)
 AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07, -1)
@@ -100,6 +101,10 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  AARCH64_FL_FOR
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
 AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0b, -1)
+AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0e, -1)
+AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0d, -1)
+AARCH64_CORE("cortex-a65",  cortexa65, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1)
+AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1)
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index 2b1ec85ae31..a6a14b7fc77 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,neoversee1,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+	"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa65,cortexa65ae,ares,neoversen1,neoversee1,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 29585cf15aa..3aa59b9a125 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15809,7 +15809,9 @@ Specify the name of the targ

Re: [PATCH][AArch64] Remove constraint strings from define_expand constructs in the back end

2019-07-03 Thread Dennis Zhang
Hi Richard, Thanks for the tips.

The special exceptions according to TARGET_SECONDARY_RELOAD hook are 
revised. Some related patterns still need constraints in order to work 
in an expected way in the TARGET_SECONDARY_RELOAD function.

The updated patch is tested for targets: aarch64_be-linux-gnu,
aarch64_be-none-linux-gnu, aarch64-linux-gnu, and 
aarch64-none-linux-gnu. It survives in testsuite regression.

gcc/ChangeLog:

2019-07-03  Dennis Zhang  

* config/aarch64/aarch64.md: Remove redundant constraints from
define_expand but keep some patterns untouched if they are
specially selected by TARGET_SECONDARY_RELOAD hook.
* config/aarch64/aarch64-sve.md: Likewise.
* config/aarch64/atomics.md: Remove redundant constraints from
define_expand.
* config/aarch64/aarch64-simd.md: Likewise.

On 7/2/19 8:05 AM, Richard Sandiford wrote:
> James Greenhalgh  writes:
>> On Mon, Jun 24, 2019 at 04:33:40PM +0100, Dennis Zhang wrote:
>>> Hi,
>>>
>>> A number of AArch64 define_expand patterns have specified constraints
>>> for their operands. But the constraint strings are ignored at expand
>>> time and are therefore redundant/useless. We now avoid specifying
>>> constraints in new define_expands, but we should clean up the existing
>>> define_expand definitions.
>>>
>>> For example, the constraint "=w" is removed in the following case:
>>> (define_expand "sqrt2"
>>> [(set (match_operand:GPF_F16 0 "register_operand" "=w")
>>> The "" marks with an empty constraint in define_expand are removed as well.
>>>
>>> The patch is tested with the build configuration of
>>> --target=aarch64-none-linux-gnu, and it passes gcc/testsuite.
>>
>> This is OK for trunk.
> 
> My fault, sorry, but... Kyrill pointed out when the corresponding arm
> patch was posted that it removes constraints from reload expanders that
> actually need them.  This patch has the same problem and so shouldn't
> go in as-is.
> 
> I'd thought at the time that Kyrill's comment applied to both patches,
> but I see now that it really was specific to arm.
> 
> Thanks,
> Richard
> 
>>
>> Thanks,
>> James
>>
>>> gcc/ChangeLog:
>>>
>>> 2019-06-21  Dennis Zhang  
>>>
>>> * config/aarch64/aarch64-simd.md: Remove redundant constraints
>>> from define_expand.
>>> * config/aarch64/aarch64-sve.md: Likewise.
>>> * config/aarch64/aarch64.md: Likewise.
>>> * config/aarch64/atomics.md: Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index df8bf1d9778..837242c7e56 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,8 +19,8 @@
 ;; <http://www.gnu.org/licenses/>.
 
 (define_expand "mov"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand" "")
-	(match_operand:VALL_F16 1 "general_operand" ""))]
+  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
+	(match_operand:VALL_F16 1 "general_operand"))]
   "TARGET_SIMD"
   "
   /* Force the operand into a register if it is not an
@@ -39,8 +39,8 @@
 )
 
 (define_expand "movmisalign"
-  [(set (match_operand:VALL 0 "nonimmediate_operand" "")
-(match_operand:VALL 1 "general_operand" ""))]
+  [(set (match_operand:VALL 0 "nonimmediate_operand")
+(match_operand:VALL 1 "general_operand"))]
   "TARGET_SIMD"
 {
   /* This pattern is not permitted to fail during expansion: if both arguments
@@ -652,8 +652,8 @@
   [(set_attr "type" "neon_fp_rsqrts_")])
 
 (define_expand "rsqrt2"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
+  [(set (match_operand:VALLF 0 "register_operand")
+	(unspec:VALLF [(match_operand:VALLF 1 "register_operand")]
 		 UNSPEC_RSQRT))]
   "TARGET_SIMD"
 {
@@ -1025,9 +1025,9 @@
 )
 
 (define_expand "ashl3"
-  [(match_operand:VDQ_I 0 "register_operand" "")
-   (match_operand:VDQ_I 1 "register_operand" "")
-   (match_operand:SI  2 "general_operand" "")]
+  [(match_operand:VDQ_I 0 "register_operand")
+   (match_operand:VDQ_I 1 "register_operand")
+   (match_operand:SI  2 "general_operand")]
  "TARGET_SIMD"
 {
   int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
@@ -1072,9 +1072,9 @@
 )
 
 (define_expand "lshr3"
-  

[PATCH][AArch64] Remove constraint strings from define_expand constructs in the back end

2019-06-24 Thread Dennis Zhang
Hi,

A number of AArch64 define_expand patterns have specified constraints 
for their operands. But the constraint strings are ignored at expand 
time and are therefore redundant/useless. We now avoid specifying 
constraints in new define_expands, but we should clean up the existing 
define_expand definitions.

For example, the constraint "=w" is removed in the following case:
(define_expand "sqrt2"
   [(set (match_operand:GPF_F16 0 "register_operand" "=w")
The "" marks with an empty constraint in define_expand are removed as well.

The patch is tested with the build configuration of 
--target=aarch64-none-linux-gnu, and it passes gcc/testsuite.

Thanks
Dennis

gcc/ChangeLog:

2019-06-21  Dennis Zhang  

* config/aarch64/aarch64-simd.md: Remove redundant constraints
from define_expand.
* config/aarch64/aarch64-sve.md: Likewise.
* config/aarch64/aarch64.md: Likewise.
* config/aarch64/atomics.md: Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index df8bf1d9778..837242c7e56 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,8 +19,8 @@
 ;; <http://www.gnu.org/licenses/>.
 
 (define_expand "mov"
-  [(set (match_operand:VALL_F16 0 "nonimmediate_operand" "")
-	(match_operand:VALL_F16 1 "general_operand" ""))]
+  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
+	(match_operand:VALL_F16 1 "general_operand"))]
   "TARGET_SIMD"
   "
   /* Force the operand into a register if it is not an
@@ -39,8 +39,8 @@
 )
 
 (define_expand "movmisalign"
-  [(set (match_operand:VALL 0 "nonimmediate_operand" "")
-(match_operand:VALL 1 "general_operand" ""))]
+  [(set (match_operand:VALL 0 "nonimmediate_operand")
+(match_operand:VALL 1 "general_operand"))]
   "TARGET_SIMD"
 {
   /* This pattern is not permitted to fail during expansion: if both arguments
@@ -652,8 +652,8 @@
   [(set_attr "type" "neon_fp_rsqrts_")])
 
 (define_expand "rsqrt2"
-  [(set (match_operand:VALLF 0 "register_operand" "=w")
-	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
+  [(set (match_operand:VALLF 0 "register_operand")
+	(unspec:VALLF [(match_operand:VALLF 1 "register_operand")]
 		 UNSPEC_RSQRT))]
   "TARGET_SIMD"
 {
@@ -1025,9 +1025,9 @@
 )
 
 (define_expand "ashl3"
-  [(match_operand:VDQ_I 0 "register_operand" "")
-   (match_operand:VDQ_I 1 "register_operand" "")
-   (match_operand:SI  2 "general_operand" "")]
+  [(match_operand:VDQ_I 0 "register_operand")
+   (match_operand:VDQ_I 1 "register_operand")
+   (match_operand:SI  2 "general_operand")]
  "TARGET_SIMD"
 {
   int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
@@ -1072,9 +1072,9 @@
 )
 
 (define_expand "lshr3"
-  [(match_operand:VDQ_I 0 "register_operand" "")
-   (match_operand:VDQ_I 1 "register_operand" "")
-   (match_operand:SI  2 "general_operand" "")]
+  [(match_operand:VDQ_I 0 "register_operand")
+   (match_operand:VDQ_I 1 "register_operand")
+   (match_operand:SI  2 "general_operand")]
  "TARGET_SIMD"
 {
   int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
@@ -1119,9 +1119,9 @@
 )
 
 (define_expand "ashr3"
-  [(match_operand:VDQ_I 0 "register_operand" "")
-   (match_operand:VDQ_I 1 "register_operand" "")
-   (match_operand:SI  2 "general_operand" "")]
+  [(match_operand:VDQ_I 0 "register_operand")
+   (match_operand:VDQ_I 1 "register_operand")
+   (match_operand:SI  2 "general_operand")]
  "TARGET_SIMD"
 {
   int bit_width = GET_MODE_UNIT_SIZE (mode) * BITS_PER_UNIT;
@@ -1166,9 +1166,9 @@
 )
 
 (define_expand "vashl3"
- [(match_operand:VDQ_I 0 "register_operand" "")
-  (match_operand:VDQ_I 1 "register_operand" "")
-  (match_operand:VDQ_I 2 "register_operand" "")]
+ [(match_operand:VDQ_I 0 "register_operand")
+  (match_operand:VDQ_I 1 "register_operand")
+  (match_operand:VDQ_I 2 "register_operand")]
  "TARGET_SIMD"
 {
   emit_insn (gen_aarch64_simd_reg_sshl (operands[0], operands[1],
@@ -1180,9 +1180,9 @@
 ;; Negating individual lanes most certainly offsets the
 ;; gain from vectorization.
 (define_expand "vashr3"
- [(match_operand:VDQ_BHSI 0 "register_operand" "")
-  (match_operand:VDQ_BHSI 1 "regi

[PATCH][Arm] Remove constraint strings from define_expand constructs in the back end

2019-06-24 Thread Dennis Zhang
Hi,

A number of Arm define_expand patterns have specified constraints for 
their operands. But the constraint strings are ignored at expand time 
and are therefore redundant/useless. We now avoid specifying constraints 
in new define_expands, but we should clean up the existing define_expand 
definitions.

For example, the constraint "=r" is removed in the following case:
(define_expand "reload_inhi"
     [(parallel [(match_operand:HI 0 "s_register_operand" "=r")
The "" marks with an empty constraint in define_expand are removed as well.

The patch is tested with the build configuration of 
--target=arm-linux-gnueabi and it passes gcc/testsuite.

Thanks,
Dennis

gcc/ChangeLog:

2019-06-21  Dennis Zhang  

       * config/arm/arm-fixed.md: Remove redundant constraints from
       define_expand.
       * config/arm/arm.md: Likewise.
       * config/arm/iwmmxt.md: Likewise.
       * config/arm/neon.md: Likewise.
       * config/arm/sync.md: Likewise.
       * config/arm/thumb1.md: Likewise.
       * config/arm/vec-common.md: Likewise.

diff --git a/gcc/config/arm/arm-fixed.md b/gcc/config/arm/arm-fixed.md
index 6534ed41488..fcab40d13f6 100644
--- a/gcc/config/arm/arm-fixed.md
+++ b/gcc/config/arm/arm-fixed.md
@@ -98,9 +98,9 @@
 ; Note: none of these do any rounding.
 
 (define_expand "mulqq3"
-  [(set (match_operand:QQ 0 "s_register_operand" "")
-	(mult:QQ (match_operand:QQ 1 "s_register_operand" "")
-		 (match_operand:QQ 2 "s_register_operand" "")))]
+  [(set (match_operand:QQ 0 "s_register_operand")
+	(mult:QQ (match_operand:QQ 1 "s_register_operand")
+		 (match_operand:QQ 2 "s_register_operand")))]
   "TARGET_DSP_MULTIPLY && arm_arch_thumb2"
 {
   rtx tmp1 = gen_reg_rtx (HImode);
@@ -116,9 +116,9 @@
 })
 
 (define_expand "mulhq3"
-  [(set (match_operand:HQ 0 "s_register_operand" "")
-	(mult:HQ (match_operand:HQ 1 "s_register_operand" "")
-		 (match_operand:HQ 2 "s_register_operand" "")))]
+  [(set (match_operand:HQ 0 "s_register_operand")
+	(mult:HQ (match_operand:HQ 1 "s_register_operand")
+		 (match_operand:HQ 2 "s_register_operand")))]
   "TARGET_DSP_MULTIPLY && arm_arch_thumb2"
 {
   rtx tmp = gen_reg_rtx (SImode);
@@ -134,9 +134,9 @@
 })
 
 (define_expand "mulsq3"
-  [(set (match_operand:SQ 0 "s_register_operand" "")
-	(mult:SQ (match_operand:SQ 1 "s_register_operand" "")
-		 (match_operand:SQ 2 "s_register_operand" "")))]
+  [(set (match_operand:SQ 0 "s_register_operand")
+	(mult:SQ (match_operand:SQ 1 "s_register_operand")
+		 (match_operand:SQ 2 "s_register_operand")))]
   "TARGET_32BIT"
 {
   rtx tmp1 = gen_reg_rtx (DImode);
@@ -156,9 +156,9 @@
 ;; Accumulator multiplies.
 
 (define_expand "mulsa3"
-  [(set (match_operand:SA 0 "s_register_operand" "")
-	(mult:SA (match_operand:SA 1 "s_register_operand" "")
-		 (match_operand:SA 2 "s_register_operand" "")))]
+  [(set (match_operand:SA 0 "s_register_operand")
+	(mult:SA (match_operand:SA 1 "s_register_operand")
+		 (match_operand:SA 2 "s_register_operand")))]
   "TARGET_32BIT"
 {
   rtx tmp1 = gen_reg_rtx (DImode);
@@ -175,9 +175,9 @@
 })
 
 (define_expand "mulusa3"
-  [(set (match_operand:USA 0 "s_register_operand" "")
-	(mult:USA (match_operand:USA 1 "s_register_operand" "")
-		  (match_operand:USA 2 "s_register_operand" "")))]
+  [(set (match_operand:USA 0 "s_register_operand")
+	(mult:USA (match_operand:USA 1 "s_register_operand")
+		  (match_operand:USA 2 "s_register_operand")))]
   "TARGET_32BIT"
 {
   rtx tmp1 = gen_reg_rtx (DImode);
@@ -317,9 +317,9 @@
 		  (const_int 32)))])
 
 (define_expand "mulha3"
-  [(set (match_operand:HA 0 "s_register_operand" "")
-	(mult:HA (match_operand:HA 1 "s_register_operand" "")
-		 (match_operand:HA 2 "s_register_operand" "")))]
+  [(set (match_operand:HA 0 "s_register_operand")
+	(mult:HA (match_operand:HA 1 "s_register_operand")
+		 (match_operand:HA 2 "s_register_operand")))]
   "TARGET_DSP_MULTIPLY && arm_arch_thumb2"
 {
   rtx tmp = gen_reg_rtx (SImode);
@@ -333,9 +333,9 @@
 })
 
 (define_expand "muluha3"
-  [(set (match_operand:UHA 0 "s_register_operand" "")
-	(mult:UHA (match_operand:UHA 1 "s_register_operand" "")
-		  (match_operand:UHA 2 "s_register_operand" "")))]
+  [(set (match_operand:UHA 0 &quo