Re: [backport gcc-10][AArch64] ACLE bf16 convert

2020-12-11 Thread Dennis Zhang via Gcc-patches
> 
> From: Kyrylo Tkachov 
> Sent: Friday, December 11, 2020 11:23 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Marcus Shawcroft; Richard Sandiford
> Subject: RE: [backport gcc-10][AArch64] ACLE bf16 convert
> 
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 10 December 2020 14:27
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; Richard Earnshaw ;
> > Marcus Shawcroft ; Kyrylo Tkachov
> > ; Richard Sandiford
> > 
> > Subject: [backport gcc-10][AArch64] ACLE bf16 convert
> >
> > Hi all,
> >
> > This patch backports the commit
> > f7d6961126a7f06c8089d8a58bd21be43bc16806.
> > The original is approved at https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > November/557859.html
> > The only change is to remove FPCR-reading flags for builtin definition since
> > it's not supported in gcc-10.
> > Regtested and bootstrapped for aarch64-none-linux-gnu.
> >
> > Is it OK to backport?
> 
> Ok.
> Thanks,
> Kyrill

Thanks Kyrill!
The patch is committed as 702e45ee471422dee86d32fc84f617d341d33175.

Bests
Dennis


Re: [backport gcc-10][AArch64] ACLE bf16 get

2020-12-11 Thread Dennis Zhang via Gcc-patches
Hi Kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Friday, December 11, 2020 11:58 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Marcus Shawcroft; Richard Sandiford
> Subject: RE: [backport gcc-10][AArch64] ACLE bf16 get
> 
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 10 December 2020 14:35
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; Richard Earnshaw ;
> > Marcus Shawcroft ; Kyrylo Tkachov
> > ; Richard Sandiford
> > 
> > Subject: [backport gcc-10][AArch64] ACLE bf16 get
> >
> > Hi all,
> >
> > This patch backports the commit
> > 3553c658533e430b232997bdfd97faf6606fb102.
> > The original is approved at https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > November/557871.html
> > There is a change to remove FPCR-reading flag for builtin declaration since
> > it's not supported in gcc-10.
> >
> > Another change is to remove a test (bf16_get-be.c) that fails compiling on
> > aarch64-none-linux-gnu in the original patch.
> > This is reported at https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > November/558195.html
> > The failure happens for several bf16 big-endian tests so the bug would be
> > fixed in a separate patch.
> > And the test should be added after the bug is fixed.
> >
> > Is it OK to backport?
> 
> But do the tests added here work for big-endian?
> Ok if they do.
> Thanks,
> Kyrill

Thanks for asking. The added test (bf16_get.c) works for both 
aarch64-none-linux-gnu and aarch64_be-none-linux-gnu.
The patch is commited as c25f7eac6555d67523f0520c7e93bbc398d0da84.

Cheers
Dennis


Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-12-10 Thread Dennis Zhang via Gcc-patches
Hi Christophe,

> From: Christophe Lyon 
> Sent: Monday, November 9, 2020 1:38 PM
> To: Dennis Zhang
> Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; nd; Ramana 
> Radhakrishnan
> Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
>
> Hi,
>
> I have just noticed that the new test has:
> /* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
> /* { dg-additional-options "-O3" } */
> That is, the first line has a typo (space between dg and -additional-options),
> so the test is effectively compiled with -O3, and without
> -funsafe-math-optimizations
>
> Since I can see it passing, it looks like -funsafe-math-optimizations
> is not needed, can you clarify?
>
> Thanks

Thank you for the report. The '-funsafe-math-optimizations' option is not 
needed.
The typo is fixed by commit b46dd03fe94e2428cbcdbfc4d081d89ed604803a.

Bests
Dennis


[committed][Patch]arm: Fix typo in testcase mve-vsub_1.c

2020-12-10 Thread Dennis Zhang via Gcc-patches
This patch fixes a typo reported at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558478.html

gcc/testsuite/
* gcc.target/arm/simd/mve-vsub_1.c: Fix typo.
Remove needless dg-additional-options.

Cheers,
Dennisdiff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
index cb3ef3a14e0..842e5c6a30b 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
 /* { dg-additional-options "-O3" } */
 
 #include 


[backport gcc-10][AArch64] ACLE bf16 get

2020-12-10 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch backports the commit 3553c658533e430b232997bdfd97faf6606fb102.
The original is approved at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557871.html
There is a change to remove FPCR-reading flag for builtin declaration since 
it's not supported in gcc-10.

Another change is to remove a test (bf16_get-be.c) that fails compiling on 
aarch64-none-linux-gnu in the original patch.
This is reported at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558195.html
The failure happens for several bf16 big-endian tests so the bug would be fixed 
in a separate patch.
And the test should be added after the bug is fixed.

Is it OK to backport?

Cheers
Dennisdiff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index ba2bda26dcdd4947dc724851433451433d378724..05726db1f6137f9ab29fcdd51f804199e24bbfcf 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -718,6 +718,10 @@
   VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, v4sf)
   VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, v4sf)
 
+  /* Implemented by aarch64_vget_lo/hi_halfv8bf.  */
+  VAR1 (UNOP, vget_lo_half, 0, v8bf)
+  VAR1 (UNOP, vget_hi_half, 0, v8bf)
+
   /* Implemented by aarch64_simd_mmlav16qi.  */
   VAR1 (TERNOP, simd_smmla, 0, v16qi)
   VAR1 (TERNOPU, simd_ummla, 0, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6ff5246f84e919402c687687a84beb8..43ac3cd40fe8379567b7a60772f360d37818e8e9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,27 @@
   [(set_attr "type" "neon_dot")]
 )
 
+;; vget_low/high_bf16
+(define_expand "aarch64_vget_lo_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")]
+  "TARGET_BF16_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, false);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p));
+  DONE;
+})
+
+(define_expand "aarch64_vget_hi_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")]
+  "TARGET_BF16_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, true);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p));
+  DONE;
+})
+
 ;; bfmmla
 (define_insn "aarch64_bfmmlaqv4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 95bfa5ebba21b739ee3c84e3971337646f8881d4..0fd78a6fd076f788d2618c492a026246e61e438c 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35680,6 +35680,20 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vget_low_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vget_lo_halfv8bf (__a);
+}
+
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vget_high_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vget_hi_halfv8bf (__a);
+}
+
 __extension__ extern __inline bfloat16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vcvt_bf16_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c
new file mode 100644
index ..2193753ffbb6246aa16eb5033559b21266a556a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c
@@ -0,0 +1,27 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+
+#include 
+
+/*
+**test_vget_low_bf16:
+** ret
+*/
+bfloat16x4_t test_vget_low_bf16 (bfloat16x8_t a)
+{
+  return vget_low_bf16 (a);
+}
+
+/*
+**test_vget_high_bf16:
+** dup	d0, v0.d\[1\]
+** ret
+*/
+bfloat16x4_t test_vget_high_bf16 (bfloat16x8_t a)
+{
+  return vget_high_bf16 (a);
+}


[backport gcc-10][AArch64] ACLE bf16 convert

2020-12-10 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch backports the commit f7d6961126a7f06c8089d8a58bd21be43bc16806.
The original is approved at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557859.html
The only change is to remove FPCR-reading flags for builtin definition since 
it's not supported in gcc-10.
Regtested and bootstrapped for aarch64-none-linux-gnu.

Is it OK to backport?

Cheers
Dennisdiff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index ba2bda26dcdd4947dc724851433451433d378724..7192f3954d311d89064707cfcb735efad4377c12 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -728,3 +728,8 @@
   VAR1 (UNOP, bfcvtn_q, 0, v8bf)
   VAR1 (BINOP, bfcvtn2, 0, v8bf)
   VAR1 (UNOP, bfcvt, 0, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, v8bf)
+  VAR1 (UNOP, bfcvt, 0, sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6ff5246f84e919402c687687a84beb8..2e8aa668b107f039e4958b6998da180a6d11b881 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7238,3 +7238,31 @@
   "bfcvt\\t%h0, %s1"
   [(set_attr "type" "f_cvt")]
 )
+
+;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes.
+(define_insn "aarch64_vbfcvt"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN))]
+  "TARGET_BF16_SIMD"
+  "shll\\t%0.4s, %1.4h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_vbfcvt_highv8bf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN2))]
+  "TARGET_BF16_SIMD"
+  "shll2\\t%0.4s, %1.8h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+	(unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+		UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_imm")]
+)
diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
index 984875dcc014300c489209c11abf41b1c47b7fbe..881615498d3d52662d7ebb3ab1e8d52d5a40cab8 100644
--- a/gcc/config/aarch64/arm_bf16.h
+++ b/gcc/config/aarch64/arm_bf16.h
@@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a)
   return __builtin_aarch64_bfcvtbf (__a);
 }
 
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtah_f32_bf16 (bfloat16_t __a)
+{
+  return __builtin_aarch64_bfcvtsf (__a);
+}
+
 #pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 95bfa5ebba21b739ee3c84e3971337646f8881d4..69cccd3278642814f3961c5bf52be5639f5ef3f3 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35680,6 +35680,27 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_f32_bf16 (bfloat16x4_t __a)
+{
+  return __builtin_aarch64_vbfcvtv4bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvtv8bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvt_highv8bf (__a);
+}
+
 __extension__ extern __inline bfloat16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vcvt_bf16_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
index bbea630b1820d578bdf1619834f29b919f5c3f32..47af7c494d9b9d1f4b63e802efc293348a40e270 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
@@ -46,3 +46,43 @@ bfloat16_t test_bfcvt (float32_t a)
 {
   return vcvth_bf16_f32 (a);
 }
+
+/*
+**test_vcvt_f32_bf16:
+** shll	v0.4s, v0.4h, #16
+** ret
+*/
+float32x4_t test_vcvt_f32_bf16 (bfloat16x4_t a)
+{
+  return vcvt_f32_bf16 (a);
+}
+
+/*
+**test_vcvtq_low_f32_bf16:
+** shll	v0.4s, v0.4h, #16
+** ret
+*/
+float32x4_t test_vcvtq_low_f32_bf16 (bfloat16x8_t a)
+{
+  return vcvtq_low_f32_bf16 (a);
+}
+
+/*
+**test_vcvtq_high_f32_bf16:
+** shll2	v0.4s, v0.8h, #16
+** ret
+*/
+float32x4_t test_vcvtq_high_f32_bf16 (bfloat16x8_t a)
+{
+  return vcvtq_high_f32_bf16 (a);
+}
+
+/*
+**test_vcvtah_f32_bf16:
+** shl	d0, d0, 

Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

2020-11-03 Thread Dennis Zhang via Gcc-patches

On 11/3/20 2:05 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

Hi Richard,

On 10/30/20 2:07 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 332a0b6b1ea..39ebb776d1d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -719,6 +719,9 @@
 VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
 VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
   
+  /* Implemented by aarch64_vget_halfv8bf.  */

+  VAR1 (GETREG, vget_half, 0, ALL, v8bf)


This should be AUTO_FP, since it doesn't have any side-effects.
(As before, we should probably rename the flag, but that's separate work.)


+
 /* Implemented by aarch64_simd_mmlav16qi.  */
 VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
 VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6f..f62c52ca327 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,19 @@
 [(set_attr "type" "neon_dot")]
   )
   
+;; vget_low/high_bf16

+(define_expand "aarch64_vget_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")
+   (match_operand:SI 2 "aarch64_zero_or_1")]
+  "TARGET_BF16_SIMD"
+{
+  int hbase = INTVAL (operands[2]);
+  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);


I think this needs to be:

aarch64_simd_vect_par_cnst_half

instead.  The issue is that on big-endian targets, GCC assumes vector
lane 0 is in the high part of the register, whereas for AArch64 it's
always in the low part of the register.  So we convert from AArch64
numbering to GCC numbering when generating the rtx and then take
endianness into account when matching the rtx later.

It would be good to have -mbig-endian tests that make sure we generate
the right instruction for each function (i.e. we get them the right way
round).  I guess it would be good to test that for little-endian too.



I've updated the expander using aarch64_simd_vect_par_cnst_half.
And the expander is divided into two for getting low and high half
seperately.
It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu
targets with new tests including -mbig-endian option.


+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
+  DONE;
+})
+
   ;; bfmmla
   (define_insn "aarch64_bfmmlaqv4sf"
 [(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..0c8bc2b0c73 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -84,6 +84,10 @@
 (ior (match_test "op == constm1_rtx")
  (match_test "op == const1_rtx"))
   
+(define_predicate "aarch64_zero_or_1"

+  (and (match_code "const_int")
+   (match_test "op == const0_rtx || op == const1_rtx")))


zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
so let's keep it as-is.



This predicate is removed since there is no need of the imm operand in
the new expanders.

Thanks for the reviews.
Is it OK for trunk now?


Looks good.  OK for trunk and branches, thanks.

Richard



Thanks for approval, Richard!
This patch is committed at 3553c658533e430b232997bdfd97faf6606fb102

Bests
Dennis


Re: [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32

2020-11-03 Thread Dennis Zhang via Gcc-patches



On 11/2/20 7:05 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

Hi Richard,

On 10/29/20 5:48 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 5bc596dbffc..b68c3ca7f4b 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -732,3 +732,8 @@
 VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf)
 VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf)
 VAR1 (UNOP, bfcvt, 0, ALL, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf)
+  VAR1 (UNOP, bfcvt, 0, ALL, sf)


New intrinsics should use something more specific than “ALL”.
Since these functions are pure non-trapping integer operations,
I think they should use “AUTO_FP” instead.  (On reflection,
we should probably change the name.)


+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+   (unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+   UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_reg")]


I think this should be neon_shift_imm instead.

OK with those changes, thanks.

Richard



I've fixed the Flag and the insn attribute.
I will commit it if no further issues.


LGTM, thanks.

Richard


Thanks Richard!
This patch is committed as f7d6961126a7f06c8089d8a58bd21be43bc16806.

Bests
Dennis


Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

2020-11-03 Thread Dennis Zhang via Gcc-patches

Hi Richard,

On 10/30/20 2:07 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 332a0b6b1ea..39ebb776d1d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -719,6 +719,9 @@
VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
  
+  /* Implemented by aarch64_vget_halfv8bf.  */

+  VAR1 (GETREG, vget_half, 0, ALL, v8bf)


This should be AUTO_FP, since it doesn't have any side-effects.
(As before, we should probably rename the flag, but that's separate work.)


+
/* Implemented by aarch64_simd_mmlav16qi.  */
VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6f..f62c52ca327 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,19 @@
[(set_attr "type" "neon_dot")]
  )
  
+;; vget_low/high_bf16

+(define_expand "aarch64_vget_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")
+   (match_operand:SI 2 "aarch64_zero_or_1")]
+  "TARGET_BF16_SIMD"
+{
+  int hbase = INTVAL (operands[2]);
+  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);


I think this needs to be:

   aarch64_simd_vect_par_cnst_half

instead.  The issue is that on big-endian targets, GCC assumes vector
lane 0 is in the high part of the register, whereas for AArch64 it's
always in the low part of the register.  So we convert from AArch64
numbering to GCC numbering when generating the rtx and then take
endianness into account when matching the rtx later.

It would be good to have -mbig-endian tests that make sure we generate
the right instruction for each function (i.e. we get them the right way
round).  I guess it would be good to test that for little-endian too.



I've updated the expander using aarch64_simd_vect_par_cnst_half.
And the expander is divided into two for getting low and high half 
seperately.
It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu 
targets with new tests including -mbig-endian option.



+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
+  DONE;
+})
+
  ;; bfmmla
  (define_insn "aarch64_bfmmlaqv4sf"
[(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..0c8bc2b0c73 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -84,6 +84,10 @@
 (ior (match_test "op == constm1_rtx")
  (match_test "op == const1_rtx"))
  
+(define_predicate "aarch64_zero_or_1"

+  (and (match_code "const_int")
+   (match_test "op == const0_rtx || op == const1_rtx")))


zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
so let's keep it as-is.



This predicate is removed since there is no need of the imm operand in 
the new expanders.


Thanks for the reviews.
Is it OK for trunk now?

Cheers
Dennis


diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index eb8e6f7b3d8..f26a96042bc 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -722,6 +722,10 @@
   VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
   VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
 
+  /* Implemented by aarch64_vget_lo/hi_halfv8bf.  */
+  VAR1 (UNOP, vget_lo_half, 0, AUTO_FP, v8bf)
+  VAR1 (UNOP, vget_hi_half, 0, AUTO_FP, v8bf)
+
   /* Implemented by aarch64_simd_mmlav16qi.  */
   VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
   VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 381a702eba0..af29a2f26f5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,27 @@
   [(set_attr "type" "neon_dot")]
 )
 
+;; vget_low/high_bf16
+(define_expand "aarch64_vget_lo_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")]
+  "TARGET_BF16_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, false);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p));
+  DONE;
+})
+
+(define_expand "aarch64_vget_hi_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")]
+  "TARGET_BF16_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, true);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p));
+  DONE;
+})
+
 ;; bfmmla
 (define_insn "aarch64_bfmmlaqv4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=w")
diff 

Re: [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32

2020-11-02 Thread Dennis Zhang via Gcc-patches

Hi Richard,

On 10/29/20 5:48 PM, Richard Sandiford wrote:

Dennis Zhang  writes:

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 5bc596dbffc..b68c3ca7f4b 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -732,3 +732,8 @@
VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf)
VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf)
VAR1 (UNOP, bfcvt, 0, ALL, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf)
+  VAR1 (UNOP, bfcvt, 0, ALL, sf)


New intrinsics should use something more specific than “ALL”.
Since these functions are pure non-trapping integer operations,
I think they should use “AUTO_FP” instead.  (On reflection,
we should probably change the name.)


+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+   (unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+   UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_reg")]


I think this should be neon_shift_imm instead.

OK with those changes, thanks.

Richard



I've fixed the Flag and the insn attribute.
I will commit it if no further issues.
Thanks for the review.

Regards
Dennis
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index eb8e6f7b3d8..f494b535a30 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -732,3 +732,8 @@
   VAR1 (UNOP, bfcvtn_q, 0, FP, v8bf)
   VAR1 (BINOP, bfcvtn2, 0, FP, v8bf)
   VAR1 (UNOP, bfcvt, 0, FP, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, AUTO_FP, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, AUTO_FP, v8bf)
+  VAR1 (UNOP, bfcvt, 0, AUTO_FP, sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 381a702eba0..030a086d31c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7238,3 +7238,31 @@
   "bfcvt\\t%h0, %s1"
   [(set_attr "type" "f_cvt")]
 )
+
+;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes.
+(define_insn "aarch64_vbfcvt"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN))]
+  "TARGET_BF16_SIMD"
+  "shll\\t%0.4s, %1.4h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_vbfcvt_highv8bf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN2))]
+  "TARGET_BF16_SIMD"
+  "shll2\\t%0.4s, %1.8h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+	(unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+		UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_imm")]
+)
diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
index 984875dcc01..881615498d3 100644
--- a/gcc/config/aarch64/arm_bf16.h
+++ b/gcc/config/aarch64/arm_bf16.h
@@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a)
   return __builtin_aarch64_bfcvtbf (__a);
 }
 
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtah_f32_bf16 (bfloat16_t __a)
+{
+  return __builtin_aarch64_bfcvtsf (__a);
+}
+
 #pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 95bfa5ebba2..69cccd32786 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35680,6 +35680,27 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_f32_bf16 (bfloat16x4_t __a)
+{
+  return __builtin_aarch64_vbfcvtv4bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvtv8bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvt_highv8bf (__a);
+}
+
 __extension__ extern __inline bfloat16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vcvt_bf16_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
index bbea630b182..47af7c494d9 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
+++ 

[PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

2020-10-29 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch implements ACLE intrinsics vget_low_bf16 and vget_high_bf16 to 
extract lower or higher half from a bfloat16x8 vector.
The vget_high_bf16 is done by 'dup' instruction. The vget_low_bf16 could be 
done by a 'dup' or 'mov', or it's mostly optimized out by just using the lower 
half of a vector register.
The test for vget_low_bf16 only checks that the interface can be compiled but 
no instruction is checked since none is generated in the test case.

Arm ACLE document at 
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics

Regtested and bootstrapped.

Is it OK for trunk please?

Thanks
Denni

gcc/ChangeLog:

2020-10-29  Dennis Zhang  

* config/aarch64/aarch64-simd-builtins.def (vget_half): New entry.
* config/aarch64/aarch64-simd.md (aarch64_vget_halfv8bf): New entry.
* config/aarch64/arm_neon.h (vget_low_bf16): New intrinsic.
(vget_high_bf16): Likewise.
* config/aarch64/predicates.md (aarch64_zero_or_1): New predicate
for zero or one immediate to indicate the lower or higher half.

gcc/testsuite/ChangeLog

2020-10-29  Dennis Zhang  

* gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c
(test_vget_low_bf16, test_vget_high_bf16): New tests.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 332a0b6b1ea..39ebb776d1d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -719,6 +719,9 @@
   VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
   VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
 
+  /* Implemented by aarch64_vget_halfv8bf.  */
+  VAR1 (GETREG, vget_half, 0, ALL, v8bf)
+
   /* Implemented by aarch64_simd_mmlav16qi.  */
   VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
   VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9f0e2bd1e6f..f62c52ca327 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7159,6 +7159,19 @@
   [(set_attr "type" "neon_dot")]
 )
 
+;; vget_low/high_bf16
+(define_expand "aarch64_vget_halfv8bf"
+  [(match_operand:V4BF 0 "register_operand")
+   (match_operand:V8BF 1 "register_operand")
+   (match_operand:SI 2 "aarch64_zero_or_1")]
+  "TARGET_BF16_SIMD"
+{
+  int hbase = INTVAL (operands[2]);
+  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);
+  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
+  DONE;
+})
+
 ;; bfmmla
 (define_insn "aarch64_bfmmlaqv4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 50f8b23bc17..c6ac0b8dd17 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35530,6 +35530,20 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vget_low_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vget_halfv8bf (__a, 0);
+}
+
+__extension__ extern __inline bfloat16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vget_high_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vget_halfv8bf (__a, 1);
+}
+
 __extension__ extern __inline bfloat16x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vcvt_bf16_f32 (float32x4_t __a)
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955..0c8bc2b0c73 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -84,6 +84,10 @@
 		 (ior (match_test "op == constm1_rtx")
 		  (match_test "op == const1_rtx"))
 
+(define_predicate "aarch64_zero_or_1"
+  (and (match_code "const_int")
+   (match_test "op == const0_rtx || op == const1_rtx")))
+
 (define_predicate "aarch64_reg_or_orr_imm"
(ior (match_operand 0 "register_operand")
 	(and (match_code "const_vector")
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c
index c42c7acbbe9..35f4cb864f2 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c
@@ -83,3 +83,14 @@ bfloat16_t test_vduph_laneq_bf16 (bfloat16x8_t a)
   return vduph_laneq_bf16 (a, 7);
 }
 /* { dg-final { scan-assembler-times "dup\\th\[0-9\]+, v\[0-9\]+\.h\\\[7\\\]" 2 } } */
+
+bfloat16x4_t test_vget_low_bf16 (bfloat16x8_t a)
+{
+  return vget_low_bf16 (a);
+}
+
+bfloat16x4_t test_vget_high_bf16 (bfloat16x8_t a)
+{
+  return vget_high_bf16 (a);
+}
+/* { dg-final { scan-assembler-times "dup\\td\[0-9\]+, v\[0-9\]+\.d\\\[1\\\]" 1 } } */


[PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32

2020-10-29 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch enables intrinsics to convert BFloat16 scalar and vector operands to 
Float32 modes.
The intrinsics are implemented by shifting each BFloat16 item 16 bits to left 
using shl/shll/shll2 instructions.

Intrinsics are documented at 
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics
ISA is documented at https://developer.arm.com/docs/ddi0596/latest

Regtested and bootstrapped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-29  Dennis Zhang  

* config/aarch64/aarch64-simd-builtins.def(vbfcvt): New entry.
(vbfcvt_high, bfcvt): Likewise.
* config/aarch64/aarch64-simd.md(aarch64_vbfcvt): New entry.
(aarch64_vbfcvt_highv8bf, aarch64_bfcvtsf): Likewise.
* config/aarch64/arm_bf16.h (vcvtah_f32_bf16): New intrinsic.
* config/aarch64/arm_neon.h (vcvt_f32_bf16): Likewise.
(vcvtq_low_f32_bf16, vcvtq_high_f32_bf16): Likewise.

gcc/testsuite/ChangeLog

2020-10-29  Dennis Zhang  

* gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
(test_vcvt_f32_bf16, test_vcvtq_low_f32_bf16): New tests.
(test_vcvtq_high_f32_bf16, test_vcvth_f32_bf16): Likewise.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 5bc596dbffc..b68c3ca7f4b 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -732,3 +732,8 @@
   VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf)
   VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf)
   VAR1 (UNOP, bfcvt, 0, ALL, bf)
+
+  /* Implemented by aarch64_{v}bfcvt{_high}.  */
+  VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf)
+  VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf)
+  VAR1 (UNOP, bfcvt, 0, ALL, sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 381a702eba0..5ae79d67981 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7238,3 +7238,31 @@
   "bfcvt\\t%h0, %s1"
   [(set_attr "type" "f_cvt")]
 )
+
+;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes.
+(define_insn "aarch64_vbfcvt"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN))]
+  "TARGET_BF16_SIMD"
+  "shll\\t%0.4s, %1.4h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_vbfcvt_highv8bf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+	(unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")]
+		  UNSPEC_BFCVTN2))]
+  "TARGET_BF16_SIMD"
+  "shll2\\t%0.4s, %1.8h, #16"
+  [(set_attr "type" "neon_shift_imm_long")]
+)
+
+(define_insn "aarch64_bfcvtsf"
+  [(set (match_operand:SF 0 "register_operand" "=w")
+	(unspec:SF [(match_operand:BF 1 "register_operand" "w")]
+		UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "shl\\t%d0, %d1, #16"
+  [(set_attr "type" "neon_shift_reg")]
+)
diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
index 984875dcc01..881615498d3 100644
--- a/gcc/config/aarch64/arm_bf16.h
+++ b/gcc/config/aarch64/arm_bf16.h
@@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a)
   return __builtin_aarch64_bfcvtbf (__a);
 }
 
+__extension__ extern __inline float32_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtah_f32_bf16 (bfloat16_t __a)
+{
+  return __builtin_aarch64_bfcvtsf (__a);
+}
+
 #pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 85c0d62ca12..9c0386ed7b1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -35716,6 +35716,27 @@ vcvtq_high_bf16_f32 (bfloat16x8_t __inactive, float32x4_t __a)
   return __builtin_aarch64_bfcvtn2v8bf (__inactive, __a);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvt_f32_bf16 (bfloat16x4_t __a)
+{
+  return __builtin_aarch64_vbfcvtv4bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_low_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvtv8bf (__a);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvtq_high_f32_bf16 (bfloat16x8_t __a)
+{
+  return __builtin_aarch64_vbfcvt_highv8bf (__a);
+}
+
 #pragma GCC pop_options
 
 /* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
index bbea630b182..47af7c494d9 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c
@@ -46,3 +46,43 @@ bfloat16_t test_bfcvt (float32_t a)
 {
   return vcvth_bf16_f32 (a);
 }
+
+/*
+**test_vcvt_f32_bf16:
+** shll	v0.4s, v0.4h, #16
+** ret
+*/

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-10-23 Thread Dennis Zhang via Gcc-patches
Hi Kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Thursday, October 22, 2020 9:40 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vsub
>
> Hi Dennis,
>
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 06 October 2020 17:47
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; nd ;
> > Richard Earnshaw ; Ramana Radhakrishnan
> > 
> > Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
> >
> > Hi all,
> >
> > On 8/17/20 6:41 PM, Dennis Zhang wrote:
> > >
> > > Hi all,
> > >
> > > This patch enables MVE vsub instructions for auto-vectorization.
> > > It adds RTL templates for MVE vsub instructions using 'minus' instead of
> > > unspec expression to make the instructions recognizable for vectorization.
> > > MVE target is added in sub3 optab. The sub3 optab is
> > > modified to use a mode iterator that selects available modes for various
> > > targets correspondingly.
> > > MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> > > support vectorization.
> > >
> > > This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> > > generate wrong instruction numbers because of unexpected icf
> > optimization.
> > > This bug is exposed by the MVE vector modes enabled in this patch,
> > > therefore it is corrected in this patch to avoid test failures.
> > >
> > > MVE instructions are documented here:
> > > https://developer.arm.com/architectures/instruction-sets/simd-
> > isas/helium/helium-intrinsics
> > >
> > > The patch is regtested for arm-none-eabi and bootstrapped for
> > > arm-none-linux-gnueabihf.
> > >
> > > Is it OK for trunk please?
> > >
> > > Thanks
> > > Dennis
> > >
> > > gcc/ChangeLog:
> > >
> > > 2020-08-10  Dennis Zhang  
> > >
> > > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector
> > modes.
> > > * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> > > (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP):
> > Likewise.
> > > (TARGET_NEON_MVE_HFP): Likewise.
> > > * config/arm/iterators.md (VSEL): New mode iterator to select modes
> > > for corresponding targets.
> > > * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
> > > using expression 'minus'.
> > > (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
> > > * config/arm/neon.md (sub3): Removed here. Integrated in the
> > > sub3 in vec-common.md
> > > * config/arm/vec-common.md (sub3): Enable MVE target. Use
> > VSEL
> > > to select available modes. Exclude TARGET_NEON_FP16INST from
> > > TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> > > originally in neon.md.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2020-08-10  Dennis Zhang  
> > >
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> > > option -fno-ipa-icf and change the instruction count from 8 to 16.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> > > * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> > > * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> > > * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> > >
> >
> > This patch is updated based on Richard Sandiford's patch adding new
> > vector mode macros:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
> > The old version of this patch is at
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > And a less related part in the old version is separated into another
> > patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > September/554100.html
> >
> > This patch enables MVE vsub instructions for auto-vectorization.
> > It adds insns for MVE vsub instructions using 'minus' instead of unspec
> > expression to make the instructions recognizable for auto-vectorization.
> > The sub3 in mve.md is modified to use new mode macros which
> > make
> > the expander available when certain modes are supported. Then various
> > targets can share this expander for vectorization. The redundant
> > sub3 insns in neon.md are then removed.
> >
> > Regression tested on arm-none-eabi and bootstraped on
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
>
> Ok.
> Thanks,
> Kyrill
>

Thanks for your approval. The patch has been committed as 
98161c248c88f873bbffba23664c540f551d89d5

Bests
Dennis

> >
> > gcc/ChangeLog:
> >
> > 2020-10-02  Dennis 

Ping: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-10-21 Thread Dennis Zhang via Gcc-patches
Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555646.html
Thanks


From: Dennis Zhang 
Sent: Tuesday, October 6, 2020 5:46 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov; nd; Richard Earnshaw; Ramana Radhakrishnan
Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
>
> Hi all,
>
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
>
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
>
> Is it OK for trunk please?
>
> Thanks
> Dennis
>
> gcc/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>   * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>   (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>   (TARGET_NEON_MVE_HFP): Likewise.
>   * config/arm/iterators.md (VSEL): New mode iterator to select modes
>   for corresponding targets.
>   * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
>   using expression 'minus'.
>   (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
>   * config/arm/neon.md (sub3): Removed here. Integrated in the
>   sub3 in vec-common.md
>   * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
>   to select available modes. Exclude TARGET_NEON_FP16INST from
>   TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>   originally in neon.md.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>   * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>   * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>   * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
>

This patch is updated based on Richard Sandiford's patch adding new
vector mode macros:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec
expression to make the instructions recognizable for auto-vectorization.
The sub3 in mve.md is modified to use new mode macros which make
the expander available when certain modes are supported. Then various
targets can share this expander for vectorization. The redundant
sub3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (*sub3_neon): Use the new mode macros
ARM_HAVE__ARITH.
(sub3, sub3_fp16): Removed.
(neon_vsub): Use gen_sub3 instead of gen_sub3_fp16.
* config/arm/vec-common.md (sub3): Use the new mode macros
ARM_HAVE__ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vsub_1.c: New test.



Re: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax

2020-10-21 Thread Dennis Zhang via Gcc-patches
Hi Kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Wednesday, October 14, 2020 10:15 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
>
> Hi Dennis,
>
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 06 October 2020 17:59
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; nd ;
> > Richard Earnshaw ; Ramana Radhakrishnan
> > 
> > Subject: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
> >
> > Hi all,
> >
> > This patch enables MVE vmin/vmax instructions for auto-vectorization.
> > MVE target is included in expander smin3, umin3,
> > smax3
> > and umax3 for vectorization.
> > Related insns for vmin/vmax in mve.md are modified to use smin, umin,
> > smax and umax expressions instead of unspec to support the expanders.
> >
> > Regression tested on arm-none-eabi and bootstraped on
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
>
> Ok.
> Thanks,
> Kyrill
>

Thanks for your approval.
This patch has been committed to trunk at 
76835dca95ab9f3f106a0db1e6152ad0740b38b3

Cheers
Dennis

Re: [PATCH][Arm] Auto-vectorization for MVE: vmul

2020-10-21 Thread Dennis Zhang via Gcc-patches
Hi kyrylo,

> 
> From: Kyrylo Tkachov 
> Sent: Wednesday, October 14, 2020 10:14 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vmul
> 
> Hi Dennis,
> 
> > -Original Message-
> > From: Dennis Zhang 
> > Sent: 06 October 2020 17:55
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; nd ;
> > Richard Earnshaw ; Ramana Radhakrishnan
> > 
> > Subject: [PATCH][Arm] Auto-vectorization for MVE: vmul
> >
> > Hi all,
> >
> > This patch enables MVE vmul instructions for auto-vectorization.
> > It includes MVE in expander mul3 to enable vectorization for MVE 
> > and modifies related vmul insns to support the expander by using 'mult'
> > instead of unspec.
> > The mul3 for vectorization in vec-common.md uses mode iterator
> > VDQWH instead of VALLW to cover all supported modes.
> > The macros ARM_HAVE__ARITH are used to select supported
> > modes for 
> > different targets. The redundant mul3 in neon.md is removed.
> >
> > Regression tested on arm-none-eabi and bootstraped on
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
> 
> Ok, thank you for your patience.
> Kyrill
> 

Thanks for your approval.
It's committed to trunk at 0f41b5e02fa47db2080b77e4e1f7cd3305457c05

Cheers
Dennis


Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-12 Thread Dennis Zhang via Gcc-patches

Hi Christophe,

On 12/10/2020 12:40, Christophe Lyon wrote:

Hi,


On Thu, 8 Oct 2020 at 16:22, Christophe Lyon  wrote:


On Thu, 8 Oct 2020 at 16:08, Dennis Zhang  wrote:


Hi Christophe,

On 08/10/2020 14:14, Christophe Lyon wrote:

Hi,


On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
 wrote:


On 9/16/20 4:00 PM, Dennis Zhang wrote:

Hi all,

This patch enables SIMD modes for MVE auto-vectorization.
In this patch, the integer and float MVE SIMD modes are returned by
arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
MVE or MVE_FLOAT is enabled.
Then the expanders for auto-vectorization can be used for generating MVE
SIMD code.

This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
revealed by the enabled MVE SIMD modes.
The tests are for checking the MVE reinterpret intrinsics.
There are two functions in each of the tests. The two functions contain
the pattern of identical code so that they are folded in icf pass.
Because of icf, the instruction count only checks one function which is 8.
However when the SIMD modes are enabled, the estimation of the code size
becomes smaller so that inlining is applied after icf, then the
instruction count becomes 16 which causes failure of the tests.
Because the icf is not the expected pattern to be tested but causes
above issues, -fno-ipa-icf is applied to the tests to avoid unstable
instruction count.

This patch is separated from
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
because this part is not strongly connected to the aim of that one so
that causing confusion.

Regtested and bootstraped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-09-15  Dennis Zhang  

* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.



Since toolchain builds work again after Jakub's divmod fix, I'm now
facing another build error likely caused by this patch:
In file included from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
   from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
./insn-modes.h:196:71: error: temporary of non-literal type
'scalar_int_mode' in a constant expression
   #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
 ^
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
note: in expansion of macro 'QImode'
 case QImode:

and similarly for the other cases.

Does the build work for you?

Thanks,

Christophe



Thanks for the report. Sorry to see the error.
I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I
didn't get this error.
Could you please help to show the configuration you use for your build?
I will test and fix at once.



It fails on all of them for me. Does it work for you with current
master? (r11-3720-gf18eeb6b958acd5e1590ca4a73231486b749be9b)



So... I guess you are using a host with GCC more recent than 4.8.5? :-)
When I build manually on ubuntu-16.04 with gcc-5.4, the build succeeds,
and after manually building with the same environment in the compute
farm I use for validation (RHEL 7, gcc-4.8.5), I managed to reproduce the
build failure.
It's a matter of replacing
case QImode:
with
case E_QImode:

Is the attached patch OK? Or do we instead want to revisit the minimum
gcc version required to build gcc?

Thanks,

Christophe



I've tested your patch and it works with my other patches depending on 
this one. So I agree this patch is OK. Thanks for the fix.


Bests
Dennis


Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-08 Thread Dennis Zhang via Gcc-patches

Hi Christophe,

On 08/10/2020 14:14, Christophe Lyon wrote:

Hi,


On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
 wrote:


On 9/16/20 4:00 PM, Dennis Zhang wrote:

Hi all,

This patch enables SIMD modes for MVE auto-vectorization.
In this patch, the integer and float MVE SIMD modes are returned by
arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
MVE or MVE_FLOAT is enabled.
Then the expanders for auto-vectorization can be used for generating MVE
SIMD code.

This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
revealed by the enabled MVE SIMD modes.
The tests are for checking the MVE reinterpret intrinsics.
There are two functions in each of the tests. The two functions contain
the pattern of identical code so that they are folded in icf pass.
Because of icf, the instruction count only checks one function which is 8.
However when the SIMD modes are enabled, the estimation of the code size
becomes smaller so that inlining is applied after icf, then the
instruction count becomes 16 which causes failure of the tests.
Because the icf is not the expected pattern to be tested but causes
above issues, -fno-ipa-icf is applied to the tests to avoid unstable
instruction count.

This patch is separated from
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
because this part is not strongly connected to the aim of that one so
that causing confusion.

Regtested and bootstraped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-09-15  Dennis Zhang  

   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.



Since toolchain builds work again after Jakub's divmod fix, I'm now
facing another build error likely caused by this patch:
In file included from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
  from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
./insn-modes.h:196:71: error: temporary of non-literal type
'scalar_int_mode' in a constant expression
  #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
^
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
note: in expansion of macro 'QImode'
case QImode:

and similarly for the other cases.

Does the build work for you?

Thanks,

Christophe



Thanks for the report. Sorry to see the error.
I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I 
didn't get this error.

Could you please help to show the configuration you use for your build?
I will test and fix at once.

Thanks
Dennis


[PATCH][Arm] Auto-vectorization for MVE: vmin/vmax

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch enables MVE vmin/vmax instructions for auto-vectorization.
MVE target is included in expander smin3, umin3, smax3 
and umax3 for vectorization.
Related insns for vmin/vmax in mve.md are modified to use smin, umin, 
smax and umax expressions instead of unspec to support the expanders.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vmaxq_): Replace with ...
(mve_vmaxq_s, mve_vmaxq_u): ... these new insns to
use smax/umax instead of VMAXQ.
(mve_vminq_): Replace with ...
(mve_vminq_s, mve_vminq_u): ... these new insns to
use smin/umin instead of VMINQ.
(mve_vmaxnmq_f): Use smax instead of VMAXNMQ_F.
(mve_vminnmq_f): Use smin instead of VMINNMQ_F.
* config/arm/vec-common.md (smin3): Use the new mode macros
ARM_HAVE__ARITH.
(umin3, smax3, umax3): Likewise.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vminmax_1.c: New test.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..0d9f932e983 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1977,15 +1977,25 @@
 ;;
 ;; [vmaxq_u, vmaxq_s])
 ;;
-(define_insn "mve_vmaxq_"
+(define_insn "mve_vmaxq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		   (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMAXQ))
+	(smax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmax.%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vmaxq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmax.%#\t%q0, %q1, %q2"
+  "vmax.%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2037,15 +2047,25 @@
 ;;
 ;; [vminq_s, vminq_u])
 ;;
-(define_insn "mve_vminq_"
+(define_insn "mve_vminq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		   (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMINQ))
+	(smin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmin.%#\t%q0, %q1, %q2"
+  "vmin.%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vminq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmin.%#\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -3030,9 +3050,8 @@
 (define_insn "mve_vmaxnmq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMAXNMQ_F))
+	(smax:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmaxnm.f%#	%q0, %q1, %q2"
@@ -3090,9 +3109,8 @@
 (define_insn "mve_vminnmq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMINNMQ_F))
+	(smin:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vminnm.f%#	%q0, %q1, %q2"
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..6a330cc82f6 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -114,39 +114,29 @@
   [(set (match_operand:VALLW 0 "s_register_operand")
 	(smin:VALLW (match_operand:VALLW 1 "s_register_operand")
 		(match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode)
-		|| flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))"
-{
-})
+   "ARM_HAVE__ARITH"
+)
 
 (define_expand "umin3"
   [(set (match_operand:VINTW 0 "s_register_operand")
 	(umin:VINTW (match_operand:VINTW 1 "s_register_operand")
 		(match_operand:VINTW 2 "s_register_operand")))]
-  "TARGET_NEON
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))"
-{
-})
+   "ARM_HAVE__ARITH"
+)
 
 (define_expand "smax3"
   [(set (match_operand:VALLW 0 "s_register_operand")
 	(smax:VALLW (match_operand:VALLW 1 "s_register_operand")
 		(match_operand:VALLW 2 "s_register_operand")))]
-  

[PATCH][Arm] Auto-vectorization for MVE: vmul

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

This patch enables MVE vmul instructions for auto-vectorization.
It includes MVE in expander mul3 to enable vectorization for MVE 
and modifies related vmul insns to support the expander by using 'mult' 
instead of unspec.
The mul3 for vectorization in vec-common.md uses mode iterator 
VDQWH instead of VALLW to cover all supported modes.
The macros ARM_HAVE__ARITH are used to select supported modes for 
different targets. The redundant mul3 in neon.md is removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vmulq): New entry for vmul instruction
using expression 'mult'.
(mve_vmulq_f): Use mult instead of VMULQ_F.
* config/arm/neon.md (mul3): Removed.
* config/arm/vec-common.md (mul3): Use the new mode macros
ARM_HAVE__ARITH. Use mode iterator VDQWH instead of VALLW.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vmul_1.c: New test.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..5b2b609174c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2199,6 +2199,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vmulq"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		(match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmul.i%#\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vornq_u, vornq_s])
 ;;
@@ -3210,9 +3221,8 @@
 (define_insn "mve_vmulq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		   (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMULQ_F))
+	(mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		(match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmul.f%#	%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 96bf277f501..f6632f1a25a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1899,17 +1899,6 @@
 (const_string "neon_mul_")))]
 )
 
-(define_insn "mul3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (mult:VH
-(match_operand:VH 1 "s_register_operand" "w")
-(match_operand:VH 2 "s_register_operand" "w")))]
-  "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
-  "vmul.f16\t%0, %1, %2"
- [(set_attr "type" "neon_mul_")]
-)
-
 (define_insn "neon_vmulf"
  [(set
(match_operand:VH 0 "s_register_operand" "=w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..45db60e7411 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -101,14 +101,11 @@
 })
 
 (define_expand "mul3"
-  [(set (match_operand:VALLW 0 "s_register_operand")
-(mult:VALLW (match_operand:VALLW 1 "s_register_operand")
-		(match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode)
-		|| flag_unsafe_math_optimizations))
-   || (mode == V4HImode && TARGET_REALLY_IWMMXT)"
-{
-})
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+	(mult:VDQWH (match_operand:VDQWH 1 "s_register_operand")
+		(match_operand:VDQWH 2 "s_register_operand")))]
+  "ARM_HAVE__ARITH"
+)
 
 (define_expand "smin3"
   [(set (match_operand:VALLW 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
new file mode 100644
index 000..514f292c15e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+void test_vmul_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i16 (int16_t * dest, int16_t * a, int16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i8 (int8_t * dest, int8_t * a, int8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+dest[i] = 

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-10-06 Thread Dennis Zhang via Gcc-patches
Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
> 
> Hi all,
> 
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
> 
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
> 
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
> 
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>   * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>   (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>   (TARGET_NEON_MVE_HFP): Likewise.
>   * config/arm/iterators.md (VSEL): New mode iterator to select modes
>   for corresponding targets.
>   * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
>   using expression 'minus'.
>   (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
>   * config/arm/neon.md (sub3): Removed here. Integrated in the
>   sub3 in vec-common.md
>   * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
>   to select available modes. Exclude TARGET_NEON_FP16INST from
>   TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>   originally in neon.md.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  
> 
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>   * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>   * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>   * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> 

This patch is updated based on Richard Sandiford's patch adding new 
vector mode macros: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another 
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec 
expression to make the instructions recognizable for auto-vectorization.
The sub3 in mve.md is modified to use new mode macros which make 
the expander available when certain modes are supported. Then various 
targets can share this expander for vectorization. The redundant 
sub3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  

* config/arm/mve.md (mve_vsubq): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (*sub3_neon): Use the new mode macros
ARM_HAVE__ARITH.
(sub3, sub3_fp16): Removed.
(neon_vsub): Use gen_sub3 instead of gen_sub3_fp16.
* config/arm/vec-common.md (sub3): Use the new mode macros
ARM_HAVE__ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  

* gcc.target/arm/simd/mve-vsub_1.c: New test.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..7853b642262 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2574,6 +2574,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vsubq"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(minus:MVE_2 

Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization

2020-10-06 Thread Dennis Zhang via Gcc-patches
On 9/16/20 4:00 PM, Dennis Zhang wrote:
> Hi all,
> 
> This patch enables SIMD modes for MVE auto-vectorization.
> In this patch, the integer and float MVE SIMD modes are returned by
> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> MVE or MVE_FLOAT is enabled.
> Then the expanders for auto-vectorization can be used for generating MVE
> SIMD code.
> 
> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> revealed by the enabled MVE SIMD modes.
> The tests are for checking the MVE reinterpret intrinsics.
> There are two functions in each of the tests. The two functions contain
> the pattern of identical code so that they are folded in icf pass.
> Because of icf, the instruction count only checks one function which is 8.
> However when the SIMD modes are enabled, the estimation of the code size
> becomes smaller so that inlining is applied after icf, then the
> instruction count becomes 16 which causes failure of the tests.
> Because the icf is not the expected pattern to be tested but causes
> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> instruction count.
> 
> This patch is separated from
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> because this part is not strongly connected to the aim of that one so
> that causing confusion.
> 
> Regtested and bootstraped.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  
> 
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>   option -fno-ipa-icf and change the instruction count from 8 to 16.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> 

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html