Re: [backport gcc-10][AArch64] ACLE bf16 convert
> > From: Kyrylo Tkachov > Sent: Friday, December 11, 2020 11:23 AM > To: Dennis Zhang; gcc-patches@gcc.gnu.org > Cc: nd; Richard Earnshaw; Marcus Shawcroft; Richard Sandiford > Subject: RE: [backport gcc-10][AArch64] ACLE bf16 convert > > > -Original Message- > > From: Dennis Zhang > > Sent: 10 December 2020 14:27 > > To: gcc-patches@gcc.gnu.org > > Cc: nd ; Richard Earnshaw ; > > Marcus Shawcroft ; Kyrylo Tkachov > > ; Richard Sandiford > > > > Subject: [backport gcc-10][AArch64] ACLE bf16 convert > > > > Hi all, > > > > This patch backports the commit > > f7d6961126a7f06c8089d8a58bd21be43bc16806. > > The original is approved at https://gcc.gnu.org/pipermail/gcc-patches/2020- > > November/557859.html > > The only change is to remove FPCR-reading flags for builtin definition since > > it's not supported in gcc-10. > > Regtested and bootstrapped for aarch64-none-linux-gnu. > > > > Is it OK to backport? > > Ok. > Thanks, > Kyrill Thanks Kyrill! The patch is committed as 702e45ee471422dee86d32fc84f617d341d33175. Bests Dennis
Re: [backport gcc-10][AArch64] ACLE bf16 get
Hi Kyrylo, > > From: Kyrylo Tkachov > Sent: Friday, December 11, 2020 11:58 AM > To: Dennis Zhang; gcc-patches@gcc.gnu.org > Cc: nd; Richard Earnshaw; Marcus Shawcroft; Richard Sandiford > Subject: RE: [backport gcc-10][AArch64] ACLE bf16 get > > > -Original Message- > > From: Dennis Zhang > > Sent: 10 December 2020 14:35 > > To: gcc-patches@gcc.gnu.org > > Cc: nd ; Richard Earnshaw ; > > Marcus Shawcroft ; Kyrylo Tkachov > > ; Richard Sandiford > > > > Subject: [backport gcc-10][AArch64] ACLE bf16 get > > > > Hi all, > > > > This patch backports the commit > > 3553c658533e430b232997bdfd97faf6606fb102. > > The original is approved at https://gcc.gnu.org/pipermail/gcc-patches/2020- > > November/557871.html > > There is a change to remove FPCR-reading flag for builtin declaration since > > it's not supported in gcc-10. > > > > Another change is to remove a test (bf16_get-be.c) that fails compiling on > > aarch64-none-linux-gnu in the original patch. > > This is reported at https://gcc.gnu.org/pipermail/gcc-patches/2020- > > November/558195.html > > The failure happens for several bf16 big-endian tests so the bug would be > > fixed in a separate patch. > > And the test should be added after the bug is fixed. > > > > Is it OK to backport? > > But do the tests added here work for big-endian? > Ok if they do. > Thanks, > Kyrill Thanks for asking. The added test (bf16_get.c) works for both aarch64-none-linux-gnu and aarch64_be-none-linux-gnu. The patch is commited as c25f7eac6555d67523f0520c7e93bbc398d0da84. Cheers Dennis
Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
Hi Christophe, > From: Christophe Lyon > Sent: Monday, November 9, 2020 1:38 PM > To: Dennis Zhang > Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; nd; Ramana > Radhakrishnan > Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub > > Hi, > > I have just noticed that the new test has: > /* { dg -additional-options "-O3 -funsafe-math-optimizations" } */ > /* { dg-additional-options "-O3" } */ > That is, the first line has a typo (space between dg and -additional-options), > so the test is effectively compiled with -O3, and without > -funsafe-math-optimizations > > Since I can see it passing, it looks like -funsafe-math-optimizations > is not needed, can you clarify? > > Thanks Thank you for the report. The '-funsafe-math-optimizations' option is not needed. The typo is fixed by commit b46dd03fe94e2428cbcdbfc4d081d89ed604803a. Bests Dennis
[committed][Patch]arm: Fix typo in testcase mve-vsub_1.c
This patch fixes a typo reported at https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558478.html gcc/testsuite/ * gcc.target/arm/simd/mve-vsub_1.c: Fix typo. Remove needless dg-additional-options. Cheers, Dennisdiff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c index cb3ef3a14e0..842e5c6a30b 100644 --- a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c @@ -1,7 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ -/* { dg -additional-options "-O3 -funsafe-math-optimizations" } */ /* { dg-additional-options "-O3" } */ #include
[backport gcc-10][AArch64] ACLE bf16 get
Hi all, This patch backports the commit 3553c658533e430b232997bdfd97faf6606fb102. The original is approved at https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557871.html There is a change to remove FPCR-reading flag for builtin declaration since it's not supported in gcc-10. Another change is to remove a test (bf16_get-be.c) that fails compiling on aarch64-none-linux-gnu in the original patch. This is reported at https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558195.html The failure happens for several bf16 big-endian tests so the bug would be fixed in a separate patch. And the test should be added after the bug is fixed. Is it OK to backport? Cheers Dennisdiff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index ba2bda26dcdd4947dc724851433451433d378724..05726db1f6137f9ab29fcdd51f804199e24bbfcf 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -718,6 +718,10 @@ VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, v4sf) VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, v4sf) + /* Implemented by aarch64_vget_lo/hi_halfv8bf. */ + VAR1 (UNOP, vget_lo_half, 0, v8bf) + VAR1 (UNOP, vget_hi_half, 0, v8bf) + /* Implemented by aarch64_simd_mmlav16qi. */ VAR1 (TERNOP, simd_smmla, 0, v16qi) VAR1 (TERNOPU, simd_ummla, 0, v16qi) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9f0e2bd1e6ff5246f84e919402c687687a84beb8..43ac3cd40fe8379567b7a60772f360d37818e8e9 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7159,6 +7159,27 @@ [(set_attr "type" "neon_dot")] ) +;; vget_low/high_bf16 +(define_expand "aarch64_vget_lo_halfv8bf" + [(match_operand:V4BF 0 "register_operand") + (match_operand:V8BF 1 "register_operand")] + "TARGET_BF16_SIMD" +{ + rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, false); + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p)); + DONE; +}) + +(define_expand "aarch64_vget_hi_halfv8bf" + [(match_operand:V4BF 0 "register_operand") + (match_operand:V8BF 1 "register_operand")] + "TARGET_BF16_SIMD" +{ + rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, true); + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p)); + DONE; +}) + ;; bfmmla (define_insn "aarch64_bfmmlaqv4sf" [(set (match_operand:V4SF 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 95bfa5ebba21b739ee3c84e3971337646f8881d4..0fd78a6fd076f788d2618c492a026246e61e438c 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -35680,6 +35680,20 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index); } +__extension__ extern __inline bfloat16x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vget_low_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vget_lo_halfv8bf (__a); +} + +__extension__ extern __inline bfloat16x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vget_high_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vget_hi_halfv8bf (__a); +} + __extension__ extern __inline bfloat16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vcvt_bf16_f32 (float32x4_t __a) diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c new file mode 100644 index ..2193753ffbb6246aa16eb5033559b21266a556a6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_get.c @@ -0,0 +1,27 @@ +/* { dg-do assemble { target { aarch64*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-save-temps" } */ +/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ + +#include + +/* +**test_vget_low_bf16: +** ret +*/ +bfloat16x4_t test_vget_low_bf16 (bfloat16x8_t a) +{ + return vget_low_bf16 (a); +} + +/* +**test_vget_high_bf16: +** dup d0, v0.d\[1\] +** ret +*/ +bfloat16x4_t test_vget_high_bf16 (bfloat16x8_t a) +{ + return vget_high_bf16 (a); +}
[backport gcc-10][AArch64] ACLE bf16 convert
Hi all, This patch backports the commit f7d6961126a7f06c8089d8a58bd21be43bc16806. The original is approved at https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557859.html The only change is to remove FPCR-reading flags for builtin definition since it's not supported in gcc-10. Regtested and bootstrapped for aarch64-none-linux-gnu. Is it OK to backport? Cheers Dennisdiff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index ba2bda26dcdd4947dc724851433451433d378724..7192f3954d311d89064707cfcb735efad4377c12 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -728,3 +728,8 @@ VAR1 (UNOP, bfcvtn_q, 0, v8bf) VAR1 (BINOP, bfcvtn2, 0, v8bf) VAR1 (UNOP, bfcvt, 0, bf) + + /* Implemented by aarch64_{v}bfcvt{_high}. */ + VAR2 (UNOP, vbfcvt, 0, v4bf, v8bf) + VAR1 (UNOP, vbfcvt_high, 0, v8bf) + VAR1 (UNOP, bfcvt, 0, sf) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9f0e2bd1e6ff5246f84e919402c687687a84beb8..2e8aa668b107f039e4958b6998da180a6d11b881 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7238,3 +7238,31 @@ "bfcvt\\t%h0, %s1" [(set_attr "type" "f_cvt")] ) + +;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes. +(define_insn "aarch64_vbfcvt" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")] + UNSPEC_BFCVTN))] + "TARGET_BF16_SIMD" + "shll\\t%0.4s, %1.4h, #16" + [(set_attr "type" "neon_shift_imm_long")] +) + +(define_insn "aarch64_vbfcvt_highv8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")] + UNSPEC_BFCVTN2))] + "TARGET_BF16_SIMD" + "shll2\\t%0.4s, %1.8h, #16" + [(set_attr "type" "neon_shift_imm_long")] +) + +(define_insn "aarch64_bfcvtsf" + [(set (match_operand:SF 0 "register_operand" "=w") + (unspec:SF [(match_operand:BF 1 "register_operand" "w")] + UNSPEC_BFCVT))] + "TARGET_BF16_FP" + "shl\\t%d0, %d1, #16" + [(set_attr "type" "neon_shift_imm")] +) diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h index 984875dcc014300c489209c11abf41b1c47b7fbe..881615498d3d52662d7ebb3ab1e8d52d5a40cab8 100644 --- a/gcc/config/aarch64/arm_bf16.h +++ b/gcc/config/aarch64/arm_bf16.h @@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a) return __builtin_aarch64_bfcvtbf (__a); } +__extension__ extern __inline float32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtah_f32_bf16 (bfloat16_t __a) +{ + return __builtin_aarch64_bfcvtsf (__a); +} + #pragma GCC pop_options #endif diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 95bfa5ebba21b739ee3c84e3971337646f8881d4..69cccd3278642814f3961c5bf52be5639f5ef3f3 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -35680,6 +35680,27 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index); } +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvt_f32_bf16 (bfloat16x4_t __a) +{ + return __builtin_aarch64_vbfcvtv4bf (__a); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtq_low_f32_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vbfcvtv8bf (__a); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtq_high_f32_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vbfcvt_highv8bf (__a); +} + __extension__ extern __inline bfloat16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vcvt_bf16_f32 (float32x4_t __a) diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c index bbea630b1820d578bdf1619834f29b919f5c3f32..47af7c494d9b9d1f4b63e802efc293348a40e270 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c @@ -46,3 +46,43 @@ bfloat16_t test_bfcvt (float32_t a) { return vcvth_bf16_f32 (a); } + +/* +**test_vcvt_f32_bf16: +** shll v0.4s, v0.4h, #16 +** ret +*/ +float32x4_t test_vcvt_f32_bf16 (bfloat16x4_t a) +{ + return vcvt_f32_bf16 (a); +} + +/* +**test_vcvtq_low_f32_bf16: +** shll v0.4s, v0.4h, #16 +** ret +*/ +float32x4_t test_vcvtq_low_f32_bf16 (bfloat16x8_t a) +{ + return vcvtq_low_f32_bf16 (a); +} + +/* +**test_vcvtq_high_f32_bf16: +** shll2 v0.4s, v0.8h, #16 +** ret +*/ +float32x4_t test_vcvtq_high_f32_bf16 (bfloat16x8_t a) +{ + return vcvtq_high_f32_bf16 (a); +} + +/* +**test_vcvtah_f32_bf16: +** shl d0, d0,
Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector
On 11/3/20 2:05 PM, Richard Sandiford wrote: Dennis Zhang writes: Hi Richard, On 10/30/20 2:07 PM, Richard Sandiford wrote: Dennis Zhang writes: diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 332a0b6b1ea..39ebb776d1d 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -719,6 +719,9 @@ VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf) VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf) + /* Implemented by aarch64_vget_halfv8bf. */ + VAR1 (GETREG, vget_half, 0, ALL, v8bf) This should be AUTO_FP, since it doesn't have any side-effects. (As before, we should probably rename the flag, but that's separate work.) + /* Implemented by aarch64_simd_mmlav16qi. */ VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi) VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9f0e2bd1e6f..f62c52ca327 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7159,6 +7159,19 @@ [(set_attr "type" "neon_dot")] ) +;; vget_low/high_bf16 +(define_expand "aarch64_vget_halfv8bf" + [(match_operand:V4BF 0 "register_operand") + (match_operand:V8BF 1 "register_operand") + (match_operand:SI 2 "aarch64_zero_or_1")] + "TARGET_BF16_SIMD" +{ + int hbase = INTVAL (operands[2]); + rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1); I think this needs to be: aarch64_simd_vect_par_cnst_half instead. The issue is that on big-endian targets, GCC assumes vector lane 0 is in the high part of the register, whereas for AArch64 it's always in the low part of the register. So we convert from AArch64 numbering to GCC numbering when generating the rtx and then take endianness into account when matching the rtx later. It would be good to have -mbig-endian tests that make sure we generate the right instruction for each function (i.e. we get them the right way round). I guess it would be good to test that for little-endian too. I've updated the expander using aarch64_simd_vect_par_cnst_half. And the expander is divided into two for getting low and high half seperately. It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu targets with new tests including -mbig-endian option. + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel)); + DONE; +}) + ;; bfmmla (define_insn "aarch64_bfmmlaqv4sf" [(set (match_operand:V4SF 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 215fcec5955..0c8bc2b0c73 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -84,6 +84,10 @@ (ior (match_test "op == constm1_rtx") (match_test "op == const1_rtx")) +(define_predicate "aarch64_zero_or_1" + (and (match_code "const_int") + (match_test "op == const0_rtx || op == const1_rtx"))) zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one. But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1, so let's keep it as-is. This predicate is removed since there is no need of the imm operand in the new expanders. Thanks for the reviews. Is it OK for trunk now? Looks good. OK for trunk and branches, thanks. Richard Thanks for approval, Richard! This patch is committed at 3553c658533e430b232997bdfd97faf6606fb102 Bests Dennis
Re: [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32
On 11/2/20 7:05 PM, Richard Sandiford wrote: Dennis Zhang writes: Hi Richard, On 10/29/20 5:48 PM, Richard Sandiford wrote: Dennis Zhang writes: diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 5bc596dbffc..b68c3ca7f4b 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -732,3 +732,8 @@ VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf) VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf) VAR1 (UNOP, bfcvt, 0, ALL, bf) + + /* Implemented by aarch64_{v}bfcvt{_high}. */ + VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf) + VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf) + VAR1 (UNOP, bfcvt, 0, ALL, sf) New intrinsics should use something more specific than “ALL”. Since these functions are pure non-trapping integer operations, I think they should use “AUTO_FP” instead. (On reflection, we should probably change the name.) +(define_insn "aarch64_bfcvtsf" + [(set (match_operand:SF 0 "register_operand" "=w") + (unspec:SF [(match_operand:BF 1 "register_operand" "w")] + UNSPEC_BFCVT))] + "TARGET_BF16_FP" + "shl\\t%d0, %d1, #16" + [(set_attr "type" "neon_shift_reg")] I think this should be neon_shift_imm instead. OK with those changes, thanks. Richard I've fixed the Flag and the insn attribute. I will commit it if no further issues. LGTM, thanks. Richard Thanks Richard! This patch is committed as f7d6961126a7f06c8089d8a58bd21be43bc16806. Bests Dennis
Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector
Hi Richard, On 10/30/20 2:07 PM, Richard Sandiford wrote: Dennis Zhang writes: diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 332a0b6b1ea..39ebb776d1d 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -719,6 +719,9 @@ VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf) VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf) + /* Implemented by aarch64_vget_halfv8bf. */ + VAR1 (GETREG, vget_half, 0, ALL, v8bf) This should be AUTO_FP, since it doesn't have any side-effects. (As before, we should probably rename the flag, but that's separate work.) + /* Implemented by aarch64_simd_mmlav16qi. */ VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi) VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9f0e2bd1e6f..f62c52ca327 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7159,6 +7159,19 @@ [(set_attr "type" "neon_dot")] ) +;; vget_low/high_bf16 +(define_expand "aarch64_vget_halfv8bf" + [(match_operand:V4BF 0 "register_operand") + (match_operand:V8BF 1 "register_operand") + (match_operand:SI 2 "aarch64_zero_or_1")] + "TARGET_BF16_SIMD" +{ + int hbase = INTVAL (operands[2]); + rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1); I think this needs to be: aarch64_simd_vect_par_cnst_half instead. The issue is that on big-endian targets, GCC assumes vector lane 0 is in the high part of the register, whereas for AArch64 it's always in the low part of the register. So we convert from AArch64 numbering to GCC numbering when generating the rtx and then take endianness into account when matching the rtx later. It would be good to have -mbig-endian tests that make sure we generate the right instruction for each function (i.e. we get them the right way round). I guess it would be good to test that for little-endian too. I've updated the expander using aarch64_simd_vect_par_cnst_half. And the expander is divided into two for getting low and high half seperately. It's tested for aarch64-none-linux-gnu and aarch64_be-none-linux-gnu targets with new tests including -mbig-endian option. + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel)); + DONE; +}) + ;; bfmmla (define_insn "aarch64_bfmmlaqv4sf" [(set (match_operand:V4SF 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 215fcec5955..0c8bc2b0c73 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -84,6 +84,10 @@ (ior (match_test "op == constm1_rtx") (match_test "op == const1_rtx")) +(define_predicate "aarch64_zero_or_1" + (and (match_code "const_int") + (match_test "op == const0_rtx || op == const1_rtx"))) zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one. But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1, so let's keep it as-is. This predicate is removed since there is no need of the imm operand in the new expanders. Thanks for the reviews. Is it OK for trunk now? Cheers Dennis diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index eb8e6f7b3d8..f26a96042bc 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -722,6 +722,10 @@ VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf) VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf) + /* Implemented by aarch64_vget_lo/hi_halfv8bf. */ + VAR1 (UNOP, vget_lo_half, 0, AUTO_FP, v8bf) + VAR1 (UNOP, vget_hi_half, 0, AUTO_FP, v8bf) + /* Implemented by aarch64_simd_mmlav16qi. */ VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi) VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 381a702eba0..af29a2f26f5 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7159,6 +7159,27 @@ [(set_attr "type" "neon_dot")] ) +;; vget_low/high_bf16 +(define_expand "aarch64_vget_lo_halfv8bf" + [(match_operand:V4BF 0 "register_operand") + (match_operand:V8BF 1 "register_operand")] + "TARGET_BF16_SIMD" +{ + rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, false); + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p)); + DONE; +}) + +(define_expand "aarch64_vget_hi_halfv8bf" + [(match_operand:V4BF 0 "register_operand") + (match_operand:V8BF 1 "register_operand")] + "TARGET_BF16_SIMD" +{ + rtx p = aarch64_simd_vect_par_cnst_half (V8BFmode, 8, true); + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], p)); + DONE; +}) + ;; bfmmla (define_insn "aarch64_bfmmlaqv4sf" [(set (match_operand:V4SF 0 "register_operand" "=w") diff
Re: [PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32
Hi Richard, On 10/29/20 5:48 PM, Richard Sandiford wrote: Dennis Zhang writes: diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 5bc596dbffc..b68c3ca7f4b 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -732,3 +732,8 @@ VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf) VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf) VAR1 (UNOP, bfcvt, 0, ALL, bf) + + /* Implemented by aarch64_{v}bfcvt{_high}. */ + VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf) + VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf) + VAR1 (UNOP, bfcvt, 0, ALL, sf) New intrinsics should use something more specific than “ALL”. Since these functions are pure non-trapping integer operations, I think they should use “AUTO_FP” instead. (On reflection, we should probably change the name.) +(define_insn "aarch64_bfcvtsf" + [(set (match_operand:SF 0 "register_operand" "=w") + (unspec:SF [(match_operand:BF 1 "register_operand" "w")] + UNSPEC_BFCVT))] + "TARGET_BF16_FP" + "shl\\t%d0, %d1, #16" + [(set_attr "type" "neon_shift_reg")] I think this should be neon_shift_imm instead. OK with those changes, thanks. Richard I've fixed the Flag and the insn attribute. I will commit it if no further issues. Thanks for the review. Regards Dennis diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index eb8e6f7b3d8..f494b535a30 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -732,3 +732,8 @@ VAR1 (UNOP, bfcvtn_q, 0, FP, v8bf) VAR1 (BINOP, bfcvtn2, 0, FP, v8bf) VAR1 (UNOP, bfcvt, 0, FP, bf) + + /* Implemented by aarch64_{v}bfcvt{_high}. */ + VAR2 (UNOP, vbfcvt, 0, AUTO_FP, v4bf, v8bf) + VAR1 (UNOP, vbfcvt_high, 0, AUTO_FP, v8bf) + VAR1 (UNOP, bfcvt, 0, AUTO_FP, sf) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 381a702eba0..030a086d31c 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7238,3 +7238,31 @@ "bfcvt\\t%h0, %s1" [(set_attr "type" "f_cvt")] ) + +;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes. +(define_insn "aarch64_vbfcvt" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")] + UNSPEC_BFCVTN))] + "TARGET_BF16_SIMD" + "shll\\t%0.4s, %1.4h, #16" + [(set_attr "type" "neon_shift_imm_long")] +) + +(define_insn "aarch64_vbfcvt_highv8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")] + UNSPEC_BFCVTN2))] + "TARGET_BF16_SIMD" + "shll2\\t%0.4s, %1.8h, #16" + [(set_attr "type" "neon_shift_imm_long")] +) + +(define_insn "aarch64_bfcvtsf" + [(set (match_operand:SF 0 "register_operand" "=w") + (unspec:SF [(match_operand:BF 1 "register_operand" "w")] + UNSPEC_BFCVT))] + "TARGET_BF16_FP" + "shl\\t%d0, %d1, #16" + [(set_attr "type" "neon_shift_imm")] +) diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h index 984875dcc01..881615498d3 100644 --- a/gcc/config/aarch64/arm_bf16.h +++ b/gcc/config/aarch64/arm_bf16.h @@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a) return __builtin_aarch64_bfcvtbf (__a); } +__extension__ extern __inline float32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtah_f32_bf16 (bfloat16_t __a) +{ + return __builtin_aarch64_bfcvtsf (__a); +} + #pragma GCC pop_options #endif diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 95bfa5ebba2..69cccd32786 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -35680,6 +35680,27 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index); } +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvt_f32_bf16 (bfloat16x4_t __a) +{ + return __builtin_aarch64_vbfcvtv4bf (__a); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtq_low_f32_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vbfcvtv8bf (__a); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtq_high_f32_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vbfcvt_highv8bf (__a); +} + __extension__ extern __inline bfloat16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vcvt_bf16_f32 (float32x4_t __a) diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c index bbea630b182..47af7c494d9 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c +++
[PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector
Hi all, This patch implements ACLE intrinsics vget_low_bf16 and vget_high_bf16 to extract lower or higher half from a bfloat16x8 vector. The vget_high_bf16 is done by 'dup' instruction. The vget_low_bf16 could be done by a 'dup' or 'mov', or it's mostly optimized out by just using the lower half of a vector register. The test for vget_low_bf16 only checks that the interface can be compiled but no instruction is checked since none is generated in the test case. Arm ACLE document at https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics Regtested and bootstrapped. Is it OK for trunk please? Thanks Denni gcc/ChangeLog: 2020-10-29 Dennis Zhang * config/aarch64/aarch64-simd-builtins.def (vget_half): New entry. * config/aarch64/aarch64-simd.md (aarch64_vget_halfv8bf): New entry. * config/aarch64/arm_neon.h (vget_low_bf16): New intrinsic. (vget_high_bf16): Likewise. * config/aarch64/predicates.md (aarch64_zero_or_1): New predicate for zero or one immediate to indicate the lower or higher half. gcc/testsuite/ChangeLog 2020-10-29 Dennis Zhang * gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c (test_vget_low_bf16, test_vget_high_bf16): New tests.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 332a0b6b1ea..39ebb776d1d 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -719,6 +719,9 @@ VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf) VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf) + /* Implemented by aarch64_vget_halfv8bf. */ + VAR1 (GETREG, vget_half, 0, ALL, v8bf) + /* Implemented by aarch64_simd_mmlav16qi. */ VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi) VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9f0e2bd1e6f..f62c52ca327 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7159,6 +7159,19 @@ [(set_attr "type" "neon_dot")] ) +;; vget_low/high_bf16 +(define_expand "aarch64_vget_halfv8bf" + [(match_operand:V4BF 0 "register_operand") + (match_operand:V8BF 1 "register_operand") + (match_operand:SI 2 "aarch64_zero_or_1")] + "TARGET_BF16_SIMD" +{ + int hbase = INTVAL (operands[2]); + rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1); + emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel)); + DONE; +}) + ;; bfmmla (define_insn "aarch64_bfmmlaqv4sf" [(set (match_operand:V4SF 0 "register_operand" "=w") diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 50f8b23bc17..c6ac0b8dd17 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -35530,6 +35530,20 @@ vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index); } +__extension__ extern __inline bfloat16x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vget_low_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vget_halfv8bf (__a, 0); +} + +__extension__ extern __inline bfloat16x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vget_high_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vget_halfv8bf (__a, 1); +} + __extension__ extern __inline bfloat16x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vcvt_bf16_f32 (float32x4_t __a) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 215fcec5955..0c8bc2b0c73 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -84,6 +84,10 @@ (ior (match_test "op == constm1_rtx") (match_test "op == const1_rtx")) +(define_predicate "aarch64_zero_or_1" + (and (match_code "const_int") + (match_test "op == const0_rtx || op == const1_rtx"))) + (define_predicate "aarch64_reg_or_orr_imm" (ior (match_operand 0 "register_operand") (and (match_code "const_vector") diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c index c42c7acbbe9..35f4cb864f2 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_dup.c @@ -83,3 +83,14 @@ bfloat16_t test_vduph_laneq_bf16 (bfloat16x8_t a) return vduph_laneq_bf16 (a, 7); } /* { dg-final { scan-assembler-times "dup\\th\[0-9\]+, v\[0-9\]+\.h\\\[7\\\]" 2 } } */ + +bfloat16x4_t test_vget_low_bf16 (bfloat16x8_t a) +{ + return vget_low_bf16 (a); +} + +bfloat16x4_t test_vget_high_bf16 (bfloat16x8_t a) +{ + return vget_high_bf16 (a); +} +/* { dg-final { scan-assembler-times "dup\\td\[0-9\]+, v\[0-9\]+\.d\\\[1\\\]" 1 } } */
[PATCH][AArch64] ACLE intrinsics: convert from BFloat16 to Float32
Hi all, This patch enables intrinsics to convert BFloat16 scalar and vector operands to Float32 modes. The intrinsics are implemented by shifting each BFloat16 item 16 bits to left using shl/shll/shll2 instructions. Intrinsics are documented at https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics ISA is documented at https://developer.arm.com/docs/ddi0596/latest Regtested and bootstrapped. Is it OK for trunk please? Thanks Dennis gcc/ChangeLog: 2020-10-29 Dennis Zhang * config/aarch64/aarch64-simd-builtins.def(vbfcvt): New entry. (vbfcvt_high, bfcvt): Likewise. * config/aarch64/aarch64-simd.md(aarch64_vbfcvt): New entry. (aarch64_vbfcvt_highv8bf, aarch64_bfcvtsf): Likewise. * config/aarch64/arm_bf16.h (vcvtah_f32_bf16): New intrinsic. * config/aarch64/arm_neon.h (vcvt_f32_bf16): Likewise. (vcvtq_low_f32_bf16, vcvtq_high_f32_bf16): Likewise. gcc/testsuite/ChangeLog 2020-10-29 Dennis Zhang * gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c (test_vcvt_f32_bf16, test_vcvtq_low_f32_bf16): New tests. (test_vcvtq_high_f32_bf16, test_vcvth_f32_bf16): Likewise.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 5bc596dbffc..b68c3ca7f4b 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -732,3 +732,8 @@ VAR1 (UNOP, bfcvtn_q, 0, ALL, v8bf) VAR1 (BINOP, bfcvtn2, 0, ALL, v8bf) VAR1 (UNOP, bfcvt, 0, ALL, bf) + + /* Implemented by aarch64_{v}bfcvt{_high}. */ + VAR2 (UNOP, vbfcvt, 0, ALL, v4bf, v8bf) + VAR1 (UNOP, vbfcvt_high, 0, ALL, v8bf) + VAR1 (UNOP, bfcvt, 0, ALL, sf) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 381a702eba0..5ae79d67981 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7238,3 +7238,31 @@ "bfcvt\\t%h0, %s1" [(set_attr "type" "f_cvt")] ) + +;; Use shl/shll/shll2 to convert BF scalar/vector modes to SF modes. +(define_insn "aarch64_vbfcvt" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (unspec:V4SF [(match_operand:VBF 1 "register_operand" "w")] + UNSPEC_BFCVTN))] + "TARGET_BF16_SIMD" + "shll\\t%0.4s, %1.4h, #16" + [(set_attr "type" "neon_shift_imm_long")] +) + +(define_insn "aarch64_vbfcvt_highv8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (unspec:V4SF [(match_operand:V8BF 1 "register_operand" "w")] + UNSPEC_BFCVTN2))] + "TARGET_BF16_SIMD" + "shll2\\t%0.4s, %1.8h, #16" + [(set_attr "type" "neon_shift_imm_long")] +) + +(define_insn "aarch64_bfcvtsf" + [(set (match_operand:SF 0 "register_operand" "=w") + (unspec:SF [(match_operand:BF 1 "register_operand" "w")] + UNSPEC_BFCVT))] + "TARGET_BF16_FP" + "shl\\t%d0, %d1, #16" + [(set_attr "type" "neon_shift_reg")] +) diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h index 984875dcc01..881615498d3 100644 --- a/gcc/config/aarch64/arm_bf16.h +++ b/gcc/config/aarch64/arm_bf16.h @@ -40,6 +40,13 @@ vcvth_bf16_f32 (float32_t __a) return __builtin_aarch64_bfcvtbf (__a); } +__extension__ extern __inline float32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtah_f32_bf16 (bfloat16_t __a) +{ + return __builtin_aarch64_bfcvtsf (__a); +} + #pragma GCC pop_options #endif diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 85c0d62ca12..9c0386ed7b1 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -35716,6 +35716,27 @@ vcvtq_high_bf16_f32 (bfloat16x8_t __inactive, float32x4_t __a) return __builtin_aarch64_bfcvtn2v8bf (__inactive, __a); } +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvt_f32_bf16 (bfloat16x4_t __a) +{ + return __builtin_aarch64_vbfcvtv4bf (__a); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtq_low_f32_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vbfcvtv8bf (__a); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vcvtq_high_f32_bf16 (bfloat16x8_t __a) +{ + return __builtin_aarch64_vbfcvt_highv8bf (__a); +} + #pragma GCC pop_options /* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics. */ diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c index bbea630b182..47af7c494d9 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c @@ -46,3 +46,43 @@ bfloat16_t test_bfcvt (float32_t a) { return vcvth_bf16_f32 (a); } + +/* +**test_vcvt_f32_bf16: +** shll v0.4s, v0.4h, #16 +** ret +*/
Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
Hi Kyrylo, > > From: Kyrylo Tkachov > Sent: Thursday, October 22, 2020 9:40 AM > To: Dennis Zhang; gcc-patches@gcc.gnu.org > Cc: nd; Richard Earnshaw; Ramana Radhakrishnan > Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vsub > > Hi Dennis, > > > -Original Message- > > From: Dennis Zhang > > Sent: 06 October 2020 17:47 > > To: gcc-patches@gcc.gnu.org > > Cc: Kyrylo Tkachov ; nd ; > > Richard Earnshaw ; Ramana Radhakrishnan > > > > Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub > > > > Hi all, > > > > On 8/17/20 6:41 PM, Dennis Zhang wrote: > > > > > > Hi all, > > > > > > This patch enables MVE vsub instructions for auto-vectorization. > > > It adds RTL templates for MVE vsub instructions using 'minus' instead of > > > unspec expression to make the instructions recognizable for vectorization. > > > MVE target is added in sub3 optab. The sub3 optab is > > > modified to use a mode iterator that selects available modes for various > > > targets correspondingly. > > > MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to > > > support vectorization. > > > > > > This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests > > > generate wrong instruction numbers because of unexpected icf > > optimization. > > > This bug is exposed by the MVE vector modes enabled in this patch, > > > therefore it is corrected in this patch to avoid test failures. > > > > > > MVE instructions are documented here: > > > https://developer.arm.com/architectures/instruction-sets/simd- > > isas/helium/helium-intrinsics > > > > > > The patch is regtested for arm-none-eabi and bootstrapped for > > > arm-none-linux-gnueabihf. > > > > > > Is it OK for trunk please? > > > > > > Thanks > > > Dennis > > > > > > gcc/ChangeLog: > > > > > > 2020-08-10 Dennis Zhang > > > > > > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector > > modes. > > > * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro. > > > (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): > > Likewise. > > > (TARGET_NEON_MVE_HFP): Likewise. > > > * config/arm/iterators.md (VSEL): New mode iterator to select modes > > > for corresponding targets. > > > * config/arm/mve.md (mve_vsubq): New entry for vsub instruction > > > using expression 'minus'. > > > (mve_vsubq_f): Use minus instead of VSUBQ_F unspec. > > > * config/arm/neon.md (sub3): Removed here. Integrated in the > > > sub3 in vec-common.md > > > * config/arm/vec-common.md (sub3): Enable MVE target. Use > > VSEL > > > to select available modes. Exclude TARGET_NEON_FP16INST from > > > TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is > > > originally in neon.md. > > > > > > gcc/testsuite/ChangeLog: > > > > > > 2020-08-10 Dennis Zhang > > > > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional > > > option -fno-ipa-icf and change the instruction count from 8 to 16. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise. > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise. > > > * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'. > > > * gcc.target/arm/mve/vect/vect_sub_0.c: New test. > > > * gcc.target/arm/mve/vect/vect_sub_1.c: New test. > > > > > > > This patch is updated based on Richard Sandiford's patch adding new > > vector mode macros: > > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html > > The old version of this patch is at > > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html > > And a less related part in the old version is separated into another > > patch: https://gcc.gnu.org/pipermail/gcc-patches/2020- > > September/554100.html > > > > This patch enables MVE vsub instructions for auto-vectorization. > > It adds insns for MVE vsub instructions using 'minus' instead of unspec > > expression to make the instructions recognizable for auto-vectorization. > > The sub3 in mve.md is modified to use new mode macros which > > make > > the expander available when certain modes are supported. Then various > > targets can share this expander for vectorization. The redundant > > sub3 insns in neon.md are then removed. > > > > Regression tested on arm-none-eabi and bootstraped on > > arm-none-linux-gnueabihf. > > > > Is it OK for trunk please? > > Ok. > Thanks, > Kyrill > Thanks for your approval. The patch has been committed as 98161c248c88f873bbffba23664c540f551d89d5 Bests Dennis > > > > gcc/ChangeLog: > > > > 2020-10-02 Dennis
Ping: [PATCH][Arm] Auto-vectorization for MVE: vsub
Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555646.html Thanks From: Dennis Zhang Sent: Tuesday, October 6, 2020 5:46 PM To: gcc-patches@gcc.gnu.org Cc: Kyrylo Tkachov; nd; Richard Earnshaw; Ramana Radhakrishnan Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub Hi all, On 8/17/20 6:41 PM, Dennis Zhang wrote: > > Hi all, > > This patch enables MVE vsub instructions for auto-vectorization. > It adds RTL templates for MVE vsub instructions using 'minus' instead of > unspec expression to make the instructions recognizable for vectorization. > MVE target is added in sub3 optab. The sub3 optab is > modified to use a mode iterator that selects available modes for various > targets correspondingly. > MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to > support vectorization. > > This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests > generate wrong instruction numbers because of unexpected icf optimization. > This bug is exposed by the MVE vector modes enabled in this patch, > therefore it is corrected in this patch to avoid test failures. > > MVE instructions are documented here: > https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics > > The patch is regtested for arm-none-eabi and bootstrapped for > arm-none-linux-gnueabihf. > > Is it OK for trunk please? > > Thanks > Dennis > > gcc/ChangeLog: > > 2020-08-10 Dennis Zhang > > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes. > * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro. > (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise. > (TARGET_NEON_MVE_HFP): Likewise. > * config/arm/iterators.md (VSEL): New mode iterator to select modes > for corresponding targets. > * config/arm/mve.md (mve_vsubq): New entry for vsub instruction > using expression 'minus'. > (mve_vsubq_f): Use minus instead of VSUBQ_F unspec. > * config/arm/neon.md (sub3): Removed here. Integrated in the > sub3 in vec-common.md > * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL > to select available modes. Exclude TARGET_NEON_FP16INST from > TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is > originally in neon.md. > > gcc/testsuite/ChangeLog: > > 2020-08-10 Dennis Zhang > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional > option -fno-ipa-icf and change the instruction count from 8 to 16. > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise. > * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'. > * gcc.target/arm/mve/vect/vect_sub_0.c: New test. > * gcc.target/arm/mve/vect/vect_sub_1.c: New test. > This patch is updated based on Richard Sandiford's patch adding new vector mode macros: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html The old version of this patch is at https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html And a less related part in the old version is separated into another patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html This patch enables MVE vsub instructions for auto-vectorization. It adds insns for MVE vsub instructions using 'minus' instead of unspec expression to make the instructions recognizable for auto-vectorization. The sub3 in mve.md is modified to use new mode macros which make the expander available when certain modes are supported. Then various targets can share this expander for vectorization. The redundant sub3 insns in neon.md are then removed. Regression tested on arm-none-eabi and bootstraped on arm-none-linux-gnueabihf. Is it OK for trunk please? Thanks Dennis gcc/ChangeLog: 2020-10-02 Dennis Zhang * config/arm/mve.md (mve_vsubq): New entry for vsub instruction using expression 'minus'. (mve_vsubq_f): Use minus instead of VSUBQ_F unspec. * config/arm/neon.md (*sub3_neon): Use the new mode macros ARM_HAVE__ARITH. (sub3, sub3_fp16): Removed. (neon_vsub): Use gen_sub3 instead of gen_sub3_fp16. * config/arm/vec-common.md (sub3): Use the new mode macros ARM_HAVE__ARITH. gcc/testsuite/ChangeLog: 2020-10-02 Dennis Zhang * gcc.target/arm/simd/mve-vsub_1.c: New test.
Re: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
Hi Kyrylo, > > From: Kyrylo Tkachov > Sent: Wednesday, October 14, 2020 10:15 AM > To: Dennis Zhang; gcc-patches@gcc.gnu.org > Cc: nd; Richard Earnshaw; Ramana Radhakrishnan > Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax > > Hi Dennis, > > > -Original Message- > > From: Dennis Zhang > > Sent: 06 October 2020 17:59 > > To: gcc-patches@gcc.gnu.org > > Cc: Kyrylo Tkachov ; nd ; > > Richard Earnshaw ; Ramana Radhakrishnan > > > > Subject: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax > > > > Hi all, > > > > This patch enables MVE vmin/vmax instructions for auto-vectorization. > > MVE target is included in expander smin3, umin3, > > smax3 > > and umax3 for vectorization. > > Related insns for vmin/vmax in mve.md are modified to use smin, umin, > > smax and umax expressions instead of unspec to support the expanders. > > > > Regression tested on arm-none-eabi and bootstraped on > > arm-none-linux-gnueabihf. > > > > Is it OK for trunk please? > > Ok. > Thanks, > Kyrill > Thanks for your approval. This patch has been committed to trunk at 76835dca95ab9f3f106a0db1e6152ad0740b38b3 Cheers Dennis
Re: [PATCH][Arm] Auto-vectorization for MVE: vmul
Hi kyrylo, > > From: Kyrylo Tkachov > Sent: Wednesday, October 14, 2020 10:14 AM > To: Dennis Zhang; gcc-patches@gcc.gnu.org > Cc: nd; Richard Earnshaw; Ramana Radhakrishnan > Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vmul > > Hi Dennis, > > > -Original Message- > > From: Dennis Zhang > > Sent: 06 October 2020 17:55 > > To: gcc-patches@gcc.gnu.org > > Cc: Kyrylo Tkachov ; nd ; > > Richard Earnshaw ; Ramana Radhakrishnan > > > > Subject: [PATCH][Arm] Auto-vectorization for MVE: vmul > > > > Hi all, > > > > This patch enables MVE vmul instructions for auto-vectorization. > > It includes MVE in expander mul3 to enable vectorization for MVE > > and modifies related vmul insns to support the expander by using 'mult' > > instead of unspec. > > The mul3 for vectorization in vec-common.md uses mode iterator > > VDQWH instead of VALLW to cover all supported modes. > > The macros ARM_HAVE__ARITH are used to select supported > > modes for > > different targets. The redundant mul3 in neon.md is removed. > > > > Regression tested on arm-none-eabi and bootstraped on > > arm-none-linux-gnueabihf. > > > > Is it OK for trunk please? > > Ok, thank you for your patience. > Kyrill > Thanks for your approval. It's committed to trunk at 0f41b5e02fa47db2080b77e4e1f7cd3305457c05 Cheers Dennis
Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
Hi Christophe, On 12/10/2020 12:40, Christophe Lyon wrote: Hi, On Thu, 8 Oct 2020 at 16:22, Christophe Lyon wrote: On Thu, 8 Oct 2020 at 16:08, Dennis Zhang wrote: Hi Christophe, On 08/10/2020 14:14, Christophe Lyon wrote: Hi, On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches wrote: On 9/16/20 4:00 PM, Dennis Zhang wrote: Hi all, This patch enables SIMD modes for MVE auto-vectorization. In this patch, the integer and float MVE SIMD modes are returned by arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when MVE or MVE_FLOAT is enabled. Then the expanders for auto-vectorization can be used for generating MVE SIMD code. This patch also fixes bugs in MVE vreiterpretq_*.c tests which are revealed by the enabled MVE SIMD modes. The tests are for checking the MVE reinterpret intrinsics. There are two functions in each of the tests. The two functions contain the pattern of identical code so that they are folded in icf pass. Because of icf, the instruction count only checks one function which is 8. However when the SIMD modes are enabled, the estimation of the code size becomes smaller so that inlining is applied after icf, then the instruction count becomes 16 which causes failure of the tests. Because the icf is not the expected pattern to be tested but causes above issues, -fno-ipa-icf is applied to the tests to avoid unstable instruction count. This patch is separated from https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html because this part is not strongly connected to the aim of that one so that causing confusion. Regtested and bootstraped. Is it OK for trunk please? Thanks Dennis gcc/ChangeLog: 2020-09-15 Dennis Zhang * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes. Since toolchain builds work again after Jakub's divmod fix, I'm now facing another build error likely caused by this patch: In file included from /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0, from /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28: /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c: In function 'machine_mode arm_preferred_simd_mode(scalar_mode)': ./insn-modes.h:196:71: error: temporary of non-literal type 'scalar_int_mode' in a constant expression #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode)) ^ /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12: note: in expansion of macro 'QImode' case QImode: and similarly for the other cases. Does the build work for you? Thanks, Christophe Thanks for the report. Sorry to see the error. I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I didn't get this error. Could you please help to show the configuration you use for your build? I will test and fix at once. It fails on all of them for me. Does it work for you with current master? (r11-3720-gf18eeb6b958acd5e1590ca4a73231486b749be9b) So... I guess you are using a host with GCC more recent than 4.8.5? :-) When I build manually on ubuntu-16.04 with gcc-5.4, the build succeeds, and after manually building with the same environment in the compute farm I use for validation (RHEL 7, gcc-4.8.5), I managed to reproduce the build failure. It's a matter of replacing case QImode: with case E_QImode: Is the attached patch OK? Or do we instead want to revisit the minimum gcc version required to build gcc? Thanks, Christophe I've tested your patch and it works with my other patches depending on this one. So I agree this patch is OK. Thanks for the fix. Bests Dennis
Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
Hi Christophe, On 08/10/2020 14:14, Christophe Lyon wrote: Hi, On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches wrote: On 9/16/20 4:00 PM, Dennis Zhang wrote: Hi all, This patch enables SIMD modes for MVE auto-vectorization. In this patch, the integer and float MVE SIMD modes are returned by arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when MVE or MVE_FLOAT is enabled. Then the expanders for auto-vectorization can be used for generating MVE SIMD code. This patch also fixes bugs in MVE vreiterpretq_*.c tests which are revealed by the enabled MVE SIMD modes. The tests are for checking the MVE reinterpret intrinsics. There are two functions in each of the tests. The two functions contain the pattern of identical code so that they are folded in icf pass. Because of icf, the instruction count only checks one function which is 8. However when the SIMD modes are enabled, the estimation of the code size becomes smaller so that inlining is applied after icf, then the instruction count becomes 16 which causes failure of the tests. Because the icf is not the expected pattern to be tested but causes above issues, -fno-ipa-icf is applied to the tests to avoid unstable instruction count. This patch is separated from https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html because this part is not strongly connected to the aim of that one so that causing confusion. Regtested and bootstraped. Is it OK for trunk please? Thanks Dennis gcc/ChangeLog: 2020-09-15 Dennis Zhang * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes. Since toolchain builds work again after Jakub's divmod fix, I'm now facing another build error likely caused by this patch: In file included from /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0, from /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28: /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c: In function 'machine_mode arm_preferred_simd_mode(scalar_mode)': ./insn-modes.h:196:71: error: temporary of non-literal type 'scalar_int_mode' in a constant expression #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode)) ^ /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12: note: in expansion of macro 'QImode' case QImode: and similarly for the other cases. Does the build work for you? Thanks, Christophe Thanks for the report. Sorry to see the error. I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I didn't get this error. Could you please help to show the configuration you use for your build? I will test and fix at once. Thanks Dennis
[PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
Hi all, This patch enables MVE vmin/vmax instructions for auto-vectorization. MVE target is included in expander smin3, umin3, smax3 and umax3 for vectorization. Related insns for vmin/vmax in mve.md are modified to use smin, umin, smax and umax expressions instead of unspec to support the expanders. Regression tested on arm-none-eabi and bootstraped on arm-none-linux-gnueabihf. Is it OK for trunk please? Thanks Dennis gcc/ChangeLog: 2020-10-02 Dennis Zhang * config/arm/mve.md (mve_vmaxq_): Replace with ... (mve_vmaxq_s, mve_vmaxq_u): ... these new insns to use smax/umax instead of VMAXQ. (mve_vminq_): Replace with ... (mve_vminq_s, mve_vminq_u): ... these new insns to use smin/umin instead of VMINQ. (mve_vmaxnmq_f): Use smax instead of VMAXNMQ_F. (mve_vminnmq_f): Use smin instead of VMINNMQ_F. * config/arm/vec-common.md (smin3): Use the new mode macros ARM_HAVE__ARITH. (umin3, smax3, umax3): Likewise. gcc/testsuite/ChangeLog: 2020-10-02 Dennis Zhang * gcc.target/arm/simd/mve-vminmax_1.c: New test. diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 3a57901bd5b..0d9f932e983 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -1977,15 +1977,25 @@ ;; ;; [vmaxq_u, vmaxq_s]) ;; -(define_insn "mve_vmaxq_" +(define_insn "mve_vmaxq_s" [ (set (match_operand:MVE_2 0 "s_register_operand" "=w") - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w") - (match_operand:MVE_2 2 "s_register_operand" "w")] - VMAXQ)) + (smax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w") + (match_operand:MVE_2 2 "s_register_operand" "w"))) + ] + "TARGET_HAVE_MVE" + "vmax.%#\t%q0, %q1, %q2" + [(set_attr "type" "mve_move") +]) + +(define_insn "mve_vmaxq_u" + [ + (set (match_operand:MVE_2 0 "s_register_operand" "=w") + (umax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w") + (match_operand:MVE_2 2 "s_register_operand" "w"))) ] "TARGET_HAVE_MVE" - "vmax.%#\t%q0, %q1, %q2" + "vmax.%#\t%q0, %q1, %q2" [(set_attr "type" "mve_move") ]) @@ -2037,15 +2047,25 @@ ;; ;; [vminq_s, vminq_u]) ;; -(define_insn "mve_vminq_" +(define_insn "mve_vminq_s" [ (set (match_operand:MVE_2 0 "s_register_operand" "=w") - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w") - (match_operand:MVE_2 2 "s_register_operand" "w")] - VMINQ)) + (smin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w") + (match_operand:MVE_2 2 "s_register_operand" "w"))) ] "TARGET_HAVE_MVE" - "vmin.%#\t%q0, %q1, %q2" + "vmin.%#\t%q0, %q1, %q2" + [(set_attr "type" "mve_move") +]) + +(define_insn "mve_vminq_u" + [ + (set (match_operand:MVE_2 0 "s_register_operand" "=w") + (umin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w") + (match_operand:MVE_2 2 "s_register_operand" "w"))) + ] + "TARGET_HAVE_MVE" + "vmin.%#\t%q0, %q1, %q2" [(set_attr "type" "mve_move") ]) @@ -3030,9 +3050,8 @@ (define_insn "mve_vmaxnmq_f" [ (set (match_operand:MVE_0 0 "s_register_operand" "=w") - (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w") - (match_operand:MVE_0 2 "s_register_operand" "w")] - VMAXNMQ_F)) + (smax:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w") + (match_operand:MVE_0 2 "s_register_operand" "w"))) ] "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" "vmaxnm.f%# %q0, %q1, %q2" @@ -3090,9 +3109,8 @@ (define_insn "mve_vminnmq_f" [ (set (match_operand:MVE_0 0 "s_register_operand" "=w") - (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w") - (match_operand:MVE_0 2 "s_register_operand" "w")] - VMINNMQ_F)) + (smin:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w") + (match_operand:MVE_0 2 "s_register_operand" "w"))) ] "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" "vminnm.f%# %q0, %q1, %q2" diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index c3c86c46355..6a330cc82f6 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -114,39 +114,29 @@ [(set (match_operand:VALLW 0 "s_register_operand") (smin:VALLW (match_operand:VALLW 1 "s_register_operand") (match_operand:VALLW 2 "s_register_operand")))] - "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode) - || flag_unsafe_math_optimizations)) - || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))" -{ -}) + "ARM_HAVE__ARITH" +) (define_expand "umin3" [(set (match_operand:VINTW 0 "s_register_operand") (umin:VINTW (match_operand:VINTW 1 "s_register_operand") (match_operand:VINTW 2 "s_register_operand")))] - "TARGET_NEON - || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))" -{ -}) + "ARM_HAVE__ARITH" +) (define_expand "smax3" [(set (match_operand:VALLW 0 "s_register_operand") (smax:VALLW (match_operand:VALLW 1 "s_register_operand") (match_operand:VALLW 2 "s_register_operand")))] -
[PATCH][Arm] Auto-vectorization for MVE: vmul
Hi all, This patch enables MVE vmul instructions for auto-vectorization. It includes MVE in expander mul3 to enable vectorization for MVE and modifies related vmul insns to support the expander by using 'mult' instead of unspec. The mul3 for vectorization in vec-common.md uses mode iterator VDQWH instead of VALLW to cover all supported modes. The macros ARM_HAVE__ARITH are used to select supported modes for different targets. The redundant mul3 in neon.md is removed. Regression tested on arm-none-eabi and bootstraped on arm-none-linux-gnueabihf. Is it OK for trunk please? Thanks Dennis gcc/ChangeLog: 2020-10-02 Dennis Zhang * config/arm/mve.md (mve_vmulq): New entry for vmul instruction using expression 'mult'. (mve_vmulq_f): Use mult instead of VMULQ_F. * config/arm/neon.md (mul3): Removed. * config/arm/vec-common.md (mul3): Use the new mode macros ARM_HAVE__ARITH. Use mode iterator VDQWH instead of VALLW. gcc/testsuite/ChangeLog: 2020-10-02 Dennis Zhang * gcc.target/arm/simd/mve-vmul_1.c: New test. diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 3a57901bd5b..5b2b609174c 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -2199,6 +2199,17 @@ [(set_attr "type" "mve_move") ]) +(define_insn "mve_vmulq" + [ + (set (match_operand:MVE_2 0 "s_register_operand" "=w") + (mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w") + (match_operand:MVE_2 2 "s_register_operand" "w"))) + ] + "TARGET_HAVE_MVE" + "vmul.i%#\t%q0, %q1, %q2" + [(set_attr "type" "mve_move") +]) + ;; ;; [vornq_u, vornq_s]) ;; @@ -3210,9 +3221,8 @@ (define_insn "mve_vmulq_f" [ (set (match_operand:MVE_0 0 "s_register_operand" "=w") - (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w") - (match_operand:MVE_0 2 "s_register_operand" "w")] - VMULQ_F)) + (mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w") + (match_operand:MVE_0 2 "s_register_operand" "w"))) ] "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT" "vmul.f%# %q0, %q1, %q2" diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 96bf277f501..f6632f1a25a 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1899,17 +1899,6 @@ (const_string "neon_mul_")))] ) -(define_insn "mul3" - [(set - (match_operand:VH 0 "s_register_operand" "=w") - (mult:VH -(match_operand:VH 1 "s_register_operand" "w") -(match_operand:VH 2 "s_register_operand" "w")))] - "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations" - "vmul.f16\t%0, %1, %2" - [(set_attr "type" "neon_mul_")] -) - (define_insn "neon_vmulf" [(set (match_operand:VH 0 "s_register_operand" "=w") diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index c3c86c46355..45db60e7411 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -101,14 +101,11 @@ }) (define_expand "mul3" - [(set (match_operand:VALLW 0 "s_register_operand") -(mult:VALLW (match_operand:VALLW 1 "s_register_operand") - (match_operand:VALLW 2 "s_register_operand")))] - "(TARGET_NEON && ((mode != V2SFmode && mode != V4SFmode) - || flag_unsafe_math_optimizations)) - || (mode == V4HImode && TARGET_REALLY_IWMMXT)" -{ -}) + [(set (match_operand:VDQWH 0 "s_register_operand") + (mult:VDQWH (match_operand:VDQWH 1 "s_register_operand") + (match_operand:VDQWH 2 "s_register_operand")))] + "ARM_HAVE__ARITH" +) (define_expand "smin3" [(set (match_operand:VALLW 0 "s_register_operand") diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c new file mode 100644 index 000..514f292c15e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c @@ -0,0 +1,64 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ +/* { dg-additional-options "-O3" } */ + +#include + +void test_vmul_i32 (int32_t * dest, int32_t * a, int32_t * b) { + int i; + for (i=0; i<4; i++) { +dest[i] = a[i] * b[i]; + } +} + +void test_vmul_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) { + int i; + for (i=0; i<4; i++) { +dest[i] = a[i] * b[i]; + } +} + +/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */ + +void test_vmul_i16 (int16_t * dest, int16_t * a, int16_t * b) { + int i; + for (i=0; i<8; i++) { +dest[i] = a[i] * b[i]; + } +} + +void test_vmul_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) { + int i; + for (i=0; i<8; i++) { +dest[i] = a[i] * b[i]; + } +} + +/* { dg-final { scan-assembler-times {vmul\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */ + +void test_vmul_i8 (int8_t * dest, int8_t * a, int8_t * b) { + int i; + for (i=0; i<16; i++) { +dest[i] = a[i] * b[i]; + } +} + +void test_vmul_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) { + int i; + for (i=0; i<16; i++) { +dest[i] =
Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
Hi all, On 8/17/20 6:41 PM, Dennis Zhang wrote: > > Hi all, > > This patch enables MVE vsub instructions for auto-vectorization. > It adds RTL templates for MVE vsub instructions using 'minus' instead of > unspec expression to make the instructions recognizable for vectorization. > MVE target is added in sub3 optab. The sub3 optab is > modified to use a mode iterator that selects available modes for various > targets correspondingly. > MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to > support vectorization. > > This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests > generate wrong instruction numbers because of unexpected icf optimization. > This bug is exposed by the MVE vector modes enabled in this patch, > therefore it is corrected in this patch to avoid test failures. > > MVE instructions are documented here: > https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics > > The patch is regtested for arm-none-eabi and bootstrapped for > arm-none-linux-gnueabihf. > > Is it OK for trunk please? > > Thanks > Dennis > > gcc/ChangeLog: > > 2020-08-10 Dennis Zhang > > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes. > * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro. > (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise. > (TARGET_NEON_MVE_HFP): Likewise. > * config/arm/iterators.md (VSEL): New mode iterator to select modes > for corresponding targets. > * config/arm/mve.md (mve_vsubq): New entry for vsub instruction > using expression 'minus'. > (mve_vsubq_f): Use minus instead of VSUBQ_F unspec. > * config/arm/neon.md (sub3): Removed here. Integrated in the > sub3 in vec-common.md > * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL > to select available modes. Exclude TARGET_NEON_FP16INST from > TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is > originally in neon.md. > > gcc/testsuite/ChangeLog: > > 2020-08-10 Dennis Zhang > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional > option -fno-ipa-icf and change the instruction count from 8 to 16. > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise. > * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'. > * gcc.target/arm/mve/vect/vect_sub_0.c: New test. > * gcc.target/arm/mve/vect/vect_sub_1.c: New test. > This patch is updated based on Richard Sandiford's patch adding new vector mode macros: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html The old version of this patch is at https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html And a less related part in the old version is separated into another patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html This patch enables MVE vsub instructions for auto-vectorization. It adds insns for MVE vsub instructions using 'minus' instead of unspec expression to make the instructions recognizable for auto-vectorization. The sub3 in mve.md is modified to use new mode macros which make the expander available when certain modes are supported. Then various targets can share this expander for vectorization. The redundant sub3 insns in neon.md are then removed. Regression tested on arm-none-eabi and bootstraped on arm-none-linux-gnueabihf. Is it OK for trunk please? Thanks Dennis gcc/ChangeLog: 2020-10-02 Dennis Zhang * config/arm/mve.md (mve_vsubq): New entry for vsub instruction using expression 'minus'. (mve_vsubq_f): Use minus instead of VSUBQ_F unspec. * config/arm/neon.md (*sub3_neon): Use the new mode macros ARM_HAVE__ARITH. (sub3, sub3_fp16): Removed. (neon_vsub): Use gen_sub3 instead of gen_sub3_fp16. * config/arm/vec-common.md (sub3): Use the new mode macros ARM_HAVE__ARITH. gcc/testsuite/ChangeLog: 2020-10-02 Dennis Zhang * gcc.target/arm/simd/mve-vsub_1.c: New test. diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 3a57901bd5b..7853b642262 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -2574,6 +2574,17 @@ [(set_attr "type" "mve_move") ]) +(define_insn "mve_vsubq" + [ + (set (match_operand:MVE_2 0 "s_register_operand" "=w") + (minus:MVE_2
Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
On 9/16/20 4:00 PM, Dennis Zhang wrote: > Hi all, > > This patch enables SIMD modes for MVE auto-vectorization. > In this patch, the integer and float MVE SIMD modes are returned by > arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when > MVE or MVE_FLOAT is enabled. > Then the expanders for auto-vectorization can be used for generating MVE > SIMD code. > > This patch also fixes bugs in MVE vreiterpretq_*.c tests which are > revealed by the enabled MVE SIMD modes. > The tests are for checking the MVE reinterpret intrinsics. > There are two functions in each of the tests. The two functions contain > the pattern of identical code so that they are folded in icf pass. > Because of icf, the instruction count only checks one function which is 8. > However when the SIMD modes are enabled, the estimation of the code size > becomes smaller so that inlining is applied after icf, then the > instruction count becomes 16 which causes failure of the tests. > Because the icf is not the expected pattern to be tested but causes > above issues, -fno-ipa-icf is applied to the tests to avoid unstable > instruction count. > > This patch is separated from > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html > because this part is not strongly connected to the aim of that one so > that causing confusion. > > Regtested and bootstraped. > > Is it OK for trunk please? > > Thanks > Dennis > > gcc/ChangeLog: > > 2020-09-15 Dennis Zhang > > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes. > > gcc/testsuite/ChangeLog: > > 2020-09-15 Dennis Zhang > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional > option -fno-ipa-icf and change the instruction count from 8 to 16. > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise. > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise. > Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html