Re: [PATCH] i386: Add reduce_*_ep[i|u][8|16] series intrinsics
On Tue, Apr 18, 2023 at 3:13 PM Hu, Lin1 via Gcc-patches wrote: > > More details: Intrinsics guide add these 128/256-bit intrinsics as follow: > https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=reduce_&ig_expand=5814. > > So we intend to enable these intrinsics for GCC-14. > > -Original Message- > From: Gcc-patches On > Behalf Of Hu, Lin1 via Gcc-patches > Sent: Tuesday, April 18, 2023 3:03 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Add reduce_*_ep[i|u][8|16] series intrinsics > > Hi all, > > The patch aims to support reduce_*_ep[i|u][8|16] series intrinsics, and has > been tested on x86_64-pc-linux-gnu. OK for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: > > * config/i386/avx2intrin.h > (_MM_REDUCE_OPERATOR_BASIC_EPI16): New macro. > (_MM_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto. > (_MM256_REDUCE_OPERATOR_BASIC_EPI16): Ditto. > (_MM256_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto. > (_MM_REDUCE_OPERATOR_BASIC_EPI8): Ditto. > (_MM_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto. > (_MM256_REDUCE_OPERATOR_BASIC_EPI8): Ditto. > (_MM256_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto. > (_mm_reduce_add_epi16): New instrinsics. > (_mm_reduce_mul_epi16): Ditto. > (_mm_reduce_and_epi16): Ditto. > (_mm_reduce_or_epi16): Ditto. > (_mm_reduce_max_epi16): Ditto. > (_mm_reduce_max_epu16): Ditto. > (_mm_reduce_min_epi16): Ditto. > (_mm_reduce_min_epu16): Ditto. > (_mm256_reduce_add_epi16): Ditto. > (_mm256_reduce_mul_epi16): Ditto. > (_mm256_reduce_and_epi16): Ditto. > (_mm256_reduce_or_epi16): Ditto. > (_mm256_reduce_max_epi16): Ditto. > (_mm256_reduce_max_epu16): Ditto. > (_mm256_reduce_min_epi16): Ditto. > (_mm256_reduce_min_epu16): Ditto. > (_mm_reduce_add_epi8): Ditto. > (_mm_reduce_mul_epi8): Ditto. > (_mm_reduce_and_epi8): Ditto. > (_mm_reduce_or_epi8): Ditto. > (_mm_reduce_max_epi8): Ditto. > (_mm_reduce_max_epu8): Ditto. > (_mm_reduce_min_epi8): Ditto. > (_mm_reduce_min_epu8): Ditto. > (_mm256_reduce_add_epi8): Ditto. > (_mm256_reduce_mul_epi8): Ditto. > (_mm256_reduce_and_epi8): Ditto. > (_mm256_reduce_or_epi8): Ditto. > (_mm256_reduce_max_epi8): Ditto. > (_mm256_reduce_max_epu8): Ditto. > (_mm256_reduce_min_epi8): Ditto. > (_mm256_reduce_min_epu8): Ditto. > * config/i386/avx512vlbwintrin.h: > (_mm_mask_reduce_add_epi16): Ditto. > (_mm_mask_reduce_mul_epi16): Ditto. > (_mm_mask_reduce_and_epi16): Ditto. > (_mm_mask_reduce_or_epi16): Ditto. > (_mm_mask_reduce_max_epi16): Ditto. > (_mm_mask_reduce_max_epu16): Ditto. > (_mm_mask_reduce_min_epi16): Ditto. > (_mm_mask_reduce_min_epu16): Ditto. > (_mm256_mask_reduce_add_epi16): Ditto. > (_mm256_mask_reduce_mul_epi16): Ditto. > (_mm256_mask_reduce_and_epi16): Ditto. > (_mm256_mask_reduce_or_epi16): Ditto. > (_mm256_mask_reduce_max_epi16): Ditto. > (_mm256_mask_reduce_max_epu16): Ditto. > (_mm256_mask_reduce_min_epi16): Ditto. > (_mm256_mask_reduce_min_epu16): Ditto. > (_mm_mask_reduce_add_epi8): Ditto. > (_mm_mask_reduce_mul_epi8): Ditto. > (_mm_mask_reduce_and_epi8): Ditto. > (_mm_mask_reduce_or_epi8): Ditto. > (_mm_mask_reduce_max_epi8): Ditto. > (_mm_mask_reduce_max_epu8): Ditto. > (_mm_mask_reduce_min_epi8): Ditto. > (_mm_mask_reduce_min_epu8): Ditto. > (_mm256_mask_reduce_add_epi8): Ditto. > (_mm256_mask_reduce_mul_epi8): Ditto. > (_mm256_mask_reduce_and_epi8): Ditto. > (_mm256_mask_reduce_or_epi8): Ditto. > (_mm256_mask_reduce_max_epi8): Ditto. > (_mm256_mask_reduce_max_epu8): Ditto. > (_mm256_mask_reduce_min_epi8): Ditto. > (_mm256_mask_reduce_min_epu8): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512vlbw-reduce-op-1.c: New test. > --- > gcc/config/i386/avx2intrin.h | 347 ++ > gcc/config/i386/avx512vlbwintrin.h| 256 + > .../gcc.target/i386/avx512vlbw-reduce-op-1.c | 206 +++ > 3 files changed, 809 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-reduce-op-1.c > > diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h > index 1b9c8169a96..9b8c1
RE: [PATCH] i386: Add reduce_*_ep[i|u][8|16] series intrinsics
More details: Intrinsics guide add these 128/256-bit intrinsics as follow: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=reduce_&ig_expand=5814. So we intend to enable these intrinsics for GCC-14. -Original Message- From: Gcc-patches On Behalf Of Hu, Lin1 via Gcc-patches Sent: Tuesday, April 18, 2023 3:03 PM To: gcc-patches@gcc.gnu.org Cc: Liu, Hongtao ; ubiz...@gmail.com Subject: [PATCH] i386: Add reduce_*_ep[i|u][8|16] series intrinsics Hi all, The patch aims to support reduce_*_ep[i|u][8|16] series intrinsics, and has been tested on x86_64-pc-linux-gnu. OK for trunk? BRs, Lin gcc/ChangeLog: * config/i386/avx2intrin.h (_MM_REDUCE_OPERATOR_BASIC_EPI16): New macro. (_MM_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto. (_MM256_REDUCE_OPERATOR_BASIC_EPI16): Ditto. (_MM256_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto. (_MM_REDUCE_OPERATOR_BASIC_EPI8): Ditto. (_MM_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto. (_MM256_REDUCE_OPERATOR_BASIC_EPI8): Ditto. (_MM256_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto. (_mm_reduce_add_epi16): New instrinsics. (_mm_reduce_mul_epi16): Ditto. (_mm_reduce_and_epi16): Ditto. (_mm_reduce_or_epi16): Ditto. (_mm_reduce_max_epi16): Ditto. (_mm_reduce_max_epu16): Ditto. (_mm_reduce_min_epi16): Ditto. (_mm_reduce_min_epu16): Ditto. (_mm256_reduce_add_epi16): Ditto. (_mm256_reduce_mul_epi16): Ditto. (_mm256_reduce_and_epi16): Ditto. (_mm256_reduce_or_epi16): Ditto. (_mm256_reduce_max_epi16): Ditto. (_mm256_reduce_max_epu16): Ditto. (_mm256_reduce_min_epi16): Ditto. (_mm256_reduce_min_epu16): Ditto. (_mm_reduce_add_epi8): Ditto. (_mm_reduce_mul_epi8): Ditto. (_mm_reduce_and_epi8): Ditto. (_mm_reduce_or_epi8): Ditto. (_mm_reduce_max_epi8): Ditto. (_mm_reduce_max_epu8): Ditto. (_mm_reduce_min_epi8): Ditto. (_mm_reduce_min_epu8): Ditto. (_mm256_reduce_add_epi8): Ditto. (_mm256_reduce_mul_epi8): Ditto. (_mm256_reduce_and_epi8): Ditto. (_mm256_reduce_or_epi8): Ditto. (_mm256_reduce_max_epi8): Ditto. (_mm256_reduce_max_epu8): Ditto. (_mm256_reduce_min_epi8): Ditto. (_mm256_reduce_min_epu8): Ditto. * config/i386/avx512vlbwintrin.h: (_mm_mask_reduce_add_epi16): Ditto. (_mm_mask_reduce_mul_epi16): Ditto. (_mm_mask_reduce_and_epi16): Ditto. (_mm_mask_reduce_or_epi16): Ditto. (_mm_mask_reduce_max_epi16): Ditto. (_mm_mask_reduce_max_epu16): Ditto. (_mm_mask_reduce_min_epi16): Ditto. (_mm_mask_reduce_min_epu16): Ditto. (_mm256_mask_reduce_add_epi16): Ditto. (_mm256_mask_reduce_mul_epi16): Ditto. (_mm256_mask_reduce_and_epi16): Ditto. (_mm256_mask_reduce_or_epi16): Ditto. (_mm256_mask_reduce_max_epi16): Ditto. (_mm256_mask_reduce_max_epu16): Ditto. (_mm256_mask_reduce_min_epi16): Ditto. (_mm256_mask_reduce_min_epu16): Ditto. (_mm_mask_reduce_add_epi8): Ditto. (_mm_mask_reduce_mul_epi8): Ditto. (_mm_mask_reduce_and_epi8): Ditto. (_mm_mask_reduce_or_epi8): Ditto. (_mm_mask_reduce_max_epi8): Ditto. (_mm_mask_reduce_max_epu8): Ditto. (_mm_mask_reduce_min_epi8): Ditto. (_mm_mask_reduce_min_epu8): Ditto. (_mm256_mask_reduce_add_epi8): Ditto. (_mm256_mask_reduce_mul_epi8): Ditto. (_mm256_mask_reduce_and_epi8): Ditto. (_mm256_mask_reduce_or_epi8): Ditto. (_mm256_mask_reduce_max_epi8): Ditto. (_mm256_mask_reduce_max_epu8): Ditto. (_mm256_mask_reduce_min_epi8): Ditto. (_mm256_mask_reduce_min_epu8): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vlbw-reduce-op-1.c: New test. --- gcc/config/i386/avx2intrin.h | 347 ++ gcc/config/i386/avx512vlbwintrin.h| 256 + .../gcc.target/i386/avx512vlbw-reduce-op-1.c | 206 +++ 3 files changed, 809 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-reduce-op-1.c diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h index 1b9c8169a96..9b8c13b7233 100644 --- a/gcc/config/i386/avx2intrin.h +++ b/gcc/config/i386/avx2intrin.h @@ -1915,6 +1915,353 @@ _mm256_mask_i64gather_epi32 (__m128i __src, int const *__base, (int) (SCALE)) #endif /* __OPTIMIZE__ */ +#define _MM_REDUCE_OPERATOR_BASIC_EPI16(op) \ + __v8hi __T1 = (__v8hi)__W; \ + __v8hi __T2 = __builtin_shufflevector (__T1, __T1, 4, 5, 6, 7, 4, 5, +6, 7); \ + __v8hi __T3 = __T1 op __T2; \ + __v8hi __T4 = __builtin_shufflevector (__T3, __T3, 2, 3, 2, 3, 4, 5, +6, 7); \ + __v8hi __T5 = __T3 op __T4; \ + __v8hi __T6 = __builtin_shufflevector (__T5,
[PATCH] i386: Add reduce_*_ep[i|u][8|16] series intrinsics
Hi all, The patch aims to support reduce_*_ep[i|u][8|16] series intrinsics, and has been tested on x86_64-pc-linux-gnu. OK for trunk? BRs, Lin gcc/ChangeLog: * config/i386/avx2intrin.h (_MM_REDUCE_OPERATOR_BASIC_EPI16): New macro. (_MM_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto. (_MM256_REDUCE_OPERATOR_BASIC_EPI16): Ditto. (_MM256_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto. (_MM_REDUCE_OPERATOR_BASIC_EPI8): Ditto. (_MM_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto. (_MM256_REDUCE_OPERATOR_BASIC_EPI8): Ditto. (_MM256_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto. (_mm_reduce_add_epi16): New instrinsics. (_mm_reduce_mul_epi16): Ditto. (_mm_reduce_and_epi16): Ditto. (_mm_reduce_or_epi16): Ditto. (_mm_reduce_max_epi16): Ditto. (_mm_reduce_max_epu16): Ditto. (_mm_reduce_min_epi16): Ditto. (_mm_reduce_min_epu16): Ditto. (_mm256_reduce_add_epi16): Ditto. (_mm256_reduce_mul_epi16): Ditto. (_mm256_reduce_and_epi16): Ditto. (_mm256_reduce_or_epi16): Ditto. (_mm256_reduce_max_epi16): Ditto. (_mm256_reduce_max_epu16): Ditto. (_mm256_reduce_min_epi16): Ditto. (_mm256_reduce_min_epu16): Ditto. (_mm_reduce_add_epi8): Ditto. (_mm_reduce_mul_epi8): Ditto. (_mm_reduce_and_epi8): Ditto. (_mm_reduce_or_epi8): Ditto. (_mm_reduce_max_epi8): Ditto. (_mm_reduce_max_epu8): Ditto. (_mm_reduce_min_epi8): Ditto. (_mm_reduce_min_epu8): Ditto. (_mm256_reduce_add_epi8): Ditto. (_mm256_reduce_mul_epi8): Ditto. (_mm256_reduce_and_epi8): Ditto. (_mm256_reduce_or_epi8): Ditto. (_mm256_reduce_max_epi8): Ditto. (_mm256_reduce_max_epu8): Ditto. (_mm256_reduce_min_epi8): Ditto. (_mm256_reduce_min_epu8): Ditto. * config/i386/avx512vlbwintrin.h: (_mm_mask_reduce_add_epi16): Ditto. (_mm_mask_reduce_mul_epi16): Ditto. (_mm_mask_reduce_and_epi16): Ditto. (_mm_mask_reduce_or_epi16): Ditto. (_mm_mask_reduce_max_epi16): Ditto. (_mm_mask_reduce_max_epu16): Ditto. (_mm_mask_reduce_min_epi16): Ditto. (_mm_mask_reduce_min_epu16): Ditto. (_mm256_mask_reduce_add_epi16): Ditto. (_mm256_mask_reduce_mul_epi16): Ditto. (_mm256_mask_reduce_and_epi16): Ditto. (_mm256_mask_reduce_or_epi16): Ditto. (_mm256_mask_reduce_max_epi16): Ditto. (_mm256_mask_reduce_max_epu16): Ditto. (_mm256_mask_reduce_min_epi16): Ditto. (_mm256_mask_reduce_min_epu16): Ditto. (_mm_mask_reduce_add_epi8): Ditto. (_mm_mask_reduce_mul_epi8): Ditto. (_mm_mask_reduce_and_epi8): Ditto. (_mm_mask_reduce_or_epi8): Ditto. (_mm_mask_reduce_max_epi8): Ditto. (_mm_mask_reduce_max_epu8): Ditto. (_mm_mask_reduce_min_epi8): Ditto. (_mm_mask_reduce_min_epu8): Ditto. (_mm256_mask_reduce_add_epi8): Ditto. (_mm256_mask_reduce_mul_epi8): Ditto. (_mm256_mask_reduce_and_epi8): Ditto. (_mm256_mask_reduce_or_epi8): Ditto. (_mm256_mask_reduce_max_epi8): Ditto. (_mm256_mask_reduce_max_epu8): Ditto. (_mm256_mask_reduce_min_epi8): Ditto. (_mm256_mask_reduce_min_epu8): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vlbw-reduce-op-1.c: New test. --- gcc/config/i386/avx2intrin.h | 347 ++ gcc/config/i386/avx512vlbwintrin.h| 256 + .../gcc.target/i386/avx512vlbw-reduce-op-1.c | 206 +++ 3 files changed, 809 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512vlbw-reduce-op-1.c diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h index 1b9c8169a96..9b8c13b7233 100644 --- a/gcc/config/i386/avx2intrin.h +++ b/gcc/config/i386/avx2intrin.h @@ -1915,6 +1915,353 @@ _mm256_mask_i64gather_epi32 (__m128i __src, int const *__base, (int) (SCALE)) #endif /* __OPTIMIZE__ */ +#define _MM_REDUCE_OPERATOR_BASIC_EPI16(op) \ + __v8hi __T1 = (__v8hi)__W; \ + __v8hi __T2 = __builtin_shufflevector (__T1, __T1, 4, 5, 6, 7, 4, 5, 6, 7); \ + __v8hi __T3 = __T1 op __T2; \ + __v8hi __T4 = __builtin_shufflevector (__T3, __T3, 2, 3, 2, 3, 4, 5, 6, 7); \ + __v8hi __T5 = __T3 op __T4; \ + __v8hi __T6 = __builtin_shufflevector (__T5, __T5, 1, 1, 2, 3, 4, 5, 6, 7); \ + __v8hi __T7 = __T5 op __T6; \ + return __T7[0] + +extern __inline short +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_add_epi16 (__m128i __W) +{ + _MM_REDUCE_OPERATOR_BASIC_EPI16 (+); +} + +extern __inline short +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_reduce_mul_epi16 (__m128i __W) +{ + _MM_REDUCE_OPERATOR_BASIC_EPI16 (*); +} + +extern __inline short +__attribute__ ((__gnu_inline__, __alway