RE: [PATCH] i386: Prefer remote atomic insn for atomic_fetch{add, and, or, xor}

2022-11-07 Thread Kong, Lingling via Gcc-patches
> On Sun, Nov 6, 2022 at 2:00 PM Kong, Lingling via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi
> >
> > The patch is to add flag -mprefer-remote-atomic to control whether to
> generate raoint insn for atomic operations.
> > Ok for trunk?
> 
> Please note TARGET_AVOID_MFENCE tuning flag, introduced a while ago due to
> the fact that several targets perform LOCK OR faster than MFENCE.
> 
> It was determined that MFENCE/SFENCE/LFENCE are much more complex
> instructions compared to LOCK OR, since they have to handle cases that C
> memory model never describes (some MMIO, or such). Considering that
> ordinary LOCKed operations adequately cover C memory model, and are
> probably faster than new instructions that have to cover all special cases, I
> wonder if there is really benefit to emit these insns instead of existing 
> LOCKed
> operations. These should IMO be used only via relevant builtins.
> 
> Uros.
> 

Ok, I will revert this patch in trunk. 
And wait until the optimization results of the actual hardware come out, and 
then consider to push the optimization patch.

> >
> > BRs,
> > Lingling
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.opt:Add -mprefer-remote-atomic.
> > * config/i386/sync.md (atomic_):
> > New define_expand.
> > (atomic_add): Rename to below one.
> > (atomic_add_1): To this.
> > (atomic_): Ditto.
> > (atomic__1): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/raoint-atomic-fetch.c: New test.
> > ---
> >  gcc/config/i386/i386.opt  |  4 +++
> >  gcc/config/i386/sync.md   | 29 ---
> >  .../gcc.target/i386/raoint-atomic-fetch.c | 29 +++
> >  3 files changed, 58 insertions(+), 4 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c
> >
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> > 415c52e1bb4..abb1e5ecbdc 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1246,3 +1246,7 @@ Support PREFETCHI built-in functions and code
> generation.
> >  mraoint
> >  Target Mask(ISA2_RAOINT) Var(ix86_isa_flags2) Save  Support RAOINT built-in
> functions and code generation.
> > +
> > +mprefer-remote-atomic
> > +Target Var(flag_prefer_remote_atomic) Init(0) Prefer use remote
> > +atomic insn for atomic operations.
> > diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index
> > e6543a5efb0..08e944fc9b7 100644
> > --- a/gcc/config/i386/sync.md
> > +++ b/gcc/config/i386/sync.md
> > @@ -37,7 +37,7 @@
> >UNSPECV_CMPXCHG
> >UNSPECV_XCHG
> >UNSPECV_LOCK
> > -
> > +
> >;; For CMPccXADD support
> >UNSPECV_CMPCCXADD
> >
> > @@ -791,7 +791,28 @@
> >  (define_code_iterator any_plus_logic [and ior xor plus])
> > (define_code_attr plus_logic [(and "and") (ior "or") (xor "xor") (plus
> > "add")])
> >
> > -(define_insn "rao_a"
> > +(define_expand "atomic_"
> > +  [(match_operand:SWI 0 "memory_operand")
> > +   (any_plus_logic:SWI (match_dup 0)
> > +  (match_operand:SWI 1 "nonmemory_operand"))
> > +   (match_operand:SI 2 "const_int_operand")]
> > +  ""
> > +{
> > +  if (flag_prefer_remote_atomic
> > +  && TARGET_RAOINT && operands[2] == const0_rtx
> > +  && (mode == SImode || mode == DImode))
> > +  {
> > +if (CONST_INT_P (operands[1]))
> > +  operands[1] = force_reg (mode, operands[1]);
> > +emit_insn (maybe_gen_rao_a (, mode, operands[0],
> > +operands[1]));
> > +  }
> > +  else
> > +emit_insn (gen_atomic__1 (operands[0], operands[1],
> > +   operands[2]));
> > +  DONE;
> > +})
> > +
> > +(define_insn "@rao_a"
> >[(set (match_operand:SWI48 0 "memory_operand" "+m")
> > (unspec_volatile:SWI48
> >   [(any_plus_logic:SWI48 (match_dup 0) @@ -801,7 +822,7 @@
> >"TARGET_RAOINT"
> >"a\t{%1, %0|%0, %1}")
> >
> > -(define_insn "atomic_add"
> > +(define_insn "atomic_add_1"
> >[(set (match_operand:SWI 0 "memory_operand" "+m")
> > (unspec_volatile:SWI
> >   [(plu

[PATCH] [committed] i386: Fix typo in sse-22.c pragma

2022-11-07 Thread Kong, Lingling via Gcc-patches


gcc/testsuite/ChangeLog:

* gcc.target/i386/sse-22.c: Fix typo in pragma GCC target.

Pushing as obvious.

Thanks,
Lingling
---
 gcc/testsuite/gcc.target/i386/sse-22.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c 
b/gcc/testsuite/gcc.target/i386/sse-22.c
index f5808e4513b..f600bb544b2 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -103,7 +103,7 @@
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target 
("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert,amx-fp16.raoint")
+#pragma GCC target 
("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert,amx-fp16,raoint")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
-- 
2.27.0



[PATCH] i386: Prefer remote atomic insn for atomic_fetch{add, and, or, xor}

2022-11-06 Thread Kong, Lingling via Gcc-patches
Hi

The patch is to add flag -mprefer-remote-atomic to control whether to generate 
raoint insn for atomic operations.
Ok for trunk?

BRs,
Lingling

gcc/ChangeLog:

* config/i386/i386.opt:Add -mprefer-remote-atomic.
* config/i386/sync.md (atomic_):
New define_expand.
(atomic_add): Rename to below one.
(atomic_add_1): To this.
(atomic_): Ditto.
(atomic__1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/raoint-atomic-fetch.c: New test.
---
 gcc/config/i386/i386.opt  |  4 +++
 gcc/config/i386/sync.md   | 29 ---
 .../gcc.target/i386/raoint-atomic-fetch.c | 29 +++
 3 files changed, 58 insertions(+), 4 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 
415c52e1bb4..abb1e5ecbdc 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1246,3 +1246,7 @@ Support PREFETCHI built-in functions and code generation.
 mraoint
 Target Mask(ISA2_RAOINT) Var(ix86_isa_flags2) Save  Support RAOINT built-in 
functions and code generation.
+
+mprefer-remote-atomic
+Target Var(flag_prefer_remote_atomic) Init(0) Prefer use remote atomic 
+insn for atomic operations.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 
e6543a5efb0..08e944fc9b7 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -37,7 +37,7 @@
   UNSPECV_CMPXCHG
   UNSPECV_XCHG
   UNSPECV_LOCK
- 
+
   ;; For CMPccXADD support
   UNSPECV_CMPCCXADD
 
@@ -791,7 +791,28 @@
 (define_code_iterator any_plus_logic [and ior xor plus])  (define_code_attr 
plus_logic [(and "and") (ior "or") (xor "xor") (plus "add")])
 
-(define_insn "rao_a"
+(define_expand "atomic_"
+  [(match_operand:SWI 0 "memory_operand")
+   (any_plus_logic:SWI (match_dup 0)
+  (match_operand:SWI 1 "nonmemory_operand"))
+   (match_operand:SI 2 "const_int_operand")]
+  ""
+{
+  if (flag_prefer_remote_atomic
+  && TARGET_RAOINT && operands[2] == const0_rtx
+  && (mode == SImode || mode == DImode))
+  {
+if (CONST_INT_P (operands[1]))
+  operands[1] = force_reg (mode, operands[1]);
+emit_insn (maybe_gen_rao_a (, mode, operands[0], 
+operands[1]));
+  }
+  else
+emit_insn (gen_atomic__1 (operands[0], operands[1],
+   operands[2]));
+  DONE;
+})
+
+(define_insn "@rao_a"
   [(set (match_operand:SWI48 0 "memory_operand" "+m")
(unspec_volatile:SWI48
  [(any_plus_logic:SWI48 (match_dup 0) @@ -801,7 +822,7 @@
   "TARGET_RAOINT"
   "a\t{%1, %0|%0, %1}")
 
-(define_insn "atomic_add"
+(define_insn "atomic_add_1"
   [(set (match_operand:SWI 0 "memory_operand" "+m")
(unspec_volatile:SWI
  [(plus:SWI (match_dup 0)
@@ -855,7 +876,7 @@
   return "lock{%;} %K2sub{}\t{%1, %0|%0, %1}";
 })
 
-(define_insn "atomic_"
+(define_insn "atomic__1"
   [(set (match_operand:SWI 0 "memory_operand" "+m")
(unspec_volatile:SWI
  [(any_logic:SWI (match_dup 0)
diff --git a/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c 
b/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c
new file mode 100644
index 000..ac4099d888e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/raoint-atomic-fetch.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-mraoint -O2 -mprefer-remote-atomic" } */
+/* { dg-final { scan-assembler-times "aadd" 2 { target {! ia32 } } } } 
+*/
+/* { dg-final { scan-assembler-times "aand" 2 { target {! ia32 } } } } 
+*/
+/* { dg-final { scan-assembler-times "aor" 2 { target {! ia32 } } } } 
+*/
+/* { dg-final { scan-assembler-times "axor" 2 { target {! ia32 } } } } 
+*/
+/* { dg-final { scan-assembler-times "aadd" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "aand" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "aor" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "axor" 1 { target ia32 } } } */ 
+volatile int x; volatile long long y; int *a; long long *b;
+
+void extern
+rao_int_test (void)
+{
+  __atomic_add_fetch (a, x, __ATOMIC_RELAXED);
+  __atomic_and_fetch (a, x, __ATOMIC_RELAXED);
+  __atomic_or_fetch (a, x, __ATOMIC_RELAXED);
+  __atomic_xor_fetch (a, x, __ATOMIC_RELAXED); #ifdef __x86_64__
+  __atomic_add_fetch (b, y, __ATOMIC_RELAXED);
+  __atomic_and_fetch (b, y, __ATOMIC_RELAXED);
+  __atomic_or_fetch (b, y, __ATOMIC_RELAXED);
+  __atomic_xor_fetch (b, y, __ATOMIC_RELAXED); #endif }
--
2.27.0



[PATCH] Support Intel RAO-INT

2022-11-06 Thread Kong, Lingling via Gcc-patches
Hi,
The patches aimed to add Intel RAO-INT.

The information is based on newly released
Intel Architecture Instruction Set Extensions and Future Features.

The document comes following:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html.

OK for trunk?

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Detect raoint.
* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_RAOINT_SET,
OPTION_MASK_ISA2_RAOINT_UNSET): New.
(ix86_handle_option): Handle -mraoint.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_RAOINT.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
raoint.
* config.gcc: Add raointintrin.h
* config/i386/cpuid.h (bit_RAOINT): New.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__RAOINT__.
* config/i386/i386-isa.def (RAOINT): Add DEF_PTA(RAOINT).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Add -mraoint.
* config/i386/sync.md (rao_a): New define insn.
* config/i386/i386.opt: Add option -mraoint.
* config/i386/x86gprintrin.h: Include raointintrin.h.
* doc/extend.texi: Document raoint.
* doc/invoke.texi: Document -mraoint.
* doc/sourcebuild.texi: Document target raoint.
* config/i386/raointintrin.h: New file.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Add -mraoint.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mraoint.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add raoint target.
* gcc.target/i386/sse-23.c: Ditto.
* lib/target-supports.exp: Add check_effective_target_raoint.
* gcc.target/i386/rao-helper.h: New test.
* gcc.target/i386/raoint-1.c: Ditto.
* gcc.target/i386/raoint-aadd-2.c: Ditto.
* gcc.target/i386/raoint-aand-2.c: Ditto.
* gcc.target/i386/raoint-aor-2.c: Ditto.
* gcc.target/i386/raoint-axor-2.c: Ditto.
* gcc.target/i386/x86gprintrin-1.c: Ditto.
* gcc.target/i386/x86gprintrin-2.c: Ditto.
* gcc.target/i386/x86gprintrin-3.c: Ditto.
* gcc.target/i386/x86gprintrin-4.c: Ditto.
* gcc.target/i386/x86gprintrin-5.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h  |   2 +
 gcc/common/config/i386/i386-common.cc |  15 +++
 gcc/common/config/i386/i386-cpuinfo.h |   1 +
 gcc/common/config/i386/i386-isas.h|   1 +
 gcc/config.gcc|   3 +-
 gcc/config/i386/cpuid.h   |   1 +
 gcc/config/i386/i386-builtin.def  |  10 ++
 gcc/config/i386/i386-c.cc |   2 +
 gcc/config/i386/i386-isa.def  |   1 +
 gcc/config/i386/i386-options.cc   |   4 +-
 gcc/config/i386/i386.opt  |   4 +
 gcc/config/i386/raointintrin.h| 101 ++
 gcc/config/i386/sync.md   |  16 +++
 gcc/config/i386/x86gprintrin.h|   2 +
 gcc/doc/extend.texi   |   5 +
 gcc/doc/invoke.texi   |  11 +-
 gcc/doc/sourcebuild.texi  |   3 +
 gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
 gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
 gcc/testsuite/gcc.target/i386/rao-helper.h|  79 ++
 gcc/testsuite/gcc.target/i386/raoint-1.c  |  31 ++
 gcc/testsuite/gcc.target/i386/raoint-aadd-2.c |  24 +  
gcc/testsuite/gcc.target/i386/raoint-aand-2.c |  25 +  
gcc/testsuite/gcc.target/i386/raoint-aor-2.c  |  25 +  
gcc/testsuite/gcc.target/i386/raoint-axor-2.c |  25 +
 gcc/testsuite/gcc.target/i386/sse-12.c|   2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|   2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c|   2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c|   4 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|   2 +-
 .../gcc.target/i386/x86gprintrin-1.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-2.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-3.c  |   2 +-
 .../gcc.target/i386/x86gprintrin-4.c  |   4 +-
 .../gcc.target/i386/x86gprintrin-5.c  |   4 +-
 gcc/testsuite/lib/target-supports.exp |  11 ++
 37 files changed, 413 insertions(+), 21 deletions(-)  create mode 100644 
gcc/config/i386/raointintrin.h  create mode 100644 
gcc/testsuite/gcc.target/i386/rao-helper.h
 create mode 100644 gcc/testsuite/gcc.target/i386/raoint-1.c
 create mode 100644 

RE: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-11-03 Thread Kong, Lingling via Gcc-patches
> > > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > > index 7c6bfa6e..cd0282f1 100644
> > > --- a/htdocs/gcc-13/changes.html
> > > +++ b/htdocs/gcc-13/changes.html
> > > @@ -230,6 +230,8 @@ a work-in-progress.
> > >For both C and C++ the __bf16 type is supported on
> > >x86 systems with SSE2 and above enabled.
> > >
> > > +  Use __bf16 type for AVX512BF16 intrinsics.
> > Could you add more explanations. Like originally it's ..., now it's
> > ..., and what's the difference when users compile the same source
> > code(which contains
> > avx512bf16 intrinsics) with gcc12(and before) and GCC13.
> > > +  
> > >  
> > >
> > >  
> > > --
> > > 2.18.2
> > >
> Yes,  changed it. Thanks a lot!
> 
> Subject: [PATCH] Mention Intel __bf16 support in AVX512BF16 intrinsics.
> 
> ---
>  htdocs/gcc-13/changes.html | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index
> 7c6bfa6e..a35f4fab 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -230,6 +230,12 @@ a work-in-progress.
>For both C and C++ the __bf16 type is supported on
>x86 systems with SSE2 and above enabled.
>
> +  Use __bf16 type for AVX512BF16 intrinsics.
> + Previously we use  short to represent bf16. Now we introduced
> __bf16 to x86 psABI.
> +  So we switch intrinsics in AVX512BF16 to the new type __bf16.
> +  When users compile the same source code contains AVX512BF16
> + intrinsics with
> +  GCC13 need to support SSE2, which is different to GCC12 (and before).
> +  
>  
> 
>  
> --
> 2.18.2
> 
> BRs,
> Lingling

Sorry, modified again. New patch is as below.

htdocs/gcc-13/changes.html | 5 +
 1 file changed, 5 insertions(+)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 
7c6bfa6e..7a5d2ab6 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -230,6 +230,11 @@ a work-in-progress.
   For both C and C++ the __bf16 type is supported on
   x86 systems with SSE2 and above enabled.
   
+  Use real __bf16 type for AVX512BF16 intrinsics. 
+ Previously  we use __bfloat16 which is typedef of short. Now we 
+ introduced real  __bf16 type to x86 psABI. Users need to 
+ adjust their  AVX512BF16-related source code when upgrading GCC12 to GCC13.
+  
 
 
 
--
2.18.2

BRs,
Lingling


RE: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-11-01 Thread Kong, Lingling via Gcc-patches
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 7c6bfa6e..cd0282f1 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -230,6 +230,8 @@ a work-in-progress.
> >For both C and C++ the __bf16 type is supported on
> >x86 systems with SSE2 and above enabled.
> >
> > +  Use __bf16 type for AVX512BF16 intrinsics.
> Could you add more explanations. Like originally it's ..., now it's ..., and 
> what's
> the difference when users compile the same source code(which contains
> avx512bf16 intrinsics) with gcc12(and before) and GCC13.
> > +  
> >  
> >
> >  
> > --
> > 2.18.2
> >
Yes,  changed it. Thanks a lot!

Subject: [PATCH] Mention Intel __bf16 support in AVX512BF16 intrinsics.

---
 htdocs/gcc-13/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 7c6bfa6e..a35f4fab 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -230,6 +230,12 @@ a work-in-progress.
   For both C and C++ the __bf16 type is supported on
   x86 systems with SSE2 and above enabled.
   
+  Use __bf16 type for AVX512BF16 intrinsics. Previously we use
+  short to represent bf16. Now we introduced __bf16 to x86 psABI.
+  So we switch intrinsics in AVX512BF16 to the new type __bf16.
+  When users compile the same source code contains AVX512BF16 intrinsics with
+  GCC13 need to support SSE2, which is different to GCC12 (and before).
+  
 

 
--
2.18.2

BRs,
Lingling


[wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-10-31 Thread Kong, Lingling via Gcc-patches
Hi

The patch is for mention Intel __bf16 support in AVX512BF16 intrinsics.
Ok for master ?

Thanks,
Lingling

---
 htdocs/gcc-13/changes.html | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 
7c6bfa6e..cd0282f1 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -230,6 +230,8 @@ a work-in-progress.
   For both C and C++ the __bf16 type is supported on
   x86 systems with SSE2 and above enabled.
   
+  Use __bf16 type for AVX512BF16 intrinsics.
+  
 
 
 
--
2.18.2



RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-28 Thread Kong, Lingling via Gcc-patches
Hi,

Because we  switch intrinsics for avx512bf16 to the new type __bf16. Now we 
could use m128/256bh for vector bf16 type instead of m128/256bf16.
And unified builtin for avx512bf16/avxneconvert.

Thanks,
Lingling

> -Original Message-
> From: Hongtao Liu 
> Sent: Tuesday, October 25, 2022 1:23 PM
> To: Kong, Lingling 
> Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; Jiang,
> Haochen 
> Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT
> 
> On Mon, Oct 24, 2022 at 2:20 PM Kong, Lingling 
> wrote:
> >
> > > From: Gcc-patches
> > > 
> > > On Behalf Of Hongtao Liu via Gcc-patches
> > > Sent: Monday, October 17, 2022 1:47 PM
> > > To: Jiang, Haochen 
> > > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT
> > >
> > > On Fri, Oct 14, 2022 at 3:58 PM Haochen Jiang via Gcc-patches
> > >  wrote:
> > > >
> > > > From: Kong Lingling 
> > > > +(define_insn "vbcstne2ps_"
> > > > +  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > > > +(vec_duplicate:VF1_128_256
> > > > + (unspec:SF
> > > > +  [(match_operand:HI 1 "memory_operand" "m")]
> > > > +  VBCSTNE)))]
> > > > +  "TARGET_AVXNECONVERT"
> > > > +  "vbcstne2ps\t{%1, %0|%0, %1}"
> > > > +  [(set_attr "prefix" "vex")
> > > > +  (set_attr "mode" "")])
> > > Since jakub has support bf16 software emulation, can we rewrite it
> > > with general rtl ir without unspec?
> > > Like (float_extend:SF (match_operand:BF "memory_operand" "m")
> > > > +
> > > > +(define_int_iterator VCVTNEBF16
> > > > +  [UNSPEC_VCVTNEEBF16SF
> > > > +   UNSPEC_VCVTNEOBF16SF])
> > > > +
> > > > +(define_int_attr vcvtnebf16type
> > > > +  [(UNSPEC_VCVTNEEBF16SF "ebf16")
> > > > +   (UNSPEC_VCVTNEOBF16SF "obf16")]) (define_insn
> > > > +"vcvtne2ps_"
> > > > +  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > > > +(unspec:VF1_128_256
> > > > +  [(match_operand: 1 "memory_operand" "m")]
> > > > + VCVTNEBF16))]
> > > > +  "TARGET_AVXNECONVERT"
> > > > +  "vcvtne2ps\t{%1, %0|%0, %1}"
> > > > +  [(set_attr "prefix" "vex")
> > > > +   (set_attr "mode" "")])
> > > Similar for this one and all those patterns below.
> >
> > That's great! Thanks for the review!
> > Now rewrite it without unspec and use float_extend for new define_insn.
> Ok.
> >
> > Thanks
> > Lingling
> >
> >
> 
> 
> --
> BR,
> Hongtao


0001-Support-Intel-AVX-NE-CONVERT.patch
Description: 0001-Support-Intel-AVX-NE-CONVERT.patch


[PATCH] i386: using __bf16 for AVX512BF16 intrinsics

2022-10-28 Thread Kong, Lingling via Gcc-patches
Hi,

Previously we use unsigned short to represent bf16. It's not a good expression, 
and at the time the front end didn't support bf16 type.
Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the new 
type.

Ok for trunk ?

Thanks,
Lingling

gcc/ChangeLog:

* config/i386/avx512bf16intrin.h (__attribute__): Change short to bf16.
(_mm_cvtsbh_ss): Ditto.
(_mm512_cvtne2ps_pbh): Ditto.
(_mm512_mask_cvtne2ps_pbh): Ditto.
(_mm512_maskz_cvtne2ps_pbh): Ditto.
* config/i386/avx512bf16vlintrin.h (__attribute__): Ditto.
(_mm256_cvtne2ps_pbh): Ditto.
(_mm256_mask_cvtne2ps_pbh): Ditto.
(_mm256_maskz_cvtne2ps_pbh): Ditto.
(_mm_cvtne2ps_pbh): Ditto.
(_mm_mask_cvtne2ps_pbh): Ditto.
(_mm_maskz_cvtne2ps_pbh): Ditto.
(_mm_cvtness_sbh): Ditto.
* config/i386/i386-builtin-types.def (V8BF): Add new
DEF_VECTOR_TYPE for BFmode.
(V16BF): Ditto.
(V32BF): Ditto.
* config/i386/i386-builtin.def (BDESC): Fixed builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Changed
avx512bf16 ix86_builtin_func_type included HI to BF.
* config/i386/immintrin.h: Add SSE2 depend for avx512bf16.
* config/i386/sse.md (TARGET_AVX512VL): Changed HI vector to BF
vector.
(avx512f_cvtneps2bf16_v4sf): New define_expand.
(*avx512f_cvtneps2bf16_v4sf): New define_insn.
(avx512f_cvtneps2bf16_v4sf_maskz):Ditto.
(avx512f_cvtneps2bf16_v4sf_mask): Ditto.
(avx512f_cvtneps2bf16_v4sf_mask_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Add fpmath option.
* gcc.target/i386/avx512bf16-vdpbf16ps-2.c: Fixed
scan-assembler.
* gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Add x/y suffix
for vcvtneps2bf16.
* gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: Ditto.
---
 gcc/config/i386/avx512bf16intrin.h|  12 +--
 gcc/config/i386/avx512bf16vlintrin.h  |  29 ++---
 gcc/config/i386/i386-builtin-types.def|  51 -
 gcc/config/i386/i386-builtin.def  |  54 +-
 gcc/config/i386/i386-expand.cc|  48 -
 gcc/config/i386/immintrin.h   |   2 +
 gcc/config/i386/sse.md| 101 ++
 .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c  |   2 +-
 .../gcc.target/i386/avx512bf16-vdpbf16ps-2.c  |   2 +-
 .../i386/avx512bf16vl-cvtness2sbh-1.c |   2 +-
 .../i386/avx512bf16vl-vcvtneps2bf16-1.c   |  12 +--
 11 files changed, 189 insertions(+), 126 deletions(-)

diff --git a/gcc/config/i386/avx512bf16intrin.h 
b/gcc/config/i386/avx512bf16intrin.h
index b6e9ddad157..ea1d0125b3f 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -35,16 +35,16 @@
 #endif /* __AVX512BF16__ */
 
 /* Internal data types for implementing the intrinsics.  */
-typedef short __v32bh __attribute__ ((__vector_size__ (64)));
+typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64)));
 
 /* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components.  */
-typedef short __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
+typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__));
 
 /* Convert One BF16 Data to One Single Float Data.  */
 extern __inline float
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_cvtsbh_ss (__bfloat16 __A)
+_mm_cvtsbh_ss (__bf16 __A)
 {
   union{ float a; unsigned int b;} __tmp;
   __tmp.b = ((unsigned int)(__A)) << 16;
@@ -57,21 +57,21 @@ extern __inline __m512bh
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_cvtne2ps_pbh (__m512 __A, __m512 __B)
 {
-  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi(__A, __B);
+  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf(__A, __B);
 }
 
 extern __inline __m512bh
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_cvtne2ps_pbh (__m512bh __A, __mmask32 __B, __m512 __C, __m512 __D)
 {
-  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_mask(__C, __D, __A, __B);
+  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_mask(__C, __D, __A, __B);
 }
 
 extern __inline __m512bh
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_maskz_cvtne2ps_pbh (__mmask32 __A, __m512 __B, __m512 __C)
 {
-  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_maskz(__B, __C, __A);
+  return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_maskz(__B, __C, __A);
 }
 
 /* vcvtneps2bf16 */
diff --git a/gcc/config/i386/avx512bf16vlintrin.h 
b/gcc/config/i386/avx512bf16vlintrin.h
index 969335ff358..56c28f14cf6 100644
--- a/gcc/config/i386/avx512bf16vlintrin.h
+++ b/gcc/config/i386/avx512bf16vlintrin.h
@@ -35,57 +35,58 @@
 #endif /* __AVX512BF16__ */
 
 /* Internal data types for 

RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-24 Thread Kong, Lingling via Gcc-patches
> From: Gcc-patches 
> On Behalf Of Hongtao Liu via Gcc-patches
> Sent: Monday, October 17, 2022 1:47 PM
> To: Jiang, Haochen 
> Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT
>
> On Fri, Oct 14, 2022 at 3:58 PM Haochen Jiang via Gcc-patches
>  wrote:
> >
> > From: Kong Lingling 
> > +(define_insn "vbcstne2ps_"
> > +  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > +(vec_duplicate:VF1_128_256
> > + (unspec:SF
> > +  [(match_operand:HI 1 "memory_operand" "m")]
> > +  VBCSTNE)))]
> > +  "TARGET_AVXNECONVERT"
> > +  "vbcstne2ps\t{%1, %0|%0, %1}"
> > +  [(set_attr "prefix" "vex")
> > +  (set_attr "mode" "")])
> Since jakub has support bf16 software emulation, can we rewrite it
> with general rtl ir without unspec?
> Like (float_extend:SF (match_operand:BF "memory_operand" "m")
> > +
> > +(define_int_iterator VCVTNEBF16
> > +  [UNSPEC_VCVTNEEBF16SF
> > +   UNSPEC_VCVTNEOBF16SF])
> > +
> > +(define_int_attr vcvtnebf16type
> > +  [(UNSPEC_VCVTNEEBF16SF "ebf16")
> > +   (UNSPEC_VCVTNEOBF16SF "obf16")])
> > +(define_insn "vcvtne2ps_"
> > +  [(set (match_operand:VF1_128_256 0 "register_operand" "=x")
> > +(unspec:VF1_128_256
> > +  [(match_operand: 1 "memory_operand" "m")]
> > + VCVTNEBF16))]
> > +  "TARGET_AVXNECONVERT"
> > +  "vcvtne2ps\t{%1, %0|%0, %1}"
> > +  [(set_attr "prefix" "vex")
> > +   (set_attr "mode" "")])
> Similar for this one and all those patterns below.

That's great! Thanks for the review! 
Now rewrite it without unspec and use float_extend for new define_insn.

Thanks
Lingling




0001-Support-Intel-AVX-NE-CONVERT.patch
Description: 0001-Support-Intel-AVX-NE-CONVERT.patch


RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-20 Thread Kong, Lingling via Gcc-patches
Thanks a lot, pushed to trunk.

> Hi Richard,
> 
> Thanks again for your reviewing.
> 
> > Yes, use else if for the bitwise induction.  Can you also make the new
> > case conditional on 'def'
> > (the compute_overall_effect_of_inner_loop) being chrec_dont_know?  If
> > that call produced something useful it will not be of either of the two 
> > special
> forms.
> > Thus like
> >
> >   if (def != chrec_dont_know)
> > /* Already OK.  */
> > ;
> >  else if ((bitinv_def = ...)
> > ..
> >  else if (tree_fits_uhwi_p (niter)
> >  ... bitwise induction case...)
> > ...
> >
> Yes, I fixed it in new patch. Thanks.
> Ok for master ?
> 
> Thanks,
> Lingling
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, September 14, 2022 4:16 PM
> > To: Kong, Lingling 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] Enhance final_value_replacement_loop to handle
> > bitop with an invariant induction.[PR105735]
> >
> > On Tue, Sep 13, 2022 at 9:54 AM Kong, Lingling
> > 
> > wrote:
> > >
> > > Hi Richard,
> > >
> > > Thanks you so much for reviewing this patch.  I really appreciate
> > > it. For these
> > review comments, I have made some changes.
> > >
> > > > That's a single-stmt match, you shouldn't use match.pd matching for 
> > > > this.
> > > > Instead just do
> > > >
> > > >   if (is_gimple_assign (stmt)
> > > >   && ((code = gimple_assign_rhs_code (stmt)), true)
> > > >   && (code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code ==
> > > > BIT_XOR_EXPR))
> > >
> > > Yes, I fixed it and dropped modification for match.pd.
> > >
> > > > and pick gimple_assign_rhs{1,2} (stmt) as the operands.  The :c in
> > > > bit_op:c is redundant btw. - while the name suggests "with
> > > > invariant" you don't actually check for that.  But again, given
> > > > canonicalization rules the invariant will be rhs2 so above add
> > > >
> > > > && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST
> > >
> > > For " with invariant", this needed op1 is invariant, and I used
> > `expr_invariant_in_loop_p (loop, match_op[0])` for check.
> > > And op2 just be PHI is ok. If op2 is INTEGER_CST, existing gcc can
> > > be directly
> > optimized and do not need modification.
> > >
> > > > you probably need dg-require-effective-target longlong, but is it
> > > > necessary to use long long for the testcases in the first place?
> > > > The IV seems to be unused, if it should match the variables bit
> > > > size use sizeof
> > > > (type) * 8
> > >
> > > Yes, It is not necessary to use long long for the testcases. I
> > > changed type to
> > unsigned int.
> > >
> > > > > +  inv = PHI_ARG_DEF_FROM_EDGE (header_phi, loop_preheader_edge
> > > > > + (loop));  return fold_build2 (code1, type, inv, match_op[0]);
> > > > > + }
> > > >
> > > > The } goes to the next line.
> > >
> > > Sorry, It might be something wrong with my use of gcc send-email format.
> > >
> > > > > +  tree bitinv_def;
> > > > > +  if ((bitinv_def
> > > >
> > > > please use else if here
> > >
> > > Sorry, If use the else if here, there is no corresponding above if.
> > > I'm not sure if
> > you mean change bitwise induction expression if to else if.
> >
> > Yes, use else if for the bitwise induction.  Can you also make the new
> > case conditional on 'def'
> > (the compute_overall_effect_of_inner_loop) being chrec_dont_know?  If
> > that call produced something useful it will not be of either of the two 
> > special
> forms.
> > Thus like
> >
> >   if (def != chrec_dont_know)
> > /* Already OK.  */
> > ;
> >  else if ((bitinv_def = ...)
> > ..
> >  else if (tree_fits_uhwi_p (niter)
> >  ... bitwise induction case...)
> > ...
> >
> > ?
> >
> > Otherwise looks OK now.
> >
> > Thanks,
> > Richard.
> >
> > > Do you agree with these changes?  Thanks again for taking a look.
> > >
> > > Thanks,
> > > Lingling
> > >
> > > > -Original Mess

RE: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-16 Thread Kong, Lingling via Gcc-patches
Hi,
 
> >   machine_mode hvmode = (mode == V16HImode ? V8HImode
> >  : mode == V16HFmode ? V8HFmode
> > +: mode == V16BFmode ? V8BFmode
> Can it be written as switch case?
Sure, I fixed it in new patch. Thanks again for take a look.
OK for master ?

Thanks,
Lingling

> -Original Message-
> From: Hongtao Liu 
> Sent: Thursday, September 15, 2022 11:46 AM
> To: Kong, Lingling 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]
> 
> On Thu, Sep 15, 2022 at 11:36 AM Kong, Lingling via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi
> >
> > The patch is to fix vec_init_dup_v16bf, add correct handle for v16bf mode in
> ix86_expand_vector_init_duplicate.
> > Add testcase with sse2 without avx2.
> >
> > OK for master?
> >
> > gcc/ChangeLog:
> >
> > PR target/106887
> > * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
> > Fixed V16BF mode case.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/106887
> > * gcc.target/i386/vect-bfloat16-2c.c: New test.
> > ---
> >  gcc/config/i386/i386-expand.cc|  1 +
> >  .../gcc.target/i386/vect-bfloat16-2c.c| 76 +++
> >  2 files changed, 77 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> >
> > diff --git a/gcc/config/i386/i386-expand.cc
> > b/gcc/config/i386/i386-expand.cc index d7b49c99dc8..9451c561489 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -15111,6 +15111,7 @@ ix86_expand_vector_init_duplicate (bool
> mmx_ok, machine_mode mode,
> > {
> >   machine_mode hvmode = (mode == V16HImode ? V8HImode
> >  : mode == V16HFmode ? V8HFmode
> > +: mode == V16BFmode ? V8BFmode
> Can it be written as switch case?
> >  : V16QImode);
> >   rtx x = gen_reg_rtx (hvmode);
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > new file mode 100644
> > index 000..bead94e46a1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > @@ -0,0 +1,76 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mf16c -msse2 -mno-avx2 -O2" } */
> > +
> > +typedef __bf16 v8bf __attribute__ ((__vector_size__ (16))); typedef
> > +__bf16 v16bf __attribute__ ((__vector_size__ (32)));
> > +
> > +#define VEC_EXTRACT(V,S,IDX)   \
> > +  S\
> > +  __attribute__((noipa))   \
> > +  vec_extract_##V##_##IDX (V v)\
> > +  {\
> > +return v[IDX]; \
> > +  }
> > +
> > +#define VEC_SET(V,S,IDX)   \
> > +  V\
> > +  __attribute__((noipa))   \
> > +  vec_set_##V##_##IDX (V v, S s)   \
> > +  {\
> > +v[IDX] = s;\
> > +return v;  \
> > +  }
> > +
> > +v8bf
> > +vec_init_v8bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> > +  __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8) {
> > +return __extension__ (v8bf) {a1, a2, a3, a4, a5, a6, a7, a8}; }
> > +
> > +v16bf
> > +vec_init_v16bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> > +  __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8,
> > +  __bf16 a9,  __bf16 a10, __bf16 a11, __bf16 a12,
> > +  __bf16 a13,  __bf16 a14, __bf16 a15, __bf16 a16) {
> > +return __extension__ (v16bf) {a1, a2, a3, a4, a5, a6, a7, a8,
> > + a9, a10, a11, a12, a13, a14, a15,
> > +a16}; }
> > +
> > +v8bf
> > +vec_init_dup_v8bf (__bf16 a1)
> > +{
> > +return __extension__ (v8bf) {a1, a1, a1, a1, a1, a1, a1, a1}; }
> > +
> > +v16bf
> > +vec_init_dup_v16bf (__bf16 a1)
> > +{
> > +return __extension__ (v16bf) {a1, a1, a1, a1, a1, a1, a1, a1,
> > + a1, a1, a1, a1, a1, a1, a1, a1}; }
> > +
> > +/* { dg-final { scan-assembler-times "vpunpcklwd"

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-15 Thread Kong, Lingling via Gcc-patches
Hi Richard,

Thanks again for your reviewing.

> Yes, use else if for the bitwise induction.  Can you also make the new case
> conditional on 'def'
> (the compute_overall_effect_of_inner_loop) being chrec_dont_know?  If that
> call produced something useful it will not be of either of the two special 
> forms.
> Thus like
> 
>   if (def != chrec_dont_know)
> /* Already OK.  */
> ;
>  else if ((bitinv_def = ...)
> ..
>  else if (tree_fits_uhwi_p (niter)
>  ... bitwise induction case...)
> ...
>
Yes, I fixed it in new patch. Thanks.
Ok for master ?

Thanks,
Lingling

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, September 14, 2022 4:16 PM
> To: Kong, Lingling 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] Enhance final_value_replacement_loop to handle bitop
> with an invariant induction.[PR105735]
> 
> On Tue, Sep 13, 2022 at 9:54 AM Kong, Lingling 
> wrote:
> >
> > Hi Richard,
> >
> > Thanks you so much for reviewing this patch.  I really appreciate it. For 
> > these
> review comments, I have made some changes.
> >
> > > That's a single-stmt match, you shouldn't use match.pd matching for this.
> > > Instead just do
> > >
> > >   if (is_gimple_assign (stmt)
> > >   && ((code = gimple_assign_rhs_code (stmt)), true)
> > >   && (code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code ==
> > > BIT_XOR_EXPR))
> >
> > Yes, I fixed it and dropped modification for match.pd.
> >
> > > and pick gimple_assign_rhs{1,2} (stmt) as the operands.  The :c in
> > > bit_op:c is redundant btw. - while the name suggests "with
> > > invariant" you don't actually check for that.  But again, given
> > > canonicalization rules the invariant will be rhs2 so above add
> > >
> > > && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST
> >
> > For " with invariant", this needed op1 is invariant, and I used
> `expr_invariant_in_loop_p (loop, match_op[0])` for check.
> > And op2 just be PHI is ok. If op2 is INTEGER_CST, existing gcc can be 
> > directly
> optimized and do not need modification.
> >
> > > you probably need dg-require-effective-target longlong, but is it
> > > necessary to use long long for the testcases in the first place?
> > > The IV seems to be unused, if it should match the variables bit size
> > > use sizeof
> > > (type) * 8
> >
> > Yes, It is not necessary to use long long for the testcases. I changed type 
> > to
> unsigned int.
> >
> > > > +  inv = PHI_ARG_DEF_FROM_EDGE (header_phi, loop_preheader_edge
> > > > + (loop));  return fold_build2 (code1, type, inv, match_op[0]); }
> > >
> > > The } goes to the next line.
> >
> > Sorry, It might be something wrong with my use of gcc send-email format.
> >
> > > > +  tree bitinv_def;
> > > > +  if ((bitinv_def
> > >
> > > please use else if here
> >
> > Sorry, If use the else if here, there is no corresponding above if. I'm not 
> > sure if
> you mean change bitwise induction expression if to else if.
> 
> Yes, use else if for the bitwise induction.  Can you also make the new case
> conditional on 'def'
> (the compute_overall_effect_of_inner_loop) being chrec_dont_know?  If that
> call produced something useful it will not be of either of the two special 
> forms.
> Thus like
> 
>   if (def != chrec_dont_know)
> /* Already OK.  */
> ;
>  else if ((bitinv_def = ...)
> ..
>  else if (tree_fits_uhwi_p (niter)
>      ... bitwise induction case...)
> ...
> 
> ?
> 
> Otherwise looks OK now.
> 
> Thanks,
> Richard.
> 
> > Do you agree with these changes?  Thanks again for taking a look.
> >
> > Thanks,
> > Lingling
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 23, 2022 3:27 PM
> > > To: Kong, Lingling 
> > > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH] Enhance final_value_replacement_loop to handle
> > > bitop with an invariant induction.[PR105735]
> > >
> > > On Thu, Aug 18, 2022 at 8:48 AM Kong, Lingling via Gcc-patches  > > patc...@gcc.gnu.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > This patch is for pr105735/pr101991. It will enable below optimization:
> > > > {
> > > > -  long unsigned int bit;
&

[PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-14 Thread Kong, Lingling via Gcc-patches
Hi

The patch is to fix vec_init_dup_v16bf, add correct handle for v16bf mode in 
ix86_expand_vector_init_duplicate.
Add testcase with sse2 without avx2.

OK for master? 

gcc/ChangeLog:

PR target/106887
* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Fixed V16BF mode case.

gcc/testsuite/ChangeLog:

PR target/106887
* gcc.target/i386/vect-bfloat16-2c.c: New test.
---
 gcc/config/i386/i386-expand.cc|  1 +
 .../gcc.target/i386/vect-bfloat16-2c.c| 76 +++
 2 files changed, 77 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc 
index d7b49c99dc8..9451c561489 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -15111,6 +15111,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
{
  machine_mode hvmode = (mode == V16HImode ? V8HImode
 : mode == V16HFmode ? V8HFmode
+: mode == V16BFmode ? V8BFmode
 : V16QImode);
  rtx x = gen_reg_rtx (hvmode);
 
diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c 
b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
new file mode 100644
index 000..bead94e46a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
@@ -0,0 +1,76 @@
+/* { dg-do compile } */
+/* { dg-options "-mf16c -msse2 -mno-avx2 -O2" } */
+
+typedef __bf16 v8bf __attribute__ ((__vector_size__ (16))); typedef 
+__bf16 v16bf __attribute__ ((__vector_size__ (32)));
+
+#define VEC_EXTRACT(V,S,IDX)   \
+  S\
+  __attribute__((noipa))   \
+  vec_extract_##V##_##IDX (V v)\
+  {\
+return v[IDX]; \
+  }
+
+#define VEC_SET(V,S,IDX)   \
+  V\
+  __attribute__((noipa))   \
+  vec_set_##V##_##IDX (V v, S s)   \
+  {\
+v[IDX] = s;\
+return v;  \
+  }
+
+v8bf
+vec_init_v8bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
+  __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8) {
+return __extension__ (v8bf) {a1, a2, a3, a4, a5, a6, a7, a8}; }
+
+v16bf
+vec_init_v16bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
+  __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8,
+  __bf16 a9,  __bf16 a10, __bf16 a11, __bf16 a12,
+  __bf16 a13,  __bf16 a14, __bf16 a15, __bf16 a16) {
+return __extension__ (v16bf) {a1, a2, a3, a4, a5, a6, a7, a8,
+ a9, a10, a11, a12, a13, a14, a15, a16}; }
+
+v8bf
+vec_init_dup_v8bf (__bf16 a1)
+{
+return __extension__ (v8bf) {a1, a1, a1, a1, a1, a1, a1, a1}; }
+
+v16bf
+vec_init_dup_v16bf (__bf16 a1)
+{
+return __extension__ (v16bf) {a1, a1, a1, a1, a1, a1, a1, a1,
+ a1, a1, a1, a1, a1, a1, a1, a1};
+}
+
+/* { dg-final { scan-assembler-times "vpunpcklwd" 12 } } */
+/* { dg-final { scan-assembler-times "vpunpckldq" 6 } } */
+/* { dg-final { scan-assembler-times "vpunpcklqdq" 3 } } */
+
+VEC_EXTRACT (v8bf, __bf16, 0);
+VEC_EXTRACT (v8bf, __bf16, 4);
+VEC_EXTRACT (v16bf, __bf16, 0);
+VEC_EXTRACT (v16bf, __bf16, 3);
+VEC_EXTRACT (v16bf, __bf16, 8);
+VEC_EXTRACT (v16bf, __bf16, 15);
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$8" 1 } } */
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$6" 1 } } */
+/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$14" 1 } } */
+/* { dg-final { scan-assembler-times "vextract" 4 } } */
+
+VEC_SET (v8bf, __bf16, 4);
+VEC_SET (v16bf, __bf16, 3);
+VEC_SET (v16bf, __bf16, 8);
+VEC_SET (v16bf, __bf16, 15);
+/* { dg-final { scan-assembler-times "vpblendw" 3 { target { ! ia32 } } 
+} } */
+
+/* { dg-final { scan-assembler-times "vpinsrw" 30 { target ia32 } } } 
+*/
+
--
2.18.2



RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-13 Thread Kong, Lingling via Gcc-patches
Hi Richard,

Thanks you so much for reviewing this patch.  I really appreciate it. For these 
review comments, I have made some changes.

> That's a single-stmt match, you shouldn't use match.pd matching for this.
> Instead just do
> 
>   if (is_gimple_assign (stmt)
>   && ((code = gimple_assign_rhs_code (stmt)), true)
>   && (code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code ==
> BIT_XOR_EXPR))

Yes, I fixed it and dropped modification for match.pd.

> and pick gimple_assign_rhs{1,2} (stmt) as the operands.  The :c in bit_op:c is
> redundant btw. - while the name suggests "with invariant" you don't actually
> check for that.  But again, given canonicalization rules the invariant will 
> be rhs2
> so above add
> 
> && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST

For " with invariant", this needed op1 is invariant, and I used 
`expr_invariant_in_loop_p (loop, match_op[0])` for check.
And op2 just be PHI is ok. If op2 is INTEGER_CST, existing gcc can be directly 
optimized and do not need modification.

> you probably need dg-require-effective-target longlong, but is it necessary to
> use long long for the testcases in the first place?
> The IV seems to be unused, if it should match the variables bit size use 
> sizeof
> (type) * 8

Yes, It is not necessary to use long long for the testcases. I changed type to 
unsigned int.

> > +  inv = PHI_ARG_DEF_FROM_EDGE (header_phi, loop_preheader_edge
> > + (loop));  return fold_build2 (code1, type, inv, match_op[0]); }
> 
> The } goes to the next line.

Sorry, It might be something wrong with my use of gcc send-email format.

> > +  tree bitinv_def;
> > +  if ((bitinv_def
> 
> please use else if here

Sorry, If use the else if here, there is no corresponding above if. I'm not 
sure if you mean change bitwise induction expression if to else if.

Do you agree with these changes?  Thanks again for taking a look.

Thanks,
Lingling

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 23, 2022 3:27 PM
> To: Kong, Lingling 
> Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Enhance final_value_replacement_loop to handle bitop
> with an invariant induction.[PR105735]
> 
> On Thu, Aug 18, 2022 at 8:48 AM Kong, Lingling via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > This patch is for pr105735/pr101991. It will enable below optimization:
> > {
> > -  long unsigned int bit;
> > -
> > -   [local count: 32534376]:
> > -
> > -   [local count: 1041207449]:
> > -  # tmp_10 = PHI 
> > -  # bit_12 = PHI 
> > -  tmp_7 = bit2_6(D) & tmp_10;
> > -  bit_8 = bit_12 + 1;
> > -  if (bit_8 != 32)
> > -goto ; [96.97%]
> > -  else
> > -goto ; [3.03%]
> > -
> > -   [local count: 1009658865]:
> > -  goto ; [100.00%]
> > -
> > -   [local count: 32534376]:
> > -  # tmp_11 = PHI 
> > -  return tmp_11;
> > +  tmp_11 = tmp_4(D) & bit2_6(D);
> > +  return tmp_11;
> >
> > }
> >
> > Ok for master ?
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/105735
> > * match.pd (bitop_with_inv_p): New match.
> > * tree-scalar-evolution.cc (gimple_bitop_with_inv_p): Declare.
> > (analyze_and_compute_bitop_with_inv_effect): New function.
> > (final_value_replacement_loop): Enhanced to handle bitop
> > with inv induction.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr105735-1.c: New test.
> > * gcc.target/i386/pr105735-2.c: New test.
> > ---
> >  gcc/match.pd   |  4 +
> >  gcc/testsuite/gcc.target/i386/pr105735-1.c | 88 ++
> gcc/testsuite/gcc.target/i386/pr105735-2.c | 28 +++
> >  gcc/tree-scalar-evolution.cc   | 59 +++
> >  4 files changed, 179 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-2.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> > 562138a8034..cfe593ebb02 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -8050,6 +8050,10 @@ and,
> >   (bit_not
> >(nop_convert1? (bit_xor@0 (convert2? (lshift integer_onep@1 @2))
> > @3
> >
> > +(for bit_op (bit_and bit_ior bit_xor)  (match (bitop_with_inv_p @0
> > +@1)
> > +  (bit_op:c @0 @1)))
> > +
> 
> That's a single-stmt match, you shouldn't use match.pd matching for this.
> Instead just do
> 
>   if (is_gimple_a

RE: [PATCH] x86: Handle V8BF in expand_vec_perm_broadcast_1

2022-09-02 Thread Kong, Lingling via Gcc-patches
Hi,

I fixed it in a new patch.  And added BF vector mode in SUBST_V and 
avx512fmaskhalfmode for @vec_interleave_high.
Ok for trunk ?

> > Hi,
> >
> > Handle E_V8BFmode in expand_vec_perm_broadcast_1 and
> ix86_expand_vector_init_duplicate.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/106742
> > * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
> > Handle V8BF mode.
> > (expand_vec_perm_broadcast_1): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr106742.c: New test.
> > ---
> >  gcc/config/i386/i386-expand.cc   | 17 -
> >  gcc/testsuite/gcc.target/i386/pr106742.c | 10 ++
> >  2 files changed, 22 insertions(+), 5 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.target/i386/pr106742.c
> >
> > diff --git a/gcc/config/i386/i386-expand.cc
> > b/gcc/config/i386/i386-expand.cc index 4b216308a18..a08222fe1b6 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -15030,11 +15030,15 @@ ix86_expand_vector_init_duplicate (bool
> mmx_ok, machine_mode mode,
> >   dperm.op0 = dperm.op1 = gen_reg_rtx (mode);
> >   dperm.one_operand_p = true;
> >
> > - if (mode == V8HFmode)
> > + if (mode == V8HFmode || mode == V8BFmode)
> > {
> > - tmp1 = force_reg (HFmode, val);
> > + rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL;
> > + tmp1 = mode == V8HFmode ? force_reg (HFmode, val)
> > + : force_reg (BFmode, val);
> tmp1 = force_reg (GET_MODE_INNER (mode), val);
> >   tmp2 = gen_reg_rtx (mode);
> > - emit_insn (gen_vec_setv8hf_0 (tmp2, CONST0_RTX (mode), tmp1));
> > + gen_vec_set_0 = mode == V8HFmode ? gen_vec_setv8hf_0
> > +  : gen_vec_setv8bf_0;
> add @ to vec_set_0 as (define_insn "@vec_set_0" and pass
> mode to vec_set_0 as emit_insn (gen_vec_set_0 (mode, tmp2, CONST0_RTX
> (mode), tmp1));
> > + emit_insn (gen_vec_set_0 (tmp2, CONST0_RTX (mode),
> > + tmp1));
> 
> >   tmp1 = gen_lowpart (mode, tmp2);
> > }
> >   else
> > @@ -21822,17 +21826,20 @@ expand_vec_perm_broadcast_1 (struct
> expand_vec_perm_d *d)
> >return true;
> >
> >  case E_V8HFmode:
> > +case E_V8BFmode:
> >/* This can be implemented via interleave and pshufd.  */
> >if (d->testing_p)
> > return true;
> >
> >if (elt >= nelt2)
> > {
> > - gen = gen_vec_interleave_highv8hf;
> > + gen = vmode == V8HFmode ? gen_vec_interleave_highv8hf
> > + : gen_vec_interleave_highv8bf;
> Similar, add @ to define_insn and pass gen_vec_interleave.
> >   elt -= nelt2;
> > }
> >else
> > -   gen = gen_vec_interleave_lowv8hf;
> > +   gen = vmode == V8HFmode ? gen_vec_interleave_lowv8hf
> > +   : gen_vec_interleave_lowv8bf;
> >nelt2 /= 2;
> >
> >dest = gen_reg_rtx (vmode);
> > diff --git a/gcc/testsuite/gcc.target/i386/pr106742.c
> > b/gcc/testsuite/gcc.target/i386/pr106742.c
> > new file mode 100644
> > index 000..4a53cd49902
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr106742.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options " -msse2 -mno-avx2 -O1" } */ typedef __bf16 v8bf
> > +__attribute__ ((__vector_size__ (16)));
> > +
> > +v8bf
> > +vec_init_dup_v8bf (__bf16 a1)
> > +{
> > +  return __extension__ (v8bf) { a1, a1, a1, a1, a1, a1, a1, a1 }; }
> > +/* { dg-final { scan-assembler-times "punpcklwd" 1} } */
> > --
> > 2.18.2
> >
> 
> 
> --
> BR,
> Hongtao


0001-x86-Handle-V8BF-in-expand_vec_perm_broadcast_1.patch
Description: 0001-x86-Handle-V8BF-in-expand_vec_perm_broadcast_1.patch


RE: [PATCH] middle-end: Add MULT_EXPR recognition for cond scalar reduction

2022-08-31 Thread Kong, Lingling via Gcc-patches
Hi  Richard,  could you help to have a look for the patch ?

Ok for master ?

> Hi,
> 
> The conditional mult reduction cannot be recognized with current GCC. The
> following loop cannot be vectorized.
> Now add MULT_EXPR recognition for conditional scalar reduction.
> 
> float summa(int n, float *arg1, float *arg2)
> {
> int i;
> float res1 = 1.0;
> for(i = 0; i < n; i++) {
>   if(arg2[i])
> res1 *= arg1[i];
> }
> return res1;
> }
> 
> gcc/ChangeLog:
> 
>   * tree-if-conv.cc (is_cond_scalar_reduction): Add MULT_EXPR
>   recognition.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/gen-vect-34.c: New test.
>   * gcc.dg/vect/vect-ifcvt-18.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c | 16 +
>  gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c   | 38 +
>  gcc/tree-if-conv.cc |  1 +
>  3 files changed, 55 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
> new file mode 100644
> index 000..8d2d36401fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -fdump-tree-vect-details" } */
> +/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } }
> +} */
> +
> +float summul(int n, float *arg1, float *arg2)
> +{
> +int i;
> +float res1 = 1.0;
> +for(i = 0; i < n; i++) {
> +  if(arg2[i])
> +res1 *= arg1[i];
> +}
> +return res1;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {
> +target { ! { avr-*-* pru-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
> b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
> new file mode 100644
> index 000..c1d3c27d819
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
> @@ -0,0 +1,38 @@
> +/* { dg-require-effective-target vect_condition } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-additional-options "-Ofast -mavx" { target avx_runtime } } */
> +
> +
> +int A0[4] = {36,39,42,45};
> +int B0[4] = {42,42,0,42};
> +float A1[8] = {36,39,42,45,43,32,21,12}; float B1[8] =
> +{42,42,0,42,42,42,0,42}; double A2[16] =
> +{36,39,42,45,43,32,21,12,23,34,45,56,42,78,89,11};
> +double B2[16] = {42,42,0,42,42,42,42,42,42,42,42,42,0,42,42,42};
> +
> +int main ()
> +{
> +  int i, j;
> +  int res0 = 1;
> +  float res1 = 1.0;
> +  double res2 = 1.0;
> +
> +  for (i = 0; i < 4; i++)
> +if (B0[i])
> +  res0 *= A0[i];
> +
> +  for (i = 0; i < 8; i++)
> +if (B1[i])
> +  res1 *= A1[i];
> +
> +  for (i = 0; i < 16; i++)
> +if (B2[i])
> +  res2 *= A2[i];
> +  /* check results:  */
> +  if (res0 != 63180 || res1 != 1043228160.00
> +  ||res2 != 3296728515318523101184.00)
> +  __builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 3 loops" "vect" { target
> +i?86-*-* x86_64-*-* } } } */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index
> 1c8e1a45234..bac29fb5574 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1739,6 +1739,7 @@ is_cond_scalar_reduction (gimple *phi, gimple
> **reduc, tree arg_0, tree arg_1,
> 
>if (reduction_op != PLUS_EXPR
>&& reduction_op != MINUS_EXPR
> +  && reduction_op != MULT_EXPR
>&& reduction_op != BIT_IOR_EXPR
>&& reduction_op != BIT_XOR_EXPR
>&& reduction_op != BIT_AND_EXPR)
> --
> 2.18.2



[PATCH] x86: Handle V8BF in expand_vec_perm_broadcast_1

2022-08-31 Thread Kong, Lingling via Gcc-patches
Hi,

Handle E_V8BFmode in expand_vec_perm_broadcast_1 and 
ix86_expand_vector_init_duplicate.
Ok for trunk?

gcc/ChangeLog:

PR target/106742
* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Handle V8BF mode.
(expand_vec_perm_broadcast_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr106742.c: New test.
---
 gcc/config/i386/i386-expand.cc   | 17 -
 gcc/testsuite/gcc.target/i386/pr106742.c | 10 ++
 2 files changed, 22 insertions(+), 5 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/pr106742.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc 
index 4b216308a18..a08222fe1b6 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -15030,11 +15030,15 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
  dperm.op0 = dperm.op1 = gen_reg_rtx (mode);
  dperm.one_operand_p = true;
 
- if (mode == V8HFmode)
+ if (mode == V8HFmode || mode == V8BFmode)
{
- tmp1 = force_reg (HFmode, val);
+ rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL;
+ tmp1 = mode == V8HFmode ? force_reg (HFmode, val)
+ : force_reg (BFmode, val);
  tmp2 = gen_reg_rtx (mode);
- emit_insn (gen_vec_setv8hf_0 (tmp2, CONST0_RTX (mode), tmp1));
+ gen_vec_set_0 = mode == V8HFmode ? gen_vec_setv8hf_0
+  : gen_vec_setv8bf_0;
+ emit_insn (gen_vec_set_0 (tmp2, CONST0_RTX (mode), tmp1));
  tmp1 = gen_lowpart (mode, tmp2);
}
  else
@@ -21822,17 +21826,20 @@ expand_vec_perm_broadcast_1 (struct expand_vec_perm_d 
*d)
   return true;
 
 case E_V8HFmode:
+case E_V8BFmode:
   /* This can be implemented via interleave and pshufd.  */
   if (d->testing_p)
return true;
 
   if (elt >= nelt2)
{
- gen = gen_vec_interleave_highv8hf;
+ gen = vmode == V8HFmode ? gen_vec_interleave_highv8hf
+ : gen_vec_interleave_highv8bf;
  elt -= nelt2;
}
   else
-   gen = gen_vec_interleave_lowv8hf;
+   gen = vmode == V8HFmode ? gen_vec_interleave_lowv8hf
+   : gen_vec_interleave_lowv8bf;
   nelt2 /= 2;
 
   dest = gen_reg_rtx (vmode);
diff --git a/gcc/testsuite/gcc.target/i386/pr106742.c 
b/gcc/testsuite/gcc.target/i386/pr106742.c
new file mode 100644
index 000..4a53cd49902
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106742.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options " -msse2 -mno-avx2 -O1" } */
+typedef __bf16 v8bf __attribute__ ((__vector_size__ (16)));
+
+v8bf
+vec_init_dup_v8bf (__bf16 a1)
+{
+  return __extension__ (v8bf) { a1, a1, a1, a1, a1, a1, a1, a1 }; }
+/* { dg-final { scan-assembler-times "punpcklwd" 1} } */
--
2.18.2



[PATCH] middle-end: Add MULT_EXPR recognition for cond scalar reduction

2022-08-25 Thread Kong, Lingling via Gcc-patches
Hi,

The conditional mult reduction cannot be recognized with current GCC. The 
following loop cannot be vectorized.
Now add MULT_EXPR recognition for conditional scalar reduction.

float summa(int n, float *arg1, float *arg2)
{  
int i; 
float res1 = 1.0;
for(i = 0; i < n; i++) {
  if(arg2[i]) 
res1 *= arg1[i];
}  
return res1;   
}

gcc/ChangeLog:

* tree-if-conv.cc (is_cond_scalar_reduction): Add MULT_EXPR
recognition.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/gen-vect-34.c: New test.
* gcc.dg/vect/vect-ifcvt-18.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c | 16 +
 gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c   | 38 +
 gcc/tree-if-conv.cc |  1 +
 3 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
new file mode 100644
index 000..8d2d36401fe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-vect-details" } */
+/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } 
+} */
+
+float summul(int n, float *arg1, float *arg2)
+{  
+int i; 
+float res1 = 1.0;
+for(i = 0; i < n; i++) {
+  if(arg2[i]) 
+res1 *= arg1[i];
+}  
+return res1;   
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
+target { ! { avr-*-* pru-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c 
b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
new file mode 100644
index 000..c1d3c27d819
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-18.c
@@ -0,0 +1,38 @@
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "-Ofast -mavx" { target avx_runtime } } */
+
+
+int A0[4] = {36,39,42,45};
+int B0[4] = {42,42,0,42};
+float A1[8] = {36,39,42,45,43,32,21,12}; float B1[8] = 
+{42,42,0,42,42,42,0,42}; double A2[16] = 
+{36,39,42,45,43,32,21,12,23,34,45,56,42,78,89,11};
+double B2[16] = {42,42,0,42,42,42,42,42,42,42,42,42,0,42,42,42};
+
+int main ()
+{
+  int i, j;
+  int res0 = 1;
+  float res1 = 1.0;
+  double res2 = 1.0;
+
+  for (i = 0; i < 4; i++)
+if (B0[i])
+  res0 *= A0[i];
+
+  for (i = 0; i < 8; i++)
+if (B1[i])
+  res1 *= A1[i];
+  
+  for (i = 0; i < 16; i++)
+if (B2[i])
+  res2 *= A2[i];
+  /* check results:  */
+  if (res0 != 63180 || res1 != 1043228160.00
+  ||res2 != 3296728515318523101184.00)
+  __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 3 loops" "vect" { target 
+i?86-*-* x86_64-*-* } } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 
1c8e1a45234..bac29fb5574 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1739,6 +1739,7 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc, 
tree arg_0, tree arg_1,
 
   if (reduction_op != PLUS_EXPR
   && reduction_op != MINUS_EXPR
+  && reduction_op != MULT_EXPR
   && reduction_op != BIT_IOR_EXPR
   && reduction_op != BIT_XOR_EXPR
   && reduction_op != BIT_AND_EXPR)
--
2.18.2



RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-08-22 Thread Kong, Lingling via Gcc-patches
Hi  Richard,  could you help to have a look for the patch ?
 
> Hi,
> 
> This patch is for pr105735/pr101991. It will enable below optimization:
> {
> -  long unsigned int bit;
> -
> -   [local count: 32534376]:
> -
> -   [local count: 1041207449]:
> -  # tmp_10 = PHI 
> -  # bit_12 = PHI 
> -  tmp_7 = bit2_6(D) & tmp_10;
> -  bit_8 = bit_12 + 1;
> -  if (bit_8 != 32)
> -goto ; [96.97%]
> -  else
> -goto ; [3.03%]
> -
> -   [local count: 1009658865]:
> -  goto ; [100.00%]
> -
> -   [local count: 32534376]:
> -  # tmp_11 = PHI 
> -  return tmp_11;
> +  tmp_11 = tmp_4(D) & bit2_6(D);
> +  return tmp_11;
> 
> }
> 
> Ok for master ?
> 
> gcc/ChangeLog:
> 
>   PR middle-end/105735
>   * match.pd (bitop_with_inv_p): New match.
>   * tree-scalar-evolution.cc (gimple_bitop_with_inv_p): Declare.
>   (analyze_and_compute_bitop_with_inv_effect): New function.
>   (final_value_replacement_loop): Enhanced to handle bitop
>   with inv induction.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr105735-1.c: New test.
>   * gcc.target/i386/pr105735-2.c: New test.
> ---
>  gcc/match.pd   |  4 +
>  gcc/testsuite/gcc.target/i386/pr105735-1.c | 88 ++
> gcc/testsuite/gcc.target/i386/pr105735-2.c | 28 +++
>  gcc/tree-scalar-evolution.cc   | 59 +++
>  4 files changed, 179 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-2.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd index 562138a8034..cfe593ebb02
> 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8050,6 +8050,10 @@ and,
>   (bit_not
>(nop_convert1? (bit_xor@0 (convert2? (lshift integer_onep@1 @2)) @3
> 
> +(for bit_op (bit_and bit_ior bit_xor)
> + (match (bitop_with_inv_p @0 @1)
> +  (bit_op:c @0 @1)))
> +
>  /* n - (((n > C1) ? n : C1) & -C2) ->  n & C1 for unsigned case.
> n - (((n > C1) ? n : C1) & -C2) ->  (n <= C1) ? n : (n & C1) for signed 
> case.  */
> (simplify diff --git a/gcc/testsuite/gcc.target/i386/pr105735-1.c
> b/gcc/testsuite/gcc.target/i386/pr105735-1.c
> new file mode 100644
> index 000..8d2123ed351
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105735-1.c
> @@ -0,0 +1,88 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-sccp-details" } */
> +/* { dg-final { scan-tree-dump-times {final value replacement} 8 "sccp"
> +} } */
> +
> +unsigned long long
> +__attribute__((noipa))
> +foo (unsigned long long tmp, unsigned long long bit2) {
> +  for (int bit = 0; bit < 64; bit++)
> +tmp &= bit2;
> +  return tmp;
> +}
> +
> +unsigned long long
> +__attribute__((noipa))
> +foo1 (unsigned long long tmp, unsigned long long bit2) {
> +  for (int bit = 63; bit >= 0; bit -=3)
> +tmp &= bit2;
> +  return tmp;
> +}
> +
> +unsigned long long
> +__attribute__((noipa))
> +foo2 (unsigned long long tmp, unsigned long long bit2) {
> +  for (int bit = 0; bit < 64; bit++)
> +tmp |= bit2;
> +  return tmp;
> +}
> +
> +unsigned long long
> +__attribute__((noipa))
> +foo3 (unsigned long long tmp, unsigned long long bit2) {
> +  for (int bit = 63; bit >= 0; bit -=3)
> +tmp |= bit2;
> +  return tmp;
> +}
> +
> +unsigned long long
> +__attribute__((noipa))
> +foo4 (unsigned long long tmp, unsigned long long bit2) {
> +  for (int bit = 0; bit < 64; bit++)
> +tmp ^= bit2;
> +  return tmp;
> +}
> +
> +unsigned long long
> +__attribute__((noipa))
> +foo5 (unsigned long long tmp, unsigned long long bit2) {
> +  for (int bit = 0; bit < 63; bit++)
> +tmp ^= bit2;
> +  return tmp;
> +}
> +
> +unsigned long long
> +__attribute__((noipa))
> +f (unsigned long long tmp, long long bit, unsigned long long bit2) {
> +  unsigned long long res = tmp;
> +  for (long long i = 0; i < bit; i++)
> +res &= bit2;
> +  return res;
> +}
> +
> +unsigned long long
> +__attribute__((noipa))
> +f1 (unsigned long long tmp, long long bit, unsigned long long bit2) {
> +  unsigned long long res = tmp;
> +  for (long long i = 0; i < bit; i++)
> +res |= bit2;
> +  return res;
> +}
> +
> +unsigned long long
> +__attribute__((noipa))
> +f2 (unsigned long long tmp, long long bit, unsigned long long bit2) {
> +  unsigned long long res = tmp;
> +  for (long long i = 0; i < bit; i++)
> +res ^= bit2;
> +  return res;
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/i386/pr105735-2.c
> b/gcc/testsuite/gcc.target/i386/pr105735-2.c
> new file mode 100644
> index 000..79c1d300b1b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105735-2.c
> @@ -0,0 +1,28 @@
> +/* { dg-do run } */
> +/* { dg-options "-O1" } */
> +
> +#include "pr105735-1.c"
> +
> +int main()
> +{
> +  unsigned long long tmp = 0x1101101ULL;
> +  unsigned long long bit2 = 0x11100111ULL;
> +  if (foo (tmp, bit2) != 0x1100101ULL)
> +__builtin_abort ();
> +  if (foo1 (tmp, bit2) != 0x1100101ULL)
> +__builtin_abort ();
> +  if (foo2 (tmp, bit2) != 

[wwwdocs] [GCC13] Mention Intel __bf16 support.

2022-08-18 Thread Kong, Lingling via Gcc-patches
Hi

The patch is for mention Intel __bf16 support in gcc13.
Ok for master ?

Thanks,
Lingling

htdocs/gcc-13/changes.html | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 
57bd8724..7d98329c 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -122,7 +122,12 @@ a work-in-progress.
 
 
 
-
+IA-32/x86-64
+
+  For both C and C++ the __bf16 type is supported on
+  x86 systems with SSE2 and above enabled.
+  
+
 
 
 
--
2.18.2



[PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-08-18 Thread Kong, Lingling via Gcc-patches
Hi,

This patch is for pr105735/pr101991. It will enable below optimization:
{
-  long unsigned int bit;
-
-   [local count: 32534376]:
-
-   [local count: 1041207449]:
-  # tmp_10 = PHI 
-  # bit_12 = PHI 
-  tmp_7 = bit2_6(D) & tmp_10;
-  bit_8 = bit_12 + 1;
-  if (bit_8 != 32)
-goto ; [96.97%]
-  else
-goto ; [3.03%]
-
-   [local count: 1009658865]:
-  goto ; [100.00%]
-
-   [local count: 32534376]:
-  # tmp_11 = PHI 
-  return tmp_11;
+  tmp_11 = tmp_4(D) & bit2_6(D);
+  return tmp_11;

}

Ok for master ?

gcc/ChangeLog:

PR middle-end/105735
* match.pd (bitop_with_inv_p): New match.
* tree-scalar-evolution.cc (gimple_bitop_with_inv_p): Declare.
(analyze_and_compute_bitop_with_inv_effect): New function.
(final_value_replacement_loop): Enhanced to handle bitop
with inv induction.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr105735-1.c: New test.
* gcc.target/i386/pr105735-2.c: New test.
---
 gcc/match.pd   |  4 +
 gcc/testsuite/gcc.target/i386/pr105735-1.c | 88 ++  
gcc/testsuite/gcc.target/i386/pr105735-2.c | 28 +++
 gcc/tree-scalar-evolution.cc   | 59 +++
 4 files changed, 179 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-2.c

diff --git a/gcc/match.pd b/gcc/match.pd index 562138a8034..cfe593ebb02 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8050,6 +8050,10 @@ and,
  (bit_not
   (nop_convert1? (bit_xor@0 (convert2? (lshift integer_onep@1 @2)) @3
 
+(for bit_op (bit_and bit_ior bit_xor)
+ (match (bitop_with_inv_p @0 @1)
+  (bit_op:c @0 @1)))
+
 /* n - (((n > C1) ? n : C1) & -C2) ->  n & C1 for unsigned case.
n - (((n > C1) ? n : C1) & -C2) ->  (n <= C1) ? n : (n & C1) for signed 
case.  */  (simplify diff --git a/gcc/testsuite/gcc.target/i386/pr105735-1.c 
b/gcc/testsuite/gcc.target/i386/pr105735-1.c
new file mode 100644
index 000..8d2123ed351
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105735-1.c
@@ -0,0 +1,88 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-sccp-details" } */
+/* { dg-final { scan-tree-dump-times {final value replacement} 8 "sccp" 
+} } */
+
+unsigned long long
+__attribute__((noipa))
+foo (unsigned long long tmp, unsigned long long bit2) {
+  for (int bit = 0; bit < 64; bit++)
+tmp &= bit2;
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo1 (unsigned long long tmp, unsigned long long bit2) {
+  for (int bit = 63; bit >= 0; bit -=3)
+tmp &= bit2;
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo2 (unsigned long long tmp, unsigned long long bit2) {
+  for (int bit = 0; bit < 64; bit++)
+tmp |= bit2;
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo3 (unsigned long long tmp, unsigned long long bit2) {
+  for (int bit = 63; bit >= 0; bit -=3)
+tmp |= bit2;
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo4 (unsigned long long tmp, unsigned long long bit2) {
+  for (int bit = 0; bit < 64; bit++)
+tmp ^= bit2;
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo5 (unsigned long long tmp, unsigned long long bit2) {
+  for (int bit = 0; bit < 63; bit++)
+tmp ^= bit2;
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+f (unsigned long long tmp, long long bit, unsigned long long bit2) {
+  unsigned long long res = tmp;
+  for (long long i = 0; i < bit; i++)
+res &= bit2;
+  return res;
+}
+
+unsigned long long
+__attribute__((noipa))
+f1 (unsigned long long tmp, long long bit, unsigned long long bit2) {
+  unsigned long long res = tmp;
+  for (long long i = 0; i < bit; i++)
+res |= bit2;
+  return res;
+}
+
+unsigned long long
+__attribute__((noipa))
+f2 (unsigned long long tmp, long long bit, unsigned long long bit2) {
+  unsigned long long res = tmp;
+  for (long long i = 0; i < bit; i++)
+res ^= bit2;
+  return res;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/pr105735-2.c 
b/gcc/testsuite/gcc.target/i386/pr105735-2.c
new file mode 100644
index 000..79c1d300b1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105735-2.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+#include "pr105735-1.c"
+
+int main()
+{
+  unsigned long long tmp = 0x1101101ULL;
+  unsigned long long bit2 = 0x11100111ULL;
+  if (foo (tmp, bit2) != 0x1100101ULL)
+__builtin_abort ();
+  if (foo1 (tmp, bit2) != 0x1100101ULL)
+__builtin_abort ();
+  if (foo2 (tmp, bit2) != 0x1110ULL)
+__builtin_abort ();
+  if (foo3 (tmp, bit2) != 0x1110ULL)
+__builtin_abort ();
+  if (foo4 (tmp, bit2) != 0x1101101ULL)
+__builtin_abort ();
+  if (foo5 (tmp, bit2) != 0x111010011010ULL)
+__builtin_abort ();
+  if (f (tmp, 64, bit2) != 0x1100101ULL)
+__builtin_abort ();
+  if (f1 (tmp, 64, bit2) != 0x1110ULL)
+

[PATCH] x86: Support vector __bf16 type.

2022-08-16 Thread Kong, Lingling via Gcc-patches
Hi,

The patch is support vector init/broadcast/set/extract for __bf16 type.
The __bf16 type is a storage type.

OK for master?

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle vector
BFmode.
(ix86_expand_vector_init_duplicate): Support vector BFmode.
(ix86_expand_vector_init_one_nonzero): Ditto.
(ix86_expand_vector_init_one_var): Ditto.
(ix86_expand_vector_init_concat): Ditto.
(ix86_expand_vector_init_interleave): Ditto.
(ix86_expand_vector_init_general): Ditto.
(ix86_expand_vector_init): Ditto.
(ix86_expand_vector_set_var): Ditto.
(ix86_expand_vector_set): Ditto.
(ix86_expand_vector_extract): Ditto.
* config/i386/i386.cc (classify_argument): Add BF vector modes.
(function_arg_64): Ditto.
(ix86_gimplify_va_arg): Ditto.
(ix86_get_ssemov): Ditto.
* config/i386/i386.h (VALID_AVX256_REG_MODE): Add BF vector modes.
(VALID_AVX512F_REG_MODE): Ditto.
(host_detect_local_cpu): Ditto.
(VALID_SSE2_REG_MODE): Ditto.
* config/i386/i386.md: Add BF vector modes.
(MODE_SIZE): Ditto.
(ssemodesuffix): Add bf suffix for BF vector modes.
(ssevecmode): Ditto.
* config/i386/sse.md (VMOVE): Adjust for BF vector modes.
(VI12HFBF_AVX512VL): Ditto.
(V_256_512): Ditto.
(VF_AVX512HFBF16): Ditto.
(VF_AVX512BWHFBF16): Ditto.
(VIHFBF): Ditto.
(avx512): Ditto.
(VIHFBF_256): Ditto.
(VIHFBF_AVX512BW): Ditto.
(VI2F_256_512):Ditto.
(V8_128):Ditto.
(V16_256): Ditto.
(V32_512): Ditto.
(sseinsnmode): Ditto.
(sseconstm1): Ditto.
(sseintmodesuffix): New mode_attr.
(avx512fmaskmode): Ditto.
(avx512fmaskmodelower): Ditto.
(ssedoublevecmode): Ditto.
(ssehalfvecmode): Ditto.
(ssehalfvecmodelower): Ditto.
(ssescalarmode): Add vector BFmode mapping.
(ssescalarmodelower): Ditto.
(ssexmmmode): Ditto.
(ternlogsuffix): Ditto.
(ssescalarsize): Ditto.
(sseintprefix): Ditto.
(i128): Ditto.
(xtg_mode): Ditto.
(bcstscalarsuff): Ditto.
(_blendm): New define_insn for BFmode.
(_store_mask): Ditto.
(vcond_mask_): Ditto.
(vec_set_0): New define_insn for BF vector set.
(V8BFH_128): New mode_iterator for BFmode.
(avx512fp16_mov): Ditto.
(vec_set): New define_insn for BF vector set.
(@vec_extract_hi_): Ditto.
(@vec_extract_lo_): Ditto.
(vec_set_hi_): Ditto.
(vec_set_lo_): Ditto.
(*vec_extract_0): New define_insn_and_split for BF
vector extract.
(*vec_extract): New define_insn.
(VEC_EXTRACT_MODE): Add BF vector modes.
(PINSR_MODE): Add V8BF.
(sse2p4_1): Ditto.
(pinsr_evex_isa): Ditto.
(_pinsr): Adjust to support
insert for V8BFmode.
(pbroadcast_evex_isa): Add BF vector modes.
(AVX2_VEC_DUP_MODE): Ditto.
(VEC_INIT_MODE): Ditto.
(VEC_INIT_HALF_MODE): Ditto.
(avx2_pbroadcast): Adjust to support BF vector mode
broadcast.
(avx2_pbroadcast_1): Ditto.
(_vec_dup_1): Ditto.
(_vec_dup_gpr):
Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/vect-bfloat16-1.C: New test.
* gcc.target/i386/vect-bfloat16-1.c: New test.
* gcc.target/i386/vect-bfloat16-2a.c: New test.
* gcc.target/i386/vect-bfloat16-2b.c: New test.
* gcc.target/i386/vect-bfloat16-typecheck_1.c: New test.
* gcc.target/i386/vect-bfloat16-typecheck_2.c: New test.
---
 gcc/config/i386/i386-expand.cc| 129 +++--
 gcc/config/i386/i386.cc   |  16 +-
 gcc/config/i386/i386.h|  12 +-
 gcc/config/i386/i386.md   |   9 +-
 gcc/config/i386/sse.md| 211 --
 .../g++.target/i386/vect-bfloat16-1.C |  13 +
 .../gcc.target/i386/vect-bfloat16-1.c |  30 ++
 .../gcc.target/i386/vect-bfloat16-2a.c| 121 
 .../gcc.target/i386/vect-bfloat16-2b.c|  22 ++
 .../i386/vect-bfloat16-typecheck_1.c  | 258 ++
 .../i386/vect-bfloat16-typecheck_2.c  | 248 +
 11 files changed, 950 insertions(+), 119 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/vect-bfloat16-1.C
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-typecheck_2.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 

RE: [PATCH] x86: Enable __bf16 type for TARGET_SSE2 and above

2022-08-03 Thread Kong, Lingling via Gcc-patches
Hi,

Old patch has some mistake in `*movbf_internal` , now disable BFmode constant 
double move in `*movbf_internal`.

Thanks,
Lingling

> -Original Message-
> From: Kong, Lingling 
> Sent: Tuesday, July 26, 2022 9:31 AM
> To: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> Cc: Kong, Lingling 
> Subject: [PATCH] x86: Enable __bf16 type for TARGET_SSE2 and above
> 
> Hi,
> 
> The patch is enable __bf16 scalar type for target sse2 and above according to
> psABI(https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/35/diffs).
> The __bf16 type is a storage type like arm.
> 
> OK for master?
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386-builtin-types.def (BFLOAT16): New primitive type.
>   * config/i386/i386-builtins.cc : Support __bf16 type for i386 backend.
>   (ix86_register_bf16_builtin_type): New function.
>   (ix86_bf16_type_node): New.
>   (ix86_bf16_ptr_type_node): Ditto.
>   (ix86_init_builtin_types): Add ix86_register_bf16_builtin_type function
> call.
>   * config/i386/i386-modes.def (FLOAT_MODE): Add BFmode.
>   (ADJUST_FLOAT_FORMAT): Ditto.
>   * config/i386/i386.cc (merge_classes): Handle BFmode.
>   (classify_argument): Ditto.
>   (examine_argument): Ditto.
>   (construct_container): Ditto.
>   (function_value_32): Return __bf16 by %xmm0.
>   (function_value_64): Return __bf16 by SSE register.
>   (ix86_print_operand): Handle CONST_DOUBLE BFmode.
>   (ix86_secondary_reload): Require gpr as intermediate register
>   to store __bf16 from sse register when sse4 is not available.
>   (ix86_scalar_mode_supported_p): Enable __bf16 under sse2.
>   (ix86_mangle_type): Add manlging for __bf16 type.
>   (ix86_invalid_conversion): New function for target hook.
>   (ix86_invalid_unary_op): Ditto.
>   (ix86_invalid_binary_op): Ditto.
>   (TARGET_INVALID_CONVERSION): New define for target hook.
>   (TARGET_INVALID_UNARY_OP): Ditto.
>   (TARGET_INVALID_BINARY_OP): Ditto.
>   * config/i386/i386.h (host_detect_local_cpu): Add BFmode.
>   * config/i386/i386.md (*pushhf_rex64): Change for BFmode.
>   (*push_rex64): Ditto.
>   (*pushhf): Ditto.
>   (*push): Ditto.
>   (*movhf_internal): Ditto.
>   (*mov_internal): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/i386/bfloat_cpp_typecheck.C: New test.
>   * gcc.target/i386/bfloat16-1.c: Ditto.
>   * gcc.target/i386/sse2-bfloat16-1.c: Ditto.
>   * gcc.target/i386/sse2-bfloat16-2.c: Ditto.
>   * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Ditto.
> ---
>  gcc/config/i386/i386-builtin-types.def|   1 +
>  gcc/config/i386/i386-builtins.cc  |  21 ++
>  gcc/config/i386/i386-modes.def|   2 +
>  gcc/config/i386/i386.cc   |  75 +-
>  gcc/config/i386/i386.h|   4 +-
>  gcc/config/i386/i386.md   |  32 +--
>  .../g++.target/i386/bfloat_cpp_typecheck.C|  10 +
>  gcc/testsuite/gcc.target/i386/bfloat16-1.c|  12 +
>  .../gcc.target/i386/sse2-bfloat16-1.c |   8 +
>  .../gcc.target/i386/sse2-bfloat16-2.c |  17 ++
>  .../i386/sse2-bfloat16-scalar-typecheck.c | 215 ++
>  11 files changed, 375 insertions(+), 22 deletions(-)  create mode 100644
> gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/bfloat16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-bfloat16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-bfloat16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-
> typecheck.c
> 
> diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-
> builtin-types.def
> index 7a2da1db0b0..63a360b0f8b 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -69,6 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16,
> short_unsigned_type_node)  DEF_PRIMITIVE_TYPE (INT64,
> long_long_integer_type_node)  DEF_PRIMITIVE_TYPE (UINT64,
> long_long_unsigned_type_node)  DEF_PRIMITIVE_TYPE (FLOAT16,
> ix86_float16_type_node)
> +DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
>  DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)  DEF_PRIMITIVE_TYPE
> (DOUBLE, double_type_node)  DEF_PRIMITIVE_TYPE (FLOAT80,
> float80_type_node) diff --git a/gcc/config/i386/i386-builtins.cc
> b/gcc/config/i386/i386-builtins.cc
> index fe7243c3837..6a04fb57e65 100644
> --- a/gcc/config/i386/i386-builtins.cc
> +++ b/gcc/config/i386/i386-builtins.cc
> @@ -126,6 +126,9 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,  static GTY(()) tree
> ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
> 
>  tree ix86_float16_type_node = NULL_TREE;
> +tree ix86_bf16_type_node = NULL_TREE;
> +tree ix86_bf16_ptr_type_node = NULL_TREE;
> +
>  /* Retrieve an element from the above table, building some of
> the types lazily.  */
> 
> @@ -1366,6 +1369,22 @@ ix86_register_float16_builtin_type 

[PATCH] x86: Enable __bf16 type for TARGET_SSE2 and above

2022-07-25 Thread Kong, Lingling via Gcc-patches
Hi,

The patch is enable __bf16 scalar type for target sse2 and above according to 
psABI(https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/35/diffs).
The __bf16 type is a storage type like arm.

OK for master?

gcc/ChangeLog:

* config/i386/i386-builtin-types.def (BFLOAT16): New primitive type.
* config/i386/i386-builtins.cc : Support __bf16 type for i386 backend.
(ix86_register_bf16_builtin_type): New function.
(ix86_bf16_type_node): New.
(ix86_bf16_ptr_type_node): Ditto.
(ix86_init_builtin_types): Add ix86_register_bf16_builtin_type function 
call.
* config/i386/i386-modes.def (FLOAT_MODE): Add BFmode.
(ADJUST_FLOAT_FORMAT): Ditto.
* config/i386/i386.cc (merge_classes): Handle BFmode.
(classify_argument): Ditto.
(examine_argument): Ditto.
(construct_container): Ditto.
(function_value_32): Return __bf16 by %xmm0.
(function_value_64): Return __bf16 by SSE register.
(ix86_print_operand): Handle CONST_DOUBLE BFmode.
(ix86_secondary_reload): Require gpr as intermediate register
to store __bf16 from sse register when sse4 is not available.
(ix86_scalar_mode_supported_p): Enable __bf16 under sse2.
(ix86_mangle_type): Add manlging for __bf16 type.
(ix86_invalid_conversion): New function for target hook.
(ix86_invalid_unary_op): Ditto.
(ix86_invalid_binary_op): Ditto.
(TARGET_INVALID_CONVERSION): New define for target hook.
(TARGET_INVALID_UNARY_OP): Ditto.
(TARGET_INVALID_BINARY_OP): Ditto.
* config/i386/i386.h (host_detect_local_cpu): Add BFmode.
* config/i386/i386.md (*pushhf_rex64): Change for BFmode.
(*push_rex64): Ditto.
(*pushhf): Ditto.
(*push): Ditto.
(*movhf_internal): Ditto.
(*mov_internal): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/bfloat_cpp_typecheck.C: New test.
* gcc.target/i386/bfloat16-1.c: Ditto.
* gcc.target/i386/sse2-bfloat16-1.c: Ditto.
* gcc.target/i386/sse2-bfloat16-2.c: Ditto.
* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Ditto.
---
 gcc/config/i386/i386-builtin-types.def|   1 +
 gcc/config/i386/i386-builtins.cc  |  21 ++
 gcc/config/i386/i386-modes.def|   2 +
 gcc/config/i386/i386.cc   |  75 +-
 gcc/config/i386/i386.h|   4 +-
 gcc/config/i386/i386.md   |  32 +--
 .../g++.target/i386/bfloat_cpp_typecheck.C|  10 +
 gcc/testsuite/gcc.target/i386/bfloat16-1.c|  12 +
 .../gcc.target/i386/sse2-bfloat16-1.c |   8 +
 .../gcc.target/i386/sse2-bfloat16-2.c |  17 ++
 .../i386/sse2-bfloat16-scalar-typecheck.c | 215 ++
 11 files changed, 375 insertions(+), 22 deletions(-)  create mode 100644 
gcc/testsuite/g++.target/i386/bfloat_cpp_typecheck.C
 create mode 100644 gcc/testsuite/gcc.target/i386/bfloat16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-bfloat16-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-bfloat16-2.c
 create mode 100644 
gcc/testsuite/gcc.target/i386/sse2-bfloat16-scalar-typecheck.c

diff --git a/gcc/config/i386/i386-builtin-types.def 
b/gcc/config/i386/i386-builtin-types.def
index 7a2da1db0b0..63a360b0f8b 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -69,6 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)  
DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)  DEF_PRIMITIVE_TYPE 
(UINT64, long_long_unsigned_type_node)  DEF_PRIMITIVE_TYPE (FLOAT16, 
ix86_float16_type_node)
+DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)  DEF_PRIMITIVE_TYPE (DOUBLE, 
double_type_node)  DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node) diff --git 
a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index fe7243c3837..6a04fb57e65 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -126,6 +126,9 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,  static GTY(()) tree 
ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
 tree ix86_float16_type_node = NULL_TREE;
+tree ix86_bf16_type_node = NULL_TREE;
+tree ix86_bf16_ptr_type_node = NULL_TREE;
+
 /* Retrieve an element from the above table, building some of
the types lazily.  */
 
@@ -1366,6 +1369,22 @@ ix86_register_float16_builtin_type (void)
"_Float16");
 }
 
+static void
+ix86_register_bf16_builtin_type (void)
+{
+  ix86_bf16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (ix86_bf16_type_node) = 16;
+  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
+  layout_type (ix86_bf16_type_node);
+
+  if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
+{
+  lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
+   

[PATCH] i386: Fix _mm_[u]comixx_{ss,sd} codegen and add PF result. [PR106113]

2022-07-14 Thread Kong, Lingling via Gcc-patches
Hi,

The patch is to fix _mm_[u]comixx_{ss,sd} codegen and add PF result.  These 
intrinsics have changed over time, like `_mm_comieq_ss ` old operation is 
`RETURN ( a[31:0] == b[31:0] ) ? 1 : 0`, and new operation update is `RETURN ( 
a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0`.

OK for master?

gcc/ChangeLog:

PR target/106113
* config/i386/i386-builtin.def (BDESC): Fix [u]comi{ss,sd}
comparison due to intrinsics changed over time.
* config/i386/i386-expand.cc (ix86_ssecom_setcc):
Add unordered check and mode for sse comi codegen.
(ix86_expand_sse_comi): Add unordered check and check a different
CCmode.
(ix86_expand_sse_comi_round):Extract unordered check and mode part
in ix86_ssecom_setcc.

gcc/testsuite/ChangeLog:

PR target/106113
* gcc.target/i386/avx-vcomisd-pr106113-2.c: New test.
* gcc.target/i386/avx-vcomiss-pr106113-2.c: Ditto.
* gcc.target/i386/avx-vucomisd-pr106113-2.c: Ditto.
* gcc.target/i386/avx-vucomiss-pr106113-2.c: Ditto.
* gcc.target/i386/sse-comiss-pr106113-1.c: Ditto.
* gcc.target/i386/sse-comiss-pr106113-2.c: Ditto.
* gcc.target/i386/sse-ucomiss-pr106113-1.c: Ditto.
* gcc.target/i386/sse-ucomiss-pr106113-2.c: Ditto.
* gcc.target/i386/sse2-comisd-pr106113-1.c: Ditto.
* gcc.target/i386/sse2-comisd-pr106113-2.c: Ditto.
* gcc.target/i386/sse2-ucomisd-pr106113-1.c: Ditto.
* gcc.target/i386/sse2-ucomisd-pr106113-2.c: Ditto.
---
 gcc/config/i386/i386-builtin.def  |  32 ++--
 gcc/config/i386/i386-expand.cc| 140 +++---
 .../gcc.target/i386/avx-vcomisd-pr106113-2.c  |   8 +
 .../gcc.target/i386/avx-vcomiss-pr106113-2.c  |   8 +
 .../gcc.target/i386/avx-vucomisd-pr106113-2.c |   8 +
 .../gcc.target/i386/avx-vucomiss-pr106113-2.c |   8 +
 .../gcc.target/i386/sse-comiss-pr106113-1.c   |  19 +++
 .../gcc.target/i386/sse-comiss-pr106113-2.c   |  59 
 .../gcc.target/i386/sse-ucomiss-pr106113-1.c  |  19 +++
 .../gcc.target/i386/sse-ucomiss-pr106113-2.c  |  59 
 .../gcc.target/i386/sse2-comisd-pr106113-1.c  |  19 +++
 .../gcc.target/i386/sse2-comisd-pr106113-2.c  |  59 
 .../gcc.target/i386/sse2-ucomisd-pr106113-1.c |  19 +++
 .../gcc.target/i386/sse2-ucomisd-pr106113-2.c |  59 
 14 files changed, 450 insertions(+), 66 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vcomisd-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vcomiss-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vucomisd-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx-vucomiss-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-comiss-pr106113-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-comiss-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-ucomiss-pr106113-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-ucomiss-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-comisd-pr106113-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-comisd-pr106113-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-ucomisd-pr106113-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-ucomisd-pr106113-2.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index fd160935e67..acb7e8ca64b 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -35,30 +35,30 @@
 IX86_BUILTIN__BDESC_##NEXT_KIND##_FIRST - 1.  */
 
 BDESC_FIRST (comi, COMI,
-   OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comieq", 
IX86_BUILTIN_COMIEQSS, UNEQ, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comilt", 
IX86_BUILTIN_COMILTSS, UNLT, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comile", 
IX86_BUILTIN_COMILESS, UNLE, 0)
+   OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comieq", 
IX86_BUILTIN_COMIEQSS, EQ, 0)
+BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comilt", 
IX86_BUILTIN_COMILTSS, LT, 0)
+BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comile", 
IX86_BUILTIN_COMILESS, LE, 0)
 BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comigt", 
IX86_BUILTIN_COMIGTSS, GT, 0)
 BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comige", 
IX86_BUILTIN_COMIGESS, GE, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_comi, "__builtin_ia32_comineq", 
IX86_BUILTIN_COMINEQSS, LTGT, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_ucomi, "__builtin_ia32_ucomieq", 
IX86_BUILTIN_UCOMIEQSS, UNEQ, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_ucomi, "__builtin_ia32_ucomilt", 
IX86_BUILTIN_UCOMILTSS, UNLT, 0)
-BDESC (OPTION_MASK_ISA_SSE, 0, CODE_FOR_sse_ucomi, "__builtin_ia32_ucomile", 
IX86_BUILTIN_UCOMILESS, UNLE, 0)
+BDESC (OPTION_MASK_ISA_SSE, 0, 

RE: [PATCH] MAINTAINERS: Add myself for write after approval

2022-06-27 Thread Kong, Lingling via Gcc-patches
Thanks a lot! I fixed it.

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 54d8ad41a6f..151770f59f4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -478,6 +478,7 @@ Jeff Knaggs 

 Michael Koch   
 Nicolas Koenig 
 Boris Kolpackov

+Lingling Kong  
 Dave Korn  
 Julia Koval
 Matt Kraai 
-- 
2.18.2

> -Original Message-
> From: Hongyu Wang 
> Sent: Monday, June 27, 2022 4:32 PM
> To: Kong, Lingling 
> Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] MAINTAINERS: Add myself for write after approval
> 
> Sorry, should be between
> 
> Boris Kolpackov  Dave Korn
> 
> 
> Hongyu Wang  于2022年6月27日周一 16:29
> 写道:
> >
> > According to the official guide, please sort your last name in
> > alphabetical order, which means you shold put your name between
> >
> > Dave Korn  Julia Koval
> > 
> >
> > Kong, Lingling via Gcc-patches  于2022年6月27
> 日周一
> > 16:05写道:
> >
> > >
> > > Hi,
> > >
> > > I want to add myself in MAINTANINER for write after approval.
> > >
> > > OK for master?
> > >
> > > ChangeLog:
> > >
> > > * MAINTAINERS (Write After Approval): Add myself.
> > > ---
> > >  MAINTAINERS | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS index
> > > 54d8ad41a6f..49627e5d113 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -698,6 +698,7 @@ Shujing Zhao
> 
> > >  Jon Ziegler
> > >  Roman Zippel   
> > >  Josef Zlomek   
> > > +Lingling Kong  
> > >
> > > Bug database only accounts
> > >
> > > --
> > > 2.18.1
> > >


[PATCH] MAINTAINERS: Add myself for write after approval

2022-06-27 Thread Kong, Lingling via Gcc-patches
Hi,

I want to add myself in MAINTANINER for write after approval.

OK for master?

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 54d8ad41a6f..49627e5d113 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -698,6 +698,7 @@ Shujing Zhao

 Jon Ziegler
 Roman Zippel   
 Josef Zlomek   
+Lingling Kong  
 
Bug database only accounts
 
-- 
2.18.1



[PATCH] i386: Enable intrinsics that convert float and bf16 data to each other.

2021-12-21 Thread Kong, Lingling via Gcc-patches
Hi,


This patch is to enable intrinsics that convert float and bf16 data to each 
other.
Ok for master?

gcc/ChangeLog:

* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Add new intrinsic.
(_mm512_cvtpbh_ps): Likewise.
(_mm512_maskz_cvtpbh_ps): Likewise.
(_mm512_mask_cvtpbh_ps): Likewise.
* config/i386/avx512bf16vlintrin.h (_mm_cvtness_sbh): Likewise.
(_mm_cvtpbh_ps): Likewise.
(_mm256_cvtpbh_ps): Likewise.
(_mm_maskz_cvtpbh_ps): Likewise.
(_mm256_maskz_cvtpbh_ps): Likewise.
(_mm_mask_cvtpbh_ps): Likewise.
(_mm256_mask_cvtpbh_ps): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: New test.
* gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c: Ditto.
* gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Ditto.
* gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c: Ditto.
---
 gcc/config/i386/avx512bf16intrin.h| 36 +++
 gcc/config/i386/avx512bf16vlintrin.h  | 63 +++
 .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c  | 15 +  
.../gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c | 20 ++
 .../i386/avx512bf16vl-cvtness2sbh-1.c | 14 +
 .../i386/avx512bf16vl-vcvtpbh2ps-1.c  | 29 +
 6 files changed, 177 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16-vcvtpbh2ps-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtpbh2ps-1.c

diff --git a/gcc/config/i386/avx512bf16intrin.h 
b/gcc/config/i386/avx512bf16intrin.h
index 9afc6bd7d2b..6b62dc3e398 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -41,6 +41,16 @@ typedef short __v32bh __attribute__ ((__vector_size__ (64)));
vector types, and their scalar components.  */  typedef short __m512bh 
__attribute__ ((__vector_size__ (64), __may_alias__));
 
+/* Convert One BF16 Data to One Single Float Data.  */ extern __inline 
+float __attribute__ ((__gnu_inline__, __always_inline__, 
+__artificial__)) _mm_cvtsbh_ss (__bfloat16 __A) {
+  union{ float a; unsigned int b;} __tmp;
+  __tmp.b = ((unsigned int)(__A)) << 16;
+  return __tmp.a;
+}
+
 /* vcvtne2ps2bf16 */
 
 extern __inline __m512bh
@@ -110,6 +120,32 @@ _mm512_maskz_dpbf16_ps (__mmask16 __A, __m512 __B, 
__m512bh __C, __m512bh __D)
   return (__m512)__builtin_ia32_dpbf16ps_v16sf_maskz(__B, __C, __D, __A);  }
 
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm512_cvtpbh_ps (__m256bh __A) {
+  return (__m512)_mm512_castsi512_ps ((__m512i)_mm512_slli_epi32 (
+(__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16)); }
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm512_maskz_cvtpbh_ps (__mmask16 __U, __m256bh __A) {
+  return (__m512)_mm512_castsi512_ps ((__m512i) _mm512_slli_epi32 (
+(__m512i)_mm512_maskz_cvtepi16_epi32 (
+(__mmask16)__U, (__m256i)__A), 16));
+}
+
+extern __inline __m512
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm512_mask_cvtpbh_ps (__m512 __S, __mmask16 __U, __m256bh __A) {
+  return (__m512)_mm512_castsi512_ps ((__m512i)(_mm512_mask_slli_epi32 (
+(__m512i)__S, (__mmask16)__U,
+(__m512i)_mm512_cvtepi16_epi32 ((__m256i)__A), 16))); }
+
 #ifdef __DISABLE_AVX512BF16__
 #undef __DISABLE_AVX512BF16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512bf16vlintrin.h 
b/gcc/config/i386/avx512bf16vlintrin.h
index 6dd396d4008..5e6a6503aa6 100644
--- a/gcc/config/i386/avx512bf16vlintrin.h
+++ b/gcc/config/i386/avx512bf16vlintrin.h
@@ -43,6 +43,7 @@ typedef short __v8bh __attribute__ ((__vector_size__ (16)));  
typedef short __m256bh __attribute__ ((__vector_size__ (32), __may_alias__));  
typedef short __m128bh __attribute__ ((__vector_size__ (16), __may_alias__));
 
+typedef unsigned short __bfloat16;
 /* vcvtne2ps2bf16 */
 
 extern __inline __m256bh
@@ -175,6 +176,68 @@ _mm_maskz_dpbf16_ps (__mmask8 __A, __m128 __B, __m128bh 
__C, __m128bh __D)
   return (__m128)__builtin_ia32_dpbf16ps_v4sf_maskz(__B, __C, __D, __A);  }
 
+extern __inline __bfloat16
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm_cvtness_sbh (float __A) {
+  __v4sf __V = {__A, 0, 0, 0};
+  __v8hi __R = __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V,
+  (__v8hi)_mm_undefined_si128 (), (__mmask8)-1);
+  return __R[0];
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm_cvtpbh_ps (__m128bh __A) {
+  return (__m128)_mm_castsi128_ps ((__m128i)_mm_slli_epi32 (
+(__m128i)_mm_cvtepi16_epi32 ((__m128i)__A), 16)); }
+
+extern __inline __m256
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) 
+_mm256_cvtpbh_ps (__m128bh __A) {
+ 

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
OK, This is the patch I prepare to check in.

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, November 24, 2021 4:49 PM
To: Kong, Lingling 
Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert 
_Float16 to SFmode with -mf16c [PR 102811]

On Wed, Nov 24, 2021 at 9:44 AM Kong, Lingling  wrote:
>
> Hi,
>
> vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with 
> -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
> Cleared before conversion, updated  movhi_internal and 
> ix86_can_change_mode_class. And fixed some commit message.
>
> OK for master?

OK, with a small adjustment to ChangeLog.

Thanks,
Uros.

> gcc/ChangeLog:
>
> PR target/102811
> * config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit data 
> in XMM register
> for TARGET_SSE2.
> * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for TARGET_F16C.
> (extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only.
> (*extendhf2): Rename from extendhf2.
> (truncsfhf2): Likewise.
> (truncdfhf2): Likewise.
> (*trunc2): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/102811
> * gcc.target/i386/pr90773-21.c: Optimize movhi_internal,
> also allow pextrw replace vmovd + movw.

Just write:

* gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw.

> * gcc.target/i386/pr90773-23.c: Ditto.
> * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
> ---
>  gcc/config/i386/i386.c|  5 +-
>  gcc/config/i386/i386.md   | 74 +--
>  .../i386/avx512vl-vcvtps2ph-pr102811.c| 11 +++
>  gcc/testsuite/gcc.target/i386/pr90773-21.c|  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-23.c|  2 +-
>  5 files changed, 83 insertions(+), 11 deletions(-)  create mode 
> 100644 gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 
> e94efdf39fb..4b813533961 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19485,9 +19485,8 @@ ix86_can_change_mode_class (machine_mode from, 
> machine_mode to,
>  disallow a change to these modes, reload will assume it's ok to
>  drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>  the vec_dupv4hi pattern.
> -NB: AVX512FP16 supports vmovw which can load 16bit data to sse
> -register.  */
> -  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_AVX512FP16 ? 2 : 
> 4;
> +NB: SSE2 can load 16bit data to sse register via pinsrw.  */
> +  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 :
> +4;
>if (GET_MODE_SIZE (from) < mov_size)
> return false;
>  }
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
> 6eb9de81921..6ee264f1151 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2525,6 +2525,16 @@
>  case TYPE_SSEMOV:
>return ix86_output_ssemov (insn, operands);
>
> +case TYPE_SSELOG:
> +  if (SSE_REG_P (operands[0]))
> +   return MEM_P (operands[1])
> + ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> + : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> +  else
> +   return MEM_P (operands[1])
> + ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> + : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +
>  case TYPE_MSKLOG:
>if (operands[1] == const0_rtx)
> return "kxorw\t%0, %0, %0";
> @@ -2540,13 +2550,17 @@
>  }
>  }
>[(set (attr "isa")
> -   (cond [(eq_attr "alternative" "9,10,11,12,13")
> - (const_string "avx512fp16")
> +   (cond [(eq_attr "alternative" "9,10,11,12")
> + (const_string "sse2")
> +  (eq_attr "alternative" "13")
> + (const_string "sse4")
>]
>(const_string "*")))
> (set (attr "type")
>   (cond [(eq_attr "alternative" "9,10,11,12,13")
> - (const_string "ssemov")
> + (if_then_else (match_test "TARGET_AVX512FP16")
> +   (const_string "ssemov")
> +   (const_string "sselog"))
> (eq_attr "alternative" "4,5,6,7")
>   (const_string "mskmov")
> (eq_attr "alternative" "8") @@ -4574,8 +4588,32 @@
>emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
>  })
>
> -(define_insn "extendhf2"
> -  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
> +(define_expand "extendhfsf2"
> +  [(set (match_operand:SF 0 "register_operand")
> +   (float_extend:SF
> + (match_operand:HF 1 "nonimmediate_operand")))]
> +  "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL"
> +{
> +  if (!TARGET_AVX512FP16)
> +{
> +  rtx res = gen_reg_rtx (V4SFmode);
> +  rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
> +
> +  

[PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
Hi,

vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with 
-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
Cleared before conversion, updated  movhi_internal and 
ix86_can_change_mode_class. And fixed some commit message.

OK for master?

gcc/ChangeLog:

PR target/102811
* config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit data in 
XMM register
for TARGET_SSE2.
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for TARGET_F16C.
(extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only.
(*extendhf2): Rename from extendhf2.
(truncsfhf2): Likewise.
(truncdfhf2): Likewise.
(*trunc2): Likewise.

gcc/testsuite/ChangeLog:

PR target/102811
* gcc.target/i386/pr90773-21.c: Optimize movhi_internal,
also allow pextrw replace vmovd + movw.
* gcc.target/i386/pr90773-23.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
---
 gcc/config/i386/i386.c|  5 +-
 gcc/config/i386/i386.md   | 74 +--
 .../i386/avx512vl-vcvtps2ph-pr102811.c| 11 +++
 gcc/testsuite/gcc.target/i386/pr90773-21.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-23.c|  2 +-
 5 files changed, 83 insertions(+), 11 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 
e94efdf39fb..4b813533961 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19485,9 +19485,8 @@ ix86_can_change_mode_class (machine_mode from, 
machine_mode to,
 disallow a change to these modes, reload will assume it's ok to
 drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
 the vec_dupv4hi pattern.
-NB: AVX512FP16 supports vmovw which can load 16bit data to sse
-register.  */
-  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_AVX512FP16 ? 2 : 4;
+NB: SSE2 can load 16bit data to sse register via pinsrw.  */
+  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 
+4;
   if (GET_MODE_SIZE (from) < mov_size)
return false;
 }
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
6eb9de81921..6ee264f1151 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2525,6 +2525,16 @@
 case TYPE_SSEMOV:
   return ix86_output_ssemov (insn, operands);
 
+case TYPE_SSELOG:
+  if (SSE_REG_P (operands[0]))
+   return MEM_P (operands[1])
+ ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
+ : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+  else
+   return MEM_P (operands[1])
+ ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
+ : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+
 case TYPE_MSKLOG:
   if (operands[1] == const0_rtx)
return "kxorw\t%0, %0, %0";
@@ -2540,13 +2550,17 @@
 }
 }
   [(set (attr "isa")
-   (cond [(eq_attr "alternative" "9,10,11,12,13")
- (const_string "avx512fp16")
+   (cond [(eq_attr "alternative" "9,10,11,12")
+ (const_string "sse2")
+  (eq_attr "alternative" "13")
+ (const_string "sse4")
   ]
   (const_string "*")))
(set (attr "type")
  (cond [(eq_attr "alternative" "9,10,11,12,13")
- (const_string "ssemov")
+ (if_then_else (match_test "TARGET_AVX512FP16")
+   (const_string "ssemov")
+   (const_string "sselog"))
(eq_attr "alternative" "4,5,6,7")
  (const_string "mskmov")
(eq_attr "alternative" "8")
@@ -4574,8 +4588,32 @@
   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
 })
 
-(define_insn "extendhf2"
-  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
+(define_expand "extendhfsf2"
+  [(set (match_operand:SF 0 "register_operand")
+   (float_extend:SF
+ (match_operand:HF 1 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL"
+{
+  if (!TARGET_AVX512FP16)
+{
+  rtx res = gen_reg_rtx (V4SFmode);
+  rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
+
+  ix86_expand_vector_set (false, tmp, operands[1], 0);
+  emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp)));
+  emit_move_insn (operands[0], gen_lowpart (SFmode, res));
+  DONE;
+}
+})
+
+(define_expand "extendhfdf2"
+  [(set (match_operand:DF 0 "register_operand")
+   (float_extend:DF
+ (match_operand:HF 1 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
+(define_insn "*extendhf2"
+  [(set (match_operand:MODEF 0 "register_operand" "=v")
 (float_extend:MODEF
  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512FP16"
@@ -4766,7 +4804,31 @@
 
 ;; Conversion from {SF,DF}mode to HFmode.
 
-(define_insn "trunchf2"
+(define_expand "truncsfhf2"
+  [(set 

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
Hi  Uros,

> BTW: When playing with my patch, I introduced (define_insn "*vec_set_0" 
> ...) to optimize scalar load to a vector. Does ix86_expand_vector_set work OK 
> without this pattern?

Yes, ix86_expand_vector_set could work ok with (define_insn 
"_pinsr"), this insn can optimize scalar load to a 
vector.

Thanks,
Lingling

-Original Message-
From: Uros Bizjak  
Sent: Wednesday, November 24, 2021 3:57 PM
To: Kong, Lingling 
Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert 
_Float16 to SFmode with -mf16c [PR 102811]

On Wed, Nov 24, 2021 at 7:25 AM Kong, Lingling via Gcc-patches 
 wrote:
>
> Hi,
>
> vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with 
> -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
> And cleared before conversion, updated  movhi_internal and 
> ix86_can_change_mode_class.

Please fix the above commit message.

>
> OK for master?
>
> gcc/ChangeLog:
>
> PR target/102811
> * config/i386/i386.c (ix86_can_change_mode_class): SSE2 can load 
> 16bit data
> to sse register via pinsrw.

Allow 16bit data in XMM register for SSE2 targets.

> * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.

... for TARGET_F16C.

> (extendhfdf2): Split extendhf2 into separate extendhfsf2, 
> extendhfdf2.
> extendhfdf only for target_avx512fp16.

Restrict extendhfdf for TARGET_AVX512FP16 only.

> (*extendhf2):rename extendhf2.

Rename from extendhf2.

> (truncsfhf2): Likewise.
> (truncdfhf2): Likewise.
> (*trunc2): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/102811
> * gcc.target/i386/pr90773-21.c: Optimized movhi_internal,
> optimize vmovd + movw to vpextrw.

Also allow pextrw.

> * gcc.target/i386/pr90773-23.c: Ditto.
> * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.

Otherwise LGTM.

BTW: When playing with my patch, I introduced (define_insn "*vec_set_0" 
...) to optimize scalar load to a vector. Does ix86_expand_vector_set work OK 
without this pattern?

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c|  5 +-
>  gcc/config/i386/i386.md   | 74 +--
>  .../i386/avx512vl-vcvtps2ph-pr102811.c| 11 +++
>  gcc/testsuite/gcc.target/i386/pr90773-21.c|  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-23.c|  2 +-
>  5 files changed, 83 insertions(+), 11 deletions(-)  create mode 
> 100644 gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 
> e94efdf39fb..4b813533961 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19485,9 +19485,8 @@ ix86_can_change_mode_class (machine_mode from, 
> machine_mode to,
>  disallow a change to these modes, reload will assume it's ok to
>  drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
>  the vec_dupv4hi pattern.
> -NB: AVX512FP16 supports vmovw which can load 16bit data to sse
> -register.  */
> -  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_AVX512FP16 ? 2 : 
> 4;
> +NB: SSE2 can load 16bit data to sse register via pinsrw.  */
> +  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 :
> +4;
>if (GET_MODE_SIZE (from) < mov_size)
> return false;
>  }
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
> 6eb9de81921..6ee264f1151 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2525,6 +2525,16 @@
>  case TYPE_SSEMOV:
>return ix86_output_ssemov (insn, operands);
>
> +case TYPE_SSELOG:
> +  if (SSE_REG_P (operands[0]))
> +   return MEM_P (operands[1])
> + ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> + : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> +  else
> +   return MEM_P (operands[1])
> + ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> + : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +
>  case TYPE_MSKLOG:
>if (operands[1] == const0_rtx)
> return "kxorw\t%0, %0, %0";
> @@ -2540,13 +2550,17 @@
>  }
>  }
>[(set (attr "isa")
> -   (cond [(eq_attr "alternative" "9,10,11,12,13")
> - (const_string "avx512fp16")
> +   (cond [(eq_attr "alternative" "9,10,11,12")
> + (const_string "sse2")
> +  (eq_attr "alternative" 

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-23 Thread Kong, Lingling via Gcc-patches
Hi,

vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with 
-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.
And cleared before conversion, updated  movhi_internal and 
ix86_can_change_mode_class.

OK for master?

gcc/ChangeLog:

PR target/102811
* config/i386/i386.c (ix86_can_change_mode_class): SSE2 can load 16bit 
data
to sse register via pinsrw.
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.
(extendhfdf2): Split extendhf2 into separate extendhfsf2, 
extendhfdf2.
extendhfdf only for target_avx512fp16.
(*extendhf2):rename extendhf2.
(truncsfhf2): Likewise.
(truncdfhf2): Likewise.
(*trunc2): Likewise.

gcc/testsuite/ChangeLog:

PR target/102811
* gcc.target/i386/pr90773-21.c: Optimized movhi_internal,
optimize vmovd + movw to vpextrw.
* gcc.target/i386/pr90773-23.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
---
 gcc/config/i386/i386.c|  5 +-
 gcc/config/i386/i386.md   | 74 +--
 .../i386/avx512vl-vcvtps2ph-pr102811.c| 11 +++
 gcc/testsuite/gcc.target/i386/pr90773-21.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-23.c|  2 +-
 5 files changed, 83 insertions(+), 11 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 
e94efdf39fb..4b813533961 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19485,9 +19485,8 @@ ix86_can_change_mode_class (machine_mode from, 
machine_mode to,
 disallow a change to these modes, reload will assume it's ok to
 drop the subreg from (subreg:SI (reg:HI 100) 0).  This affects
 the vec_dupv4hi pattern.
-NB: AVX512FP16 supports vmovw which can load 16bit data to sse
-register.  */
-  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_AVX512FP16 ? 2 : 4;
+NB: SSE2 can load 16bit data to sse register via pinsrw.  */
+  int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 
+4;
   if (GET_MODE_SIZE (from) < mov_size)
return false;
 }
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
6eb9de81921..6ee264f1151 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2525,6 +2525,16 @@
 case TYPE_SSEMOV:
   return ix86_output_ssemov (insn, operands);
 
+case TYPE_SSELOG:
+  if (SSE_REG_P (operands[0]))
+   return MEM_P (operands[1])
+ ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
+ : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+  else
+   return MEM_P (operands[1])
+ ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
+ : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+
 case TYPE_MSKLOG:
   if (operands[1] == const0_rtx)
return "kxorw\t%0, %0, %0";
@@ -2540,13 +2550,17 @@
 }
 }
   [(set (attr "isa")
-   (cond [(eq_attr "alternative" "9,10,11,12,13")
- (const_string "avx512fp16")
+   (cond [(eq_attr "alternative" "9,10,11,12")
+ (const_string "sse2")
+  (eq_attr "alternative" "13")
+ (const_string "sse4")
   ]
   (const_string "*")))
(set (attr "type")
  (cond [(eq_attr "alternative" "9,10,11,12,13")
- (const_string "ssemov")
+ (if_then_else (match_test "TARGET_AVX512FP16")
+   (const_string "ssemov")
+   (const_string "sselog"))
(eq_attr "alternative" "4,5,6,7")
  (const_string "mskmov")
(eq_attr "alternative" "8")
@@ -4574,8 +4588,32 @@
   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
 })
 
-(define_insn "extendhf2"
-  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
+(define_expand "extendhfsf2"
+  [(set (match_operand:SF 0 "register_operand")
+   (float_extend:SF
+ (match_operand:HF 1 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL"
+{
+  if (!TARGET_AVX512FP16)
+{
+  rtx res = gen_reg_rtx (V4SFmode);
+  rtx tmp = force_reg (V8HFmode, CONST0_RTX (V8HFmode));
+
+  ix86_expand_vector_set (false, tmp, operands[1], 0);
+  emit_insn (gen_vcvtph2ps (res, gen_lowpart (V8HImode, tmp)));
+  emit_move_insn (operands[0], gen_lowpart (SFmode, res));
+  DONE;
+}
+})
+
+(define_expand "extendhfdf2"
+  [(set (match_operand:DF 0 "register_operand")
+   (float_extend:DF
+ (match_operand:HF 1 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16")
+
+(define_insn "*extendhf2"
+  [(set (match_operand:MODEF 0 "register_operand" "=v")
 (float_extend:MODEF
  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512FP16"
@@ -4766,7 +4804,31 @@
 
 ;; Conversion from {SF,DF}mode to HFmode.
 
-(define_insn "trunchf2"
+(define_expand "truncsfhf2"
+  

[PATCH] i386: add alias for f*mul_*ch intrinsics

2021-11-16 Thread Kong, Lingling via Gcc-patches
Hi,

This patch is to add alias for f*mul_*ch intrinsics. 

Ok for master?

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h (_mm512_mul_pch): Add alias for 
_mm512_fmul_pch.
(_mm512_mask_mul_pch): Likewise.
(_mm512_maskz_mul_pch): Likewise.
(_mm512_mul_round_pch): Likewise.
(_mm512_mask_mul_round_pch): Likewise.
(_mm512_maskz_mul_round_pch): Likewise.
(_mm512_cmul_pch): Likewise.
(_mm512_mask_cmul_pch): Likewise.
(_mm512_maskz_cmul_pch): Likewise.
(_mm512_cmul_round_pch): Likewise.
(_mm512_mask_cmul_round_pch): Likewise.
(_mm512_maskz_cmul_round_pch): Likewise.
(_mm_mul_sch): Likewise.
(_mm_mask_mul_sch): Likewise.
(_mm_maskz_mul_sch): Likewise.
(_mm_mul_round_sch): Likewise.
(_mm_mask_mul_round_sch): Likewise.
(_mm_maskz_mul_round_sch): Likewise.
(_mm_cmul_sch): Likewise.
(_mm_mask_cmul_sch): Likewise.
(_mm_maskz_cmul_sch): Likewise.
(_mm_cmul_round_sch): Likewise.
(_mm_mask_cmul_round_sch): Likewise.
(_mm_maskz_cmul_round_sch): Likewise.
* config/i386/avx512fp16vlintrin.h (_mm_mul_pch): Likewise.
(_mm_mask_mul_pch): Likewise.
(_mm_maskz_mul_pch): Likewise.
(_mm256_mul_pch): Likewise.
(_mm256_mask_mul_pch): Likewise.
(_mm256_maskz_mul_pch): Likewise.
(_mm_cmul_pch): Likewise.
(_mm_mask_cmul_pch): Likewise.
(_mm_maskz_cmul_pch): Likewise.
(_mm256_cmul_pch): Likewise.
(_mm256_mask_cmul_pch): Likewise.
(_mm256_maskz_cmul_pch): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfcmulcph-1a.c: Add new test for alias.
* gcc.target/i386/avx512fp16-vfcmulcsh-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfmulcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfmulcsh-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfcmulcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfmulcph-1a.c: Likewise.
---
 gcc/config/i386/avx512fp16intrin.h| 39 +++
 gcc/config/i386/avx512fp16vlintrin.h  | 17 
 .../gcc.target/i386/avx512fp16-vfcmulcph-1a.c | 19 ++---  
.../gcc.target/i386/avx512fp16-vfcmulcsh-1a.c | 19 ++---  
.../gcc.target/i386/avx512fp16-vfmulcph-1a.c  | 19 ++---  
.../gcc.target/i386/avx512fp16-vfmulcsh-1a.c  | 19 ++---
 .../i386/avx512fp16vl-vfcmulcph-1a.c  | 20 +++---
 .../i386/avx512fp16vl-vfmulcph-1a.c   | 20 +++---
 8 files changed, 136 insertions(+), 36 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h 
b/gcc/config/i386/avx512fp16intrin.h
index 44c5e24f234..fe73e693897 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -7162,6 +7162,45 @@ _mm512_set1_pch (_Float16 _Complex __A)
   return (__m512h) _mm512_set1_ps (u.b);  }
 
+// intrinsics below are alias for f*mul_*ch #define _mm512_mul_pch(A, 
+B) _mm512_fmul_pch ((A), (B))
+#define _mm512_mask_mul_pch(W, U, A, B)  \
+  _mm512_mask_fmul_pch ((W), (U), (A), (B)) #define 
+_mm512_maskz_mul_pch(U, A, B) _mm512_maskz_fmul_pch ((U), (A), (B)) 
+#define _mm512_mul_round_pch(A, B, R) _mm512_fmul_round_pch ((A), (B), (R))
+#define _mm512_mask_mul_round_pch(W, U, A, B, R) \
+  _mm512_mask_fmul_round_pch ((W), (U), (A), (B), (R))
+#define _mm512_maskz_mul_round_pch(U, A, B, R)   \
+  _mm512_maskz_fmul_round_pch ((U), (A), (B), (R))
+
+#define _mm512_cmul_pch(A, B) _mm512_fcmul_pch ((A), (B))
+#define _mm512_mask_cmul_pch(W, U, A, B) \
+  _mm512_mask_fcmul_pch ((W), (U), (A), (B)) #define 
+_mm512_maskz_cmul_pch(U, A, B) _mm512_maskz_fcmul_pch ((U), (A), (B)) 
+#define _mm512_cmul_round_pch(A, B, R) _mm512_fcmul_round_pch ((A), (B), (R))
+#define _mm512_mask_cmul_round_pch(W, U, A, B, R)\
+  _mm512_mask_fcmul_round_pch ((W), (U), (A), (B), (R))
+#define _mm512_maskz_cmul_round_pch(U, A, B, R)  \
+  _mm512_maskz_fcmul_round_pch ((U), (A), (B), (R))
+
+#define _mm_mul_sch(A, B) _mm_fmul_sch ((A), (B)) #define 
+_mm_mask_mul_sch(W, U, A, B) _mm_mask_fmul_sch ((W), (U), (A), (B)) 
+#define _mm_maskz_mul_sch(U, A, B) _mm_maskz_fmul_sch ((U), (A), (B)) 
+#define _mm_mul_round_sch(A, B, R) _mm_fmul_round_sch ((A), (B), (R))
+#define _mm_mask_mul_round_sch(W, U, A, B, R)\
+  _mm_mask_fmul_round_sch ((W), (U), (A), (B), (R))
+#define _mm_maskz_mul_round_sch(U, A, B, R)  \
+  _mm_maskz_fmul_round_sch ((U), (A), (B), (R))
+
+#define _mm_cmul_sch(A, B) _mm_fcmul_sch ((A), (B)) #define 
+_mm_mask_cmul_sch(W, U, A, B) _mm_mask_fcmul_sch ((W), (U), (A), (B)) 
+#define _mm_maskz_cmul_sch(U, A, B) _mm_maskz_fcmul_sch ((U), (A), (B)) 
+#define _mm_cmul_round_sch(A, B, R) _mm_fcmul_round_sch ((A), (B), (R))

[PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-16 Thread Kong, Lingling via Gcc-patches
Hi,

vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with 
-mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c.

OK for master?

gcc/ChangeLog:

PR target/102811
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.
(extendhfdf2): Split extendhf2 into separate extendhfsf2, 
extendhfdf2.
(truncsfhf2): Likewise.
(truncdfhf2): Likewise.

gcc/testsuite/ChangeLog:

PR target/102811
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
---
 gcc/config/i386/i386.md   | 48 +++
 .../i386/avx512vl-vcvtps2ph-pr102811.c| 10 
 2 files changed, 49 insertions(+), 9 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
6eb9de81921..c5415475342 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4574,15 +4574,30 @@
   emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
 })
 
-(define_insn "extendhf2"
-  [(set (match_operand:MODEF 0 "nonimm_ssenomem_operand" "=v")
-(float_extend:MODEF
+(define_insn "extendhfsf2"
+  [(set (match_operand:SF 0 "register_operand" "=v")
+   (float_extend:SF
+ (match_operand:HF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL"
+{
+  if (TARGET_AVX512FP16)
+return "vcvtsh2ss\t{%1, %0, %0|%0, %0, %1}";
+  else
+return "vcvtph2ps\t{%1, %0|%0, %1}"; }
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "maybe_evex")
+   (set_attr "mode" "SF")])
+
+(define_insn "extendhfdf2"
+  [(set (match_operand:DF 0 "nonimm_ssenomem_operand" "=v")
+   (float_extend:DF
  (match_operand:HF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512FP16"
-  "vcvtsh2\t{%1, %0, %0|%0, %0, %1}"
+  "vcvtsh2sd\t{%1, %0, %0|%0, %0, %1}"
   [(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
-   (set_attr "mode" "")])
+   (set_attr "mode" "DF")])
 
 
 (define_expand "extendxf2"
@@ -4766,12 +4781,27 @@
 
 ;; Conversion from {SF,DF}mode to HFmode.
 
-(define_insn "trunchf2"
+(define_insn "truncsfhf2"
+  [(set (match_operand:HF 0 "register_operand" "=v")
+   (float_truncate:HF
+ (match_operand:SF 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512FP16 || TARGET_F16C || TARGET_AVX512VL"
+  {
+if (TARGET_AVX512FP16)
+  return "vcvtss2sh\t{%1, %d0|%d0, %1}";
+else
+  return "vcvtps2ph\t{0, %1, %0|%0, %1, 0}";
+  }
+  [(set_attr "type" "ssecvt")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "HF")])
+
+(define_insn "truncdfhf2"
   [(set (match_operand:HF 0 "register_operand" "=v")
-   (float_truncate:HF
- (match_operand:MODEF 1 "nonimmediate_operand" "vm")))]
+   (float_truncate:HF
+ (match_operand:DF 1 "nonimmediate_operand" "vm")))]
   "TARGET_AVX512FP16"
-  "vcvt2sh\t{%1, %d0|%d0, %1}"
+  "vcvtsd2sh\t{%1, %d0|%d0, %1}"
   [(set_attr "type" "ssecvt")
(set_attr "prefix" "evex")
(set_attr "mode" "HF")])
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c 
b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
new file mode 100644
index 000..ab44a304a03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mf16c -mno-avx512fp16" } */
+/* { dg-final { scan-assembler-times "vcvtph2ps\[ \\t\]" 2 } } */
+/* { dg-final { scan-assembler-times "vcvtps2ph\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "__truncsfhf2\[ \\t\]"} } */
+/* { dg-final { scan-assembler-not "__extendhfsf2\[ \\t\]"} } */
+_Float16 test (_Float16 a, _Float16 b)
+{
+  return a + b;
+}
--
2.18.1



[PATCH] i386: Optimization for mm512_set1_pch.

2021-11-05 Thread Kong, Lingling via Gcc-patches
Hi,

This patch is to support fold _mm512_fmadd_pch (a, _mm512_set1_pch(*(b)), c) to 
1 instruction vfmaddcph (%rsp){1to16}, %zmm1, %zmm2.
OK for master?

gcc/ChangeLog:

* config/i386/sse.md (fma___pair):
Add new define_insn.
(fma__fmaddc_bcst): Add new define_insn_and_split.
(fma__fcmaddc_bcst): Likewise

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16vl-complex-broadcast-1.c: New test.
---
 gcc/config/i386/sse.md| 62 +++
 .../i386/avx512fp16vl-complex-broadcast-1.c   | 25 
 2 files changed, 87 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/i386/avx512fp16vl-complex-broadcast-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
0a7f5b178f9..eba8e77515f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -193,7 +193,9 @@
 
   ;; For AVX512FP16 suppport
   UNSPEC_COMPLEX_FMA
+  UNSPEC_COMPLEX_FMA_PAIR
   UNSPEC_COMPLEX_FCMA
+  UNSPEC_COMPLEX_FCMA_PAIR
   UNSPEC_COMPLEX_FMUL
   UNSPEC_COMPLEX_FCMUL
   UNSPEC_COMPLEX_MASK
@@ -5913,6 +5915,9 @@
 (define_int_iterator UNSPEC_COMPLEX_F_C_MA
[UNSPEC_COMPLEX_FMA UNSPEC_COMPLEX_FCMA])
 
+(define_int_iterator UNSPEC_COMPLEX_F_C_MA_PAIR
+   [UNSPEC_COMPLEX_FMA_PAIR UNSPEC_COMPLEX_FCMA_PAIR])
+
 (define_int_iterator UNSPEC_COMPLEX_F_C_MUL
[UNSPEC_COMPLEX_FMUL UNSPEC_COMPLEX_FCMUL])
 
@@ -5922,6 +5927,10 @@
 (UNSPEC_COMPLEX_FMUL "fmulc")
 (UNSPEC_COMPLEX_FCMUL "fcmulc")])
 
+(define_int_attr complexpairopname
+   [(UNSPEC_COMPLEX_FMA_PAIR "fmaddc")
+(UNSPEC_COMPLEX_FCMA_PAIR "fcmaddc")])
+
 (define_mode_attr complexmove
   [(V32HF "avx512f_loadv16sf")
(V16HF "avx512vl_loadv8sf")
@@ -6067,6 +6076,59 @@
  [(match_dup 1) (match_dup 2) (match_dup 4)]
   UNSPEC_COMPLEX_F_C_MA))])
 
+(define_insn "fma___pair"
+ [(set (match_operand:VF1_AVX512VL 0 "register_operand" "=")
+   (unspec:VF1_AVX512VL
+[(match_operand:VF1_AVX512VL 1 "vector_operand" "%v")
+ (match_operand:VF1_AVX512VL 2 "bcst_vector_operand" "vmBr")
+ (match_operand:VF1_AVX512VL 3 "vector_operand" "0")]
+ UNSPEC_COMPLEX_F_C_MA_PAIR))]
+ "TARGET_AVX512FP16"
+ "vph\t{%2, %1, %0|%0, %1, %2}"
+ [(set_attr "type" "ssemuladd")])
+
+(define_insn_and_split "fma__fmaddc_bcst"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (unspec:VF_AVX512FP16VL
+ [(match_operand:VF_AVX512FP16VL 1 "vector_operand")
+  (subreg:VF_AVX512FP16VL
+(match_operand: 2 "bcst_vector_operand") 0)
+  (match_operand:VF_AVX512FP16VL 3 "vector_operand")]
+  UNSPEC_COMPLEX_FMA))]
+  "TARGET_AVX512FP16"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:
+ [(match_dup 1) (match_dup 2) (match_dup 3)]
+  UNSPEC_COMPLEX_FMA_PAIR))]
+  {
+operands[0] = lowpart_subreg (mode, operands[0], mode);
+operands[1] = lowpart_subreg (mode, operands[1], mode);
+operands[3] = lowpart_subreg (mode, operands[3], 
+mode);
+  })
+
+(define_insn_and_split "fma__fcmaddc_bcst"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (unspec:VF_AVX512FP16VL
+ [(match_operand:VF_AVX512FP16VL 1 "vector_operand")
+  (subreg:VF_AVX512FP16VL
+(match_operand: 2 "bcst_vector_operand") 0)
+  (match_operand:VF_AVX512FP16VL 3 "vector_operand")]
+  UNSPEC_COMPLEX_FCMA))]
+  "TARGET_AVX512FP16"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:
+ [(match_dup 1) (match_dup 2) (match_dup 3)]
+  UNSPEC_COMPLEX_FCMA_PAIR))]
+  {
+operands[0] = lowpart_subreg (mode, operands[0], mode);
+operands[1] = lowpart_subreg (mode, operands[1], mode);
+operands[3] = lowpart_subreg (mode, operands[3], 
+mode);
+  })
+
 (define_insn "___mask"
   [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=")
(vec_merge:VF_AVX512FP16VL
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-complex-broadcast-1.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16vl-complex-broadcast-1.c
new file mode 100644
index 000..3c8e84230f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-complex-broadcast-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } }  */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } }  */
+
+#include 
+
+volatile __m512h res0, a0, c0;
+volatile __m256h res1, a1, c1;
+volatile __m128h res2, a2, c2;
+volatile _Float16 *b;
+
+void extern
+avx_test(void)
+{
+  res0 = _mm512_fmadd_pch (a0, _mm512_set1_pch(*(b + 2 * 6)), c0);
+  res0 = _mm512_fcmadd_pch (a0, _mm512_set1_pch(*(b + 2 * 6)), c0);
+
+  res1 = _mm256_fmadd_pch (a1, _mm256_set1_pch(*(b + 2 * 6)), c1);
+  res1 = _mm256_fcmadd_pch (a1, _mm256_set1_pch(*(b + 2 * 6)), c1);
+
+  res2 =  

[PATCH] i386: Support complex fma/conj_fma for _Float16.

2021-11-05 Thread Kong, Lingling via Gcc-patches
Hi,

This patch is to support cmla_optab, cmul_optab, cmla_conj_optab, 
cmul_conj_optab for vector _Float16.
Ok for master?

gcc/ChangeLog:

* config/i386/sse.md (cmul3): add new define_expand.
(cmla4): Likewise

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vector-complex-float.c: New test.
---
 gcc/config/i386/sse.md| 23 +++
 .../i386/avx512fp16-vector-complex-float.c| 40 +++
 2 files changed, 63 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/i386/avx512fp16-vector-complex-float.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
0a7f5b178f9..8d3fef0a31a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5922,6 +5922,12 @@
 (UNSPEC_COMPLEX_FMUL "fmulc")
 (UNSPEC_COMPLEX_FCMUL "fcmulc")])
 
+(define_int_attr conj_op
+   [(UNSPEC_COMPLEX_FMA "")
+(UNSPEC_COMPLEX_FCMA "_conj")
+(UNSPEC_COMPLEX_FMUL "")
+(UNSPEC_COMPLEX_FCMUL "_conj")])
+
 (define_mode_attr complexmove
   [(V32HF "avx512f_loadv16sf")
(V16HF "avx512vl_loadv8sf")
@@ -6003,6 +6009,15 @@
   DONE;
 })
 
+(define_expand "cmla4"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (unspec:VF_AVX512FP16VL
+   [(match_operand:VF_AVX512FP16VL 1 "vector_operand")
+(match_operand:VF_AVX512FP16VL 2 "vector_operand")
+(match_operand:VF_AVX512FP16VL 3 "vector_operand")]
+UNSPEC_COMPLEX_F_C_MA))]
+  "TARGET_AVX512FP16")
+
 (define_insn "fma__"
   [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=")
(unspec:VF_AVX512FP16VL
@@ -6084,6 +6099,14 @@
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
+(define_expand "cmul3"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (unspec:VF_AVX512FP16VL
+ [(match_operand:VF_AVX512FP16VL 1 "vector_operand")
+  (match_operand:VF_AVX512FP16VL 2 "vector_operand")]
+  UNSPEC_COMPLEX_F_C_MUL))]
+  "TARGET_AVX512FP16")
+
 (define_insn "__"
   [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=")
  (unspec:VF_AVX512FP16VL
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vector-complex-float.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-complex-float.c
new file mode 100644
index 000..bcb957f0de0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vector-complex-float.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vfmadd\[123]*ph\[ \\t\]"} } */
+/* { dg-final { scan-assembler-not "vfmadd\[123]*sh\[ \\t\]"} } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]" 1 } } */
+
+#include
+#define TYPE _Float16
+#define N 16
+
+void fma0 (_Complex TYPE *a, _Complex TYPE *b,
+   _Complex TYPE *c)
+{
+  for (int i = 0; i < N; i++)
+c[i] += a[i] * b[i];
+}
+
+void fmaconj (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+  for (int i = 0; i < N; i++)
+c[i] += a[i] * ~b[i];
+}
+
+void fmul (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+  _Complex TYPE c[restrict N])
+{
+  for (int i = 0; i < N; i++)
+c[i] = a[i] * b[i];
+}
+
+void fmulconj (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+  _Complex TYPE c[restrict N])
+{
+  for (int i = 0; i < N; i++)
+c[i] = a[i] * ~b[i];
+}
--
2.18.1



[PATCH] i386: Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A) and combine FADD(A, FMUL(B, C)) to FMA(B, C, A).

2021-10-21 Thread Kong, Lingling via Gcc-patches
Hi,

This patch is to support transform in fast-math something like 
_mm512_add_ph(x1, _mm512_fmadd_pch(a, b, _mm512_setzero_ph())) to  
_mm512_fmadd_pch(a, b, x1).

And support transform _mm512_add_ph(x1, _mm512_fmul_pch(a, b)) to 
_mm512_fmadd_pch(a, b, x1).
Ok for master?

gcc/ChangeLog:

* config/i386/sse.md (fma__fadd_fmul): Add new
define_insn_and_split.
(fma__fadd_fcmul):Likewise
(fma___fma_zero):Likewise

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-complex-fma.c: New test.
---
 gcc/config/i386/sse.md| 52 +++
 .../gcc.target/i386/avx512fp16-complex-fma.c  | 18 +++
 2 files changed, 70 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-complex-fma.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
fbf056bf9e6..36407ca4a59 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5958,6 +5958,58 @@
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
+(define_insn_and_split "fma__fadd_fmul"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (plus:VF_AVX512FP16VL
+ (unspec:VF_AVX512FP16VL
+   [(match_operand:VF_AVX512FP16VL 1 "vector_operand")
+(match_operand:VF_AVX512FP16VL 2 "vector_operand")]
+UNSPEC_COMPLEX_FMUL)
+ (match_operand:VF_AVX512FP16VL 3 "vector_operand")))]
+  "TARGET_AVX512FP16 && flag_unsafe_math_optimizations
+  && ix86_pre_reload_split()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:VF_AVX512FP16VL
+ [(match_dup 1) (match_dup 2) (match_dup 3)]
+  UNSPEC_COMPLEX_FMA))])
+
+(define_insn_and_split "fma__fadd_fcmul"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (plus:VF_AVX512FP16VL
+ (unspec:VF_AVX512FP16VL
+   [(match_operand:VF_AVX512FP16VL 1 "vector_operand")
+(match_operand:VF_AVX512FP16VL 2 "vector_operand")]
+UNSPEC_COMPLEX_FCMUL)
+ (match_operand:VF_AVX512FP16VL 3 "vector_operand")))]
+  "TARGET_AVX512FP16 && flag_unsafe_math_optimizations
+  && ix86_pre_reload_split()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:VF_AVX512FP16VL
+ [(match_dup 1) (match_dup 2) (match_dup 3)]
+  UNSPEC_COMPLEX_FCMA))])
+
+(define_insn_and_split "fma___fma_zero"
+  [(set (match_operand:VF_AVX512FP16VL 0 "register_operand")
+   (plus:VF_AVX512FP16VL
+ (unspec:VF_AVX512FP16VL
+   [(match_operand:VF_AVX512FP16VL 1 "vector_operand")
+(match_operand:VF_AVX512FP16VL 2 "vector_operand")
+(match_operand:VF_AVX512FP16VL 3 "const0_operand")]
+UNSPEC_COMPLEX_F_C_MA)
+ (match_operand:VF_AVX512FP16VL 4 "vector_operand")))]
+  "TARGET_AVX512FP16 && flag_unsafe_math_optimizations
+  && ix86_pre_reload_split()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:VF_AVX512FP16VL
+ [(match_dup 1) (match_dup 2) (match_dup 4)]
+  UNSPEC_COMPLEX_F_C_MA))])
+
 (define_insn "___mask"
   [(set (match_operand:VF_AVX512FP16VL 0 "register_operand" "=")
(vec_merge:VF_AVX512FP16VL
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-complex-fma.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-complex-fma.c
new file mode 100644
index 000..2dfd369e785
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-complex-fma.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2 -Ofast" } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ 
+\\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(
+?:\n|\[ \\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-not "vaddph\[ 
+\\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(
+?:\n|\[ \\t\]+#)"} } */
+/* { dg-final { scan-assembler-not "vfmulcph\[ 
+\\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(
+?:\n|\[ \\t\]+#)"} } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ 
+\\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(
+?:\n|\[ \\t\]+#)" 2 } } */
+
+#include 
+volatile __m512h x1, x2, res, a, b;
+void extern
+avx512f_test (void)
+{
+  res = _mm512_add_ph (x1, _mm512_fmadd_pch (a, b, 
+_mm512_setzero_ph()));
+  res = _mm512_add_ph (x1, _mm512_fcmadd_pch (a, b, 
+_mm512_setzero_ph()));
+
+  res = _mm512_add_ph (x1, _mm512_fmul_pch (a, b));
+  res = _mm512_add_ph (x1, _mm512_fcmul_pch (a, b)); }
--
2.18.1



[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-26 Thread Kong, Lingling via Gcc-patches
Hi,

For avx512f_scattersi, mask operand only affect set src, we need to 
refine the pattern to let gcc know mask register also affect the dest.
So we put mask operand into UNSPEC_VSIBADDR.

Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
Ok for master?

gcc/ChangeLog:

PR target/101472
* config/i386/sse.md: (scattersi): Add mask operand to
UNSPEC_VSIBADDR.
(scattersi): Likewise.
(*avx512f_scattersi): Merge mask operand to set_dest.
(*avx512f_scatterdi): Likewise

gcc/testsuite/ChangeLog:

PR target/101472
* gcc.target/i386/avx512f-pr101472.c: New test.
* gcc.target/i386/avx512vl-pr101472.c: New test.
---
 gcc/config/i386/sse.md| 20 +++--
 .../gcc.target/i386/avx512f-pr101472.c| 49 
 .../gcc.target/i386/avx512vl-pr101472.c   | 79 +++
 3 files changed, 140 insertions(+), 8 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr101472.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 
03fc2df1fb0..a3055dbd316 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -24205,8 +24205,9 @@
   "TARGET_AVX512F"
 {
   operands[5]
-= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
-   operands[4]), UNSPEC_VSIBADDR);
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
+   operands[4], operands[1]), 
+   UNSPEC_VSIBADDR);
 })
 
 (define_insn "*avx512f_scattersi"
@@ -24214,10 +24215,11 @@
  [(unspec:P
 [(match_operand:P 0 "vsib_address_operand" "Tv")
  (match_operand: 2 "register_operand" "v")
- (match_operand:SI 4 "const1248_operand" "n")]
+ (match_operand:SI 4 "const1248_operand" "n")
+ (match_operand: 6 "register_operand" "1")]
 UNSPEC_VSIBADDR)])
(unspec:VI48F
- [(match_operand: 6 "register_operand" "1")
+ [(match_dup 6)
   (match_operand:VI48F 3 "register_operand" "v")]
  UNSPEC_SCATTER))
(clobber (match_scratch: 1 "="))] @@ -24243,8 +24245,9 
@@
   "TARGET_AVX512F"
 {
   operands[5]
-= gen_rtx_UNSPEC (Pmode, gen_rtvec (3, operands[0], operands[2],
-   operands[4]), UNSPEC_VSIBADDR);
+= gen_rtx_UNSPEC (Pmode, gen_rtvec (4, operands[0], operands[2],
+   operands[4], operands[1]), 
+   UNSPEC_VSIBADDR);
 })
 
 (define_insn "*avx512f_scatterdi"
@@ -24252,10 +24255,11 @@
  [(unspec:P
 [(match_operand:P 0 "vsib_address_operand" "Tv")
  (match_operand: 2 "register_operand" "v")
- (match_operand:SI 4 "const1248_operand" "n")]
+ (match_operand:SI 4 "const1248_operand" "n")
+ (match_operand:QI 6 "register_operand" "1")]
 UNSPEC_VSIBADDR)])
(unspec:VI48F
- [(match_operand:QI 6 "register_operand" "1")
+ [(match_dup 6)
   (match_operand: 3 "register_operand" "v")]
  UNSPEC_SCATTER))
(clobber (match_scratch:QI 1 "="))] diff --git 
a/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c 
b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
new file mode 100644
index 000..89c6603c2ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr101472.c
@@ -0,0 +1,49 @@
+/* PR target/101472 */
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vpscatterqd\[ 
+\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpscatterdd\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpscatterqq\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vpscatterdq\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterqps\[ 
+\\t\]+\[^\{\n\]*ymm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterdps\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterqpd\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*zmm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vscatterdpd\[ 
+\\t\]+\[^\{\n\]*zmm\[0-9\]\[^\n\]*ymm\[0-9\]\[^\n\]*{%k\[1-7\]}(?:\n|\[ 
+\\t\]+#)" 2 } } */
+
+#include 
+
+void two_scatters_epi32(void* addr, __mmask8 k1, __mmask8 k2, __m512i vindex, 
+__m256i a, __m512i b)
+{
+ 

[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-25 Thread Kong, Lingling via Gcc-patches
Hi,

For avx512f_scattersi, mask operand only affect set src, we
need to refine the pattern to let gcc know mask register also affect the dest.
So we put mask operand into UNSPEC_VSIBADDR.

Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}.
Ok for master?

gcc/ChangeLog:

*config/i386/sse.md (scattersi): Add mask operand to
UNSPEC_VSIBADDR.
(scattersi): Likewise.
(*avx512f_scattersi): Merge mask operand
to set_dest.
(*avx512f_scatterdi): Likewise

gcc/testsuite/ChangeLog:

*gcc.target/i386/avx512f-pr101472.c: New test.
*gcc.target/i386/avx512vl-pr101472.c: Ditto.


0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch
Description: 0001-i386-Fix-wrong-optimization-for-consecutive-masked-s.patch


[PATCH] i386: Fix _mm512_fpclass_ps_mask in O0 [PR 101471]

2021-08-25 Thread Kong, Lingling via Gcc-patches
Hi,

For _mm512_fpclass_ps_mask in O0, mask should be (__mmask16)-1 instead of
(__mmask8)-1).

Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for master?

gcc/ChangeLog:

* gcc/config/i386/avx512dqintrin.h : fix _mm512_fpclass_ps_mask define in O0

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-pr101471.c: add new test


0001-i386-Fix-_mm512_fpclass_ps_mask-in-O0-PR-101471.patch
Description: 0001-i386-Fix-_mm512_fpclass_ps_mask-in-O0-PR-101471.patch