Re: [PATCH v2, rs6000] Change insn condition from TARGET_64BIT to TARGET_POWERPC64 for VSX scalar extract/insert instructions

2022-09-07 Thread Paul A. Clarke via Gcc-patches
On Tue, Sep 06, 2022 at 12:19:06PM -0500, Segher Boessenkool wrote:
> On Mon, Sep 05, 2022 at 02:36:30PM +0800, HAO CHEN GUI wrote:
> > On 2/9/2022 下午 11:56, Segher Boessenkool wrote:
> > >> -  const signed long __builtin_vsx_scalar_extract_exp (double);
> > >> +  const unsigned long long __builtin_vsx_scalar_extract_exp (double);
> > >>  VSEEDP xsxexpdp {}
> > >>
> > >> -  const signed long __builtin_vsx_scalar_extract_sig (double);
> > >> +  const unsigned long long __builtin_vsx_scalar_extract_sig (double);
> > >>  VSESDP xsxsigdp {}
> > > This also brings these legacy builtins in line with the vec_ versions,
> > > which are the preferred builtins (they are defined in the PVIPR).
> > 
> > The return type of vec_ version built-ins are different than their 
> > definition
> > in PVIPR. In PVIPR, they're vector unsigned int or vector unsigned long 
> > long.
> > Shall we correct them?
> > 
> >   const vd __builtin_vsx_extract_exp_dp (vd);
> > VEEDP xvxexpdp {}
> > 
> >   const vf __builtin_vsx_extract_exp_sp (vf);
> > VEESP xvxexpsp {}
> > 
> >   const vd __builtin_vsx_extract_sig_dp (vd);
> > VESDP xvxsigdp {}
> > 
> >   const vf __builtin_vsx_extract_sig_sp (vf);
> > VESSP xvxsigsp {}
> 
> Those are the vsx_ versions.  I'm not sure what you're asking.
> 
> It won't be easy at all to change types from vector integer to vector
> float, it will break all over.  A compatibility nightmare.  It is better
> if you can show the current stuff cannot ever work, it's not a problem
> to replace it in that case.

I think Hao Chen is concerned about the return types:

> >   const vd __builtin_vsx_extract_exp_dp (vd);
> > VEEDP xvxexpdp {}

Per PVIPR, this should return vector unsigned long long ("vull" not "vd").

> >   const vf __builtin_vsx_extract_exp_sp (vf);
> > VEESP xvxexpsp {}

This should return vector unsigned int ("vui" not "vf").

> >   const vd __builtin_vsx_extract_sig_dp (vd);
> > VESDP xvxsigdp {}

This should return vector unsigned long long ("vull" not "vd").

> >   const vf __builtin_vsx_extract_sig_sp (vf);
> > VESSP xvxsigsp {}

This should return vector unsigned int ("vui" not "vf").

PC


[PATCH COMMITTED] Revert move of g++.dg/pr69667.C

2022-05-18 Thread Paul A. Clarke via Gcc-patches
Commit eccbd7fcee5bbfc47731e8de83c44eee2e3dcc4b moved the subject file to
g++.target/powerpc.  Unfortunately, test g++.dg/tsan/pr88018.C includes
"../pr69667.C".

Revert the move of this file.

Commit 14e678a2c4a76433fd4029568d28530c921e11ee relaxed some DejaGnu
directives in g++.dg/tsan/pr88018.C, given its more restrictive environment
within g++.target/powerpc.  Revert these changes in that file as well.

2022-05-18  Paul A. Clarke  

gcc/testsuite
PR target/105620
* g++.target/powerpc/pr69667.C: Move to ...
* g++.dg/pr69667.C: here. Also, revert recent dg directives changes.
---
Committed as trivial.

 gcc/testsuite/{g++.target/powerpc => g++.dg}/pr69667.C | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
 rename gcc/testsuite/{g++.target/powerpc => g++.dg}/pr69667.C (97%)

diff --git a/gcc/testsuite/g++.target/powerpc/pr69667.C 
b/gcc/testsuite/g++.dg/pr69667.C
similarity index 97%
rename from gcc/testsuite/g++.target/powerpc/pr69667.C
rename to gcc/testsuite/g++.dg/pr69667.C
index da550cd14bd6..422116dd5995 100644
--- a/gcc/testsuite/g++.target/powerpc/pr69667.C
+++ b/gcc/testsuite/g++.dg/pr69667.C
@@ -1,4 +1,5 @@
-/* { dg-skip-if "" { *-*-darwin* } } */
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -w -std=c++14" } */
 
-- 
2.27.0



Re: [COMMITTED] rs6000: Remove a few needless 'lp64' contraints.

2022-05-13 Thread Paul A. Clarke via Gcc-patches
On Thu, May 12, 2022 at 05:30:16PM -0500, Segher Boessenkool wrote:
> On Mon, Apr 18, 2022 at 12:15:35PM -0500, Paul A. Clarke wrote:
> > A few tests need not be restricted to 'lp64', so remove the restriction.
> > 
> > A few of those need a simple change to the DejaGnu directives to suppress
> > '-mcmodel' flags for '-m32'.
> 
> Okay for trunk.  Thanks!

I noticed that removing "{ target lp64 }" just left "{ dg-do compile }", which
is superfluous, so I removed the whole line before committing.

Since the actual commit was different than what was posted,
I'm sending the final, committed patch to the list, below.

PC

rs6000: Remove a few needless 'lp64' contraints.

A few tests need not be restricted to 'lp64', so remove the restriction.

A few of those need a simple change to the DejaGnu directives to suppress
'-mcmodel' flags for '-m32'.

2022-05-13  Paul A. Clarke  

gcc/testsuite
* g++.target/powerpc/pr65240-1.C: Adjust DejaGnu directives.
* g++.target/powerpc/pr65240-2.C: Likewise.
* g++.target/powerpc/pr65240-3.C: Likewise.
* g++.target/powerpc/pr65240-4.C: Likewise.
* g++.target/powerpc/pr65242.C: Likewise.
* g++.target/powerpc/pr67211.C: Likewise.
* g++.target/powerpc/pr69667.C: Likewise.
* g++.target/powerpc/pr71294.C: Likewise.
---
 gcc/testsuite/g++.target/powerpc/pr65240-1.C | 4 ++--
 gcc/testsuite/g++.target/powerpc/pr65240-2.C | 4 ++--
 gcc/testsuite/g++.target/powerpc/pr65240-3.C | 4 ++--
 gcc/testsuite/g++.target/powerpc/pr65240-4.C | 1 -
 gcc/testsuite/g++.target/powerpc/pr65242.C   | 1 -
 gcc/testsuite/g++.target/powerpc/pr67211.C   | 1 -
 gcc/testsuite/g++.target/powerpc/pr69667.C   | 1 -
 gcc/testsuite/g++.target/powerpc/pr71294.C   | 1 -
 8 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-1.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
index f735a1f7834a..1cf158c69097 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-1.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
@@ -1,7 +1,7 @@
-/* { dg-do compile { target lp64 } } */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mno-fp-in-toc -Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mno-fp-in-toc 
-Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=small" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-2.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
index e201e3a74d71..32d1c799b0db 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-2.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
@@ -1,7 +1,7 @@
-/* { dg-do compile { target lp64 } } */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mfp-in-toc -Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mfp-in-toc 
-Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=small" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-3.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
index 0821f68a5cf9..02567647f304 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-3.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
@@ -1,7 +1,7 @@
-/* { dg-do compile { target lp64 } } */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=medium" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-4.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-4.C
index 92d31acb20d9..3f6993aa1cde 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-4.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-4.C
@@ -1,4 +1,3 @@
-/* { dg-do compile { target lp64 } } */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-mdejagnu-cpu=power7 -O3 -ffast-math -Wno-return-type" } */
diff --git a/gcc/testsuite/g++.target/powerpc/pr65242.C 
b/gcc/testsuite/g++.target/powerpc/pr65242.C
index b2984d1d6083..3f5c2eaa9099 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65242.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65242.C
@@ -1,4 +1,3 @@
-/* { dg-do compile { target lp64 } } */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3" } */
diff --git 

Re: [COMMITTED] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-05-13 Thread Paul A. Clarke via Gcc-patches
> On 12 May 2022, at 23:38, Segher Boessenkool  
> wrote:
> On Mon, Apr 18, 2022 at 12:15:34PM -0500, Paul A. Clarke wrote:
>> -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> +/* Never tested on darwin, so skip there.  */
>> +/* { dg-skip-if "" { *-*-darwin* } } */
> 
> That is probably the reason for the skip, but it is a lousy reason, and
> not a good precedent to create.  It is much better to let the Darwin
> maintainer deal with this *if* it fails.  If you have proof it fails on
> Darwin, just say *that* (or even say *why* it fails!)

I misunderstood your earlier review comments, sorry.

This is all pre-existing (in slightly different form), so it probably shouldn't
be in this patch, anyway.  I took your original "Okay for trunk",
and committed the patch below.  If somebody has a strong desire to enable Darwin
for these tests, that can be done separately later.

I also committed patch 2/2.

On Thu, May 12, 2022 at 11:53:43PM +0100, Iain Sandoe via Gcc-patches wrote:
> For the record, if there’s a specific patch you’d like tested on Darwin,
> I am happy to try and fit it into the schedule (the machine is slow for
> modern codebases, so don’t expect immediate answers).

Thank you, Iain. Good to know. I won't trouble you for this set. The net of the
committed changes is just test movement.

PC

rs6000: Move g++.dg powerpc PR tests to g++.target

Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
longer required.

2021-05-13  Paul A. Clarke  

gcc/testsuite
* g++.dg/pr65240.h: Move to g++.target/powerpc.
* g++.dg/pr93974.C: Likewise.
* g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
* g++.dg/pr65240-2.C: Likewise.
* g++.dg/pr65240-3.C: Likewise.
* g++.dg/pr65240-4.C: Likewise.
* g++.dg/pr65242.C: Likewise.
* g++.dg/pr67211.C: Likewise.
* g++.dg/pr69667.C: Likewise.
* g++.dg/pr71294.C: Likewise.
* g++.dg/pr84264.C: Likewise.
* g++.dg/pr84279.C: Likewise.
* g++.dg/pr85657.C: Likewise.
---
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h   | 0
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C   | 0
 13 files changed, 19 insertions(+), 19 deletions(-)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C (71%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C (71%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C (70%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C (68%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h (100%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C (93%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C (91%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C (97%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C (96%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C (79%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C (100%)

diff --git a/gcc/testsuite/g++.dg/pr65240-1.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
similarity index 71%
rename from gcc/testsuite/g++.dg/pr65240-1.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-1.C
index ff8910df6a1a..f735a1f7834a 100644
--- a/gcc/testsuite/g++.dg/pr65240-1.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
@@ -1,5 +1,5 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mno-fp-in-toc -Wno-return-type" } */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-2.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
similarity index 71%
rename from gcc/testsuite/g++.dg/pr65240-2.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-2.C
index bdb7a62d73d2..e201e3a74d71 100644
--- a/gcc/testsuite/g++.dg/pr65240-2.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
@@ -1,5 +1,5 @@
-/* { dg-do compile { 

Re: [PING PATCH v3 0/2] rs6000: Move g++.dg powerpc tests to g++.target

2022-05-12 Thread Paul A. Clarke via Gcc-patches
ping

On Mon, Apr 18, 2022 at 12:15:33PM -0500, Paul A. Clarke via Gcc-patches wrote:
> v3: moved "not tested on Darwin" changes into 1/2, where they belong.
> 
> v2:
> - v1 patches 1/3 and 2/3 have been merged after reviews / approval.
> - Previous 3/3 is now 1/2, and new 2/2 is per review from Segher...
> 
> Some tests in g++.dg are target-specific for powerpc. Move those to
> g++.target/powerpc. Update the DejaGnu directives as needed, since
> the target restriction is perhaps no longer needed when residing in the
> target-specific powerpc subdirectory.
> 
> In addition (new patch 2/2), as suggested by Segher, remove 'lp64' restriction
> for a handful of tests, protecting uses of '-mcmodel' flag with
> dg-additional-options.
> 
> Tested on Linux/Power9 (BE) and Linux Power8 (LE 32 and 64), full "make 
> check".
> 
> OK for trunk?
> 
> Paul A. Clarke (2):
>   rs6000: Move g++.dg powerpc PR tests to g++.target
>   rs6000: Remove a few needless 'lp64' contraints.
> 
>  gcc/testsuite/g++.dg/pr65240-1.C | 8 
>  gcc/testsuite/g++.dg/pr65240-2.C | 8 
>  gcc/testsuite/g++.dg/pr65240-3.C | 8 
>  gcc/testsuite/g++.target/powerpc/pr65240-1.C | 9 +
>  gcc/testsuite/g++.target/powerpc/pr65240-2.C | 9 +
>  gcc/testsuite/g++.target/powerpc/pr65240-3.C | 9 +
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C | 5 +++--
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h   | 0
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C   | 5 +++--
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C   | 5 +++--
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C   | 5 +++--
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C   | 2 +-
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C   | 2 +-
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C   | 5 +++--
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C   | 2 +-
>  gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C   | 0
>  16 files changed, 45 insertions(+), 37 deletions(-)
>  delete mode 100644 gcc/testsuite/g++.dg/pr65240-1.C
>  delete mode 100644 gcc/testsuite/g++.dg/pr65240-2.C
>  delete mode 100644 gcc/testsuite/g++.dg/pr65240-3.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-1.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-2.C
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-3.C
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C (68%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h (100%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C (93%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C (91%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C (97%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C (96%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C (79%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C (89%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C (90%)
>  rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C (100%)
> 
> -- 
> 2.27.0
> 


PING [PATCH] Fix 'modff' reference in extend.texi

2022-04-22 Thread Paul A. Clarke via Gcc-patches
On Mon, Apr 11, 2022 at 11:23:48AM -0500, Paul A. Clarke via Gcc-patches wrote:
> In commit a2a919aa501e3 (2003), built-ins for modf and modff were added.
> In extend.texi, section "Other Builtins", "modf" was added to the paragraph
> "There are also built-in versions of the ISO C99 functions [...]" and
> "modf" was also added to the paragraph "The ISO C90 functions [...]".
> "modff" was not added to either paragraph.
> 
> Based on the context clues about where "modfl" and other similar function
> pairs like "powf/powl" appear, I believe the reference to "modf" in the
> first paragraph (C99) should instead be "modff".
> 
> 2022-04-11  Paul A. Clarke  
> 
> gcc
>   * doc/extend.texi (Other Builtins): Correct reference to 'modff'.
> ---
>  gcc/doc/extend.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index e10b10bc1f14..05c99f4284a6 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -13460,7 +13460,7 @@ There are also built-in versions of the ISO C99 
> functions
>  @code{expl}, @code{fabsf}, @code{fabsl}, @code{floorf}, @code{floorl},
>  @code{fmodf}, @code{fmodl}, @code{frexpf}, @code{frexpl}, @code{ldexpf},
>  @code{ldexpl}, @code{log10f}, @code{log10l}, @code{logf}, @code{logl},
> -@code{modfl}, @code{modf}, @code{powf}, @code{powl}, @code{sinf},
> +@code{modfl}, @code{modff}, @code{powf}, @code{powl}, @code{sinf},
>  @code{sinhf}, @code{sinhl}, @code{sinl}, @code{sqrtf}, @code{sqrtl},
>  @code{tanf}, @code{tanhf}, @code{tanhl} and @code{tanl}
>  that are recognized in any mode since ISO C90 reserves these names for
> -- 


[COMMITTED] docs: Correct "This functions" to "These functions"

2022-04-22 Thread Paul A. Clarke via Gcc-patches
2022-04-22  Paul A. Clarke  

gcc
* doc/extend.texi: Correct "This" to "These".
---
Committed as trivial/obvious.

 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e10b10bc1f14..931e5ae3769f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13525,7 +13525,7 @@ exceptions handling functions @code{fegetround}, 
@code{feclearexcept} and
 @code{feraiseexcept}.  They may not be available for all targets, and because
 they need close interaction with libc internal values, they may not be 
available
 for all target libcs, but in all cases they will gracefully fallback to libc
-calls.  This built-in functions appear both with and without the
+calls.  These built-in functions appear both with and without the
 @code{__builtin_} prefix.
 
 @deftypefn {Built-in Function} void *__builtin_alloca (size_t size)
-- 
2.27.0



[PATCH v3 1/2] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-04-18 Thread Paul A. Clarke via Gcc-patches
Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
longer required.

2021-04-18  Paul A. Clarke  

gcc/testsuite
* g++.dg/pr65240.h: Move to g++.target/powerpc.
* g++.dg/pr93974.C: Likewise.
* g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
* g++.dg/pr65240-2.C: Likewise.
* g++.dg/pr65240-3.C: Likewise.
* g++.dg/pr65240-4.C: Likewise.
* g++.dg/pr65242.C: Likewise.
* g++.dg/pr67211.C: Likewise.
* g++.dg/pr69667.C: Likewise.
* g++.dg/pr71294.C: Likewise.
* g++.dg/pr84264.C: Likewise.
* g++.dg/pr84279.C: Likewise.
* g++.dg/pr85657.C: Likewise.
---
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h   | 0
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C   | 0
 13 files changed, 27 insertions(+), 19 deletions(-)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C (68%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C (68%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C (67%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C (65%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h (100%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C (92%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C (96%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C (96%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C (79%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C (89%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C (100%)

diff --git a/gcc/testsuite/g++.dg/pr65240-1.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
similarity index 68%
rename from gcc/testsuite/g++.dg/pr65240-1.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-1.C
index ff8910df6a1a..23026673e76b 100644
--- a/gcc/testsuite/g++.dg/pr65240-1.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
@@ -1,5 +1,6 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* Never tested on darwin, so skip there.  */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mno-fp-in-toc -Wno-return-type" } */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-2.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
similarity index 68%
rename from gcc/testsuite/g++.dg/pr65240-2.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-2.C
index bdb7a62d73d2..ddd3b3b75f43 100644
--- a/gcc/testsuite/g++.dg/pr65240-2.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
@@ -1,5 +1,6 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* Never tested on darwin, so skip there.  */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mfp-in-toc -Wno-return-type" } */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-3.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
similarity index 67%
rename from gcc/testsuite/g++.dg/pr65240-3.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-3.C
index f37db9025d12..9e826c46ae7f 100644
--- a/gcc/testsuite/g++.dg/pr65240-3.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
@@ -1,5 +1,6 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* Never tested on darwin, so skip there.  */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-4.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-4.C
similarity index 65%
rename from gcc/testsuite/g++.dg/pr65240-4.C
rename to 

[PATCH v3 2/2] rs6000: Remove a few needless 'lp64' contraints.

2022-04-18 Thread Paul A. Clarke via Gcc-patches
A few tests need not be restricted to 'lp64', so remove the restriction.

A few of those need a simple change to the DejaGnu directives to suppress
'-mcmodel' flags for '-m32'.

2022-04-18  Paul A. Clarke  

gcc/testsuite
* g++.target/powerpc/pr65240-1.C: Adjust DejaGnu directives.
* g++.target/powerpc/pr65240-2.C: Likewise.
* g++.target/powerpc/pr65240-3.C: Likewise.
* g++.target/powerpc/pr65240-4.C: Likewise.
* g++.target/powerpc/pr65242.C: Likewise.
* g++.target/powerpc/pr67211.C: Likewise.
* g++.target/powerpc/pr69667.C: Likewise.
* g++.target/powerpc/pr71294.C: Likewise.
---
 gcc/testsuite/g++.target/powerpc/pr65240-1.C | 4 ++--
 gcc/testsuite/g++.target/powerpc/pr65240-2.C | 4 ++--
 gcc/testsuite/g++.target/powerpc/pr65240-3.C | 4 ++--
 gcc/testsuite/g++.target/powerpc/pr65240-4.C | 2 +-
 gcc/testsuite/g++.target/powerpc/pr65242.C   | 2 +-
 gcc/testsuite/g++.target/powerpc/pr67211.C   | 2 +-
 gcc/testsuite/g++.target/powerpc/pr69667.C   | 2 +-
 gcc/testsuite/g++.target/powerpc/pr71294.C   | 2 +-
 8 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-1.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
index 23026673e76b..40682d5fe857 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-1.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
@@ -1,8 +1,8 @@
-/* { dg-do compile { target lp64 } } */
 /* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mno-fp-in-toc -Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mno-fp-in-toc 
-Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=small" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-2.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
index ddd3b3b75f43..4e4a1c2bb897 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-2.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
@@ -1,8 +1,8 @@
-/* { dg-do compile { target lp64 } } */
 /* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mfp-in-toc -Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mfp-in-toc 
-Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=small" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-3.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
index 9e826c46ae7f..6acd278cab50 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-3.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
@@ -1,8 +1,8 @@
-/* { dg-do compile { target lp64 } } */
 /* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=medium" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-4.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-4.C
index 6047f136536e..57f2c769a3f3 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-4.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-4.C
@@ -1,4 +1,4 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile } */
 /* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
diff --git a/gcc/testsuite/g++.target/powerpc/pr65242.C 
b/gcc/testsuite/g++.target/powerpc/pr65242.C
index 09f5bb35f11d..64ca67e246f8 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65242.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65242.C
@@ -1,4 +1,4 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile } */
 /* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
diff --git a/gcc/testsuite/g++.target/powerpc/pr67211.C 
b/gcc/testsuite/g++.target/powerpc/pr67211.C
index 5cd00ba98ee4..946802e44cde 100644
--- a/gcc/testsuite/g++.target/powerpc/pr67211.C
+++ b/gcc/testsuite/g++.target/powerpc/pr67211.C
@@ -1,4 +1,4 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile } */
 /* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
diff --git a/gcc/testsuite/g++.target/powerpc/pr69667.C 

[PATCH v3 0/2] rs6000: Move g++.dg powerpc tests to g++.target

2022-04-18 Thread Paul A. Clarke via Gcc-patches
v3: moved "not tested on Darwin" changes into 1/2, where they belong.

v2:
- v1 patches 1/3 and 2/3 have been merged after reviews / approval.
- Previous 3/3 is now 1/2, and new 2/2 is per review from Segher...

Some tests in g++.dg are target-specific for powerpc. Move those to
g++.target/powerpc. Update the DejaGnu directives as needed, since
the target restriction is perhaps no longer needed when residing in the
target-specific powerpc subdirectory.

In addition (new patch 2/2), as suggested by Segher, remove 'lp64' restriction
for a handful of tests, protecting uses of '-mcmodel' flag with
dg-additional-options.

Tested on Linux/Power9 (BE) and Linux Power8 (LE 32 and 64), full "make check".

OK for trunk?

Paul A. Clarke (2):
  rs6000: Move g++.dg powerpc PR tests to g++.target
  rs6000: Remove a few needless 'lp64' contraints.

 gcc/testsuite/g++.dg/pr65240-1.C | 8 
 gcc/testsuite/g++.dg/pr65240-2.C | 8 
 gcc/testsuite/g++.dg/pr65240-3.C | 8 
 gcc/testsuite/g++.target/powerpc/pr65240-1.C | 9 +
 gcc/testsuite/g++.target/powerpc/pr65240-2.C | 9 +
 gcc/testsuite/g++.target/powerpc/pr65240-3.C | 9 +
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h   | 0
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C   | 0
 16 files changed, 45 insertions(+), 37 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/pr65240-1.C
 delete mode 100644 gcc/testsuite/g++.dg/pr65240-2.C
 delete mode 100644 gcc/testsuite/g++.dg/pr65240-3.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-3.C
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C (68%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h (100%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C (93%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C (91%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C (97%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C (96%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C (79%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C (89%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C (100%)

-- 
2.27.0



[PATCH v2 1/2] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-04-18 Thread Paul A. Clarke via Gcc-patches
Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
longer required.

2021-04-18  Paul A. Clarke  

gcc/testsuite
* g++.dg/pr65240.h: Move to g++.target/powerpc.
* g++.dg/pr93974.C: Likewise.
* g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
* g++.dg/pr65240-2.C: Likewise.
* g++.dg/pr65240-3.C: Likewise.
* g++.dg/pr65240-4.C: Likewise.
* g++.dg/pr65242.C: Likewise.
* g++.dg/pr67211.C: Likewise.
* g++.dg/pr69667.C: Likewise.
* g++.dg/pr71294.C: Likewise.
* g++.dg/pr84264.C: Likewise.
* g++.dg/pr84279.C: Likewise.
* g++.dg/pr85657.C: Likewise.
---
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h   | 0
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C   | 0
 13 files changed, 19 insertions(+), 19 deletions(-)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C (71%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C (71%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C (70%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C (68%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h (100%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C (93%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C (91%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C (97%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C (96%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C (79%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C (100%)

diff --git a/gcc/testsuite/g++.dg/pr65240-1.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
similarity index 71%
rename from gcc/testsuite/g++.dg/pr65240-1.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-1.C
index ff8910df6a1a..f735a1f7834a 100644
--- a/gcc/testsuite/g++.dg/pr65240-1.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
@@ -1,5 +1,5 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mno-fp-in-toc -Wno-return-type" } */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-2.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
similarity index 71%
rename from gcc/testsuite/g++.dg/pr65240-2.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-2.C
index bdb7a62d73d2..e201e3a74d71 100644
--- a/gcc/testsuite/g++.dg/pr65240-2.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
@@ -1,5 +1,5 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mfp-in-toc -Wno-return-type" } */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-3.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
similarity index 70%
rename from gcc/testsuite/g++.dg/pr65240-3.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-3.C
index f37db9025d12..0821f68a5cf9 100644
--- a/gcc/testsuite/g++.dg/pr65240-3.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
@@ -1,5 +1,5 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
 
diff --git a/gcc/testsuite/g++.dg/pr65240-4.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-4.C
similarity index 68%
rename from gcc/testsuite/g++.dg/pr65240-4.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-4.C
index efb6a6c06e7c..92d31acb20d9 100644
--- a/gcc/testsuite/g++.dg/pr65240-4.C
+++ 

[PATCH v2 2/2] rs6000: Remove a few needless 'lp64' contraints.

2022-04-18 Thread Paul A. Clarke via Gcc-patches
A few tests need not be restricted to 'lp64', so remove the restriction.

A few of those need a simple change to the DejaGnu directives to suppress
'-mcmodel' flags for '-m32'.

2022-04-18  Paul A. Clarke  

gcc/testsuite
* g++.target/powerpc/pr65240-1.C: Adjust DejaGnu directives.
* g++.target/powerpc/pr65240-2.C: Likewise.
* g++.target/powerpc/pr65240-3.C: Likewise.
* g++.target/powerpc/pr65240-4.C: Likewise.
* g++.target/powerpc/pr65242.C: Likewise.
* g++.target/powerpc/pr67211.C: Likewise.
* g++.target/powerpc/pr69667.C: Likewise.
* g++.target/powerpc/pr71294.C: Likewise.
---
 gcc/testsuite/g++.target/powerpc/pr65240-1.C | 5 +++--
 gcc/testsuite/g++.target/powerpc/pr65240-2.C | 5 +++--
 gcc/testsuite/g++.target/powerpc/pr65240-3.C | 5 +++--
 gcc/testsuite/g++.target/powerpc/pr65240-4.C | 3 ++-
 gcc/testsuite/g++.target/powerpc/pr65242.C   | 3 ++-
 gcc/testsuite/g++.target/powerpc/pr67211.C   | 3 ++-
 gcc/testsuite/g++.target/powerpc/pr69667.C   | 3 ++-
 gcc/testsuite/g++.target/powerpc/pr71294.C   | 2 +-
 8 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-1.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
index f735a1f7834a..40682d5fe857 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-1.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
@@ -1,7 +1,8 @@
-/* { dg-do compile { target lp64 } } */
+/* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mno-fp-in-toc -Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mno-fp-in-toc 
-Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=small" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-2.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
index e201e3a74d71..4e4a1c2bb897 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-2.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
@@ -1,7 +1,8 @@
-/* { dg-do compile { target lp64 } } */
+/* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=small 
-mfp-in-toc -Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mfp-in-toc 
-Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=small" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-3.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
index 0821f68a5cf9..6acd278cab50 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-3.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
@@ -1,7 +1,8 @@
-/* { dg-do compile { target lp64 } } */
+/* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
+/* { dg-options "-mdejagnu-cpu=power8 -O3 -ffast-math -Wno-return-type" } */
+/* { dg-additional-options "-mcmodel=medium" { target lp64 } } */
 
 /* target/65240, compiler got a 'insn does not satisfy its constraints' error. 
 */
 
diff --git a/gcc/testsuite/g++.target/powerpc/pr65240-4.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-4.C
index 92d31acb20d9..57f2c769a3f3 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65240-4.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-4.C
@@ -1,4 +1,5 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile } */
+/* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-mdejagnu-cpu=power7 -O3 -ffast-math -Wno-return-type" } */
diff --git a/gcc/testsuite/g++.target/powerpc/pr65242.C 
b/gcc/testsuite/g++.target/powerpc/pr65242.C
index b2984d1d6083..64ca67e246f8 100644
--- a/gcc/testsuite/g++.target/powerpc/pr65242.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65242.C
@@ -1,4 +1,5 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile } */
+/* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O3" } */
diff --git a/gcc/testsuite/g++.target/powerpc/pr67211.C 
b/gcc/testsuite/g++.target/powerpc/pr67211.C
index b58c08234272..946802e44cde 100644
--- a/gcc/testsuite/g++.target/powerpc/pr67211.C
+++ b/gcc/testsuite/g++.target/powerpc/pr67211.C
@@ -1,4 +1,5 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile } */
+/* Never tested on darwin, so skip there.  */
 /* { dg-skip-if "" { 

[PATCH v2 0/2] rs6000: Move g++.dg powerpc tests to g++.target

2022-04-18 Thread Paul A. Clarke via Gcc-patches
V1 patches 1/3 and 2/3 have been merged after reviews / approval.

Previous 3/3 is now 1/2, and new 2/2 is per review from Segher...

Some tests in g++.dg are target-specific for powerpc. Move those to
g++.target/powerpc. Update the DejaGnu directives as needed, since
the target restriction is perhaps no longer needed when residing in the
target-specific powerpc subdirectory.

In addition (new patch 2/2), as suggested by Segher, remove 'lp64' restriction
for a handful of tests, protecting uses of '-mcmodel' flag with
dg-additional-options.

Tested on Linux/Power9 (BE) and Linux Power8 (LE 32 and 64), full "make check".

OK for trunk?

Paul A. Clarke (2):
  rs6000: Move g++.dg powerpc PR tests to g++.target
  rs6000: Remove a few needless 'lp64' contraints.

 gcc/testsuite/g++.dg/pr65240-1.C | 8 
 gcc/testsuite/g++.dg/pr65240-2.C | 8 
 gcc/testsuite/g++.dg/pr65240-3.C | 8 
 gcc/testsuite/g++.target/powerpc/pr65240-1.C | 9 +
 gcc/testsuite/g++.target/powerpc/pr65240-2.C | 9 +
 gcc/testsuite/g++.target/powerpc/pr65240-3.C | 9 +
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h   | 0
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C   | 5 +++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C   | 0
 16 files changed, 44 insertions(+), 37 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/pr65240-1.C
 delete mode 100644 gcc/testsuite/g++.dg/pr65240-2.C
 delete mode 100644 gcc/testsuite/g++.dg/pr65240-3.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr65240-3.C
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C (68%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h (100%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C (93%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C (91%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C (97%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C (96%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C (79%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C (100%)

-- 
2.27.0



[PATCH] Fix 'modff' reference in extend.texi

2022-04-11 Thread Paul A. Clarke via Gcc-patches
In commit a2a919aa501e3 (2003), built-ins for modf and modff were added.
In extend.texi, section "Other Builtins", "modf" was added to the paragraph
"There are also built-in versions of the ISO C99 functions [...]" and
"modf" was also added to the paragraph "The ISO C90 functions [...]".
"modff" was not added to either paragraph.

Based on the context clues about where "modfl" and other similar function
pairs like "powf/powl" appear, I believe the reference to "modf" in the
first paragraph (C99) should instead be "modff".

2022-04-11  Paul A. Clarke  

gcc
* doc/extend.texi (Other Builtins): Correct reference to 'modff'.
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e10b10bc1f14..05c99f4284a6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13460,7 +13460,7 @@ There are also built-in versions of the ISO C99 
functions
 @code{expl}, @code{fabsf}, @code{fabsl}, @code{floorf}, @code{floorl},
 @code{fmodf}, @code{fmodl}, @code{frexpf}, @code{frexpl}, @code{ldexpf},
 @code{ldexpl}, @code{log10f}, @code{log10l}, @code{logf}, @code{logl},
-@code{modfl}, @code{modf}, @code{powf}, @code{powl}, @code{sinf},
+@code{modfl}, @code{modff}, @code{powf}, @code{powl}, @code{sinf},
 @code{sinhf}, @code{sinhl}, @code{sinl}, @code{sqrtf}, @code{sqrtl},
 @code{tanf}, @code{tanhf}, @code{tanhl} and @code{tanl}
 that are recognized in any mode since ISO C90 reserves these names for
-- 
2.27.0



Re: [PING^2 PATCH 3/3] rs6000: Move more g++.dg powerpc tests to g++.target

2022-03-29 Thread Paul A. Clarke via Gcc-patches
Ping.

On Tue, Mar 08, 2022 at 01:59:47PM -0600, Paul A. Clarke via Gcc-patches wrote:
> Ping.
> 
> On Mon, Feb 21, 2022 at 03:17:47PM -0600, Paul A. Clarke via Gcc-patches 
> wrote:
> > Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is 
> > no
> > longer required.
> > 
> > 2021-02-21  Paul A. Clarke  
> > 
> > gcc/testsuite
> > * g++.dg/debug/dwarf2/const2.C: Move to g++.target/powerpc.
> > * g++.dg/other/darwin-minversion-1.C: Likewise.
> > * g++.dg/eh/ppc64-sighandle-cr.C: Likewise.
> > * g++.dg/eh/simd-5.C: Likewise.
> > * g++.dg/eh/simd-4.C: Move to g++.target/powerpc, adjust dg directives.
> > * g++.dg/eh/uncaught3.C: Likewise.
> > * g++.dg/other/spu2vmx-1.C: Likewise.
> > ---
> >  .../{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C| 0
> >  .../{g++.dg/other => g++.target/powerpc}/darwin-minversion-1.C  | 0
> >  .../{g++.dg/eh => g++.target/powerpc}/ppc64-sighandle-cr.C  | 0
> >  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C| 2 +-
> >  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C| 0
> >  gcc/testsuite/{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C  | 2 +-
> >  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C | 2 +-
> >  7 files changed, 3 insertions(+), 3 deletions(-)
> >  rename gcc/testsuite/{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C 
> > (100%)
> >  rename gcc/testsuite/{g++.dg/other => 
> > g++.target/powerpc}/darwin-minversion-1.C (100%)
> >  rename gcc/testsuite/{g++.dg/eh => 
> > g++.target/powerpc}/ppc64-sighandle-cr.C (100%)
> >  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C (95%)
> >  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C (100%)
> >  rename gcc/testsuite/{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C (84%)
> >  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C (96%)
> > 
> > diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/const2.C 
> > b/gcc/testsuite/g++.target/powerpc/const2.C
> > similarity index 100%
> > rename from gcc/testsuite/g++.dg/debug/dwarf2/const2.C
> > rename to gcc/testsuite/g++.target/powerpc/const2.C
> > diff --git a/gcc/testsuite/g++.dg/other/darwin-minversion-1.C 
> > b/gcc/testsuite/g++.target/powerpc/darwin-minversion-1.C
> > similarity index 100%
> > rename from gcc/testsuite/g++.dg/other/darwin-minversion-1.C
> > rename to gcc/testsuite/g++.target/powerpc/darwin-minversion-1.C
> > diff --git a/gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C 
> > b/gcc/testsuite/g++.target/powerpc/ppc64-sighandle-cr.C
> > similarity index 100%
> > rename from gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C
> > rename to gcc/testsuite/g++.target/powerpc/ppc64-sighandle-cr.C
> > diff --git a/gcc/testsuite/g++.dg/eh/simd-4.C 
> > b/gcc/testsuite/g++.target/powerpc/simd-4.C
> > similarity index 95%
> > rename from gcc/testsuite/g++.dg/eh/simd-4.C
> > rename to gcc/testsuite/g++.target/powerpc/simd-4.C
> > index 8c9b58bf8684..a01f19c27369 100644
> > --- a/gcc/testsuite/g++.dg/eh/simd-4.C
> > +++ b/gcc/testsuite/g++.target/powerpc/simd-4.C
> > @@ -1,4 +1,4 @@
> > -/* { dg-do run { target powerpc*-*-darwin* } } */
> > +/* { dg-do run { target *-*-darwin* } } */
> >  /* { dg-options "-fexceptions -fnon-call-exceptions -O -maltivec" } */
> >  
> >  #include 
> > diff --git a/gcc/testsuite/g++.dg/eh/simd-5.C 
> > b/gcc/testsuite/g++.target/powerpc/simd-5.C
> > similarity index 100%
> > rename from gcc/testsuite/g++.dg/eh/simd-5.C
> > rename to gcc/testsuite/g++.target/powerpc/simd-5.C
> > diff --git a/gcc/testsuite/g++.dg/other/spu2vmx-1.C 
> > b/gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> > similarity index 84%
> > rename from gcc/testsuite/g++.dg/other/spu2vmx-1.C
> > rename to gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> > index d9c8faf94592..496b46c22c95 100644
> > --- a/gcc/testsuite/g++.dg/other/spu2vmx-1.C
> > +++ b/gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> > @@ -1,4 +1,4 @@
> > -/* { dg-do compile { target powerpc*-*-* } } */
> > +/* { dg-do compile } */
> >  /* { dg-require-effective-target powerpc_spu } */
> >  /* { dg-options "-maltivec" } */
> >  
> > diff --git a/gcc/testsuite/g++.dg/eh/uncaught3.C 
> > b/gcc/testsuite/g++.target/powerpc/uncaught3.C
> > similarity index 96%
> > rename from gcc/testsuite/g++.dg/eh/uncaught3.C
> > rename to gcc/testsuite/g++.target/powerpc/uncaught3.C
> > index 1beaab3f..f891401584ec 100644
> > --- a/gcc/testsuite/g++.dg/eh/uncaught3.C
> > +++ b/gcc/testsuite/g++.target/powerpc/uncaught3.C
> > @@ -1,4 +1,4 @@
> > -// { dg-do compile { target powerpc*-*-darwin* } }
> > +// { dg-do compile { target *-*-darwin* } }
> >  // { dg-final { scan-assembler-not "__cxa_get_exception" } }
> >  // { dg-options "-mmacosx-version-min=10.4" }
> >  // { dg-additional-options "-Wno-deprecated" { target c++17 } }
> > -- 
> > 2.27.0
> > 


Re: [PING^2 PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-03-29 Thread Paul A. Clarke via Gcc-patches
Ping.

On Tue, Mar 08, 2022 at 02:03:04PM -0600, Paul A. Clarke via Gcc-patches wrote:
> Gentle ping. I am grateful for the initial review, but seek closure on the
> final couple of discussion items. Thanks!
> 
> PC
> 
> On Tue, Feb 22, 2022 at 07:56:40PM -0600, Paul A. Clarke via Gcc-patches 
> wrote:
> > On Tue, Feb 22, 2022 at 06:41:45PM -0600, Segher Boessenkool wrote:
> > > On Mon, Feb 21, 2022 at 03:17:46PM -0600, Paul A. Clarke wrote:
> > > > Also adjust DejaGnu directives, as specifically requiring 
> > > > "powerpc*-*-*" is no
> > > > longer required.
> > > > 
> > > > 2021-02-21  Paul A. Clarke  
> > > > 
> > > > gcc/testsuite
> > > > * g++.dg/pr65240.h: Move to g++.target/powerpc.
> > > > * g++.dg/pr93974.C: Likewise.
> > > > * g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg 
> > > > directives.
> > > > * g++.dg/pr65240-2.C: Likewise.
> > > > * g++.dg/pr65240-3.C: Likewise.
> > > > * g++.dg/pr65240-4.C: Likewise.
> > > > * g++.dg/pr65242.C: Likewise.
> > > > * g++.dg/pr67211.C: Likewise.
> > > > * g++.dg/pr69667.C: Likewise.
> > > > * g++.dg/pr71294.C: Likewise.
> > > > * g++.dg/pr84264.C: Likewise.
> > > > * g++.dg/pr84279.C: Likewise.
> > > > * g++.dg/pr85657.C: Likewise.
> > > 
> > > Okay for trunk.  Thanks!
> > 
> > Thanks for the review! More below...
> > 
> > > That said...
> > > 
> > > > -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > > > -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > > > +/* { dg-do compile { target lp64 } } */
> > > > +/* { dg-skip-if "" { *-*-darwin* } } */
> > > 
> > > That skip-if is most likely cargo cult, and it's not clear why lp64
> > > would be needed either (there is no comment what it is needed for, for
> > > example).
> > 
> > I can't speak to darwin, nor have an easy way of testing on it.
> > 
> > As for lp64, these tests fail on -m32 with:
> >   cc1plus: error: '-mcmodel' not supported in this configuration
> > - g++.dg/pr65240-1.C
> > - g++.dg/pr65240-2.C
> > - g++.dg/pr65240-3.C
> > 
> > '-mcmodel' is in the dg-options line for the above tests.
> > 
> > The rest PASSed.  Shall I remove the 'lp64' restriction for those that PASS?
> > 
> > > > +++ b/gcc/testsuite/g++.target/powerpc/pr85657.C
> > > > @@ -1,4 +1,4 @@
> > > > -// { dg-do compile { target { powerpc*-*-linux* } } }
> > > > +// { dg-do compile { target { *-*-linux* } } }
> > > 
> > > A comment here would help as well.  All of that is pre-existing of
> > > course.
> > 
> > I'm not sure what such a comment would say. I suspect it was a testing issue
> > (only tested on Linux), but I have similar limitations, so I'm also 
> > reluctant
> > to enable the test for what would be untested (by me) platforms.
> > 
> > PC


Re: [PATCH] rs6000: Skip overload instances with NULL fntype [PR104967]

2022-03-23 Thread Paul A. Clarke via Gcc-patches
On Wed, Mar 23, 2022 at 05:33:21PM +0800, Kewen.Lin via Gcc-patches wrote:
> As shown in PR104967, for some overload built-in function instance,
> if it requires a date type which isn't defined on the target, its

nit: s/date/data/

> fntype would be initialized as NULL.  This patch is to consider
> this possibility in function find_instance to avoid ICE.

PC


Re: [PING PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-03-08 Thread Paul A. Clarke via Gcc-patches
Gentle ping. I am grateful for the initial review, but seek closure on the
final couple of discussion items. Thanks!

PC

On Tue, Feb 22, 2022 at 07:56:40PM -0600, Paul A. Clarke via Gcc-patches wrote:
> On Tue, Feb 22, 2022 at 06:41:45PM -0600, Segher Boessenkool wrote:
> > On Mon, Feb 21, 2022 at 03:17:46PM -0600, Paul A. Clarke wrote:
> > > Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" 
> > > is no
> > > longer required.
> > > 
> > > 2021-02-21  Paul A. Clarke  
> > > 
> > > gcc/testsuite
> > >   * g++.dg/pr65240.h: Move to g++.target/powerpc.
> > >   * g++.dg/pr93974.C: Likewise.
> > >   * g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
> > >   * g++.dg/pr65240-2.C: Likewise.
> > >   * g++.dg/pr65240-3.C: Likewise.
> > >   * g++.dg/pr65240-4.C: Likewise.
> > >   * g++.dg/pr65242.C: Likewise.
> > >   * g++.dg/pr67211.C: Likewise.
> > >   * g++.dg/pr69667.C: Likewise.
> > >   * g++.dg/pr71294.C: Likewise.
> > >   * g++.dg/pr84264.C: Likewise.
> > >   * g++.dg/pr84279.C: Likewise.
> > >   * g++.dg/pr85657.C: Likewise.
> > 
> > Okay for trunk.  Thanks!
> 
> Thanks for the review! More below...
> 
> > That said...
> > 
> > > -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > > -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > > +/* { dg-do compile { target lp64 } } */
> > > +/* { dg-skip-if "" { *-*-darwin* } } */
> > 
> > That skip-if is most likely cargo cult, and it's not clear why lp64
> > would be needed either (there is no comment what it is needed for, for
> > example).
> 
> I can't speak to darwin, nor have an easy way of testing on it.
> 
> As for lp64, these tests fail on -m32 with:
>   cc1plus: error: '-mcmodel' not supported in this configuration
> - g++.dg/pr65240-1.C
> - g++.dg/pr65240-2.C
> - g++.dg/pr65240-3.C
> 
> '-mcmodel' is in the dg-options line for the above tests.
> 
> The rest PASSed.  Shall I remove the 'lp64' restriction for those that PASS?
> 
> > > +++ b/gcc/testsuite/g++.target/powerpc/pr85657.C
> > > @@ -1,4 +1,4 @@
> > > -// { dg-do compile { target { powerpc*-*-linux* } } }
> > > +// { dg-do compile { target { *-*-linux* } } }
> > 
> > A comment here would help as well.  All of that is pre-existing of
> > course.
> 
> I'm not sure what such a comment would say. I suspect it was a testing issue
> (only tested on Linux), but I have similar limitations, so I'm also reluctant
> to enable the test for what would be untested (by me) platforms.
> 
> PC


Re: [PING PATCH 3/3] rs6000: Move more g++.dg powerpc tests to g++.target

2022-03-08 Thread Paul A. Clarke via Gcc-patches
Ping.

On Mon, Feb 21, 2022 at 03:17:47PM -0600, Paul A. Clarke via Gcc-patches wrote:
> Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
> longer required.
> 
> 2021-02-21  Paul A. Clarke  
> 
> gcc/testsuite
>   * g++.dg/debug/dwarf2/const2.C: Move to g++.target/powerpc.
>   * g++.dg/other/darwin-minversion-1.C: Likewise.
>   * g++.dg/eh/ppc64-sighandle-cr.C: Likewise.
>   * g++.dg/eh/simd-5.C: Likewise.
>   * g++.dg/eh/simd-4.C: Move to g++.target/powerpc, adjust dg directives.
>   * g++.dg/eh/uncaught3.C: Likewise.
>   * g++.dg/other/spu2vmx-1.C: Likewise.
> ---
>  .../{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C| 0
>  .../{g++.dg/other => g++.target/powerpc}/darwin-minversion-1.C  | 0
>  .../{g++.dg/eh => g++.target/powerpc}/ppc64-sighandle-cr.C  | 0
>  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C| 2 +-
>  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C| 0
>  gcc/testsuite/{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C  | 2 +-
>  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C | 2 +-
>  7 files changed, 3 insertions(+), 3 deletions(-)
>  rename gcc/testsuite/{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C 
> (100%)
>  rename gcc/testsuite/{g++.dg/other => 
> g++.target/powerpc}/darwin-minversion-1.C (100%)
>  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/ppc64-sighandle-cr.C 
> (100%)
>  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C (95%)
>  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C (100%)
>  rename gcc/testsuite/{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C (84%)
>  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C (96%)
> 
> diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/const2.C 
> b/gcc/testsuite/g++.target/powerpc/const2.C
> similarity index 100%
> rename from gcc/testsuite/g++.dg/debug/dwarf2/const2.C
> rename to gcc/testsuite/g++.target/powerpc/const2.C
> diff --git a/gcc/testsuite/g++.dg/other/darwin-minversion-1.C 
> b/gcc/testsuite/g++.target/powerpc/darwin-minversion-1.C
> similarity index 100%
> rename from gcc/testsuite/g++.dg/other/darwin-minversion-1.C
> rename to gcc/testsuite/g++.target/powerpc/darwin-minversion-1.C
> diff --git a/gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C 
> b/gcc/testsuite/g++.target/powerpc/ppc64-sighandle-cr.C
> similarity index 100%
> rename from gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C
> rename to gcc/testsuite/g++.target/powerpc/ppc64-sighandle-cr.C
> diff --git a/gcc/testsuite/g++.dg/eh/simd-4.C 
> b/gcc/testsuite/g++.target/powerpc/simd-4.C
> similarity index 95%
> rename from gcc/testsuite/g++.dg/eh/simd-4.C
> rename to gcc/testsuite/g++.target/powerpc/simd-4.C
> index 8c9b58bf8684..a01f19c27369 100644
> --- a/gcc/testsuite/g++.dg/eh/simd-4.C
> +++ b/gcc/testsuite/g++.target/powerpc/simd-4.C
> @@ -1,4 +1,4 @@
> -/* { dg-do run { target powerpc*-*-darwin* } } */
> +/* { dg-do run { target *-*-darwin* } } */
>  /* { dg-options "-fexceptions -fnon-call-exceptions -O -maltivec" } */
>  
>  #include 
> diff --git a/gcc/testsuite/g++.dg/eh/simd-5.C 
> b/gcc/testsuite/g++.target/powerpc/simd-5.C
> similarity index 100%
> rename from gcc/testsuite/g++.dg/eh/simd-5.C
> rename to gcc/testsuite/g++.target/powerpc/simd-5.C
> diff --git a/gcc/testsuite/g++.dg/other/spu2vmx-1.C 
> b/gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> similarity index 84%
> rename from gcc/testsuite/g++.dg/other/spu2vmx-1.C
> rename to gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> index d9c8faf94592..496b46c22c95 100644
> --- a/gcc/testsuite/g++.dg/other/spu2vmx-1.C
> +++ b/gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-do compile } */
>  /* { dg-require-effective-target powerpc_spu } */
>  /* { dg-options "-maltivec" } */
>  
> diff --git a/gcc/testsuite/g++.dg/eh/uncaught3.C 
> b/gcc/testsuite/g++.target/powerpc/uncaught3.C
> similarity index 96%
> rename from gcc/testsuite/g++.dg/eh/uncaught3.C
> rename to gcc/testsuite/g++.target/powerpc/uncaught3.C
> index 1beaab3f..f891401584ec 100644
> --- a/gcc/testsuite/g++.dg/eh/uncaught3.C
> +++ b/gcc/testsuite/g++.target/powerpc/uncaught3.C
> @@ -1,4 +1,4 @@
> -// { dg-do compile { target powerpc*-*-darwin* } }
> +// { dg-do compile { target *-*-darwin* } }
>  // { dg-final { scan-assembler-not "__cxa_get_exception" } }
>  // { dg-options "-mmacosx-version-min=10.4" }
>  // { dg-additional-options "-Wno-deprecated" { target c++17 } }
> -- 
> 2.27.0
> 


Re: [PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-02-22 Thread Paul A. Clarke via Gcc-patches
On Tue, Feb 22, 2022 at 06:41:45PM -0600, Segher Boessenkool wrote:
> On Mon, Feb 21, 2022 at 03:17:46PM -0600, Paul A. Clarke wrote:
> > Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is 
> > no
> > longer required.
> > 
> > 2021-02-21  Paul A. Clarke  
> > 
> > gcc/testsuite
> > * g++.dg/pr65240.h: Move to g++.target/powerpc.
> > * g++.dg/pr93974.C: Likewise.
> > * g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
> > * g++.dg/pr65240-2.C: Likewise.
> > * g++.dg/pr65240-3.C: Likewise.
> > * g++.dg/pr65240-4.C: Likewise.
> > * g++.dg/pr65242.C: Likewise.
> > * g++.dg/pr67211.C: Likewise.
> > * g++.dg/pr69667.C: Likewise.
> > * g++.dg/pr71294.C: Likewise.
> > * g++.dg/pr84264.C: Likewise.
> > * g++.dg/pr84279.C: Likewise.
> > * g++.dg/pr85657.C: Likewise.
> 
> Okay for trunk.  Thanks!

Thanks for the review! More below...

> That said...
> 
> > -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-do compile { target lp64 } } */
> > +/* { dg-skip-if "" { *-*-darwin* } } */
> 
> That skip-if is most likely cargo cult, and it's not clear why lp64
> would be needed either (there is no comment what it is needed for, for
> example).

I can't speak to darwin, nor have an easy way of testing on it.

As for lp64, these tests fail on -m32 with:
  cc1plus: error: '-mcmodel' not supported in this configuration
- g++.dg/pr65240-1.C
- g++.dg/pr65240-2.C
- g++.dg/pr65240-3.C

'-mcmodel' is in the dg-options line for the above tests.

The rest PASSed.  Shall I remove the 'lp64' restriction for those that PASS?

> > +++ b/gcc/testsuite/g++.target/powerpc/pr85657.C
> > @@ -1,4 +1,4 @@
> > -// { dg-do compile { target { powerpc*-*-linux* } } }
> > +// { dg-do compile { target { *-*-linux* } } }
> 
> A comment here would help as well.  All of that is pre-existing of
> course.

I'm not sure what such a comment would say. I suspect it was a testing issue
(only tested on Linux), but I have similar limitations, so I'm also reluctant
to enable the test for what would be untested (by me) platforms.

PC


Re: [PATCH 0/3] rs6000: Move g++.dg powerpc tests to g++.target

2022-02-22 Thread Paul A. Clarke via Gcc-patches
On Tue, Feb 22, 2022 at 12:28:56PM -0600, Segher Boessenkool wrote:
> On Mon, Feb 21, 2022 at 03:17:44PM -0600, Paul A. Clarke wrote:
> > Some tests in g++.dg are target-specific for powerpc. Move those to
> > g++.target/powerpc. Update the DejaGnu directives as needed, since
> > the target restriction is perhaps no longer needed when residing in the
> > target-specific powerpc subdirectory.
> 
> Not "perhaps" :-)  More specifically, powerpc.exp has
> 
> # Exit immediately if this isn't a PowerPC target.
> if {![istarget powerpc*-*-*] } then {
>   return
> }
> 
> so anything run from that driver does not have to test for powerpc
> separately anymore.

The context for "perhaps" is for cases like:
// { dg-do compile { target powerpc*-*-darwin* } }
and
// { dg-do compile { target { powerpc*-*-linux* } } }

where the target is still needed, albeit without the "powerpc"
restriction itself.

PC


[PATCH 3/3] rs6000: Move more g++.dg powerpc tests to g++.target

2022-02-21 Thread Paul A. Clarke via Gcc-patches
Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
longer required.

2021-02-21  Paul A. Clarke  

gcc/testsuite
* g++.dg/debug/dwarf2/const2.C: Move to g++.target/powerpc.
* g++.dg/other/darwin-minversion-1.C: Likewise.
* g++.dg/eh/ppc64-sighandle-cr.C: Likewise.
* g++.dg/eh/simd-5.C: Likewise.
* g++.dg/eh/simd-4.C: Move to g++.target/powerpc, adjust dg directives.
* g++.dg/eh/uncaught3.C: Likewise.
* g++.dg/other/spu2vmx-1.C: Likewise.
---
 .../{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C| 0
 .../{g++.dg/other => g++.target/powerpc}/darwin-minversion-1.C  | 0
 .../{g++.dg/eh => g++.target/powerpc}/ppc64-sighandle-cr.C  | 0
 gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C| 2 +-
 gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C| 0
 gcc/testsuite/{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C  | 2 +-
 gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C | 2 +-
 7 files changed, 3 insertions(+), 3 deletions(-)
 rename gcc/testsuite/{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C 
(100%)
 rename gcc/testsuite/{g++.dg/other => 
g++.target/powerpc}/darwin-minversion-1.C (100%)
 rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/ppc64-sighandle-cr.C 
(100%)
 rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C (95%)
 rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C (100%)
 rename gcc/testsuite/{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C (84%)
 rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C (96%)

diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/const2.C 
b/gcc/testsuite/g++.target/powerpc/const2.C
similarity index 100%
rename from gcc/testsuite/g++.dg/debug/dwarf2/const2.C
rename to gcc/testsuite/g++.target/powerpc/const2.C
diff --git a/gcc/testsuite/g++.dg/other/darwin-minversion-1.C 
b/gcc/testsuite/g++.target/powerpc/darwin-minversion-1.C
similarity index 100%
rename from gcc/testsuite/g++.dg/other/darwin-minversion-1.C
rename to gcc/testsuite/g++.target/powerpc/darwin-minversion-1.C
diff --git a/gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C 
b/gcc/testsuite/g++.target/powerpc/ppc64-sighandle-cr.C
similarity index 100%
rename from gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C
rename to gcc/testsuite/g++.target/powerpc/ppc64-sighandle-cr.C
diff --git a/gcc/testsuite/g++.dg/eh/simd-4.C 
b/gcc/testsuite/g++.target/powerpc/simd-4.C
similarity index 95%
rename from gcc/testsuite/g++.dg/eh/simd-4.C
rename to gcc/testsuite/g++.target/powerpc/simd-4.C
index 8c9b58bf8684..a01f19c27369 100644
--- a/gcc/testsuite/g++.dg/eh/simd-4.C
+++ b/gcc/testsuite/g++.target/powerpc/simd-4.C
@@ -1,4 +1,4 @@
-/* { dg-do run { target powerpc*-*-darwin* } } */
+/* { dg-do run { target *-*-darwin* } } */
 /* { dg-options "-fexceptions -fnon-call-exceptions -O -maltivec" } */
 
 #include 
diff --git a/gcc/testsuite/g++.dg/eh/simd-5.C 
b/gcc/testsuite/g++.target/powerpc/simd-5.C
similarity index 100%
rename from gcc/testsuite/g++.dg/eh/simd-5.C
rename to gcc/testsuite/g++.target/powerpc/simd-5.C
diff --git a/gcc/testsuite/g++.dg/other/spu2vmx-1.C 
b/gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
similarity index 84%
rename from gcc/testsuite/g++.dg/other/spu2vmx-1.C
rename to gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
index d9c8faf94592..496b46c22c95 100644
--- a/gcc/testsuite/g++.dg/other/spu2vmx-1.C
+++ b/gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
@@ -1,4 +1,4 @@
-/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-do compile } */
 /* { dg-require-effective-target powerpc_spu } */
 /* { dg-options "-maltivec" } */
 
diff --git a/gcc/testsuite/g++.dg/eh/uncaught3.C 
b/gcc/testsuite/g++.target/powerpc/uncaught3.C
similarity index 96%
rename from gcc/testsuite/g++.dg/eh/uncaught3.C
rename to gcc/testsuite/g++.target/powerpc/uncaught3.C
index 1beaab3f..f891401584ec 100644
--- a/gcc/testsuite/g++.dg/eh/uncaught3.C
+++ b/gcc/testsuite/g++.target/powerpc/uncaught3.C
@@ -1,4 +1,4 @@
-// { dg-do compile { target powerpc*-*-darwin* } }
+// { dg-do compile { target *-*-darwin* } }
 // { dg-final { scan-assembler-not "__cxa_get_exception" } }
 // { dg-options "-mmacosx-version-min=10.4" }
 // { dg-additional-options "-Wno-deprecated" { target c++17 } }
-- 
2.27.0



[PATCH 1/3] rs6000: Move g++.dg/ext powerpc tests to g++.target

2022-02-21 Thread Paul A. Clarke via Gcc-patches
Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
longer required.

2021-02-21  Paul A. Clarke  

gcc/testsuite
* g++.dg/ext/altivec-1.C: Move to g++.target/powerpc, adjust dg
directives.
* g++.dg/ext/altivec-2.C: Likewise.
* g++.dg/ext/altivec-3.C: Likewise.
* g++.dg/ext/altivec-4.C: Likewise.
* g++.dg/ext/altivec-5.C: Likewise.
* g++.dg/ext/altivec-6.C: Likewise.
* g++.dg/ext/altivec-7.C: Likewise.
* g++.dg/ext/altivec-8.C: Likewise.
* g++.dg/ext/altivec-9.C: Likewise.
* g++.dg/ext/altivec-10.C: Likewise.
* g++.dg/ext/altivec-11.C: Likewise.
* g++.dg/ext/altivec-12.C: Likewise.
* g++.dg/ext/altivec-13.C: Likewise.
* g++.dg/ext/altivec-14.C: Likewise.
* g++.dg/ext/altivec-15.C: Likewise.
* g++.dg/ext/altivec-16.C: Likewise.
* g++.dg/ext/altivec-17.C: Likewise.
* g++.dg/ext/altivec-18.C: Likewise.
* g++.dg/ext/altivec-cell-1.C: Likewise.
* g++.dg/ext/altivec-cell-2.C: Likewise.
* g++.dg/ext/altivec-cell-3.C: Likewise.
* g++.dg/ext/altivec-cell-4.C: Likewise.
* g++.dg/ext/altivec-cell-5.C: Likewise.
* g++.dg/ext/altivec-types-1.C: Likewise.
* g++.dg/ext/altivec-types-2.C: Likewise.
* g++.dg/ext/altivec-types-3.C: Likewise.
* g++.dg/ext/altivec-types-4.C: Likewise.
* g++.dg/ext/undef-bool-1.C: Likewise.
---
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-1.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-10.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-11.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-12.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-13.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-14.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-15.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-16.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-17.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-18.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-2.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-3.C  | 4 ++--
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-4.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-5.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-6.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-7.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-8.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-9.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-1.C   | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-2.C   | 4 ++--
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-3.C   | 4 ++--
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-4.C   | 4 ++--
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-5.C   | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-types-1.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-types-2.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-types-3.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-types-4.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/undef-bool-1.C | 2 +-
 28 files changed, 32 insertions(+), 32 deletions(-)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-1.C (83%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-10.C (92%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-11.C (80%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-12.C (87%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-13.C (97%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-14.C (86%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-15.C (92%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-16.C (88%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-17.C (91%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-18.C (83%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-2.C (92%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-3.C (96%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-4.C (81%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-5.C (83%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-6.C (94%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-7.C (96%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-8.C (93%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-9.C (86%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-cell-1.C (96%)
 rename gcc/testsuite/{g++.dg/ext => 

[PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-02-21 Thread Paul A. Clarke via Gcc-patches
Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
longer required.

2021-02-21  Paul A. Clarke  

gcc/testsuite
* g++.dg/pr65240.h: Move to g++.target/powerpc.
* g++.dg/pr93974.C: Likewise.
* g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
* g++.dg/pr65240-2.C: Likewise.
* g++.dg/pr65240-3.C: Likewise.
* g++.dg/pr65240-4.C: Likewise.
* g++.dg/pr65242.C: Likewise.
* g++.dg/pr67211.C: Likewise.
* g++.dg/pr69667.C: Likewise.
* g++.dg/pr71294.C: Likewise.
* g++.dg/pr84264.C: Likewise.
* g++.dg/pr84279.C: Likewise.
* g++.dg/pr85657.C: Likewise.
---
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h   | 0
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C   | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C   | 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C   | 0
 13 files changed, 19 insertions(+), 19 deletions(-)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C (76%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C (76%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C (76%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C (75%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h (100%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C (94%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C (92%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C (97%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C (96%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C (79%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C (91%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C (90%)
 rename gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C (100%)

diff --git a/gcc/testsuite/g++.dg/pr65240-1.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
similarity index 76%
rename from gcc/testsuite/g++.dg/pr65240-1.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-1.C
index d2e25b65fcae..d2f4a229773e 100644
--- a/gcc/testsuite/g++.dg/pr65240-1.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-1.C
@@ -1,5 +1,5 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=small -mno-fp-in-toc 
-Wno-return-type" } */
diff --git a/gcc/testsuite/g++.dg/pr65240-2.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
similarity index 76%
rename from gcc/testsuite/g++.dg/pr65240-2.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-2.C
index 38d5020bd198..12e36994d27b 100644
--- a/gcc/testsuite/g++.dg/pr65240-2.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-2.C
@@ -1,5 +1,5 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=small -mfp-in-toc 
-Wno-return-type" } */
diff --git a/gcc/testsuite/g++.dg/pr65240-3.C 
b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
similarity index 76%
rename from gcc/testsuite/g++.dg/pr65240-3.C
rename to gcc/testsuite/g++.target/powerpc/pr65240-3.C
index e8463c914946..9ded3e3ab1d3 100644
--- a/gcc/testsuite/g++.dg/pr65240-3.C
+++ b/gcc/testsuite/g++.target/powerpc/pr65240-3.C
@@ -1,5 +1,5 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-do compile { target lp64 } } */
+/* { dg-skip-if "" { *-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3 -ffast-math -mcmodel=medium 
-Wno-return-type" } */
diff --git a/gcc/testsuite/g++.dg/pr65240-4.C 

[PATCH 0/3] rs6000: Move g++.dg powerpc tests to g++.target

2022-02-21 Thread Paul A. Clarke via Gcc-patches
Some tests in g++.dg are target-specific for powerpc. Move those to
g++.target/powerpc. Update the DejaGnu directives as needed, since
the target restriction is perhaps no longer needed when residing in the
target-specific powerpc subdirectory.

Tested with Linux on Power9, full "make check".

OK for trunk?

Paul A. Clarke (3):
  rs6000: Move g++.dg/ext powerpc tests to g++.target
  rs6000: Move g++.dg powerpc PR tests to g++.target
  rs6000: Move more g++.dg powerpc tests to g++.target

 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-1.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-10.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-11.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-12.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-13.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-14.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-15.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-16.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-17.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-18.C | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-2.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-3.C  | 4 ++--
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-4.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-5.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-6.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-7.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-8.C  | 2 +-
 gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-9.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-1.C   | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-2.C   | 4 ++--
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-3.C   | 4 ++--
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-4.C   | 4 ++--
 .../{g++.dg/ext => g++.target/powerpc}/altivec-cell-5.C   | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-types-1.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-types-2.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-types-3.C  | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/altivec-types-4.C  | 2 +-
 .../{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C  | 0
 .../other => g++.target/powerpc}/darwin-minversion-1.C| 0
 .../{g++.dg/eh => g++.target/powerpc}/ppc64-sighandle-cr.C| 0
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-1.C  | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-2.C  | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-3.C  | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240-4.C  | 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65240.h| 0
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr65242.C| 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr67211.C| 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr69667.C| 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr71294.C| 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84264.C| 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr84279.C| 4 ++--
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr85657.C| 2 +-
 gcc/testsuite/{g++.dg => g++.target/powerpc}/pr93974.C| 0
 gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C  | 2 +-
 gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C  | 0
 .../{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C  | 2 +-
 gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C   | 2 +-
 .../{g++.dg/ext => g++.target/powerpc}/undef-bool-1.C | 2 +-
 48 files changed, 54 insertions(+), 54 deletions(-)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-1.C (83%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-10.C (92%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-11.C (80%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-12.C (87%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-13.C (97%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-14.C (86%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-15.C (92%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-16.C (88%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-17.C (91%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-18.C (83%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-2.C (92%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-3.C (96%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-4.C (81%)
 rename gcc/testsuite/{g++.dg/ext => g++.target/powerpc}/altivec-5.C (83%)
 rename 

Re: [PATCH] rs6000: Fix up #include or [PR104239]

2022-01-27 Thread Paul A. Clarke via Gcc-patches
On Wed, Jan 26, 2022 at 03:50:35PM -0500, David Edelsohn via Gcc-patches wrote:
> On Wed, Jan 26, 2022 at 3:45 PM Jakub Jelinek  wrote:
> > r12-4717-g7d37abedf58d66 added immintrin.h and x86gprintrin.h headers
> > to rs6000, these headers are on x86 standalone headers that various
> > programs include directly rather than including them through
> > .
> > Unfortunately, for that change the bmiintrin.h and bmi2intrin.h
> > headers haven't been adjusted, so the effect is that if one includes them
> > (without including also x86intrin.h first) #error will trigger.
> > Furthermore, when including such headers conditionally as some real-world
> > packages do, this means a regression.
> >
> > The following patch fixes it and matches what the x86 bmi{,2}intrin.h
> > headers do.
> 
> Okay.
> 
> Thanks for catching this.

Indeed, thanks. And thanks for reviewing, David.

Should we add similar compile-only tests for all of the standalone include
files?

PC


[v2 COMMITTED] rs6000: Add Power10 optimization for _mm_blendv*

2022-01-10 Thread Paul A. Clarke via Gcc-patches
This is the patch that was committed. Thanks for the review!
---
Power10 ISA added `xxblendv*` instructions which are realized in the
`vec_blendv` instrinsic.

Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
`_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.

Update original implementation of _mm_blendv_epi8 to use signed types,
to better match the function parameters. Realization is unchanged.

Also, copy a test from i386 for testing `_mm_blendv_ps`.
This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
but was inadvertently omitted.

2022-01-10  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
when _ARCH_PWR10. Use signed types.
(_mm_blendv_ps): Use vec_blendv when _ARCH_PWR10.
(_mm_blendv_pd): Likewise.

gcc/testsuite
* gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
adjust dg directives to suit.
---
v2: Used signed types within new and original implementation of
_mm_blendv_epi8.

 gcc/config/rs6000/smmintrin.h | 14 +++-
 .../gcc.target/powerpc/sse4_1-blendvps.c  | 65 +++
 2 files changed, 78 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 1fda04881554..b9cb46b3c1dd 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -113,9 +113,13 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int 
__imm8)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128i) vec_blendv ((__v16qi) __A, (__v16qi) __B, (__v16qu) __mask);
+#else
   const __v16qu __seven = vec_splats ((unsigned char) 0x07);
   __v16qu __lmask = vec_sra ((__v16qu) __mask, __seven);
-  return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
+  return (__m128i) vec_sel ((__v16qi) __A, (__v16qi) __B, __lmask);
+#endif
 }
 
 extern __inline __m128
@@ -149,9 +153,13 @@ extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128) vec_blendv ((__v4sf) __A, (__v4sf) __B, (__v4su) __mask);
+#else
   const __v4si __zero = {0};
   const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
   return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+#endif
 }
 
 extern __inline __m128d
@@ -174,9 +182,13 @@ extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128d) vec_blendv ((__v2df) __A, (__v2df) __B, (__v2du) __mask);
+#else
   const __v2di __zero = {0};
   const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
   return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+#endif
 }
 #endif
 
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
new file mode 100644
index ..8fcb55383047
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include 
+#include 
+
+#define NUM 20
+
+static void
+init_blendvps (float *src1, float *src2, float *mask)
+{
+  int i, msk, sign = 1; 
+
+  msk = -1;
+  for (i = 0; i < NUM * 4; i++)
+{
+  if((i % 4) == 0)
+   msk++;
+  src1[i] = i* (i + 1) * sign;
+  src2[i] = (i + 20) * sign;
+  mask[i] = (i + 120) * i;
+  if( (msk & (1 << (i % 4
+   mask[i] = -mask[i];
+  sign = -sign;
+}
+}
+
+static int
+check_blendvps (__m128 *dst, float *src1, float *src2,
+   float *mask)
+{
+  float tmp[4];
+  int j;
+
+  memcpy ([0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+if (mask [j] < 0.0)
+  tmp[j] = src2[j];
+
+  return memcmp (dst, [0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+} dst, src1, src2, mask;
+  int i;
+
+  init_blendvps (src1.f, src2.f, mask.f);
+
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blendv_ps (src1.x[i], src2.x[i], mask.x[i]);
+  if (check_blendvps ([i], [i * 4], [i * 4],
+ [i * 4]))
+   abort ();
+}
+}
-- 
2.27.0



Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8

2022-01-07 Thread Paul A. Clarke via Gcc-patches
On Fri, Jan 07, 2022 at 02:40:51PM -0500, David Edelsohn via Gcc-patches wrote:
> +#ifdef __LITTLE_ENDIAN__
> +  /* Sum across four integers with two integer results.  */
> +  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
> +  /* Note: vec_sum2s could be used here, but on little-endian, vector
> + shifts are added that are not needed for this use-case.
> + A vector shift to correctly position the 32-bit integer results
> + (currently at [0] and [2]) to [1] and [3] would then need to be
> + swapped back again since the desired results are two 64-bit
> + integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
> +#else
>/* Sum across four integers with two integer results.  */
>result = vec_sum2s (vsum, (__vector signed int) zero);
> 
> If little-endian adds shifts to correct for the position and
> big-endian does not, why not use the inline asm without the shifts for
> both?  It seems confusing to add the inline asm only for LE instead of
> always using it with the appropriate comment.
> 
> It's a good and valuable optimization for LE.  Fewer variants are less
> fragile, easier to test and easier to maintain.  If you're going to go
> to the trouble of using inline asm for LE, use it for both.

BE (only) _does_ need a shift as seen on the next two lines after the
code snippet above:
  /* Sum across four integers with two integer results.  */
  result = vec_sum2s (vsum, (__vector signed int) zero);
  /* Rotate the sums into the correct position.  */
  result = vec_sld (result, result, 6);

So, when using {vec_sum2s;vec_sld}:
- LE gets an implicit shift in vec_sum2s which just needs to be undone
  by the vec_sld, and those shifts don't "cancel out" and get removed
  by GCC.
- BE does not get any implicit shifts, but needs one that comes from
  vec_sld.

Are you saying use the asm(vsum2sws) and then conditionally call
vec_sld on BE only?

I viewed this change as a temporary bandage unless and until GCC can
remove the unnecessary swaps.  It seems like the preferred code is
vec_sum2s/vec_sld, not the asm, but that currently is suboptimal for LE.

PC


Re: [PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2022-01-07 Thread Paul A. Clarke via Gcc-patches
On Fri, Jan 07, 2022 at 02:23:14PM -0500, David Edelsohn wrote:
> > Power10 ISA added `vextract*` instructions which are realized in the
> > `vec_extractm` instrinsic.
> >
> > Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
> > `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
> >
> > 2021-10-21  Paul A. Clarke  
> >
> > gcc
> > * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
> > when _ARCH_PWR10.
> > * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
> > (_mm_movemask_epi8): Likewise.
> > ---
> > Tested on Power10 powerpc64le-linux (compiled with and without
> > `-mcpu=power10`).
> >
> > OK for trunk?
> 
> This is okay modulo
> 
> > + return vec_extractm ((__v16qu) __A);
> 
> Should the above be __v16qi like x86?

That would match x86 better, but we don't have a function signature
for vec_extractm which accepts a signed type.

PC


Re: [PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2022-01-07 Thread Paul A. Clarke via Gcc-patches
On Fri, Jan 07, 2022 at 02:15:22PM -0500, David Edelsohn wrote:
> > Power10 ISA added `xxblendv*` instructions which are realized in the
> > `vec_blendv` instrinsic.
> >
> > Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> > `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
> >
> > Also, copy a test from i386 for testing `_mm_blendv_ps`.
> > This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> > but was inadvertently omitted.
> >
> > 2021-10-20  Paul A. Clarke  
> >
> > gcc
> > * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
> > when _ARCH_PWR10.
> > (_mm_blendv_ps): Likewise.
> > (_mm_blendv_pd): Likewise.
> >
> > gcc/testsuite
> > * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
> > adjust dg directives to suit.
> > ---
> > Tested on Power10 powerpc64le-linux (compiled with and without
> > `-mcpu=power10`).
> >
> > OK for trunk?
> 
> This is okay modulo
> 
> > + return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> > __mask);
> 
> Should the above be __v16qi like x86?

That does arguably match the types involved (epi8) better.

Shall I change the original implementation as well (4 lines later)?

>   return (__m128i) vec_sel ((__v16qi) __A, (__v16qi) __B, __lmask);

PC


Re: [PING^3 PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2022-01-07 Thread Paul A. Clarke via Gcc-patches
On Thu, Nov 18, 2021 at 08:25:35PM -0600, Paul A. Clarke via Gcc-patches wrote:
> On Mon, Nov 08, 2021 at 11:42:27AM -0600, Paul A. Clarke via Gcc-patches 
> wrote:
> > Gentle ping...
> 
> Gentle re-ping.

Gentle re-re-ping.

> > On Wed, Oct 20, 2021 at 08:42:07PM -0500, Paul A. Clarke via Gcc-patches 
> > wrote:
> > > Power10 ISA added `xxblendv*` instructions which are realized in the
> > > `vec_blendv` instrinsic.
> > > 
> > > Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> > > `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
> > > 
> > > Also, copy a test from i386 for testing `_mm_blendv_ps`.
> > > This should have come with commit 
> > > ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> > > but was inadvertently omitted.
> > > 
> > > 2021-10-20  Paul A. Clarke  
> > > 
> > > gcc
> > >   * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
> > >   when _ARCH_PWR10.
> > >   (_mm_blendv_ps): Likewise.
> > >   (_mm_blendv_pd): Likewise.
> > > 
> > > gcc/testsuite
> > >   * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
> > >   adjust dg directives to suit.
> > > ---
> > > Tested on Power10 powerpc64le-linux (compiled with and without
> > > `-mcpu=power10`).
> > > 
> > > OK for trunk?
> > > 
> > >  gcc/config/rs6000/smmintrin.h | 12 
> > >  .../gcc.target/powerpc/sse4_1-blendvps.c  | 65 +++
> > >  2 files changed, 77 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> > > 
> > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> > > index b732fbca7b09..5d87fd7b6f61 100644
> > > --- a/gcc/config/rs6000/smmintrin.h
> > > +++ b/gcc/config/rs6000/smmintrin.h
> > > @@ -113,9 +113,13 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int 
> > > __imm8)
> > >  extern __inline __m128i __attribute__((__gnu_inline__, 
> > > __always_inline__, __artificial__))
> > >  _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
> > >  {
> > > +#ifdef _ARCH_PWR10
> > > +  return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> > > __mask);
> > > +#else
> > >const __v16qu __seven = vec_splats ((unsigned char) 0x07);
> > >__v16qu __lmask = vec_sra ((__v16qu) __mask, __seven);
> > >return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> > > +#endif
> > >  }
> > >  
> > >  __inline __m128
> > > @@ -149,9 +153,13 @@ __inline __m128
> > >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > >  _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
> > >  {
> > > +#ifdef _ARCH_PWR10
> > > +  return (__m128) vec_blendv ((__v4sf) __A, (__v4sf) __B, (__v4su) 
> > > __mask);
> > > +#else
> > >const __v4si __zero = {0};
> > >const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, 
> > > __zero);
> > >return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) 
> > > __boolmask);
> > > +#endif
> > >  }
> > >  
> > >  __inline __m128d
> > > @@ -174,9 +182,13 @@ __inline __m128d
> > >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > >  _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
> > >  {
> > > +#ifdef _ARCH_PWR10
> > > +  return (__m128d) vec_blendv ((__v2df) __A, (__v2df) __B, (__v2du) 
> > > __mask);
> > > +#else
> > >const __v2di __zero = {0};
> > >const __vector __bool long long __boolmask = vec_cmplt ((__v2di) 
> > > __mask, __zero);
> > >return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) 
> > > __boolmask);
> > > +#endif
> > >  }
> > >  #endif
> > >  
> > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c 
> > > b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> > > new file mode 100644
> > > index ..8fcb55383047
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> > > @@ -0,0 +1,65 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target p8vector_hw } */
> > > +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> > > +
> > > 

Re: [PING^4 PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2022-01-03 Thread Paul A. Clarke via Gcc-patches
On Thu, Nov 18, 2021 at 08:24:52PM -0600, Paul A. Clarke via Gcc-patches wrote:
> On Mon, Nov 08, 2021 at 11:40:42AM -0600, Paul A. Clarke via Gcc-patches 
> wrote:
> > On Tue, Oct 26, 2021 at 03:00:11PM -0500, Paul A. Clarke via Gcc-patches 
> > wrote:
> > > Patches 1/3 and 3/3 have been committed.
> > > This is only a ping for 2/3.
> > 
> > Gentle re-ping.
> 
> Gentle re-re-ping.

and once more. :-)

> > > On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches 
> > > wrote:
> > > > Suppress exceptions (when specified), by saving, manipulating, and
> > > > restoring the FPSCR.  Similarly, save, set, and restore the 
> > > > floating-point
> > > > rounding mode when required.
> > > > 
> > > > No attempt is made to optimize writing the FPSCR (by checking if the new
> > > > value would be the same), other than using lighter weight instructions
> > > > when possible. Note that explicit instruction scheduling "barriers" are
> > > > added to prevent floating-point computations from being moved before or
> > > > after the explicit FPSCR manipulations.  (That these are required has
> > > > been reported as an issue in GCC: PR102783.)
> > > > 
> > > > The scalar versions naively use the parallel versions to compute the
> > > > single scalar result and then construct the remainder of the result.
> > > > 
> > > > Of minor note, the values of _MM_FROUND_TO_NEG_INF and 
> > > > _MM_FROUND_TO_ZERO
> > > > are swapped from the corresponding values on x86 so as to match the
> > > > corresponding rounding mode values in the Power ISA.
> > > > 
> > > > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > > > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > > > analogous implementations in config/i386/smmintrin.h.
> > > > 
> > > > Function signatures match the analogous functions in 
> > > > config/i386/smmintrin.h.
> > > > 
> > > > Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> > > > modeled after the very similar "floor" and "ceil" tests.
> > > > 
> > > > Include basic tests, plus tests at the boundaries for floating-point
> > > > representation, positive and negative, test all of the parameterized
> > > > rounding modes as well as the C99 rounding modes and interactions
> > > > between the two.
> > > > 
> > > > Exceptions are not explicitly tested.
> > > > 
> > > > 2021-10-18  Paul A. Clarke  
> > > > 
> > > > gcc
> > > > * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> > > > _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> > > > _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, 
> > > > _MM_FROUND_TO_NEG_INF,
> > > > _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, 
> > > > _MM_FROUND_NO_EXC,
> > > > _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, 
> > > > _MM_FROUND_TRUNC,
> > > > _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> > > > * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, 
> > > > _mm_ceil_sd,
> > > > _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, 
> > > > _mm_floor_ss):
> > > > Convert from function to macro.
> > > > 
> > > > gcc/testsuite
> > > > * gcc.target/powerpc/sse4_1-round3.h: New.
> > > > * gcc.target/powerpc/sse4_1-roundpd.c: New.
> > > > * gcc.target/powerpc/sse4_1-roundps.c: New.
> > > > * gcc.target/powerpc/sse4_1-roundsd.c: New.
> > > > * gcc.target/powerpc/sse4_1-roundss.c: New.
> > > > ---
> > > >  gcc/config/rs6000/smmintrin.h | 292 ++
> > > >  .../gcc.target/powerpc/sse4_1-round3.h|  81 +
> > > >  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
> > > >  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
> > > >  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
> > > >  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
> > > >  6 files changed, 1014 insertions(+), 64 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/p

[COMMITTED] rs6000: Fix errant "vector" instead of "__vector"

2021-12-06 Thread Paul A. Clarke via Gcc-patches
Committed as trivial and obvious.

Fixes 85289ba36c2e62de84cc0232c954d9a74bda708a.

2021-12-06  Paul A. Clarke  

gcc
PR target/103545
* config/rs6000/xmmintrin.h (_mm_movemask_ps): Replace "vector" with
"__vector".
---
 gcc/config/rs6000/xmmintrin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h
index 4c093fd1d5ae..31d26add50b3 100644
--- a/gcc/config/rs6000/xmmintrin.h
+++ b/gcc/config/rs6000/xmmintrin.h
@@ -1353,7 +1353,7 @@ extern __inline int __attribute__((__gnu_inline__, 
__always_inline__, __artifici
 _mm_movemask_ps (__m128  __A)
 {
 #ifdef _ARCH_PWR10
-  return vec_extractm ((vector unsigned int) __A);
+  return vec_extractm ((__vector unsigned int) __A);
 #else
   __vector unsigned long long result;
   static const __vector unsigned int perm_mask =
-- 
2.27.0



Re: [PING^2 PATCH] rs6000: Add optimizations for _mm_sad_epu8

2021-11-18 Thread Paul A. Clarke via Gcc-patches
On Mon, Nov 08, 2021 at 11:43:26AM -0600, Paul A. Clarke via Gcc-patches wrote:
> Gentle ping...

Gentle re-ping.

> On Fri, Oct 22, 2021 at 12:28:49PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > Power9 ISA added `vabsdub` instruction which is realized in the
> > `vec_absd` instrinsic.
> > 
> > Use `vec_absd` for `_mm_sad_epu8` compatibility intrinsic, when
> > `_ARCH_PWR9`.
> > 
> > Also, the realization of `vec_sum2s` on little-endian includes
> > two shifts in order to position the input and output to match
> > the semantics of `vec_sum2s`:
> > - Shift the second input vector left 12 bytes. In the current usage,
> >   that vector is `{0}`, so this shift is unnecessary, but is currently
> >   not eliminated under optimization.
> > - Shift the vector produced by the `vsum2sws` instruction left 4 bytes.
> >   The two words within each doubleword of this (shifted) result must then
> >   be explicitly swapped to match the semantics of `_mm_sad_epu8`,
> >   effectively reversing this shift.  So, this shift (and a susequent swap)
> >   are unnecessary, but not currently removed under optimization.
> > 
> > Using `__builtin_altivec_vsum2sws` retains both shifts, so is not an
> > option for removing the shifts.
> > 
> > For little-endian, use the `vsum2sws` instruction directly, and
> > eliminate the explicit shift (swap).
> > 
> > 2021-10-22  Paul A. Clarke  
> > 
> > gcc
> > * config/rs6000/emmintrin.h (_mm_sad_epu8): Use vec_absd
> > when _ARCH_PWR9, optimize vec_sum2s when LE.
> > ---
> > Tested on powerpc64le-linux on Power9, with and without `-mcpu=power9`,
> > and on powerpc/powerpc64-linux on Power8.
> > 
> > OK for trunk?
> > 
> >  gcc/config/rs6000/emmintrin.h | 24 +---
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
> > index ab16c13c379e..c4758be0e777 100644
> > --- a/gcc/config/rs6000/emmintrin.h
> > +++ b/gcc/config/rs6000/emmintrin.h
> > @@ -2197,27 +2197,37 @@ extern __inline __m128i 
> > __attribute__((__gnu_inline__, __always_inline__, __arti
> >  _mm_sad_epu8 (__m128i __A, __m128i __B)
> >  {
> >__v16qu a, b;
> > -  __v16qu vmin, vmax, vabsdiff;
> > +  __v16qu vabsdiff;
> >__v4si vsum;
> >const __v4su zero = { 0, 0, 0, 0 };
> >__v4si result;
> >  
> >a = (__v16qu) __A;
> >b = (__v16qu) __B;
> > -  vmin = vec_min (a, b);
> > -  vmax = vec_max (a, b);
> > +#ifndef _ARCH_PWR9
> > +  __v16qu vmin = vec_min (a, b);
> > +  __v16qu vmax = vec_max (a, b);
> >vabsdiff = vec_sub (vmax, vmin);
> > +#else
> > +  vabsdiff = vec_absd (a, b);
> > +#endif
> >/* Sum four groups of bytes into integers.  */
> >vsum = (__vector signed int) vec_sum4s (vabsdiff, zero);
> > +#ifdef __LITTLE_ENDIAN__
> > +  /* Sum across four integers with two integer results.  */
> > +  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
> > +  /* Note: vec_sum2s could be used here, but on little-endian, vector
> > + shifts are added that are not needed for this use-case.
> > + A vector shift to correctly position the 32-bit integer results
> > + (currently at [0] and [2]) to [1] and [3] would then need to be
> > + swapped back again since the desired results are two 64-bit
> > + integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
> > +#else
> >/* Sum across four integers with two integer results.  */
> >result = vec_sum2s (vsum, (__vector signed int) zero);
> >/* Rotate the sums into the correct position.  */
> > -#ifdef __LITTLE_ENDIAN__
> > -  result = vec_sld (result, result, 4);
> > -#else
> >result = vec_sld (result, result, 6);
> >  #endif
> > -  /* Rotate the sums into the correct position.  */
> >return (__m128i) result;
> >  }
> >  
> > -- 
> > 2.27.0
> > 


Re: [PING^2 PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2021-11-18 Thread Paul A. Clarke via Gcc-patches
On Mon, Nov 08, 2021 at 11:42:56AM -0600, Paul A. Clarke via Gcc-patches wrote:
> Gentle ping...

Gentle re-ping.

> On Thu, Oct 21, 2021 at 12:22:12PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > Power10 ISA added `vextract*` instructions which are realized in the
> > `vec_extractm` instrinsic.
> > 
> > Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
> > `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
> > 
> > 2021-10-21  Paul A. Clarke  
> > 
> > gcc
> > * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
> > when _ARCH_PWR10.
> > * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
> > (_mm_movemask_epi8): Likewise.
> > ---
> > Tested on Power10 powerpc64le-linux (compiled with and without
> > `-mcpu=power10`).
> > 
> > OK for trunk?
> > 
> >  gcc/config/rs6000/emmintrin.h | 8 
> >  gcc/config/rs6000/xmmintrin.h | 4 
> >  2 files changed, 12 insertions(+)
> > 
> > diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
> > index 32ad72b4cc35..ab16c13c379e 100644
> > --- a/gcc/config/rs6000/emmintrin.h
> > +++ b/gcc/config/rs6000/emmintrin.h
> > @@ -1233,6 +1233,9 @@ _mm_loadl_pd (__m128d __A, double const *__B)
> >  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> >  _mm_movemask_pd (__m128d  __A)
> >  {
> > +#ifdef _ARCH_PWR10
> > +  return vec_extractm ((__v2du) __A);
> > +#else
> >__vector unsigned long long result;
> >static const __vector unsigned int perm_mask =
> >  {
> > @@ -1252,6 +1255,7 @@ _mm_movemask_pd (__m128d  __A)
> >  #else
> >return result[0];
> >  #endif
> > +#endif /* !_ARCH_PWR10 */
> >  }
> >  #endif /* _ARCH_PWR8 */
> >  
> > @@ -2030,6 +2034,9 @@ _mm_min_epu8 (__m128i __A, __m128i __B)
> >  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> >  _mm_movemask_epi8 (__m128i __A)
> >  {
> > +#ifdef _ARCH_PWR10
> > +  return vec_extractm ((__v16qu) __A);
> > +#else
> >__vector unsigned long long result;
> >static const __vector unsigned char perm_mask =
> >  {
> > @@ -2046,6 +2053,7 @@ _mm_movemask_epi8 (__m128i __A)
> >  #else
> >return result[0];
> >  #endif
> > +#endif /* !_ARCH_PWR10 */
> >  }
> >  #endif /* _ARCH_PWR8 */
> >  
> > diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h
> > index ae1a33e8d95b..4c093fd1d5ae 100644
> > --- a/gcc/config/rs6000/xmmintrin.h
> > +++ b/gcc/config/rs6000/xmmintrin.h
> > @@ -1352,6 +1352,9 @@ _mm_storel_pi (__m64 *__P, __m128 __A)
> >  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> >  _mm_movemask_ps (__m128  __A)
> >  {
> > +#ifdef _ARCH_PWR10
> > +  return vec_extractm ((vector unsigned int) __A);
> > +#else
> >__vector unsigned long long result;
> >static const __vector unsigned int perm_mask =
> >  {
> > @@ -1371,6 +1374,7 @@ _mm_movemask_ps (__m128  __A)
> >  #else
> >return result[0];
> >  #endif
> > +#endif /* !_ARCH_PWR10 */
> >  }
> >  #endif /* _ARCH_PWR8 */
> >  
> > -- 
> > 2.27.0
> > 


Re: [PING^2 PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2021-11-18 Thread Paul A. Clarke via Gcc-patches
On Mon, Nov 08, 2021 at 11:42:27AM -0600, Paul A. Clarke via Gcc-patches wrote:
> Gentle ping...

Gentile re-ping.

> On Wed, Oct 20, 2021 at 08:42:07PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > Power10 ISA added `xxblendv*` instructions which are realized in the
> > `vec_blendv` instrinsic.
> > 
> > Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> > `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
> > 
> > Also, copy a test from i386 for testing `_mm_blendv_ps`.
> > This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> > but was inadvertently omitted.
> > 
> > 2021-10-20  Paul A. Clarke  
> > 
> > gcc
> > * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
> > when _ARCH_PWR10.
> > (_mm_blendv_ps): Likewise.
> > (_mm_blendv_pd): Likewise.
> > 
> > gcc/testsuite
> > * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
> > adjust dg directives to suit.
> > ---
> > Tested on Power10 powerpc64le-linux (compiled with and without
> > `-mcpu=power10`).
> > 
> > OK for trunk?
> > 
> >  gcc/config/rs6000/smmintrin.h | 12 
> >  .../gcc.target/powerpc/sse4_1-blendvps.c  | 65 +++
> >  2 files changed, 77 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> > 
> > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> > index b732fbca7b09..5d87fd7b6f61 100644
> > --- a/gcc/config/rs6000/smmintrin.h
> > +++ b/gcc/config/rs6000/smmintrin.h
> > @@ -113,9 +113,13 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int 
> > __imm8)
> >  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> >  _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
> >  {
> > +#ifdef _ARCH_PWR10
> > +  return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> > __mask);
> > +#else
> >const __v16qu __seven = vec_splats ((unsigned char) 0x07);
> >__v16qu __lmask = vec_sra ((__v16qu) __mask, __seven);
> >return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> > +#endif
> >  }
> >  
> >  __inline __m128
> > @@ -149,9 +153,13 @@ __inline __m128
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
> >  {
> > +#ifdef _ARCH_PWR10
> > +  return (__m128) vec_blendv ((__v4sf) __A, (__v4sf) __B, (__v4su) __mask);
> > +#else
> >const __v4si __zero = {0};
> >const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, 
> > __zero);
> >return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) 
> > __boolmask);
> > +#endif
> >  }
> >  
> >  __inline __m128d
> > @@ -174,9 +182,13 @@ __inline __m128d
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
> >  {
> > +#ifdef _ARCH_PWR10
> > +  return (__m128d) vec_blendv ((__v2df) __A, (__v2df) __B, (__v2du) 
> > __mask);
> > +#else
> >const __v2di __zero = {0};
> >const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
> > __zero);
> >return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) 
> > __boolmask);
> > +#endif
> >  }
> >  #endif
> >  
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c 
> > b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> > new file mode 100644
> > index ..8fcb55383047
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> > @@ -0,0 +1,65 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target p8vector_hw } */
> > +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> > +
> > +#include "sse4_1-check.h"
> > +
> > +#include 
> > +#include 
> > +
> > +#define NUM 20
> > +
> > +static void
> > +init_blendvps (float *src1, float *src2, float *mask)
> > +{
> > +  int i, msk, sign = 1; 
> > +
> > +  msk = -1;
> > +  for (i = 0; i < NUM * 4; i++)
> > +{
> > +  if((i % 4) == 0)
> > +   msk++;
> > +  src1[i] = i* (i + 1) * sign;
> > +  src2[i] = (i + 20) * sign;
> > +  mask[i] = (i + 120) * i;
> > +  if( (msk & (1 << (i % 4
> > +   mask[i]

Re: [PING^3 PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2021-11-18 Thread Paul A. Clarke via Gcc-patches
On Mon, Nov 08, 2021 at 11:40:42AM -0600, Paul A. Clarke via Gcc-patches wrote:
> On Tue, Oct 26, 2021 at 03:00:11PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > Patches 1/3 and 3/3 have been committed.
> > This is only a ping for 2/3.
> 
> Gentle re-ping.

Gentle re-re-ping.

> > On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches 
> > wrote:
> > > Suppress exceptions (when specified), by saving, manipulating, and
> > > restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> > > rounding mode when required.
> > > 
> > > No attempt is made to optimize writing the FPSCR (by checking if the new
> > > value would be the same), other than using lighter weight instructions
> > > when possible. Note that explicit instruction scheduling "barriers" are
> > > added to prevent floating-point computations from being moved before or
> > > after the explicit FPSCR manipulations.  (That these are required has
> > > been reported as an issue in GCC: PR102783.)
> > > 
> > > The scalar versions naively use the parallel versions to compute the
> > > single scalar result and then construct the remainder of the result.
> > > 
> > > Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> > > are swapped from the corresponding values on x86 so as to match the
> > > corresponding rounding mode values in the Power ISA.
> > > 
> > > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > > analogous implementations in config/i386/smmintrin.h.
> > > 
> > > Function signatures match the analogous functions in 
> > > config/i386/smmintrin.h.
> > > 
> > > Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> > > modeled after the very similar "floor" and "ceil" tests.
> > > 
> > > Include basic tests, plus tests at the boundaries for floating-point
> > > representation, positive and negative, test all of the parameterized
> > > rounding modes as well as the C99 rounding modes and interactions
> > > between the two.
> > > 
> > > Exceptions are not explicitly tested.
> > > 
> > > 2021-10-18  Paul A. Clarke  
> > > 
> > > gcc
> > >   * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> > >   _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> > >   _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
> > >   _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
> > >   _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
> > >   _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> > >   * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > >   _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > >   Convert from function to macro.
> > > 
> > > gcc/testsuite
> > >   * gcc.target/powerpc/sse4_1-round3.h: New.
> > >   * gcc.target/powerpc/sse4_1-roundpd.c: New.
> > >   * gcc.target/powerpc/sse4_1-roundps.c: New.
> > >   * gcc.target/powerpc/sse4_1-roundsd.c: New.
> > >   * gcc.target/powerpc/sse4_1-roundss.c: New.
> > > ---
> > >  gcc/config/rs6000/smmintrin.h | 292 ++
> > >  .../gcc.target/powerpc/sse4_1-round3.h|  81 +
> > >  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
> > >  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
> > >  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
> > >  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
> > >  6 files changed, 1014 insertions(+), 64 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > > 
> > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> > > index 90ce03d22709..6bb03e6e20ac 100644
> > > --- a/gcc/config/rs6000/smmintrin.h
> > > +++ b/gcc/config/rs6000/smmintrin.h
> > > @@ -42,6 +42,234 @@
> > >  #include 
>

Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Paul A. Clarke via Gcc-patches
On Wed, Nov 17, 2021 at 02:00:02PM -0600, Segher Boessenkool wrote:
> On Wed, Nov 17, 2021 at 11:45:02AM -0600, Paul A. Clarke wrote:
> > I guess I'm being pedantic.  "requires -mcpu=power8 and -mvsx" is not
> > accurate from a user's point a view, as "-mcpu=power8" is sufficient,
> > since "-mvsx" is enabled when "-mcpu=power8" is specified.
> 
> To be really pedantic, -mcpu=power8 isn't required either: anythng that
> enable the subset of ISA 2.07 that is needed is enough already.  But we
> don't want to encourage users to use those interfaces.
> 
> > The real "requires" is "-mcpu=power8" and no "-mno-vsx".
> 
> And no -mno-altivec.  And and and.  There is a huge web.
> 
> > It's not a strong objection, since specifying "-mno-vsx" should be
> > uncommon.  (Right?)  And, specifying "-mcpu=power8 -mvsx" is harmless.
> 
> Maybe the warning could say "requires -mcpu=power8 (and -mvsx)"?  Is
> that clearer, to your eye?

Hrm. No, but let me withdraw my expression of concern. Both "power8" and
"vsx" are required, and those two options get that explicitly.
That "-mcpu=power8" also pulls in "-mvsx" is a subtlety that is
perhaps not terribly relevant.

Thanks for entertaining my concern, but we've spent too much time on it
already.  :-)

PC


Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Paul A. Clarke via Gcc-patches
On Wed, Nov 17, 2021 at 11:00:07AM -0600, Bill Schmidt via Gcc-patches wrote:
> On 11/17/21 10:54 AM, Paul A. Clarke wrote:
> > On Tue, Nov 16, 2021 at 11:12:35AM -0600, Bill Schmidt via Gcc-patches 
> > wrote:
> >> Hi!  During a previous patch review, Segher asked that I provide better
> >> messages when builtins are unavailable because they require both a minimum
> >> CPU and the enablement of VSX instructions.  This patch does just that.
> > ...
> >> gcc/
> >>* config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Change
> >>error messages for ENB_P8V and ENB_P9V.
> >> ---
> >>  gcc/config/rs6000/rs6000-call.c | 6 --
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/gcc/config/rs6000/rs6000-call.c 
> >> b/gcc/config/rs6000/rs6000-call.c
> >> index 85fec80c6d7..035266eb001 100644
> >> --- a/gcc/config/rs6000/rs6000-call.c
> >> +++ b/gcc/config/rs6000/rs6000-call.c
> >> @@ -11943,7 +11943,8 @@ rs6000_invalid_new_builtin (enum 
> >> rs6000_gen_builtins fncode)
> >>error ("%qs requires the %qs option", name, "-mcpu=power8");
> >>break;
> >>  case ENB_P8V:
> >> -  error ("%qs requires the %qs option", name, "-mpower8-vector");
> >> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power8",
> >> +   "-mvsx");
> > "-mcpu=power8" itself enables "-mvsx", doesn't it?
> 
> Of course, but it can be disabled with -mno-vsx.  Then you get this error.
> You won't get it unless you deliberately did something strange with the
> compile options.
> 
> >
> >>break;
> >>  case ENB_P9:
> >>error ("%qs requires the %qs option", name, "-mcpu=power9");
> >> @@ -11953,7 +11954,8 @@ rs6000_invalid_new_builtin (enum 
> >> rs6000_gen_builtins fncode)
> >> name, "-mcpu=power9", "-m64", "-mpowerpc64");
> >>break;
> >>  case ENB_P9V:
> >> -  error ("%qs requires the %qs option", name, "-mpower9-vector");
> >> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power9",
> >> +   "-mvsx");
> > Similarly, "-mcpu=power9" itself enables "-mvsx", doesn't it?
> >
> > Are you trying to also say "don't use -mno-vsx"?  If so, maybe s/and/with/
> > would be slightly less confusing? This is going to be awkward unless it can
> > be more precise, like two messages depending on actual context:
> > - with "-mcpu=power8 -mno-vsx:  "...requires -mvsx".
> > - without "-mcpu=power8":  "...requires -mcpu=power8".
> 
> This seems like a YMMV situation...I don't see the confusion myself.

I guess I'm being pedantic.  "requires -mcpu=power8 and -mvsx" is not
accurate from a user's point a view, as "-mcpu=power8" is sufficient,
since "-mvsx" is enabled when "-mcpu=power8" is specified.

The real "requires" is "-mcpu=power8" and no "-mno-vsx".

(I'm just picturing myself fumbling around in a Makefile written by
somebody else. ;-)

It's not a strong objection, since specifying "-mno-vsx" should be
uncommon.  (Right?)  And, specifying "-mcpu=power8 -mvsx" is harmless.

PC


Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Paul A. Clarke via Gcc-patches
On Tue, Nov 16, 2021 at 11:12:35AM -0600, Bill Schmidt via Gcc-patches wrote:
> Hi!  During a previous patch review, Segher asked that I provide better
> messages when builtins are unavailable because they require both a minimum
> CPU and the enablement of VSX instructions.  This patch does just that.
...
> gcc/
>   * config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Change
>   error messages for ENB_P8V and ENB_P9V.
> ---
>  gcc/config/rs6000/rs6000-call.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 85fec80c6d7..035266eb001 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -11943,7 +11943,8 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins 
> fncode)
>error ("%qs requires the %qs option", name, "-mcpu=power8");
>break;
>  case ENB_P8V:
> -  error ("%qs requires the %qs option", name, "-mpower8-vector");
> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power8",
> +  "-mvsx");

"-mcpu=power8" itself enables "-mvsx", doesn't it?

>break;
>  case ENB_P9:
>error ("%qs requires the %qs option", name, "-mcpu=power9");
> @@ -11953,7 +11954,8 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins 
> fncode)
>name, "-mcpu=power9", "-m64", "-mpowerpc64");
>break;
>  case ENB_P9V:
> -  error ("%qs requires the %qs option", name, "-mpower9-vector");
> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power9",
> +  "-mvsx");

Similarly, "-mcpu=power9" itself enables "-mvsx", doesn't it?

Are you trying to also say "don't use -mno-vsx"?  If so, maybe s/and/with/
would be slightly less confusing? This is going to be awkward unless it can
be more precise, like two messages depending on actual context:
- with "-mcpu=power8 -mno-vsx:  "...requires -mvsx".
- without "-mcpu=power8":  "...requires -mcpu=power8".

PC


[PING PATCH] rs6000: Add optimizations for _mm_sad_epu8

2021-11-08 Thread Paul A. Clarke via Gcc-patches
Gentle ping...

On Fri, Oct 22, 2021 at 12:28:49PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Power9 ISA added `vabsdub` instruction which is realized in the
> `vec_absd` instrinsic.
> 
> Use `vec_absd` for `_mm_sad_epu8` compatibility intrinsic, when
> `_ARCH_PWR9`.
> 
> Also, the realization of `vec_sum2s` on little-endian includes
> two shifts in order to position the input and output to match
> the semantics of `vec_sum2s`:
> - Shift the second input vector left 12 bytes. In the current usage,
>   that vector is `{0}`, so this shift is unnecessary, but is currently
>   not eliminated under optimization.
> - Shift the vector produced by the `vsum2sws` instruction left 4 bytes.
>   The two words within each doubleword of this (shifted) result must then
>   be explicitly swapped to match the semantics of `_mm_sad_epu8`,
>   effectively reversing this shift.  So, this shift (and a susequent swap)
>   are unnecessary, but not currently removed under optimization.
> 
> Using `__builtin_altivec_vsum2sws` retains both shifts, so is not an
> option for removing the shifts.
> 
> For little-endian, use the `vsum2sws` instruction directly, and
> eliminate the explicit shift (swap).
> 
> 2021-10-22  Paul A. Clarke  
> 
> gcc
>   * config/rs6000/emmintrin.h (_mm_sad_epu8): Use vec_absd
>   when _ARCH_PWR9, optimize vec_sum2s when LE.
> ---
> Tested on powerpc64le-linux on Power9, with and without `-mcpu=power9`,
> and on powerpc/powerpc64-linux on Power8.
> 
> OK for trunk?
> 
>  gcc/config/rs6000/emmintrin.h | 24 +---
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
> index ab16c13c379e..c4758be0e777 100644
> --- a/gcc/config/rs6000/emmintrin.h
> +++ b/gcc/config/rs6000/emmintrin.h
> @@ -2197,27 +2197,37 @@ extern __inline __m128i 
> __attribute__((__gnu_inline__, __always_inline__, __arti
>  _mm_sad_epu8 (__m128i __A, __m128i __B)
>  {
>__v16qu a, b;
> -  __v16qu vmin, vmax, vabsdiff;
> +  __v16qu vabsdiff;
>__v4si vsum;
>const __v4su zero = { 0, 0, 0, 0 };
>__v4si result;
>  
>a = (__v16qu) __A;
>b = (__v16qu) __B;
> -  vmin = vec_min (a, b);
> -  vmax = vec_max (a, b);
> +#ifndef _ARCH_PWR9
> +  __v16qu vmin = vec_min (a, b);
> +  __v16qu vmax = vec_max (a, b);
>vabsdiff = vec_sub (vmax, vmin);
> +#else
> +  vabsdiff = vec_absd (a, b);
> +#endif
>/* Sum four groups of bytes into integers.  */
>vsum = (__vector signed int) vec_sum4s (vabsdiff, zero);
> +#ifdef __LITTLE_ENDIAN__
> +  /* Sum across four integers with two integer results.  */
> +  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
> +  /* Note: vec_sum2s could be used here, but on little-endian, vector
> + shifts are added that are not needed for this use-case.
> + A vector shift to correctly position the 32-bit integer results
> + (currently at [0] and [2]) to [1] and [3] would then need to be
> + swapped back again since the desired results are two 64-bit
> + integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
> +#else
>/* Sum across four integers with two integer results.  */
>result = vec_sum2s (vsum, (__vector signed int) zero);
>/* Rotate the sums into the correct position.  */
> -#ifdef __LITTLE_ENDIAN__
> -  result = vec_sld (result, result, 4);
> -#else
>result = vec_sld (result, result, 6);
>  #endif
> -  /* Rotate the sums into the correct position.  */
>return (__m128i) result;
>  }
>  
> -- 
> 2.27.0
> 


[PING PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2021-11-08 Thread Paul A. Clarke via Gcc-patches
Gentle ping...

On Thu, Oct 21, 2021 at 12:22:12PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Power10 ISA added `vextract*` instructions which are realized in the
> `vec_extractm` instrinsic.
> 
> Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
> `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
> 
> 2021-10-21  Paul A. Clarke  
> 
> gcc
>   * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
>   when _ARCH_PWR10.
>   * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
>   (_mm_movemask_epi8): Likewise.
> ---
> Tested on Power10 powerpc64le-linux (compiled with and without
> `-mcpu=power10`).
> 
> OK for trunk?
> 
>  gcc/config/rs6000/emmintrin.h | 8 
>  gcc/config/rs6000/xmmintrin.h | 4 
>  2 files changed, 12 insertions(+)
> 
> diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
> index 32ad72b4cc35..ab16c13c379e 100644
> --- a/gcc/config/rs6000/emmintrin.h
> +++ b/gcc/config/rs6000/emmintrin.h
> @@ -1233,6 +1233,9 @@ _mm_loadl_pd (__m128d __A, double const *__B)
>  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_movemask_pd (__m128d  __A)
>  {
> +#ifdef _ARCH_PWR10
> +  return vec_extractm ((__v2du) __A);
> +#else
>__vector unsigned long long result;
>static const __vector unsigned int perm_mask =
>  {
> @@ -1252,6 +1255,7 @@ _mm_movemask_pd (__m128d  __A)
>  #else
>return result[0];
>  #endif
> +#endif /* !_ARCH_PWR10 */
>  }
>  #endif /* _ARCH_PWR8 */
>  
> @@ -2030,6 +2034,9 @@ _mm_min_epu8 (__m128i __A, __m128i __B)
>  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_movemask_epi8 (__m128i __A)
>  {
> +#ifdef _ARCH_PWR10
> +  return vec_extractm ((__v16qu) __A);
> +#else
>__vector unsigned long long result;
>static const __vector unsigned char perm_mask =
>  {
> @@ -2046,6 +2053,7 @@ _mm_movemask_epi8 (__m128i __A)
>  #else
>return result[0];
>  #endif
> +#endif /* !_ARCH_PWR10 */
>  }
>  #endif /* _ARCH_PWR8 */
>  
> diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h
> index ae1a33e8d95b..4c093fd1d5ae 100644
> --- a/gcc/config/rs6000/xmmintrin.h
> +++ b/gcc/config/rs6000/xmmintrin.h
> @@ -1352,6 +1352,9 @@ _mm_storel_pi (__m64 *__P, __m128 __A)
>  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_movemask_ps (__m128  __A)
>  {
> +#ifdef _ARCH_PWR10
> +  return vec_extractm ((vector unsigned int) __A);
> +#else
>__vector unsigned long long result;
>static const __vector unsigned int perm_mask =
>  {
> @@ -1371,6 +1374,7 @@ _mm_movemask_ps (__m128  __A)
>  #else
>return result[0];
>  #endif
> +#endif /* !_ARCH_PWR10 */
>  }
>  #endif /* _ARCH_PWR8 */
>  
> -- 
> 2.27.0
> 


[PING PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2021-11-08 Thread Paul A. Clarke via Gcc-patches
Gentle ping...

On Wed, Oct 20, 2021 at 08:42:07PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Power10 ISA added `xxblendv*` instructions which are realized in the
> `vec_blendv` instrinsic.
> 
> Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
> 
> Also, copy a test from i386 for testing `_mm_blendv_ps`.
> This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> but was inadvertently omitted.
> 
> 2021-10-20  Paul A. Clarke  
> 
> gcc
>   * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
>   when _ARCH_PWR10.
>   (_mm_blendv_ps): Likewise.
>   (_mm_blendv_pd): Likewise.
> 
> gcc/testsuite
>   * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
>   adjust dg directives to suit.
> ---
> Tested on Power10 powerpc64le-linux (compiled with and without
> `-mcpu=power10`).
> 
> OK for trunk?
> 
>  gcc/config/rs6000/smmintrin.h | 12 
>  .../gcc.target/powerpc/sse4_1-blendvps.c  | 65 +++
>  2 files changed, 77 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> 
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index b732fbca7b09..5d87fd7b6f61 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -113,9 +113,13 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int 
> __imm8)
>  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
>  {
> +#ifdef _ARCH_PWR10
> +  return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> __mask);
> +#else
>const __v16qu __seven = vec_splats ((unsigned char) 0x07);
>__v16qu __lmask = vec_sra ((__v16qu) __mask, __seven);
>return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> +#endif
>  }
>  
>  __inline __m128
> @@ -149,9 +153,13 @@ __inline __m128
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
>  {
> +#ifdef _ARCH_PWR10
> +  return (__m128) vec_blendv ((__v4sf) __A, (__v4sf) __B, (__v4su) __mask);
> +#else
>const __v4si __zero = {0};
>const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
>return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
> +#endif
>  }
>  
>  __inline __m128d
> @@ -174,9 +182,13 @@ __inline __m128d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
>  {
> +#ifdef _ARCH_PWR10
> +  return (__m128d) vec_blendv ((__v2df) __A, (__v2df) __B, (__v2du) __mask);
> +#else
>const __v2di __zero = {0};
>const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
> __zero);
>return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
> +#endif
>  }
>  #endif
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c 
> b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> new file mode 100644
> index ..8fcb55383047
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> @@ -0,0 +1,65 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#include "sse4_1-check.h"
> +
> +#include 
> +#include 
> +
> +#define NUM 20
> +
> +static void
> +init_blendvps (float *src1, float *src2, float *mask)
> +{
> +  int i, msk, sign = 1; 
> +
> +  msk = -1;
> +  for (i = 0; i < NUM * 4; i++)
> +{
> +  if((i % 4) == 0)
> + msk++;
> +  src1[i] = i* (i + 1) * sign;
> +  src2[i] = (i + 20) * sign;
> +  mask[i] = (i + 120) * i;
> +  if( (msk & (1 << (i % 4
> + mask[i] = -mask[i];
> +  sign = -sign;
> +}
> +}
> +
> +static int
> +check_blendvps (__m128 *dst, float *src1, float *src2,
> + float *mask)
> +{
> +  float tmp[4];
> +  int j;
> +
> +  memcpy ([0], src1, sizeof (tmp));
> +  for (j = 0; j < 4; j++)
> +if (mask [j] < 0.0)
> +  tmp[j] = src2[j];
> +
> +  return memcmp (dst, [0], sizeof (tmp));
> +}
> +
> +static void
> +sse4_1_test (void)
> +{
> +  union
> +{
> +  __m128 x[NUM];
> +  float f[NUM * 4];
> +} dst, src1, src2, mask;
> +  int i;
> +
> +  init_blendvps (src1.f, src2.f, mask.f);
> +
> +  for (i = 0; i < NUM; i++)
> +{
> +  dst.x[i] = _mm_blendv_ps (src1.x[i], src2.x[i], mask.x[i]);
> +  if (check_blendvps ([i], [i * 4], [i * 4],
> +   [i * 4]))
> + abort ();
> +}
> +}
> -- 
> 2.27.0
> 


Re: [PING^2 PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2021-11-08 Thread Paul A. Clarke via Gcc-patches
On Tue, Oct 26, 2021 at 03:00:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Patches 1/3 and 3/3 have been committed.
> This is only a ping for 2/3.

Gentle re-ping.

> On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > Suppress exceptions (when specified), by saving, manipulating, and
> > restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> > rounding mode when required.
> > 
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible. Note that explicit instruction scheduling "barriers" are
> > added to prevent floating-point computations from being moved before or
> > after the explicit FPSCR manipulations.  (That these are required has
> > been reported as an issue in GCC: PR102783.)
> > 
> > The scalar versions naively use the parallel versions to compute the
> > single scalar result and then construct the remainder of the result.
> > 
> > Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> > are swapped from the corresponding values on x86 so as to match the
> > corresponding rounding mode values in the Power ISA.
> > 
> > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > analogous implementations in config/i386/smmintrin.h.
> > 
> > Function signatures match the analogous functions in 
> > config/i386/smmintrin.h.
> > 
> > Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> > modeled after the very similar "floor" and "ceil" tests.
> > 
> > Include basic tests, plus tests at the boundaries for floating-point
> > representation, positive and negative, test all of the parameterized
> > rounding modes as well as the C99 rounding modes and interactions
> > between the two.
> > 
> > Exceptions are not explicitly tested.
> > 
> > 2021-10-18  Paul A. Clarke  
> > 
> > gcc
> > * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> > _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> > _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
> > _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
> > _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
> > _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> > * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > Convert from function to macro.
> > 
> > gcc/testsuite
> > * gcc.target/powerpc/sse4_1-round3.h: New.
> > * gcc.target/powerpc/sse4_1-roundpd.c: New.
> > * gcc.target/powerpc/sse4_1-roundps.c: New.
> > * gcc.target/powerpc/sse4_1-roundsd.c: New.
> > * gcc.target/powerpc/sse4_1-roundss.c: New.
> > ---
> >  gcc/config/rs6000/smmintrin.h | 292 ++
> >  .../gcc.target/powerpc/sse4_1-round3.h|  81 +
> >  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
> >  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
> >  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
> >  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
> >  6 files changed, 1014 insertions(+), 64 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > 
> > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> > index 90ce03d22709..6bb03e6e20ac 100644
> > --- a/gcc/config/rs6000/smmintrin.h
> > +++ b/gcc/config/rs6000/smmintrin.h
> > @@ -42,6 +42,234 @@
> >  #include 
> >  #include 
> >  
> > +/* Rounding mode macros. */
> > +#define _MM_FROUND_TO_NEAREST_INT   0x00
> > +#define _MM_FROUND_TO_ZERO  0x01
> > +#define _MM_FROUND_TO_POS_INF   0x02
> > +#define _MM_FROUND_TO_NEG_INF   0x03
> > +#define _MM_FROUND_CUR_DIRECTION0x04
> > +
> > +#define _MM_FROUND_NINT\
> > +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_FLOOR

Re: [PING PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2021-10-26 Thread Paul A. Clarke via Gcc-patches
Patches 1/3 and 3/3 have been committed.
This is only a ping for 2/3.

On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Suppress exceptions (when specified), by saving, manipulating, and
> restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> rounding mode when required.
> 
> No attempt is made to optimize writing the FPSCR (by checking if the new
> value would be the same), other than using lighter weight instructions
> when possible. Note that explicit instruction scheduling "barriers" are
> added to prevent floating-point computations from being moved before or
> after the explicit FPSCR manipulations.  (That these are required has
> been reported as an issue in GCC: PR102783.)
> 
> The scalar versions naively use the parallel versions to compute the
> single scalar result and then construct the remainder of the result.
> 
> Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> are swapped from the corresponding values on x86 so as to match the
> corresponding rounding mode values in the Power ISA.
> 
> Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> convert _mm_ceil* and _mm_floor* into macros. This matches the current
> analogous implementations in config/i386/smmintrin.h.
> 
> Function signatures match the analogous functions in config/i386/smmintrin.h.
> 
> Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> modeled after the very similar "floor" and "ceil" tests.
> 
> Include basic tests, plus tests at the boundaries for floating-point
> representation, positive and negative, test all of the parameterized
> rounding modes as well as the C99 rounding modes and interactions
> between the two.
> 
> Exceptions are not explicitly tested.
> 
> 2021-10-18  Paul A. Clarke  
> 
> gcc
>   * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
>   _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
>   _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
>   _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
>   _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
>   _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
>   * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
>   _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
>   Convert from function to macro.
> 
> gcc/testsuite
>   * gcc.target/powerpc/sse4_1-round3.h: New.
>   * gcc.target/powerpc/sse4_1-roundpd.c: New.
>   * gcc.target/powerpc/sse4_1-roundps.c: New.
>   * gcc.target/powerpc/sse4_1-roundsd.c: New.
>   * gcc.target/powerpc/sse4_1-roundss.c: New.
> ---
>  gcc/config/rs6000/smmintrin.h | 292 ++
>  .../gcc.target/powerpc/sse4_1-round3.h|  81 +
>  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
>  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
>  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
>  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
>  6 files changed, 1014 insertions(+), 64 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> 
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 90ce03d22709..6bb03e6e20ac 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -42,6 +42,234 @@
>  #include 
>  #include 
>  
> +/* Rounding mode macros. */
> +#define _MM_FROUND_TO_NEAREST_INT   0x00
> +#define _MM_FROUND_TO_ZERO  0x01
> +#define _MM_FROUND_TO_POS_INF   0x02
> +#define _MM_FROUND_TO_NEG_INF   0x03
> +#define _MM_FROUND_CUR_DIRECTION0x04
> +
> +#define _MM_FROUND_NINT  \
> +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_FLOOR \
> +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_CEIL  \
> +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_TRUNC \
> +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_RINT  \
> +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_NEARBYINT \
> +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> +
> +#define _MM_FROUND_RAISE_EXC0x00
> +#define _MM_F

Re: [PATCH v2 COMMITTED] rs6000: Fixes for tests including only

2021-10-26 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 25, 2021 at 05:32:51PM -0500, Segher Boessenkool wrote:
> On Mon, Oct 25, 2021 at 03:33:21PM -0500, Paul A. Clarke wrote:
> > * config/rs6000/x86intrin.h: Move some included headers to new
> > headers; include new immintrin.h instead.
> 
> s/; i/.  I/  (And instead of what?)
> 
> > * config/rs6000/immintrin.h: New.
> > * config/rs6000/x86gprintrin.h: New.
> 
> (That is a filename worse than our worst mnemonic :-) )

Not my choice. ;-)

> > * config/config.gcc (powerpc-*-*): Add new headers to extra_headers.
> 
> powerpc*-*-*
> 
> > --- a/gcc/testsuite/gcc.target/powerpc/pr78102.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mvsx" } */
> > -/* { dg-require-effective-target vsx_hw } */
> > +/* { dg-options "-O2 -mpower8-vector -DNO_WARN_X86_INTRINSICS" } */
> > +/* { dg-require-effective-target p8vector_hw } */
> 
> Please use -mcpu=power8 instead?  (And -mdejagnu-cpu=power8 in
> testcases).

So, -mdejagnu-cpu=power8 here.

> (The changelog should say you added the -D btw).

OK

> If you run you need *_hw.  If you only compile, like here, you want to
> use *_ok instead.

Yep, my mistake.  Fixed.

> Okay for trunk with those things tuned up.  Thanks!

Thanks for the review!  This has been committed:
--
Tests which only include  expect many other include files
to be brought in, but not enough are.

Try to increase compatibility with x86 headers by:
- Create new immintrin.h, including the analogous subset of intrinsics
  headers available for powerpc.
- Create new x86gprintrin.h, serving exclusively as the umbrella for
  bmiintrin.h and bmi2intrin.h.
- Modify x86intrin.h:
  - Include new immintrin.h.
  - Remove mmintrin.h, xmmintrin.h, emmintrin.h, now included indirectly
from immintrin.h.
  - Remove bmiintrin.h, bmi2intrin.h, now included indirectly from
x86gprintrin.h (which is now included from immintrin.h).

Add the new files to gcc/config.gcc.

Also, fix up the testcase that provoked PR102719, which requires
Power8 vector support.

Fixes commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

2021-10-25  Paul A. Clarke  

gcc
PR target/102719
* config/rs6000/x86intrin.h: Move some included headers to new
headers.  Include new immintrin.h instead of those headers.
* config/rs6000/immintrin.h: New.
* config/rs6000/x86gprintrin.h: New.
* config.gcc (powerpc*-*-*): Add new headers to extra_headers.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Fix dg directives to require Power8
vector support.  Also, add -DNO_WARN_X86_INTRINSICS.
---
 gcc/config.gcc |  2 +-
 gcc/config/rs6000/immintrin.h  | 41 ++
 gcc/config/rs6000/x86gprintrin.h   | 31 
 gcc/config/rs6000/x86intrin.h  | 10 +-
 gcc/testsuite/gcc.target/powerpc/pr78102.c |  4 +--
 5 files changed, 76 insertions(+), 12 deletions(-)
 create mode 100644 gcc/config/rs6000/immintrin.h
 create mode 100644 gcc/config/rs6000/x86gprintrin.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fb1f06f3da89..efd1f42ac234 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -490,7 +490,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
-   extra_headers="${extra_headers} nmmintrin.h"
+   extra_headers="${extra_headers} nmmintrin.h immintrin.h x86gprintrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
diff --git a/gcc/config/rs6000/immintrin.h b/gcc/config/rs6000/immintrin.h
new file mode 100644
index ..647a5ae49b5a
--- /dev/null
+++ b/gcc/config/rs6000/immintrin.h
@@ -0,0 +1,41 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   

[PATCH] rs6000: Fixes for tests including only

2021-10-25 Thread Paul A. Clarke via Gcc-patches
Tests which only include  expect many other include files
to be brought in, but not enough are.

Try to increase compatibility with x86 headers by:
- Create new immintrin.h, including the analogous subset of intrinsics
  headers available for powerpc.
- Create new x86gprintrin.h, serving exclusively as the umbrella for
  bmiintrin.h and bmi2intrin.h.
- Modify x86intrin.h:
  - Include new immintrin.h.
  - Remove mmintrin.h, xmmintrin.h, emmintrin.h, now included indirectly
from immintrin.h.
  - Remove bmiintrin.h, bmi2intrin.h, now included indirectly from
x86gprintrin.h (which is now included from immintrin.h).

Add the new files to gcc/config.gcc.

Also, fix up the testcase that provoked PR102719, which requires
Power8 vector support.

Fixes commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

2021-10-25  Paul A. Clarke  

gcc
PR target/102719
* config/rs6000/x86intrin.h: Move some included headers to new
headers; include new immintrin.h instead.
* config/rs6000/immintrin.h: New.
* config/rs6000/x86gprintrin.h: New.
* config/config.gcc (powerpc-*-*): Add new headers to extra_headers.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Fix dg directives to require Power8
vector support.
---
Tested on powerpc64le-linux (Power9), powerpc64-linux (Power8) and
powerpc-linux (Power8).

OK for trunk?

 gcc/config.gcc |  2 +-
 gcc/config/rs6000/immintrin.h  | 41 ++
 gcc/config/rs6000/x86gprintrin.h   | 31 
 gcc/config/rs6000/x86intrin.h  | 10 +-
 gcc/testsuite/gcc.target/powerpc/pr78102.c |  4 +--
 5 files changed, 76 insertions(+), 12 deletions(-)
 create mode 100644 gcc/config/rs6000/immintrin.h
 create mode 100644 gcc/config/rs6000/x86gprintrin.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fb1f06f3da89..efd1f42ac234 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -490,7 +490,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
-   extra_headers="${extra_headers} nmmintrin.h"
+   extra_headers="${extra_headers} nmmintrin.h immintrin.h x86gprintrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
diff --git a/gcc/config/rs6000/immintrin.h b/gcc/config/rs6000/immintrin.h
new file mode 100644
index ..647a5ae49b5a
--- /dev/null
+++ b/gcc/config/rs6000/immintrin.h
@@ -0,0 +1,41 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _IMMINTRIN_H_INCLUDED
+#define _IMMINTRIN_H_INCLUDED
+
+#include 
+
+#include 
+
+#include 
+
+#include 
+
+#include 
+
+#include 
+
+#include 
+
+#endif /* _IMMINTRIN_H_INCLUDED */
diff --git a/gcc/config/rs6000/x86gprintrin.h b/gcc/config/rs6000/x86gprintrin.h
new file mode 100644
index ..57ef120f805f
--- /dev/null
+++ b/gcc/config/rs6000/x86gprintrin.h
@@ -0,0 +1,31 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library 

[COMMITTED] rs6000: Fix missing "externs" in smmintrin.h

2021-10-25 Thread Paul A. Clarke via Gcc-patches
Inline functions defined in smmintrin.h need "extern" as part of their
declaration, otherwise instances of those functions are created in the
objects which include them.

Fixes commits:
- acd4b9103c1a30c833de4eee31fb69c3ff13cd77
- 9d352c68e8c8b642a36a6bcfc7f6b5dba11ac748
- bd9a8737d478f7f1d01a9d5f1cc4309ffbb53103
- 5f500715438761f59de5fb992267748c5d4dc4b6
- eaa93a0f3d9f67c8cbc1dc849ea6feba432ff412
- 29fb1e831bf1c25e4574bf2f98a9f534e5c67665

2021-10-25  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_testz_si128): Add "extern" to
function signature.
(_mm_testc_si128): Likewise.
(_mm_testnzc_si128): Likewise.
(_mm_blend_ps): Likewise.
(_mm_blendv_ps): Likewise.
(_mm_blend_pd): Likewise.
(_mm_blendv_pd): Likewise.
(_mm_ceil_pd): Likewise.
(_mm_ceil_sd): Likewise.
(_mm_ceil_ps): Likewise.
(_mm_ceil_ss): Likewise.
(_mm_floor_pd): Likewise.
(_mm_floor_sd): Likewise.
(_mm_floor_ps): Likewise.
(_mm_floor_ss): Likewise.
(_mm_minpos_epu16): Likewise.
(_mm_mul_epi32): Likewise.
(_mm_cvtepi8_epi16): Likewise.
(_mm_packus_epi32): Likewise.
(_mm_cmpgt_epi64): Likewise.
---
Tested on powerpc64le-linux (Power9), powerpc64-linux (Power8),
powerpc-linux (Power8).

Committed as trivial, obvious.

 gcc/config/rs6000/smmintrin.h | 40 +--
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index b732fbca7b09..0fab308b1951 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -118,7 +118,7 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
-__inline __m128
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
 {
@@ -145,7 +145,7 @@ _mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
   return (__m128) __r;
 }
 
-__inline __m128
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
 {
@@ -154,7 +154,7 @@ _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
   return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
 }
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
 {
@@ -170,7 +170,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
 }
 
 #ifdef _ARCH_PWR8
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
 {
@@ -180,7 +180,7 @@ _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
 }
 #endif
 
-__inline int
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testz_si128 (__m128i __A, __m128i __B)
 {
@@ -189,7 +189,7 @@ _mm_testz_si128 (__m128i __A, __m128i __B)
   return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
 }
 
-__inline int
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testc_si128 (__m128i __A, __m128i __B)
 {
@@ -199,7 +199,7 @@ _mm_testc_si128 (__m128i __A, __m128i __B)
   return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
 }
 
-__inline int
+extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testnzc_si128 (__m128i __A, __m128i __B)
 {
@@ -214,14 +214,14 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
 
 #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_ceil_pd (__m128d __A)
 {
   return (__m128d) vec_ceil ((__v2df) __A);
 }
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_ceil_sd (__m128d __A, __m128d __B)
 {
@@ -230,14 +230,14 @@ _mm_ceil_sd (__m128d __A, __m128d __B)
   return (__m128d) __r;
 }
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_floor_pd (__m128d __A)
 {
   return (__m128d) vec_floor ((__v2df) __A);
 }
 
-__inline __m128d
+extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_floor_sd (__m128d __A, __m128d __B)
 {
@@ -246,14 +246,14 @@ _mm_floor_sd (__m128d __A, __m128d __B)
   return (__m128d) __r;
 }
 
-__inline __m128
+extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_ceil_ps (__m128 __A)
 {
   return (__m128) vec_ceil ((__v4sf) __A);
 }
 
-__inline __m128
+extern __inline __m128
 __attribute__ ((__gnu_inline__, 

[PATCH] rs6000: Add optimizations for _mm_sad_epu8

2021-10-22 Thread Paul A. Clarke via Gcc-patches
Power9 ISA added `vabsdub` instruction which is realized in the
`vec_absd` instrinsic.

Use `vec_absd` for `_mm_sad_epu8` compatibility intrinsic, when
`_ARCH_PWR9`.

Also, the realization of `vec_sum2s` on little-endian includes
two shifts in order to position the input and output to match
the semantics of `vec_sum2s`:
- Shift the second input vector left 12 bytes. In the current usage,
  that vector is `{0}`, so this shift is unnecessary, but is currently
  not eliminated under optimization.
- Shift the vector produced by the `vsum2sws` instruction left 4 bytes.
  The two words within each doubleword of this (shifted) result must then
  be explicitly swapped to match the semantics of `_mm_sad_epu8`,
  effectively reversing this shift.  So, this shift (and a susequent swap)
  are unnecessary, but not currently removed under optimization.

Using `__builtin_altivec_vsum2sws` retains both shifts, so is not an
option for removing the shifts.

For little-endian, use the `vsum2sws` instruction directly, and
eliminate the explicit shift (swap).

2021-10-22  Paul A. Clarke  

gcc
* config/rs6000/emmintrin.h (_mm_sad_epu8): Use vec_absd
when _ARCH_PWR9, optimize vec_sum2s when LE.
---
Tested on powerpc64le-linux on Power9, with and without `-mcpu=power9`,
and on powerpc/powerpc64-linux on Power8.

OK for trunk?

 gcc/config/rs6000/emmintrin.h | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ab16c13c379e..c4758be0e777 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -2197,27 +2197,37 @@ extern __inline __m128i __attribute__((__gnu_inline__, 
__always_inline__, __arti
 _mm_sad_epu8 (__m128i __A, __m128i __B)
 {
   __v16qu a, b;
-  __v16qu vmin, vmax, vabsdiff;
+  __v16qu vabsdiff;
   __v4si vsum;
   const __v4su zero = { 0, 0, 0, 0 };
   __v4si result;
 
   a = (__v16qu) __A;
   b = (__v16qu) __B;
-  vmin = vec_min (a, b);
-  vmax = vec_max (a, b);
+#ifndef _ARCH_PWR9
+  __v16qu vmin = vec_min (a, b);
+  __v16qu vmax = vec_max (a, b);
   vabsdiff = vec_sub (vmax, vmin);
+#else
+  vabsdiff = vec_absd (a, b);
+#endif
   /* Sum four groups of bytes into integers.  */
   vsum = (__vector signed int) vec_sum4s (vabsdiff, zero);
+#ifdef __LITTLE_ENDIAN__
+  /* Sum across four integers with two integer results.  */
+  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
+  /* Note: vec_sum2s could be used here, but on little-endian, vector
+ shifts are added that are not needed for this use-case.
+ A vector shift to correctly position the 32-bit integer results
+ (currently at [0] and [2]) to [1] and [3] would then need to be
+ swapped back again since the desired results are two 64-bit
+ integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
+#else
   /* Sum across four integers with two integer results.  */
   result = vec_sum2s (vsum, (__vector signed int) zero);
   /* Rotate the sums into the correct position.  */
-#ifdef __LITTLE_ENDIAN__
-  result = vec_sld (result, result, 4);
-#else
   result = vec_sld (result, result, 6);
 #endif
-  /* Rotate the sums into the correct position.  */
   return (__m128i) result;
 }
 
-- 
2.27.0



[PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2021-10-21 Thread Paul A. Clarke via Gcc-patches
Power10 ISA added `vextract*` instructions which are realized in the
`vec_extractm` instrinsic.

Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
`_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.

2021-10-21  Paul A. Clarke  

gcc
* config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
when _ARCH_PWR10.
* config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
(_mm_movemask_epi8): Likewise.
---
Tested on Power10 powerpc64le-linux (compiled with and without
`-mcpu=power10`).

OK for trunk?

 gcc/config/rs6000/emmintrin.h | 8 
 gcc/config/rs6000/xmmintrin.h | 4 
 2 files changed, 12 insertions(+)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index 32ad72b4cc35..ab16c13c379e 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -1233,6 +1233,9 @@ _mm_loadl_pd (__m128d __A, double const *__B)
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movemask_pd (__m128d  __A)
 {
+#ifdef _ARCH_PWR10
+  return vec_extractm ((__v2du) __A);
+#else
   __vector unsigned long long result;
   static const __vector unsigned int perm_mask =
 {
@@ -1252,6 +1255,7 @@ _mm_movemask_pd (__m128d  __A)
 #else
   return result[0];
 #endif
+#endif /* !_ARCH_PWR10 */
 }
 #endif /* _ARCH_PWR8 */
 
@@ -2030,6 +2034,9 @@ _mm_min_epu8 (__m128i __A, __m128i __B)
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movemask_epi8 (__m128i __A)
 {
+#ifdef _ARCH_PWR10
+  return vec_extractm ((__v16qu) __A);
+#else
   __vector unsigned long long result;
   static const __vector unsigned char perm_mask =
 {
@@ -2046,6 +2053,7 @@ _mm_movemask_epi8 (__m128i __A)
 #else
   return result[0];
 #endif
+#endif /* !_ARCH_PWR10 */
 }
 #endif /* _ARCH_PWR8 */
 
diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h
index ae1a33e8d95b..4c093fd1d5ae 100644
--- a/gcc/config/rs6000/xmmintrin.h
+++ b/gcc/config/rs6000/xmmintrin.h
@@ -1352,6 +1352,9 @@ _mm_storel_pi (__m64 *__P, __m128 __A)
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movemask_ps (__m128  __A)
 {
+#ifdef _ARCH_PWR10
+  return vec_extractm ((vector unsigned int) __A);
+#else
   __vector unsigned long long result;
   static const __vector unsigned int perm_mask =
 {
@@ -1371,6 +1374,7 @@ _mm_movemask_ps (__m128  __A)
 #else
   return result[0];
 #endif
+#endif /* !_ARCH_PWR10 */
 }
 #endif /* _ARCH_PWR8 */
 
-- 
2.27.0



[PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2021-10-20 Thread Paul A. Clarke via Gcc-patches
Power10 ISA added `xxblendv*` instructions which are realized in the
`vec_blendv` instrinsic.

Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
`_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.

Also, copy a test from i386 for testing `_mm_blendv_ps`.
This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
but was inadvertently omitted.

2021-10-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
when _ARCH_PWR10.
(_mm_blendv_ps): Likewise.
(_mm_blendv_pd): Likewise.

gcc/testsuite
* gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
adjust dg directives to suit.
---
Tested on Power10 powerpc64le-linux (compiled with and without
`-mcpu=power10`).

OK for trunk?

 gcc/config/rs6000/smmintrin.h | 12 
 .../gcc.target/powerpc/sse4_1-blendvps.c  | 65 +++
 2 files changed, 77 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index b732fbca7b09..5d87fd7b6f61 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -113,9 +113,13 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int 
__imm8)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) __mask);
+#else
   const __v16qu __seven = vec_splats ((unsigned char) 0x07);
   __v16qu __lmask = vec_sra ((__v16qu) __mask, __seven);
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
+#endif
 }
 
 __inline __m128
@@ -149,9 +153,13 @@ __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128) vec_blendv ((__v4sf) __A, (__v4sf) __B, (__v4su) __mask);
+#else
   const __v4si __zero = {0};
   const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
   return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+#endif
 }
 
 __inline __m128d
@@ -174,9 +182,13 @@ __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128d) vec_blendv ((__v2df) __A, (__v2df) __B, (__v2du) __mask);
+#else
   const __v2di __zero = {0};
   const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
   return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+#endif
 }
 #endif
 
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
new file mode 100644
index ..8fcb55383047
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include 
+#include 
+
+#define NUM 20
+
+static void
+init_blendvps (float *src1, float *src2, float *mask)
+{
+  int i, msk, sign = 1; 
+
+  msk = -1;
+  for (i = 0; i < NUM * 4; i++)
+{
+  if((i % 4) == 0)
+   msk++;
+  src1[i] = i* (i + 1) * sign;
+  src2[i] = (i + 20) * sign;
+  mask[i] = (i + 120) * i;
+  if( (msk & (1 << (i % 4
+   mask[i] = -mask[i];
+  sign = -sign;
+}
+}
+
+static int
+check_blendvps (__m128 *dst, float *src1, float *src2,
+   float *mask)
+{
+  float tmp[4];
+  int j;
+
+  memcpy ([0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+if (mask [j] < 0.0)
+  tmp[j] = src2[j];
+
+  return memcmp (dst, [0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+} dst, src1, src2, mask;
+  int i;
+
+  init_blendvps (src1.f, src2.f, mask.f);
+
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blendv_ps (src1.x[i], src2.x[i], mask.x[i]);
+  if (check_blendvps ([i], [i * 4], [i * 4],
+ [i * 4]))
+   abort ();
+}
+}
-- 
2.27.0



Re: [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations

2021-10-19 Thread Paul A. Clarke via Gcc-patches
On Tue, Oct 19, 2021 at 09:32:20AM -0500, Segher Boessenkool wrote:
> On Mon, Oct 18, 2021 at 08:15:12PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > Some compatibility implementations of x86 intrinsics include
> > Power intrinsics which require POWER8.  Guard them.
> 
> I assume this improves on all previous commented things (you don't say
> if it does).

Sorry, I summarized the changes in the v4 cover letter. This patch
required no changes other than adding a new PR addressed by it.
The reasons for no changes was in my reply to your review of v3.

> > gcc
> > PR target/101893
> > PR target/102719
> > * config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
> > * config/rs6000/pmmintrin.h: Same.
> > * config/rs6000/smmintrin.h: Same.
> > * config/rs6000/tmmintrin.h: Same.
> 
> Okay for trunk.  Thanks!

Thanks!

PC


[PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2021-10-18 Thread Paul A. Clarke via Gcc-patches
Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible. Note that explicit instruction scheduling "barriers" are
added to prevent floating-point computations from being moved before or
after the explicit FPSCR manipulations.  (That these are required has
been reported as an issue in GCC: PR102783.)

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-10-18  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.
---
 gcc/config/rs6000/smmintrin.h | 292 ++
 .../gcc.target/powerpc/sse4_1-round3.h|  81 +
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
 6 files changed, 1014 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 90ce03d22709..6bb03e6e20ac 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,234 @@
 #include 
 #include 
 
+/* Rounding mode macros. */
+#define _MM_FROUND_TO_NEAREST_INT   0x00
+#define _MM_FROUND_TO_ZERO  0x01
+#define _MM_FROUND_TO_POS_INF   0x02
+#define _MM_FROUND_TO_NEG_INF   0x03
+#define _MM_FROUND_CUR_DIRECTION0x04
+
+#define _MM_FROUND_NINT\
+  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_FLOOR   \
+  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_CEIL\
+  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_TRUNC   \
+  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_RINT\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_NEARBYINT   \
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
+
+#define _MM_FROUND_RAISE_EXC0x00
+#define _MM_FROUND_NO_EXC   0x08
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_pd (__m128d __A, int __rounding)
+{
+  __v2df __r;
+  union {
+double __fr;
+long long __fpscr;
+  } __enables_save, __fpscr_save;
+
+  if (__rounding & _MM_FROUND_NO_EXC)
+{
+  /* Save enabled exceptions, disable all exceptions,
+and preserve the rounding mode.  */
+#ifdef _ARCH_PWR9
+  __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+  __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+#else
+  __fpscr_save.__fr = __builtin_mffs ();
+  

[PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations

2021-10-18 Thread Paul A. Clarke via Gcc-patches
Some compatibility implementations of x86 intrinsics include
Power intrinsics which require POWER8.  Guard them.

emmintrin.h:
- _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
  but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
  The "POWER8" version works fine on pre-POWER8.
- _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
pmmintrin.h:
- _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
- _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
smmintrin.h:
- _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
- _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
- _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
tmmintrin.h:
- _mm_sign_epi8: vec_neg(v4si) uses vsububm.
- _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
- _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
  Note that the above three could actually be supported pre-POWER8,
  but current GCC does not support them before POWER8.
- _mm_sign_pi8: depends on _mm_sign_epi8.
- _mm_sign_pi16: depends on _mm_sign_epi16.
- _mm_sign_pi32: depends on _mm_sign_epi32.

2021-10-18  Paul A. Clarke  

gcc
PR target/101893
PR target/102719
* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
* config/rs6000/pmmintrin.h: Same.
* config/rs6000/smmintrin.h: Same.
* config/rs6000/tmmintrin.h: Same.
---
 gcc/config/rs6000/emmintrin.h | 12 ++--
 gcc/config/rs6000/pmmintrin.h |  4 
 gcc/config/rs6000/smmintrin.h |  4 
 gcc/config/rs6000/tmmintrin.h | 12 
 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c |  4 ++--
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ce1287edf782..32ad72b4cc35 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpord_pd (__m128d __A, __m128d __B)
 {
-#if _ARCH_PWR8
   __v2du c, d;
   /* Compare against self will return false (0's) if NAN.  */
   c = (__v2du)vec_cmpeq (__A, __A);
   d = (__v2du)vec_cmpeq (__B, __B);
-#else
-  __v2du a, b;
-  __v2du c, d;
-  const __v2du double_exp_mask  = {0x7ff0, 0x7ff0};
-  a = (__v2du)vec_abs ((__v2df)__A);
-  b = (__v2du)vec_abs ((__v2df)__B);
-  c = (__v2du)vec_cmpgt (double_exp_mask, a);
-  d = (__v2du)vec_cmpgt (double_exp_mask, b);
-#endif
   /* A != NAN and B != NAN.  */
   return ((__m128d)vec_and(c, d));
 }
@@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B)
   return ((__m64)a * (__m64)b);
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_mul_epu32 (__m128i __A, __m128i __B)
 {
@@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B)
   return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B);
 #endif
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_slli_epi16 (__m128i __A, int __B)
diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h
index eab712fdfa66..83dff1d85666 100644
--- a/gcc/config/rs6000/pmmintrin.h
+++ b/gcc/config/rs6000/pmmintrin.h
@@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y)
vec_mergel ((__v2df) __X, (__v2df)__Y));
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movehdup_ps (__m128 __X)
 {
   return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_moveldup_ps (__m128 __X)
 {
   return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loaddup_pd (double const *__P)
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 6bb03e6e20ac..24adc95589ad 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -324,6 +324,7 @@ _mm_extract_ps (__m128 __X, const int __N)
   return ((__v4si)__X)[__N & 3];
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
 {
@@ -335,6 +336,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
   #endif
   return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
@@ -395,6 +397,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
   return (__m128d) __r;
 }
 
+#ifdef _ARCH_PWR8
 __inline __m128d
 __attribute__ ((__gnu_inline__, 

[PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers

2021-10-18 Thread Paul A. Clarke via Gcc-patches
Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

2021-10-18  Paul A. Clarke  

gcc
* config/config.gcc (extra_headers): Add nmmintrin.h.
---
 gcc/config.gcc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index aa5bd5d14590..1cb9303b3a85 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -490,6 +490,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
+   extra_headers="${extra_headers} nmmintrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
-- 
2.27.0



[PATCH v4 0/3] rs6000: Support more SSE4 intrinsics

2021-10-18 Thread Paul A. Clarke via Gcc-patches
v4:
- Of original 6 patches in this series, I committed patches 2-5.
- Found an issue from v3. New file "nmmintrin.h" also needs to be added
to gcc/config.gcc "extra_headers".  Unfortunately, I discovered this
after committing the patch which added "nmmintrin.h", so I've added a
new patch here.
- Added scheduling "barriers" to patch 2 after review from Segher.
- Noted additional PR fixed by patch 3.

v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
and users will expect to be able to include "nmmintrin.h",
even though "nmmintrin.h" just includes "smmintrin.h"
where all of the SSE4.2 implementations actually appear.
Only patch 5/6 changed from v2.

Tested ppc64le (POWER9) and ppc64/32 (POWER7).

OK for trunk?

Paul A. Clarke (3):
  rs6000: Add nmmintrin.h to extra_headers
  rs6000: Support SSE4.1 "round" intrinsics
  rs6000: Guard some x86 intrinsics implementations

 gcc/config.gcc|   1 +
 gcc/config/rs6000/emmintrin.h |  12 +-
 gcc/config/rs6000/pmmintrin.h |   4 +
 gcc/config/rs6000/smmintrin.h | 296 ++
 gcc/config/rs6000/tmmintrin.h |  12 +
 .../gcc.target/powerpc/sse4_1-round3.h|  81 +
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   |   4 +-
 11 files changed, 1039 insertions(+), 76 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

-- 
2.27.0



Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-18 Thread Paul A. Clarke via Gcc-patches
On Tue, Oct 12, 2021 at 05:25:32PM -0500, Segher Boessenkool wrote:
> On Tue, Oct 12, 2021 at 02:35:57PM -0500, Paul A. Clarke wrote:
> > static __inline __attribute__ ((__always_inline__)) void
> > libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
> > {
> >   fenv_union_t old;
> >   register fenv_union_t __fr;
> >   __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
> >   ctx->env = old.fenv = __fr.fenv; 
> >   ctx->updated_status = (r != (old.l & 3));
> > }
> 
> (Should use "n", not "i", only numbers are allowed, not e.g. the address
> of something.  This actually can matter, in unusual cases.)

Noted, will submit a change to glibc when I get a chance. Thanks!

> This orders the updating of RN before the store to __fr.fenv .  There is
> no other ordering ensured here.
> 
> The store to __fr.env obviously has to stay in order with anything that
> can alias it, if that store isn't optimised away completely later.
> 
> > static __inline __attribute__ ((__always_inline__)) void
> > libc_feresetround_ppc (fenv_t *envp)
> > { 
> >   fenv_union_t new = { .fenv = *envp };
> >   register fenv_union_t __fr;
> >   __fr.l = new.l & 3;
> >   __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" 
> > (__fr.fenv));
> > }
> 
> This both reads from and stores to __fr.fenv, the asm has to stay
> between those two accesses (in the machine code).  If the code that
> actually depends on the modified RN depends onb that __fr.fenv some way,
> all will be fine.
> 
> > double
> > __sin (double x)
> > {
> >   struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
> >   libc_feholdsetround_ppc_ctx (, (0));
> >   /* floating point intensive code.  */
> >   return retval;
> > }
> 
> ... but there is no such dependency.  The cleanup attribute does not
> give any such ordering either afaik.
> 
> > There's not much to it, really.  "mffscrni" on the way in to save and set
> > a required rounding mode, and "mffscrn" on the way out to restore it.
> 
> Yes.  But the code making use of the modified RN needs to have some
> artificial dependencies with the RN setters, perhaps via __fr.fenv .
> 
> > > Calling a real function (that does not even need a stack frame, just a
> > > blr) is not terribly expensive, either.
> > 
> > Not ideal, better would be better.
> 
> Yes.  But at least it *works* :-)  I'll take a stupid, simply, stupidly
> simple, *robust* solution over some nice, faster,nicely faster way of
> doing the wrong thing.

Understand, and agree. 

> > > > > > Would creating a __builtin_mffsce be another solution?
> > > > > 
> > > > > Yes.  And not a bad idea in the first place.
> > > > 
> > > > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > > > difference between "asm" and builtin, how does using a builtin solve the
> > > > problem?
> > > 
> > > You will have to make the builtin solve it.  What a builtin can do is
> > > virtually unlimited.  What an asm can do is not: it just outputs some
> > > assembler language, and does in/out/clobber constraints.  You can do a
> > > *lot* with that, but it is much more limited than everything you can do
> > > in the compiler!  :-)
> > > 
> > > The fact remains that there is no way in RTL (or Gimple for that matter)
> > > to express things like rounding mode changes.  You will need to
> > > artificially make some barriers.
> > 
> > I know there is __builtin_set_fpscr_rn that generates mffscrn.
> 
> Or some mtfsb[01]'s, or nasty mffs/mtfsf code, yeah.  And it does not
> provide the ordering either.  It *cannot*: you need to cooperate with
> whatever you are ordering against.  There is no way in GCC to say "this
> is an FP insn and has to stay in order with all FP control writes and FP
> status reads".
> 
> Maybe now you see why I like external functions for this :-)
> 
> > This
> > is not used in the code above because I believe it first appears in
> > GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
> > a return value, which would be handy in this case).  Does the
> > implementation of that builtin meet the requirements needed here,
> > to prevent reordering of FP computation across instantiations of the
> > builtin?  If not, is there a model on which to base an implementation
> > of __builtin_mffsce (or some preferred name)?
> 
> It depends on what you are actually ordering, unfortunately.

What I hear is that for the specific requirements and restrictions here,
there is nothing special that another builtin, like a theoretical
__builtin_mffsce implemented like __builtin_fpscr_set_rn, can provide
to solve the issue under discussion.  The dependencies need to be expressed
such that the compiler understand them, and there is no way to do so
with the current implementation of __builtin_fpscr_set_rn.

With some effort, and proper visibility, the dependencies can be expressed
using "asm". I believe that's the case here, and will submit a v2 for
review shortly.

For the general case of inlines, 

Re: [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations

2021-10-18 Thread Paul A. Clarke via Gcc-patches
On Wed, Oct 13, 2021 at 06:47:21PM -0500, Segher Boessenkool wrote:
> On Wed, Oct 13, 2021 at 12:04:39PM -0500, Paul A. Clarke wrote:
> > On Mon, Oct 11, 2021 at 07:11:13PM -0500, Segher Boessenkool wrote:
> > > > - _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
> > > 
> > > Did this fail on p7?  If not, add a test that *does*?
> > 
> > Do you mean fail if not for "dg-require-effective-target p8vector_hw"?
> > We have that, in gcc/testsuite/gcc.target/powerpc/sse2-pmuludq-1.c.
> 
> "Some compatibility implementations of x86 intrinsics include
> Power intrinsics which require POWER8."
> 
> Plus, everything this patch does.  None of that would be needed if it
> worked on p7!

The tests that are permitted to compile/link on P7, gated by dg directives,
work on P7.

> So things in this patch are either not needed (so add noise only, and
> reduce functionality on older systems for no reason), or they do fix a
> bug.  It would be nice if we could have detected such bugs earlier.

Most, if not all of the intrinsics tests were originally limited to
P8 and up, 64bit, and little-endian. At your request, I have lowered
many of those restrictions in areas that are capable of support.
Such is the case here, to enable compiling and running as much as
possible on P7.

If you want a different approach, do let me know.

> > > > gcc
> > > > PR target/101893
> > > 
> > > This is a different bug (the vgbdd one)?
> > 
> > PR 101893 is the same issue: things not being properly masked by
> > #ifdefs.
> 
> But PR101893 does not mention anything you touch here, and this patch
> does not fix PR101893.  The main purpose of bug tracking systems is the
> tracking part!

The error message in PR101893 is in smmintrin.h:
| gcc/include/smmintrin.h:103:3: error: AltiVec argument passed to unprototyped 
function
| 
| That line is
| 
|   __charmask = vec_gb (__charmask);

smmintrin.h is changed by this patch, including `#ifdef _ARCH_PWR8` around
the code which has vec_gb.

PC


Re: [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations

2021-10-13 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 11, 2021 at 07:11:13PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:10PM -0500, Paul A. Clarke wrote:
> > Some compatibility implementations of x86 intrinsics include
> > Power intrinsics which require POWER8.  Guard them.
> 
> > emmintrin.h:
> > - _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
> >   but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
> >   The "POWER8" version works fine on pre-POWER8.
> 
> Huh.  It just generates xvcmpeqdp I suppose?

Yes.

> > - _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
> 
> Did this fail on p7?  If not, add a test that *does*?

Do you mean fail if not for "dg-require-effective-target p8vector_hw"?
We have that, in gcc/testsuite/gcc.target/powerpc/sse2-pmuludq-1.c.

> > pmmintrin.h:
> > - _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
> > - _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
> 
> Similar.

gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c
gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c

> > smmintrin.h:
> > - _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
> > - _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
> > - _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
> > tmmintrin.h:
> > - _mm_sign_epi8: vec_neg(v4si) uses vsububm.
> > - _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
> > - _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
> >   Note that the above three could actually be supported pre-POWER8,
> >   but current GCC does not support them before POWER8.
> > - _mm_sign_pi8: depends on _mm_sign_epi8.
> > - _mm_sign_pi16: depends on _mm_sign_epi16.
> > - _mm_sign_pi32: depends on _mm_sign_epi32.
> 
> And more.

gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c
- although this one will _actually_ fail on P7, as it only requires
"vsx_hw". I'll fix this.
gcc/testsuite/gcc.target/powerpc/ssse3-psignb.c
gcc/testsuite/gcc.target/powerpc/ssse3-psignw.c
gcc/testsuite/gcc.target/powerpc/ssse3-psignd.c

> > gcc
> > PR target/101893
> 
> This is a different bug (the vgbdd one)?

PR 101893 is the same issue: things not being properly masked by
#ifdefs.

> All looks good, but we need such failing tests :-)

Thanks for the review! Let me know what you mean by "failing tests".
("Would fail if not for ..."?)

PC


Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-12 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 11, 2021 at 05:04:12PM -0500, Segher Boessenkool wrote:
> On Mon, Oct 11, 2021 at 12:31:07PM -0500, Paul A. Clarke wrote:
> > On Mon, Oct 11, 2021 at 11:28:39AM -0500, Segher Boessenkool wrote:
> > > > Very similar methods are used in glibc today. Are those broken?
> > > 
> > > Maybe.
> > 
> > Ouch.
> 
> So show the code?

You asked for it. ;-)  Boiled down to remove macroisms and code that
should be removed by optimization:
--
static __inline __attribute__ ((__always_inline__)) void
libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
{
  fenv_union_t old;
  register fenv_union_t __fr;
  __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
  ctx->env = old.fenv = __fr.fenv; 
  ctx->updated_status = (r != (old.l & 3));
}
static __inline __attribute__ ((__always_inline__)) void
libc_feresetround_ppc (fenv_t *envp)
{ 
  fenv_union_t new = { .fenv = *envp };
  register fenv_union_t __fr;
  __fr.l = new.l & 3;
  __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" (__fr.fenv));
}
double
__sin (double x)
{
  struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
  libc_feholdsetround_ppc_ctx (, (0));
  /* floating point intensive code.  */
  return retval;
}
--

There's not much to it, really.  "mffscrni" on the way in to save and set
a required rounding mode, and "mffscrn" on the way out to restore it.

> > > If you get a real (i.e. not inline) function call there, that
> > > can save you often.
> > 
> > Calling a real function in order to execute a single instruction is
> > sub-optimal. ;-)
> 
> Calling a real function (that does not even need a stack frame, just a
> blr) is not terribly expensive, either.

Not ideal, better would be better.

> > > > Would creating a __builtin_mffsce be another solution?
> > > 
> > > Yes.  And not a bad idea in the first place.
> > 
> > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > difference between "asm" and builtin, how does using a builtin solve the
> > problem?
> 
> You will have to make the builtin solve it.  What a builtin can do is
> virtually unlimited.  What an asm can do is not: it just outputs some
> assembler language, and does in/out/clobber constraints.  You can do a
> *lot* with that, but it is much more limited than everything you can do
> in the compiler!  :-)
> 
> The fact remains that there is no way in RTL (or Gimple for that matter)
> to express things like rounding mode changes.  You will need to
> artificially make some barriers.

I know there is __builtin_set_fpscr_rn that generates mffscrn. This
is not used in the code above because I believe it first appears in
GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
a return value, which would be handy in this case).  Does the
implementation of that builtin meet the requirements needed here,
to prevent reordering of FP computation across instantiations of the
builtin?  If not, is there a model on which to base an implementation
of __builtin_mffsce (or some preferred name)?

PC


[COMMITTED] rs6000: Correct several errant dg-require-effective-target

2021-10-11 Thread Paul A. Clarke via Gcc-patches
I misspelled the dg-require-effective-target attribute "vsx_hw" in
recent commits, causing the effected tests to fail.  Correct the spelling.

2021-10-11  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Fix dg-require-effective-target.
* gcc.target/powerpc/sse4_1-packusdw.c: Likewise.
* gcc.target/powerpc/sse4_1-pmaxsb.c: Likewise.
* gcc.target/powerpc/sse4_1-pmaxsd.c: Likewise.
* gcc.target/powerpc/sse4_1-pmaxud.c: Likewise.
* gcc.target/powerpc/sse4_1-pmaxuw.c: Likewise.
* gcc.target/powerpc/sse4_1-pminsb.c: Likewise.
* gcc.target/powerpc/sse4_1-pminsd.c: Likewise.
* gcc.target/powerpc/sse4_1-pminud.c: Likewise.
* gcc.target/powerpc/sse4_1-pminuw.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovsxbd.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovsxbw.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovsxwd.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovzxbd.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovzxbq.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovzxbw.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovzxdq.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovzxwd.c: Likewise.
* gcc.target/powerpc/sse4_1-pmovzxwq.c: Likewise.
* gcc.target/powerpc/sse4_1-pmulld.c: Likewise.
* gcc.target/powerpc/sse4_2-pcmpgtq.c: Likewise.
* gcc.target/powerpc/sse4_1-phminposuw.c: Use correct
dg-require-effective-target.
---
Committed as obvious.

 gcc/testsuite/gcc.target/powerpc/pr78102.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c| 2 +-
 22 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr78102.c 
b/gcc/testsuite/gcc.target/powerpc/pr78102.c
index 68898c7f9428..434e677e1714 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr78102.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mvsx" } */
-/* { dg-require-effective-target powerpc_vsx_hw } */
+/* { dg-require-effective-target vsx_hw } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
index 8b757a267468..fe51003ba46b 100644
--- a/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -mvsx" } */
-/* { dg-require-effective-target powerpc_vsx_hw } */
+/* { dg-require-effective-target vsx_hw } */
 
 #ifndef CHECK_H
 #define CHECK_H "sse4_1-check.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
index 146df24777a9..150a8a0f5638 100644
--- a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -mvsx -Wno-psabi" } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #ifndef CHECK_H
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
index 33f168b712ea..2f5906d83cf7 100644
--- a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target powerpc_vsx_hw } */
+/* { dg-require-effective-target vsx_hw } */
 /* { dg-options "-O2 -mvsx" } */
 
 #ifndef CHECK_H
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
index 60b342587ddb..d196abedbebf 100644
--- 

Re: [COMMITTED v4 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-10-11 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 11, 2021 at 06:07:35PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:09PM -0500, Paul A. Clarke wrote:
> > gcc
> > * config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
> > _mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.
> > * config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.
> > 
> > gcc/testsuite
> > * gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
> > adjust dg directives to suit.
> > * gcc.target/powerpc/sse4_1-packusdw.c: Same.
> > * gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
> > * gcc.target/powerpc/sse4_1-pmuldq.c: Same.
> > * gcc.target/powerpc/sse4_1-pmulld.c: Same.
> > * gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
> > * gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386,
> > tweak to suit.
> 
> Okay for trunk (with the vsx_hw thing).  Thanks!

This was committed:

rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cmpeq_epi64
- _mm_mullo_epi32, _mm_mul_epi32
- _mm_packus_epi32
- _mm_cmpgt_epi64 (SSE4.2)

from gcc/testsuite/gcc.target/i386.

2021-10-11  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
_mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.
* config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-packusdw.c: Same.
* gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
* gcc.target/powerpc/sse4_1-pmuldq.c: Same.
* gcc.target/powerpc/sse4_1-pmulld.c: Same.
* gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
* gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386,
tweak to suit.
---
v4: Fix "space after cast" and "vsx_hw" issues, per Segher review.

diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
new file mode 100644
index ..20a70bee3776
--- /dev/null
+++ b/gcc/config/rs6000/nmmintrin.h
@@ -0,0 +1,40 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.  */
+#endif
+
+#ifndef _NMMINTRIN_H_INCLUDED
+#define _NMMINTRIN_H_INCLUDED
+
+/* We just include SSE4.1 header file.  */
+#include 
+
+#endif /* _NMMINTRIN_H_INCLUDED */
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index ad6b68e13cce..90ce03d22709 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -274,6 +274,15 @@ _mm_floor_ss (__m128 __A, __m128 __B)
   return __r;
 }
 
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_cmpeq ((__v2di) __X, (__v2di) __Y);
+}
+#endif
+
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_min_epi8 (__m128i __X, __m128i __Y)
@@ -332,6 +341,22 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
 
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mullo_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_mul ((__v4su) __X, (__v4su) __Y);
+}
+
+#ifdef _ARCH_PWR8
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_mule ((__v4si) __X, (__v4si) __Y);
+}
+#endif
+
+__inline __m128i
+__attribute__ 

Re: [COMMITTED v4 4/6] rs6000: Support SSE4.1 "cvt" intrinsics

2021-10-11 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 11, 2021 at 04:52:44PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:08PM -0500, Paul A. Clarke wrote:
[...]
> > +extern __inline __m128i
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_cvtepi8_epi16 (__m128i __A)
> > +{
> > +  return (__m128i) vec_unpackh ((__v16qi)__A);
> > +}
> 
> This strange mixture of sometimes writing a cast with a space and
> sometimes without one is...  strange :-)
> 
> Having up to three unpacks in a row seems suboptimal.  But it certainly
> is aesthetically pleasing :-)
> 
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> 
> Same as before here too (needs vsx_hw).
> 
> Okay for trunk with that fixed.  Thanks!

This was committed:

rs6000: Support SSE4.1 "cvt" intrinsics

Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cvtepi8_epi16, _mm_cvtepi8_epi32, _mm_cvtepi8_epi64
- _mm_cvtepi16_epi32, _mm_cvtepi16_epi64
- _mm_cvtepi32_epi64,
- _mm_cvtepu8_epi16, _mm_cvtepu8_epi32, _mm_cvtepu8_epi64
- _mm_cvtepu16_epi32, _mm_cvtepu16_epi64
- _mm_cvtepu32_epi64

from gcc/testsuite/gcc.target/i386.

sse4_1-pmovsxbd.c, sse4_1-pmovsxbq.c, and sse4_1-pmovsxbw.c were
modified from using "char" types to "signed char" types, because
the default is unsigned on powerpc.

2021-10-11  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cvtepi8_epi16, _mm_cvtepi8_epi32,
_mm_cvtepi8_epi64, _mm_cvtepi16_epi32, _mm_cvtepi16_epi64,
_mm_cvtepi32_epi64, _mm_cvtepu8_epi16, _mm_cvtepu8_epi32,
_mm_cvtepu8_epi64, _mm_cvtepu16_epi32, _mm_cvtepu16_epi64,
_mm_cvtepu32_epi64): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmovsxbd.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-pmovsxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwq.c: Same.
---
v4: Fix "space after cast" and "vsx_ok" issues, per Segher review.

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index f935ab060abc..ad6b68e13cce 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -330,6 +330,144 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
   return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
 }
 
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi16 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v16qi) __A);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi32 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi) __A);
+  return (__m128i) vec_unpackh ((__v8hi) __A);
+}
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi) __A);
+  __A = (__m128i) vec_unpackh ((__v8hi) __A);
+  return (__m128i) vec_unpackh ((__v4si) __A);
+}
+#endif
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi32 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v8hi) __A);
+}
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v8hi) __A);
+  return (__m128i) vec_unpackh ((__v4si) __A);
+}
+#endif
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi32_epi64 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v4si) __A);
+}
+#endif
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepu8_epi16 (__m128i __A)
+{
+  const __v16qu __zero = {0};
+#ifdef __LITTLE_ENDIAN__
+  __A = (__m128i) vec_mergeh ((__v16qu) __A, __zero);
+#else /* __BIG_ENDIAN__.  */
+  __A = (__m128i) vec_mergeh (__zero, (__v16qu) __A);
+#endif /* __BIG_ENDIAN__.  */
+  return __A;
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepu8_epi32 (__m128i __A)
+{
+  const __v16qu __zero = {0};
+#ifdef __LITTLE_ENDIAN__
+  __A = (__m128i) vec_mergeh ((__v16qu) __A, __zero);
+  __A = (__m128i) vec_mergeh ((__v8hu) __A, (__v8hu) __zero);
+#else /* __BIG_ENDIAN__.  */
+  __A = (__m128i) vec_mergeh (__zero, (__v16qu) __A);
+  __A = (__m128i) 

Re: [COMMITTED v4 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics

2021-10-11 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 11, 2021 at 03:50:31PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:07PM -0500, Paul A. Clarke wrote:
> > gcc
> > * config/rs6000/smmintrin.h (_mm_test_all_zeros,
> > _mm_test_all_ones, _mm_test_mix_ones_zeros): Replace.
> 
> "Replace" does not say what it is replaced with.  "Rewrite" maybe?
> 
> Okay for trunk either way.  Thanks!

This was committed:

Copy some simple redirections from i386 , for:
- _mm_test_all_zeros
- _mm_test_all_ones
- _mm_test_mix_ones_zeros

2021-10-11  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_test_all_zeros,
_mm_test_all_ones, _mm_test_mix_ones_zeros): Rewrite as macro.
--
v4: tweak commit message.

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index af782079cbcb..f935ab060abc 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -203,34 +203,12 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
   return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
 }
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
-}
+#define _mm_test_all_zeros(M, V) _mm_testz_si128 ((M), (V))
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_ones (__m128i __A)
-{
-  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
-  return vec_all_eq ((__v16qu) __A, __ones);
-}
+#define _mm_test_all_ones(V) \
+  _mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V)))
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
-  const int any_ones = vec_any_ne (__Amasked, __zero);
-  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
-  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
-  const int any_zeros = vec_any_ne (__notAmasked, __zero);
-  return any_ones * any_zeros;
-}
+#define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
 __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))


Re: [COMMITTED v4 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics

2021-10-11 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 11, 2021 at 02:28:15PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:06PM -0500, Paul A. Clarke wrote:
> > gcc
> > * config/rs6000/smmintrin.h (_mm_min_epi8, _mm_min_epu16,
> > _mm_min_epi32, _mm_min_epu32, _mm_max_epi8, _mm_max_epu16,
> > _mm_max_epi32, _mm_max_epu32): New.
> > 
> > gcc/testsuite
> > * gcc.target/powerpc/sse4_1-pmaxsb.c: Copy from gcc.target/i386.
> > * gcc.target/powerpc/sse4_1-pmaxsd.c: Same.
> > * gcc.target/powerpc/sse4_1-pmaxud.c: Same.
> > * gcc.target/powerpc/sse4_1-pmaxuw.c: Same.
> > * gcc.target/powerpc/sse4_1-pminsb.c: Same.
> > * gcc.target/powerpc/sse4_1-pminsd.c: Same.
> > * gcc.target/powerpc/sse4_1-pminud.c: Same.
> > * gcc.target/powerpc/sse4_1-pminuw.c: Same.
> 
> Okay for trunk.  Thanks!

The following was committed.

Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for _mm_min_epi8, _mm_min_epu16, _mm_min_epi32,
_mm_min_epu32, _mm_max_epi8, _mm_max_epu16, _mm_max_epi32, _mm_max_epu32
from gcc/testsuite/gcc.target/i386.

sse4_1-pmaxsb.c and sse4_1-pminsb.c were modified from using
"char" types to "signed char" types, because the default is unsigned on
powerpc.

2021-10-11  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_min_epi8, _mm_min_epu16,
_mm_min_epi32, _mm_min_epu32, _mm_max_epi8, _mm_max_epu16,
_mm_max_epi32, _mm_max_epu32): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmaxsb.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-pmaxsd.c: Same.
* gcc.target/powerpc/sse4_1-pmaxud.c: Same.
* gcc.target/powerpc/sse4_1-pmaxuw.c: Same.
* gcc.target/powerpc/sse4_1-pminsb.c: Same.
* gcc.target/powerpc/sse4_1-pminsd.c: Same.
* gcc.target/powerpc/sse4_1-pminud.c: Same.
* gcc.target/powerpc/sse4_1-pminuw.c: Same.
---
v4: I fixed more "space after cast" and "vsx_hw" issues.

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 3767a67eada7..af782079cbcb 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -296,6 +296,62 @@ _mm_floor_ss (__m128 __A, __m128 __B)
   return __r;
 }
 
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v16qi)__X, (__v16qi)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v8hu)__X, (__v8hu)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4si)__X, (__v4si)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4su)__X, (__v4su)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v16qi)__X, (__v16qi)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v8hu)__X, (__v8hu)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4si)__X, (__v4si)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
+}
+
 /* Return horizontal packed word minimum and its index in bits [15:0]
and bits [18:16] respectively.  */
 __inline __m128i
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
new file mode 100644
index ..33f168b712ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+#define NUM 1024
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x[NUM / 16];
+  signed char i[NUM];
+} dst, src1, src2;
+  int i, sign = 1;
+  signed char max;
+
+  for (i = 0; i < NUM; i++)
+{
+  src1.i[i] = i * i * sign;
+  src2.i[i] = (i + 20) * sign;
+  sign = -sign;
+}
+
+  for (i = 0; i < NUM; i += 16)
+dst.x[i / 16] = _mm_max_epi8 (src1.x[i / 16], src2.x[i / 16]);
+
+  for (i = 0; i < NUM; i++)
+{
+  max = src1.i[i] <= src2.i[i] ? src2.i[i] : src1.i[i];
+  if (max != 

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-11 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 11, 2021 at 11:28:39AM -0500, Segher Boessenkool wrote:
> On Mon, Oct 11, 2021 at 08:46:17AM -0500, Paul A. Clarke wrote:
> > On Fri, Oct 08, 2021 at 05:31:11PM -0500, Segher Boessenkool wrote:
[...]
> > > > With respect to volatile, I worry about removing it, because I do
> > > > indeed need that instruction to execute in order to clear the FPSCR
> > > > exception enable bits. That side-effect is not otherwise known to the
> > > > compiler.
> > > 
> > > Yes.  But as said above, volatile isn't enough to get this to behave
> > > correctly.
> > > 
> > > The easiest way out is to write this all in one piece of (inline) asm.
> > 
> > Ugh. I really don't want to go there, not just because it's work, but
> > I think this is a paradigm that should work without needing to drop
> > fully into asm.
> 
> Yes.  Let's say GCC still has some challenges here :-(
> 
> > Is there something unique about using an "asm" statement versus using,
> > say, a builtin like __builtin_mtfsf or a hypothetical __builtin_mffsce?
> 
> Nope.
> 
> > Very similar methods are used in glibc today. Are those broken?
> 
> Maybe.

Ouch.

> If you get a real (i.e. not inline) function call there, that
> can save you often.

Calling a real function in order to execute a single instruction is
sub-optimal. ;-)

> > Would creating a __builtin_mffsce be another solution?
> 
> Yes.  And not a bad idea in the first place.

The previous "Nope" and this "Yes" seem in contradiction. If there is no
difference between "asm" and builtin, how does using a builtin solve the
problem?

PC


Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-11 Thread Paul A. Clarke via Gcc-patches
On Fri, Oct 08, 2021 at 05:31:11PM -0500, Segher Boessenkool wrote:
> On Fri, Oct 08, 2021 at 02:27:28PM -0500, Paul A. Clarke wrote:
> > On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> > I see. Thanks for the reference. If I understand correctly, volatile
> > prevents some optimizations based on the defined inputs/outputs, but
> > the asm could still be subject to reordering.
> 
> "asm volatile" means there is a side effect in the asm.  This means that
> it has to be executed on the real machine the same as on the abstract
> machine, with the side effects in the same order.
> 
> It can still be reordered, modulo those restrictions.  It can be merged
> with an identical asm as well.  And the compiler can split this into two
> identical asms on two paths.

It seems odd to me that the compiler can make any assumptions about
the side-effect(s). How does it know that a side-effect does not alter
computation (as it indeed does in this case), such that reordering is
a still correct (which it wouldn't be in this case)?

> In this case you might want a side effect (the instructions writes to
> the FPSCR after all).  But you need this to be tied to the FP code that
> you want the flags to be changed for, and to the restore of the flags,
> and finally you need to prevent other FP code from being scheduled in
> between.
> 
> You need more for that than just volatile, and the solution may well
> make volatile not wanted: tying the insns together somehow will
> naturally make the flags restored to a sane situation again, so the
> whole group can be removed if you want, etc.
> 
> > In this particular case, I don't think it's an issue with respect to
> > reordering.  The code in question is:
> > +  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > +  __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > 
> > The output (__fpscr_save) is a source for the following assignment,
> > so the order should be respected, no?
> 
> Other FP code can be interleaved, and then do the wrong thing.
> 
> > With respect to volatile, I worry about removing it, because I do
> > indeed need that instruction to execute in order to clear the FPSCR
> > exception enable bits. That side-effect is not otherwise known to the
> > compiler.
> 
> Yes.  But as said above, volatile isn't enough to get this to behave
> correctly.
> 
> The easiest way out is to write this all in one piece of (inline) asm.

Ugh. I really don't want to go there, not just because it's work, but
I think this is a paradigm that should work without needing to drop
fully into asm.

Is there something unique about using an "asm" statement versus using,
say, a builtin like __builtin_mtfsf or a hypothetical __builtin_mffsce?
Very similar methods are used in glibc today. Are those broken?

Would creating a __builtin_mffsce be another solution?

Would adding memory barriers between the FPSCR manipulations and the
code which is bracketed by them be sufficient?

PC


Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-08 Thread Paul A. Clarke via Gcc-patches
On Fri, Oct 08, 2021 at 12:39:15PM -0500, Segher Boessenkool wrote:
> On Thu, Oct 07, 2021 at 08:04:23PM -0500, Paul A. Clarke wrote:
> > On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> > > > +  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > > 
> > > The __volatile__ does likely not do what you want.  As far as I can see
> > > you do not want one here anyway?
> > > 
> > > "volatile" does not order asm wrt fp insns, which you likely *do* want.
> > 
> > Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
> > has no effect at all (6.47.1):
> > 
> > | The optional volatile qualifier has no effect. All basic asm blocks are
> > | implicitly volatile.
> > 
> > So, it could be removed without concern.
> 
> This is not a basic asm (it contains a ":"; that is not just an easy way
> to see it, it is the *definition* of basic vs. extended asm).

Ah, basic vs extended. I learned something today... thanks for your
patience!

> The manual explains:
> 
> """
> Note that the compiler can move even 'volatile asm' instructions
> relative to other code, including across jump instructions.  For
> example, on many targets there is a system register that controls the
> rounding mode of floating-point operations.  Setting it with a 'volatile
> asm' statement, as in the following PowerPC example, does not work
> reliably.
> 
>  asm volatile("mtfsf 255, %0" : : "f" (fpenv));
>  sum = x + y;
> 
> The compiler may move the addition back before the 'volatile asm'
> statement.  To make it work as expected, add an artificial dependency to
> the 'asm' by referencing a variable in the subsequent code, for example:
> 
>  asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
>  sum = x + y;
> """

I see. Thanks for the reference. If I understand correctly, volatile
prevents some optimizations based on the defined inputs/outputs, but
the asm could still be subject to reordering.

In this particular case, I don't think it's an issue with respect to
reordering.  The code in question is:
+  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+  __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;

The output (__fpscr_save) is a source for the following assignment,
so the order should be respected, no?

With respect to volatile, I worry about removing it, because I do
indeed need that instruction to execute in order to clear the FPSCR
exception enable bits. That side-effect is not otherwise known to the
compiler.

> > > You do not need any of that __ either.
> > 
> > I'm surprised that I don't. A .h file needs to be concerned about the
> > namespace it inherits, no?
> 
> These are local variables in a function though.  You get such
> complexities in macros, but never in functions, where everything is
> scoped.  Local variables are a great thing.  And macros are a bad thing!

They are local variables in a function *in an include file*, though.
If a user's preprocessor macro just happens to match a local variable name
there could be problems, right?

a.h:
inline void foo () {
  int A = 0;
}

a.c:
#define A a+b
#include 

$ gcc -c -I. a.c
In file included from a.c:1:
a.c: In function ‘foo’:
a.h:1:12: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘+’ 
token
 #define A a+b
^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
 ^
a.h:1:13: error: ‘b’ undeclared (first use in this function)
 #define A a+b
 ^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
 ^
a.h:1:13: note: each undeclared identifier is reported only once for each 
function it appears in
 #define A a+b
 ^
a.c:2:17: note: in expansion of macro ‘A’
 int foo() { int A = 0; }
 ^
PC


RE: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-07 Thread Paul A. Clarke via Gcc-patches
On Thu, Oct 07, 2021 at 06:39:06PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:05PM -0500, Paul A. Clarke wrote:
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible.
> 
> __builtin_set_fpscr_rn makes optimised code (using mtfsb[01])
> automatically, fwiw.
> 
> > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > analogous implementations in config/i386/smmintrin.h.
> 
> Hrm.  Using function-like macros is begging for trouble, as usual.  But
> the x86 version does this, so meh.
> 
> > +extern __inline __m128d
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_pd (__m128d __A, int __rounding)
> > +{
> > +  __v2df __r;
> > +  union {
> > +double __fr;
> > +long long __fpscr;
> > +  } __enables_save, __fpscr_save;
> > +
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +{
> > +  /* Save enabled exceptions, disable all exceptions,
> > +and preserve the rounding mode.  */
> > +#ifdef _ARCH_PWR9
> > +  __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> 
> The __volatile__ does likely not do what you want.  As far as I can see
> you do not want one here anyway?
> 
> "volatile" does not order asm wrt fp insns, which you likely *do* want.

Reading the GCC docs, it looks like the "volatile" qualifier for "asm"
has no effect at all (6.47.1):

| The optional volatile qualifier has no effect. All basic asm blocks are
| implicitly volatile.

So, it could be removed without concern.

> > +  __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] };
> 
> You put spaces after only some casts, btw?  Well maybe I found the one
> place you did it wrong, heh :-)  And you can avoid having so many parens
> by making extra variables -- much more readable.

I'll fix this.

> > +  switch (__rounding)
> 
> You do not need any of that __ either.

I'm surprised that I don't. A .h file needs to be concerned about the
namespace it inherits, no?

> > +/* { dg-do run } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> 
> "dg-do run" requires vsx_hw, not just vsx_ok.  Testing on a machine
> without VSX (so before p7) would have shown that, but do you have access
> to any?  This is one of those things we are only told about a year after
> it was added, because no one who tests often does that on so old
> hardware :-)
> 
> So, okay for trunk (and backports after some burn-in) with that vsx_ok
> fixed.  That asm needs fixing, but you can do that later.

OK.

Thanks!

PC


Re: [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics

2021-10-07 Thread Paul A. Clarke via Gcc-patches
On Thu, Oct 07, 2021 at 05:25:54PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 23, 2021 at 02:03:04PM -0500, Paul A. Clarke wrote:
> > v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
> 
> There should not be a "v3" in the commit message.  The easy way to
> achieve this is put it inside the [] in the subject (as you did), and to
> mention the version history after a --- (see --notes for git-format-patch
> for example).

This is just a cover letter. Does it matter in that context?
(I have done as described in the patches which followed.)

> > Tested ppc64le (POWER9) and ppc64/32 (POWER7).
> 
> Please write the full triples -- well at least enough that they are
> usable, like, powerpc64-linux.  I'll assume you tested on Linux :-)

Yes, sorry.  All are "-linux", and I'll try to remember that for next time.

PC


Re: [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics

2021-10-04 Thread Paul A. Clarke via Gcc-patches
Ping.

On Thu, Sep 16, 2021 at 09:59:39AM -0500, Paul A. Clarke via Gcc-patches wrote:
> Ping.
> 
> On Mon, Aug 23, 2021 at 02:03:04PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
> > and users will expect to be able to include "nmmintrin.h",
> > even though "nmmintrin.h" just includes "smmintrin.h"
> > where all of the SSE4.2 implementations actually appear.
> > 
> > Only patch 5/6 changed from v2.
> > 
> > Tested ppc64le (POWER9) and ppc64/32 (POWER7).
> > 
> > OK for trunk?
> > 
> > Paul A. Clarke (6):
> >   rs6000: Support SSE4.1 "round" intrinsics
> >   rs6000: Support SSE4.1 "min" and "max" intrinsics
> >   rs6000: Simplify some SSE4.1 "test" intrinsics
> >   rs6000: Support SSE4.1 "cvt" intrinsics
> >   rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics
> >   rs6000: Guard some x86 intrinsics implementations
> > 
> >  gcc/config/rs6000/emmintrin.h |  12 +-
> >  gcc/config/rs6000/nmmintrin.h |  40 ++
> >  gcc/config/rs6000/pmmintrin.h |   4 +
> >  gcc/config/rs6000/smmintrin.h | 427 --
> >  gcc/config/rs6000/tmmintrin.h |  12 +
> >  gcc/testsuite/gcc.target/powerpc/pr78102.c|  23 +
> >  .../gcc.target/powerpc/sse4_1-packusdw.c  |  73 +++
> >  .../gcc.target/powerpc/sse4_1-pcmpeqq.c   |  46 ++
> >  .../gcc.target/powerpc/sse4_1-pmaxsb.c|  46 ++
> >  .../gcc.target/powerpc/sse4_1-pmaxsd.c|  46 ++
> >  .../gcc.target/powerpc/sse4_1-pmaxud.c|  47 ++
> >  .../gcc.target/powerpc/sse4_1-pmaxuw.c|  47 ++
> >  .../gcc.target/powerpc/sse4_1-pminsb.c|  46 ++
> >  .../gcc.target/powerpc/sse4_1-pminsd.c|  46 ++
> >  .../gcc.target/powerpc/sse4_1-pminud.c|  47 ++
> >  .../gcc.target/powerpc/sse4_1-pminuw.c|  47 ++
> >  .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
> >  .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
> >  .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
> >  .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
> >  .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
> >  .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
> >  .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
> >  .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
> >  .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
> >  .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
> >  .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
> >  .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
> >  .../gcc.target/powerpc/sse4_1-pmuldq.c|  51 +++
> >  .../gcc.target/powerpc/sse4_1-pmulld.c|  46 ++
> >  .../gcc.target/powerpc/sse4_1-round3.h|  81 
> >  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
> >  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 
> >  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
> >  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
> >  .../gcc.target/powerpc/sse4_2-check.h |  18 +
> >  .../gcc.target/powerpc/sse4_2-pcmpgtq.c   |  46 ++
> >  37 files changed, 2407 insertions(+), 59 deletions(-)
> >  create mode 100644 gcc/config/rs6000/nmmintrin.h
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
> >  create mode 100644 gcc/testsuite/gcc.targe

Re: [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics

2021-09-16 Thread Paul A. Clarke via Gcc-patches
Ping.

On Mon, Aug 23, 2021 at 02:03:04PM -0500, Paul A. Clarke via Gcc-patches wrote:
> v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
> and users will expect to be able to include "nmmintrin.h",
> even though "nmmintrin.h" just includes "smmintrin.h"
> where all of the SSE4.2 implementations actually appear.
> 
> Only patch 5/6 changed from v2.
> 
> Tested ppc64le (POWER9) and ppc64/32 (POWER7).
> 
> OK for trunk?
> 
> Paul A. Clarke (6):
>   rs6000: Support SSE4.1 "round" intrinsics
>   rs6000: Support SSE4.1 "min" and "max" intrinsics
>   rs6000: Simplify some SSE4.1 "test" intrinsics
>   rs6000: Support SSE4.1 "cvt" intrinsics
>   rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics
>   rs6000: Guard some x86 intrinsics implementations
> 
>  gcc/config/rs6000/emmintrin.h |  12 +-
>  gcc/config/rs6000/nmmintrin.h |  40 ++
>  gcc/config/rs6000/pmmintrin.h |   4 +
>  gcc/config/rs6000/smmintrin.h | 427 --
>  gcc/config/rs6000/tmmintrin.h |  12 +
>  gcc/testsuite/gcc.target/powerpc/pr78102.c|  23 +
>  .../gcc.target/powerpc/sse4_1-packusdw.c  |  73 +++
>  .../gcc.target/powerpc/sse4_1-pcmpeqq.c   |  46 ++
>  .../gcc.target/powerpc/sse4_1-pmaxsb.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-pmaxsd.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-pmaxud.c|  47 ++
>  .../gcc.target/powerpc/sse4_1-pmaxuw.c|  47 ++
>  .../gcc.target/powerpc/sse4_1-pminsb.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-pminsd.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-pminud.c|  47 ++
>  .../gcc.target/powerpc/sse4_1-pminuw.c|  47 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
>  .../gcc.target/powerpc/sse4_1-pmuldq.c|  51 +++
>  .../gcc.target/powerpc/sse4_1-pmulld.c|  46 ++
>  .../gcc.target/powerpc/sse4_1-round3.h|  81 
>  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
>  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 
>  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
>  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
>  .../gcc.target/powerpc/sse4_2-check.h |  18 +
>  .../gcc.target/powerpc/sse4_2-pcmpgtq.c   |  46 ++
>  37 files changed, 2407 insertions(+), 59 deletions(-)
>  create mode 100644 gcc/config/rs6000/nmmintrin.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-30 Thread Paul A. Clarke via Gcc-patches
On Fri, Aug 27, 2021 at 08:44:43AM -0500, Bill Schmidt via Gcc-patches wrote:
> On 8/23/21 2:03 PM, Paul A. Clarke wrote:
> > +   __fpscr_save.__fr = __builtin_mffsl ();
> 
> As pointed out in the v1 review, __builtin_mffsl is enabled (or supposed to
> be) only for POWER9 and later.  This will fail to work on POWER8 and earlier
> when the new builtins support is complete and this is enforced more
> carefully.  Please #ifdef and use __builtin_mffs on earlier processors. 
> Please do this everywhere this occurs.
> 
> I think you got some contradictory guidance on this, but trust me, this will
> break.

The confusing thing is that _builtin_mffsl is explicitly supported on earlier
processors, if I read the code right (from gcc/config/rs6000/rs6000.md):
--
(define_expand "rs6000_mffsl"
  [(set (match_operand:DF 0 "gpc_reg_operand")
(unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSL))]
  "TARGET_HARD_FLOAT"
{
  /* If the low latency mffsl instruction (ISA 3.0) is available use it,
 otherwise fall back to the older mffs instruction to emulate the mffsl
 instruction.  */
  
  if (!TARGET_P9_MISC)
{
  rtx tmp1 = gen_reg_rtx (DFmode);

  /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl 
 instruction using the mffs instruction and masking the result.  */
  emit_insn (gen_rs6000_mffs (tmp1));
...
--

Is that going away?  If so, that would be a possible (undesirable?)
API change, no?

PC


Re: [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-08-27 Thread Paul A. Clarke via Gcc-patches
On Fri, Aug 27, 2021 at 10:21:35AM -0500, Bill Schmidt via Gcc-patches wrote:
> On 8/23/21 2:03 PM, Paul A. Clarke wrote:
> > Function signatures and decorations match gcc/config/i386/smmintrin.h.

> > gcc

> > * config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.

> > ---
> > v3:
> > - Add nmmintrin.h. _mm_cmpgt_epi64 is part of SSE4.2, which is
> >ostensibly defined in nmmintrin.h. Following the i386 implementation,
> >however, nmmintrin.h only includes smmintrin.h, and the actual
> >implementations appear there.

> > v2:
> > - Added "extern" to functions to maintain compatible decorations with
> >like implementations in gcc/config/i386.

> > diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
> > new file mode 100644
> > index ..20a70bee3776
> > --- /dev/null
> > +++ b/gcc/config/rs6000/nmmintrin.h
> > @@ -0,0 +1,40 @@
> > +/* Copyright (C) 2021 Free Software Foundation, Inc.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify
> > +   it under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +   GNU General Public License for more details.
> > +
> > +   Under Section 7 of GPL version 3, you are granted additional
> > +   permissions described in the GCC Runtime Library Exception, version
> > +   3.1, as published by the Free Software Foundation.
> > +
> > +   You should have received a copy of the GNU General Public License and
> > +   a copy of the GCC Runtime Library Exception along with this program;
> > +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +   .  */
> > +
> > +#ifndef NO_WARN_X86_INTRINSICS
> > +/* This header is distributed to simplify porting x86_64 code that
> > +   makes explicit use of Intel intrinsics to powerpc64le.
> > +   It is the user's responsibility to determine if the results are
> > +   acceptable and make additional changes as necessary.
> > +   Note that much code that uses Intel intrinsics can be rewritten in
> > +   standard C or GNU C extensions, which are more portable and better
> > +   optimized across multiple targets.  */
> > +#endif
> > +
> > +#ifndef _NMMINTRIN_H_INCLUDED
> > +#define _NMMINTRIN_H_INCLUDED
> > +
> > +/* We just include SSE4.1 header file.  */
> > +#include 
> > +
> > +#endif /* _NMMINTRIN_H_INCLUDED */
> 
> Should there be something in here indicating that nmmintrin.h is for SSE
> 4.2?  Otherwise it's a bit of a head-scratcher to a new person wondering why
> this file exists.  No big deal either way.

For good or bad, I have been trying to minimize differences with the
analogous i386 files.  With the exception of the copyright and our annoying
litte warning, the only difference was this comment:

--
/* Implemented from the specification included in the Intel C++ Compiler
   User Guide and Reference, version 10.0.  */
--

I didn't find that (1) accurate, since there are no implementations therein,
or (2) particularly informative, as I imagine that document has a much
bigger scope than SSE4.2.  And keeping it would be a bit misleading, I think.
So, I intentionally removed the comment.

> This looks fine to me with or without that.  Recommend approval.

Thanks for the review!

PC


[PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-23 Thread Paul A. Clarke via Gcc-patches
Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible.

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.
---
v3: No change.
v2:
- Replaced clever (and broken) exception masking with more straightforward
  implementation, per v1 review and closer inspection. mtfsf was only
  writing the final nybble (1) instead of the final two nybbles (2), so
  not all of the exception-enable bits were cleared.
- Renamed some variables from cryptic "tmp" and "save" to
  "fpscr_save" and "enables_save".
- Retained use of __builtin_mffsl, since that is supported pre-POWER8
  (with an alternate instruction sequence).
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Added some additional text to the commit message about some of the
  (unpleasant?) implementations and decorations coming from
  like implementations in gcc/config/i386, per v1 review.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Fixed indentation and other minor formatting changes, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 240 +++-
 .../gcc.target/powerpc/sse4_1-round3.h|  81 ++
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 +++
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 ++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 ++
 6 files changed, 962 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 3767a67eada7..a6b88d313ad0 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,182 @@
 #include 
 #include 
 
+/* Rounding mode macros. */
+#define _MM_FROUND_TO_NEAREST_INT   0x00
+#define _MM_FROUND_TO_ZERO  0x01
+#define _MM_FROUND_TO_POS_INF   0x02
+#define _MM_FROUND_TO_NEG_INF   0x03
+#define _MM_FROUND_CUR_DIRECTION0x04
+
+#define _MM_FROUND_NINT\
+  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_FLOOR   \
+  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_CEIL\
+  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_TRUNC   \
+  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_RINT\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_NEARBYINT   \
+  

[PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-08-23 Thread Paul A. Clarke via Gcc-patches
Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cmpeq_epi64
- _mm_mullo_epi32, _mm_mul_epi32
- _mm_packus_epi32
- _mm_cmpgt_epi64 (SSE4.2)

from gcc/testsuite/gcc.target/i386.

2021-08-23  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
_mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.
* config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-packusdw.c: Same.
* gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
* gcc.target/powerpc/sse4_1-pmuldq.c: Same.
* gcc.target/powerpc/sse4_1-pmulld.c: Same.
* gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
* gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386,
tweak to suit.
---
v3:
- Add nmmintrin.h. _mm_cmpgt_epi64 is part of SSE4.2, which is
  ostensibly defined in nmmintrin.h. Following the i386 implementation,
  however, nmmintrin.h only includes smmintrin.h, and the actual
  implementations appear there.
- Add sse4_2-check.h, required by sse4_2-pcmpgtq.c. My testing was
  obviously inadequate.
v2:
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/nmmintrin.h | 40 ++
 gcc/config/rs6000/smmintrin.h | 41 +++
 gcc/testsuite/gcc.target/powerpc/pr78102.c| 23 ++
 .../gcc.target/powerpc/sse4_1-packusdw.c  | 73 +++
 .../gcc.target/powerpc/sse4_1-pcmpeqq.c   | 46 
 .../gcc.target/powerpc/sse4_1-pmuldq.c| 51 +
 .../gcc.target/powerpc/sse4_1-pmulld.c| 46 
 .../gcc.target/powerpc/sse4_2-check.h | 18 +
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   | 46 
 9 files changed, 384 insertions(+)
 create mode 100644 gcc/config/rs6000/nmmintrin.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-check.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c

diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
new file mode 100644
index ..20a70bee3776
--- /dev/null
+++ b/gcc/config/rs6000/nmmintrin.h
@@ -0,0 +1,40 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.  */
+#endif
+
+#ifndef _NMMINTRIN_H_INCLUDED
+#define _NMMINTRIN_H_INCLUDED
+
+/* We just include SSE4.1 header file.  */
+#include 
+
+#endif /* _NMMINTRIN_H_INCLUDED */
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index fdef6674d16c..c04d2bb5b6d3 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -386,6 +386,15 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
 
 #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
+{
+  return 

[PATCH v3 4/6] rs6000: Support SSE4.1 "cvt" intrinsics

2021-08-23 Thread Paul A. Clarke via Gcc-patches
Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cvtepi8_epi16, _mm_cvtepi8_epi32, _mm_cvtepi8_epi64
- _mm_cvtepi16_epi32, _mm_cvtepi16_epi64
- _mm_cvtepi32_epi64,
- _mm_cvtepu8_epi16, _mm_cvtepu8_epi32, _mm_cvtepu8_epi64
- _mm_cvtepu16_epi32, _mm_cvtepu16_epi64
- _mm_cvtepu32_epi64

from gcc/testsuite/gcc.target/i386.

sse4_1-pmovsxbd.c, sse4_1-pmovsxbq.c, and sse4_1-pmovsxbw.c were
modified from using "char" types to "signed char" types, because
the default is unsigned on powerpc.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cvtepi8_epi16, _mm_cvtepi8_epi32,
_mm_cvtepi8_epi64, _mm_cvtepi16_epi32, _mm_cvtepi16_epi64,
_mm_cvtepi32_epi64, _mm_cvtepu8_epi16, _mm_cvtepu8_epi32,
_mm_cvtepu8_epi64, _mm_cvtepu16_epi32, _mm_cvtepu16_epi64,
_mm_cvtepu32_epi64): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmovsxbd.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-pmovsxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwq.c: Same.
---
v3: No change.
v2:
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 138 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
 13 files changed, 648 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 363534cb06a2..fdef6674d16c 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -442,6 +442,144 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
   return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
 }
 
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi16 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v16qi)__A);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi32 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi)__A);
+  return (__m128i) vec_unpackh ((__v8hi)__A);
+}
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi)__A);
+  __A = (__m128i) vec_unpackh ((__v8hi)__A);
+  return (__m128i) vec_unpackh ((__v4si)__A);
+}
+#endif
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi32 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v8hi)__A);
+}
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v8hi)__A);
+  return (__m128i) vec_unpackh ((__v4si)__A);
+}
+#endif
+

[PATCH v3 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics

2021-08-23 Thread Paul A. Clarke via Gcc-patches
Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for _mm_min_epi8, _mm_min_epu16, _mm_min_epi32,
_mm_min_epu32, _mm_max_epi8, _mm_max_epu16, _mm_max_epi32, _mm_max_epu32
from gcc/testsuite/gcc.target/i386.

sse4_1-pmaxsb.c and sse4_1-pminsb.c were modified from using
"char" types to "signed char" types, because the default is unsigned on
powerpc.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_min_epi8, _mm_min_epu16,
_mm_min_epi32, _mm_min_epu32, _mm_max_epi8, _mm_max_epu16,
_mm_max_epi32, _mm_max_epu32): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmaxsb.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-pmaxsd.c: Same.
* gcc.target/powerpc/sse4_1-pmaxud.c: Same.
* gcc.target/powerpc/sse4_1-pmaxuw.c: Same.
* gcc.target/powerpc/sse4_1-pminsb.c: Same.
* gcc.target/powerpc/sse4_1-pminsd.c: Same.
* gcc.target/powerpc/sse4_1-pminud.c: Same.
* gcc.target/powerpc/sse4_1-pminuw.c: Same.
---
v3: No change.
v2:
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 56 +++
 .../gcc.target/powerpc/sse4_1-pmaxsb.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pmaxsd.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pmaxud.c| 47 
 .../gcc.target/powerpc/sse4_1-pmaxuw.c| 47 
 .../gcc.target/powerpc/sse4_1-pminsb.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pminsd.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pminud.c| 47 
 .../gcc.target/powerpc/sse4_1-pminuw.c| 47 
 9 files changed, 428 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index a6b88d313ad0..505fe4ce22a8 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -408,6 +408,62 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v16qi)__X, (__v16qi)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v8hu)__X, (__v8hu)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4si)__X, (__v4si)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4su)__X, (__v4su)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v16qi)__X, (__v16qi)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v8hu)__X, (__v8hu)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4si)__X, (__v4si)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
+}
+
 /* Return horizontal packed word minimum and its index in bits [15:0]
and bits [18:16] respectively.  */
 __inline __m128i
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
new file mode 100644
index ..7a465b01dd11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST 

[PATCH v3 0/6] rs6000: Support more SSE4 intrinsics

2021-08-23 Thread Paul A. Clarke via Gcc-patches
v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
and users will expect to be able to include "nmmintrin.h",
even though "nmmintrin.h" just includes "smmintrin.h"
where all of the SSE4.2 implementations actually appear.

Only patch 5/6 changed from v2.

Tested ppc64le (POWER9) and ppc64/32 (POWER7).

OK for trunk?

Paul A. Clarke (6):
  rs6000: Support SSE4.1 "round" intrinsics
  rs6000: Support SSE4.1 "min" and "max" intrinsics
  rs6000: Simplify some SSE4.1 "test" intrinsics
  rs6000: Support SSE4.1 "cvt" intrinsics
  rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics
  rs6000: Guard some x86 intrinsics implementations

 gcc/config/rs6000/emmintrin.h |  12 +-
 gcc/config/rs6000/nmmintrin.h |  40 ++
 gcc/config/rs6000/pmmintrin.h |   4 +
 gcc/config/rs6000/smmintrin.h | 427 --
 gcc/config/rs6000/tmmintrin.h |  12 +
 gcc/testsuite/gcc.target/powerpc/pr78102.c|  23 +
 .../gcc.target/powerpc/sse4_1-packusdw.c  |  73 +++
 .../gcc.target/powerpc/sse4_1-pcmpeqq.c   |  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxsb.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxsd.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxud.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pmaxuw.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pminsb.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pminsd.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pminud.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pminuw.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmuldq.c|  51 +++
 .../gcc.target/powerpc/sse4_1-pmulld.c|  46 ++
 .../gcc.target/powerpc/sse4_1-round3.h|  81 
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
 .../gcc.target/powerpc/sse4_2-check.h |  18 +
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   |  46 ++
 37 files changed, 2407 insertions(+), 59 deletions(-)
 create mode 100644 gcc/config/rs6000/nmmintrin.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
 create mode 100644 

[PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations

2021-08-23 Thread Paul A. Clarke via Gcc-patches
Some compatibility implementations of x86 intrinsics include
Power intrinsics which require POWER8.  Guard them.

emmintrin.h:
- _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
  but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
  The "POWER8" version works fine on pre-POWER8.
- _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
pmmintrin.h:
- _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
- _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
smmintrin.h:
- _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
- _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
- _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
tmmintrin.h:
- _mm_sign_epi8: vec_neg(v4si) uses vsububm.
- _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
- _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
  Note that the above three could actually be supported pre-POWER8,
  but current GCC does not support them before POWER8.
- _mm_sign_pi8: depends on _mm_sign_epi8.
- _mm_sign_pi16: depends on _mm_sign_epi16.
- _mm_sign_pi32: depends on _mm_sign_epi32.

2021-08-20  Paul A. Clarke  

gcc
PR target/101893
* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
* config/rs6000/pmmintrin.h: Same.
* config/rs6000/smmintrin.h: Same.
* config/rs6000/tmmintrin.h: Same.
---
v3: No change.
v2:
- Ensured that new "#ifdef _ARCH_PWR8" bracket each function so
  impacted, rather than groups of functions, per v1 review.
- Noted testing in patch series cover letter.
- Added PR number to commit message.

 gcc/config/rs6000/emmintrin.h | 12 ++--
 gcc/config/rs6000/pmmintrin.h |  4 
 gcc/config/rs6000/smmintrin.h |  4 
 gcc/config/rs6000/tmmintrin.h | 12 
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ce1287edf782..32ad72b4cc35 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpord_pd (__m128d __A, __m128d __B)
 {
-#if _ARCH_PWR8
   __v2du c, d;
   /* Compare against self will return false (0's) if NAN.  */
   c = (__v2du)vec_cmpeq (__A, __A);
   d = (__v2du)vec_cmpeq (__B, __B);
-#else
-  __v2du a, b;
-  __v2du c, d;
-  const __v2du double_exp_mask  = {0x7ff0, 0x7ff0};
-  a = (__v2du)vec_abs ((__v2df)__A);
-  b = (__v2du)vec_abs ((__v2df)__B);
-  c = (__v2du)vec_cmpgt (double_exp_mask, a);
-  d = (__v2du)vec_cmpgt (double_exp_mask, b);
-#endif
   /* A != NAN and B != NAN.  */
   return ((__m128d)vec_and(c, d));
 }
@@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B)
   return ((__m64)a * (__m64)b);
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_mul_epu32 (__m128i __A, __m128i __B)
 {
@@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B)
   return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B);
 #endif
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_slli_epi16 (__m128i __A, int __B)
diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h
index eab712fdfa66..83dff1d85666 100644
--- a/gcc/config/rs6000/pmmintrin.h
+++ b/gcc/config/rs6000/pmmintrin.h
@@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y)
vec_mergel ((__v2df) __X, (__v2df)__Y));
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movehdup_ps (__m128 __X)
 {
   return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_moveldup_ps (__m128 __X)
 {
   return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loaddup_pd (double const *__P)
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index c04d2bb5b6d3..29719367e205 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -272,6 +272,7 @@ _mm_extract_ps (__m128 __X, const int __N)
   return ((__v4si)__X)[__N & 3];
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
 {
@@ -283,6 +284,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
   #endif
   return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
@@ -343,6 +345,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
   return (__m128d) __r;
 }
 
+#ifdef _ARCH_PWR8
 

[PATCH v3 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics

2021-08-23 Thread Paul A. Clarke via Gcc-patches
Copy some simple redirections from i386 , for:
- _mm_test_all_zeros
- _mm_test_all_ones
- _mm_test_mix_ones_zeros

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_test_all_zeros,
_mm_test_all_ones, _mm_test_mix_ones_zeros): Replace.
---
v3: No change.
v2:
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 505fe4ce22a8..363534cb06a2 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -379,34 +379,12 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
   return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
 }
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
-}
+#define _mm_test_all_zeros(M, V) _mm_testz_si128 ((M), (V))
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_ones (__m128i __A)
-{
-  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
-  return vec_all_eq ((__v16qu) __A, __ones);
-}
+#define _mm_test_all_ones(V) \
+  _mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V)))
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
-  const int any_ones = vec_any_ne (__Amasked, __zero);
-  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
-  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
-  const int any_zeros = vec_any_ne (__notAmasked, __zero);
-  return any_ones * any_zeros;
-}
+#define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-- 
2.27.0



[PATCH v2 6/6] rs6000: Guard some x86 intrinsics implementations

2021-08-20 Thread Paul A. Clarke via Gcc-patches
Some compatibility implementations of x86 intrinsics include
Power intrinsics which require POWER8.  Guard them.

emmintrin.h:
- _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
  but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
  The "POWER8" version works fine on pre-POWER8.
- _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
pmmintrin.h:
- _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
- _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
smmintrin.h:
- _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
- _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
- _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
tmmintrin.h:
- _mm_sign_epi8: vec_neg(v4si) uses vsububm.
- _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
- _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
  Note that the above three could actually be supported pre-POWER8,
  but current GCC does not support them before POWER8.
- _mm_sign_pi8: depends on _mm_sign_epi8.
- _mm_sign_pi16: depends on _mm_sign_epi16.
- _mm_sign_pi32: depends on _mm_sign_epi32.

2021-08-20  Paul A. Clarke  

gcc
PR target/101893
* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
* config/rs6000/pmmintrin.h: Same.
* config/rs6000/smmintrin.h: Same.
* config/rs6000/tmmintrin.h: Same.
---
v2:
- Ensured that new "#ifdef _ARCH_PWR8" bracket each function so
  impacted, rather than groups of functions, per v1 review.
- Noted testing in patch series cover letter.
- Added PR number to commit message.

 gcc/config/rs6000/emmintrin.h | 12 ++--
 gcc/config/rs6000/pmmintrin.h |  4 
 gcc/config/rs6000/smmintrin.h |  4 
 gcc/config/rs6000/tmmintrin.h | 12 
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ce1287edf782..32ad72b4cc35 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpord_pd (__m128d __A, __m128d __B)
 {
-#if _ARCH_PWR8
   __v2du c, d;
   /* Compare against self will return false (0's) if NAN.  */
   c = (__v2du)vec_cmpeq (__A, __A);
   d = (__v2du)vec_cmpeq (__B, __B);
-#else
-  __v2du a, b;
-  __v2du c, d;
-  const __v2du double_exp_mask  = {0x7ff0, 0x7ff0};
-  a = (__v2du)vec_abs ((__v2df)__A);
-  b = (__v2du)vec_abs ((__v2df)__B);
-  c = (__v2du)vec_cmpgt (double_exp_mask, a);
-  d = (__v2du)vec_cmpgt (double_exp_mask, b);
-#endif
   /* A != NAN and B != NAN.  */
   return ((__m128d)vec_and(c, d));
 }
@@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B)
   return ((__m64)a * (__m64)b);
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_mul_epu32 (__m128i __A, __m128i __B)
 {
@@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B)
   return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B);
 #endif
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_slli_epi16 (__m128i __A, int __B)
diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h
index eab712fdfa66..83dff1d85666 100644
--- a/gcc/config/rs6000/pmmintrin.h
+++ b/gcc/config/rs6000/pmmintrin.h
@@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y)
vec_mergel ((__v2df) __X, (__v2df)__Y));
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movehdup_ps (__m128 __X)
 {
   return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_moveldup_ps (__m128 __X)
 {
   return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loaddup_pd (double const *__P)
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index c04d2bb5b6d3..29719367e205 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -272,6 +272,7 @@ _mm_extract_ps (__m128 __X, const int __N)
   return ((__v4si)__X)[__N & 3];
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
 {
@@ -283,6 +284,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
   #endif
   return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
@@ -343,6 +345,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
   return (__m128d) __r;
 }
 
+#ifdef _ARCH_PWR8
 __inline __m128d
 

[PATCH v2 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-20 Thread Paul A. Clarke via Gcc-patches
Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible.

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.
---
v2:
- Replaced clever (and broken) exception masking with more straightforward
  implementation, per v1 review and closer inspection. mtfsf was only
  writing the final nybble (1) instead of the final two nybbles (2), so
  not all of the exception-enable bits were cleared.
- Renamed some variables from cryptic "tmp" and "save" to
  "fpscr_save" and "enables_save".
- Retained use of __builtin_mffsl, since that is supported pre-POWER8
  (with an alternate instruction sequence).
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Added some additional text to the commit message about some of the
  (unpleasant?) implementations and decorations coming from
  like implementations in gcc/config/i386, per v1 review.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Fixed indentation and other minor formatting changes, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 240 +++-
 .../gcc.target/powerpc/sse4_1-round3.h|  81 ++
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 +++
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 ++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 ++
 6 files changed, 962 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 3767a67eada7..a6b88d313ad0 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,182 @@
 #include 
 #include 
 
+/* Rounding mode macros. */
+#define _MM_FROUND_TO_NEAREST_INT   0x00
+#define _MM_FROUND_TO_ZERO  0x01
+#define _MM_FROUND_TO_POS_INF   0x02
+#define _MM_FROUND_TO_NEG_INF   0x03
+#define _MM_FROUND_CUR_DIRECTION0x04
+
+#define _MM_FROUND_NINT\
+  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_FLOOR   \
+  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_CEIL\
+  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_TRUNC   \
+  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_RINT\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_NEARBYINT   \
+  

[PATCH v2 4/6] rs6000: Support SSE4.1 "cvt" intrinsics

2021-08-20 Thread Paul A. Clarke via Gcc-patches
Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cvtepi8_epi16, _mm_cvtepi8_epi32, _mm_cvtepi8_epi64
- _mm_cvtepi16_epi32, _mm_cvtepi16_epi64
- _mm_cvtepi32_epi64,
- _mm_cvtepu8_epi16, _mm_cvtepu8_epi32, _mm_cvtepu8_epi64
- _mm_cvtepu16_epi32, _mm_cvtepu16_epi64
- _mm_cvtepu32_epi64

from gcc/testsuite/gcc.target/i386.

sse4_1-pmovsxbd.c, sse4_1-pmovsxbq.c, and sse4_1-pmovsxbw.c were
modified from using "char" types to "signed char" types, because
the default is unsigned on powerpc.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cvtepi8_epi16, _mm_cvtepi8_epi32,
_mm_cvtepi8_epi64, _mm_cvtepi16_epi32, _mm_cvtepi16_epi64,
_mm_cvtepi32_epi64, _mm_cvtepu8_epi16, _mm_cvtepu8_epi32,
_mm_cvtepu8_epi64, _mm_cvtepu16_epi32, _mm_cvtepu16_epi64,
_mm_cvtepu32_epi64): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmovsxbd.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-pmovsxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwq.c: Same.
---
v2:
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 138 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
 13 files changed, 648 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 363534cb06a2..fdef6674d16c 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -442,6 +442,144 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
   return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
 }
 
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi16 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v16qi)__A);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi32 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi)__A);
+  return (__m128i) vec_unpackh ((__v8hi)__A);
+}
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi)__A);
+  __A = (__m128i) vec_unpackh ((__v8hi)__A);
+  return (__m128i) vec_unpackh ((__v4si)__A);
+}
+#endif
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi32 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v8hi)__A);
+}
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v8hi)__A);
+  return (__m128i) vec_unpackh ((__v4si)__A);
+}
+#endif
+
+#ifdef 

[PATCH v2 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics

2021-08-20 Thread Paul A. Clarke via Gcc-patches
Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for _mm_min_epi8, _mm_min_epu16, _mm_min_epi32,
_mm_min_epu32, _mm_max_epi8, _mm_max_epu16, _mm_max_epi32, _mm_max_epu32
from gcc/testsuite/gcc.target/i386.

sse4_1-pmaxsb.c and sse4_1-pminsb.c were modified from using
"char" types to "signed char" types, because the default is unsigned on
powerpc.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_min_epi8, _mm_min_epu16,
_mm_min_epi32, _mm_min_epu32, _mm_max_epi8, _mm_max_epu16,
_mm_max_epi32, _mm_max_epu32): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmaxsb.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-pmaxsd.c: Same.
* gcc.target/powerpc/sse4_1-pmaxud.c: Same.
* gcc.target/powerpc/sse4_1-pmaxuw.c: Same.
* gcc.target/powerpc/sse4_1-pminsb.c: Same.
* gcc.target/powerpc/sse4_1-pminsd.c: Same.
* gcc.target/powerpc/sse4_1-pminud.c: Same.
* gcc.target/powerpc/sse4_1-pminuw.c: Same.
---
v2:
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 56 +++
 .../gcc.target/powerpc/sse4_1-pmaxsb.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pmaxsd.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pmaxud.c| 47 
 .../gcc.target/powerpc/sse4_1-pmaxuw.c| 47 
 .../gcc.target/powerpc/sse4_1-pminsb.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pminsd.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pminud.c| 47 
 .../gcc.target/powerpc/sse4_1-pminuw.c| 47 
 9 files changed, 428 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index a6b88d313ad0..505fe4ce22a8 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -408,6 +408,62 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v16qi)__X, (__v16qi)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v8hu)__X, (__v8hu)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4si)__X, (__v4si)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4su)__X, (__v4su)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v16qi)__X, (__v16qi)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v8hu)__X, (__v8hu)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4si)__X, (__v4si)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
+}
+
 /* Return horizontal packed word minimum and its index in bits [15:0]
and bits [18:16] respectively.  */
 __inline __m128i
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
new file mode 100644
index ..7a465b01dd11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif

[PATCH v2 0/6] rs6000: Support more SSE4.1 intrinsics

2021-08-20 Thread Paul A. Clarke via Gcc-patches
Tested ppc64le (POWER9) and ppc64/32 (POWER7).

OK for trunk?

Paul A. Clarke (6):
  rs6000: Support SSE4.1 "round" intrinsics
  rs6000: Support SSE4.1 "min" and "max" intrinsics
  rs6000: Simplify some SSE4.1 "test" intrinsics
  rs6000: Support SSE4.1 "cvt" intrinsics
  rs6000: Support more SSE4.1 "cmp", "mul", "pack" intrinsics
  rs6000: Guard some x86 intrinsics implementations

 gcc/config/rs6000/emmintrin.h |  12 +-
 gcc/config/rs6000/pmmintrin.h |   4 +
 gcc/config/rs6000/smmintrin.h | 427 --
 gcc/config/rs6000/tmmintrin.h |  12 +
 gcc/testsuite/gcc.target/powerpc/pr78102.c|  23 +
 .../gcc.target/powerpc/sse4_1-packusdw.c  |  73 +++
 .../gcc.target/powerpc/sse4_1-pcmpeqq.c   |  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxsb.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxsd.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxud.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pmaxuw.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pminsb.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pminsd.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pminud.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pminuw.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmuldq.c|  51 +++
 .../gcc.target/powerpc/sse4_1-pmulld.c|  46 ++
 .../gcc.target/powerpc/sse4_1-round3.h|  81 
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   |  46 ++
 35 files changed, 2349 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c

-- 
2.27.0



[PATCH v2 5/6] rs6000: Support more SSE4.1 "cmp", "mul", "pack" intrinsics

2021-08-20 Thread Paul A. Clarke via Gcc-patches
Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cmpeq_epi64, _mm_cmpgt_epi64
- _mm_mullo_epi32, _mm_mul_epi32
- _mm_packus_epi32

from gcc/testsuite/gcc.target/i386.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
_mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-packusdw.c: Same.
* gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
* gcc.target/powerpc/sse4_1-pmuldq.c: Same.
* gcc.target/powerpc/sse4_1-pmulld.c: Same.
* gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
---
v2:
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 41 +++
 gcc/testsuite/gcc.target/powerpc/pr78102.c| 23 ++
 .../gcc.target/powerpc/sse4_1-packusdw.c  | 73 +++
 .../gcc.target/powerpc/sse4_1-pcmpeqq.c   | 46 
 .../gcc.target/powerpc/sse4_1-pmuldq.c| 51 +
 .../gcc.target/powerpc/sse4_1-pmulld.c| 46 
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   | 46 
 7 files changed, 326 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index fdef6674d16c..c04d2bb5b6d3 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -386,6 +386,15 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
 
 #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_cmpeq ((__v2di)__X, (__v2di)__Y);
+}
+#endif
+
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_min_epi8 (__m128i __X, __m128i __Y)
@@ -444,6 +453,22 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
 
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mullo_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_mul ((__v4su)__X, (__v4su)__Y);
+}
+
+#ifdef _ARCH_PWR8
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_mule ((__v4si)__X, (__v4si)__Y);
+}
+#endif
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_cvtepi8_epi16 (__m128i __A)
 {
   return (__m128i) vec_unpackh ((__v16qi)__A);
@@ -607,4 +632,20 @@ _mm_minpos_epu16 (__m128i __A)
   return __r.__m;
 }
 
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_packus_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_packsu ((__v4si)__X, (__v4si)__Y);
+}
+
+#ifdef _ARCH_PWR8
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmpgt_epi64 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_cmpgt ((__v2di)__X, (__v2di)__Y);
+}
+#endif
+
 #endif
diff --git a/gcc/testsuite/gcc.target/powerpc/pr78102.c 
b/gcc/testsuite/gcc.target/powerpc/pr78102.c
new file mode 100644
index ..56a2d497bbff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mvsx" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#include 
+
+__m128i
+foo (const __m128i x, const __m128i y)
+{
+  return _mm_cmpeq_epi64 (x, y);
+}
+
+__v2di
+bar (const __v2di x, const __v2di y)
+{
+  return x == y;
+}
+
+__v2di
+baz (const __v2di x, const __v2di y)
+{
+  return x != y;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
new file mode 100644
index ..15b8ca418f54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
@@ -0,0 +1,73 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mvsx" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+#define NUM 64
+
+static unsigned short
+int_to_ushort (int iVal)
+{
+  unsigned short sVal;
+
+  if (iVal < 0)
+

[PATCH v2 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics

2021-08-20 Thread Paul A. Clarke via Gcc-patches
Copy some simple redirections from i386 , for:
- _mm_test_all_zeros
- _mm_test_all_ones
- _mm_test_mix_ones_zeros

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_test_all_zeros,
_mm_test_all_ones, _mm_test_mix_ones_zeros): Replace.
---
v2:
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/smmintrin.h | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 505fe4ce22a8..363534cb06a2 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -379,34 +379,12 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
   return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
 }
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
-}
+#define _mm_test_all_zeros(M, V) _mm_testz_si128 ((M), (V))
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_ones (__m128i __A)
-{
-  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
-  return vec_all_eq ((__v16qu) __A, __ones);
-}
+#define _mm_test_all_ones(V) \
+  _mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V)))
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
-  const int any_ones = vec_any_ne (__Amasked, __zero);
-  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
-  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
-  const int any_zeros = vec_any_ne (__notAmasked, __zero);
-  return any_ones * any_zeros;
-}
+#define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-- 
2.27.0



Re: [PATCH 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-19 Thread Paul A. Clarke via Gcc-patches
On Wed, Aug 18, 2021 at 05:46:58PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 09, 2021 at 03:23:50PM -0500, Paul A. Clarke wrote:
> > Suppress exceptions (when specified), by saving, manipulating, and
> > restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> > rounding mode when required.
> > 
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible.
> 
> There are __builtin_set_fpscr_rn and friends, please use those, those
> are optimised for any platform.

I do.  (Unless I missed an opportunity somewhere?)

The "optimize" comment refers to, for example, not checking the current
rounding mode before setting and restoring it.

> > * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > Convert from function to macro.
> 
> Please explain why you regress this (not in the changelog of course).

I'm not sure what "regress" means here?

I should've said that these are now identical implementations to those
found in config/i386/smmintrin.h.  I'll add that to the commit message
in v2.

> > +/* Rounding mode macros. */
> > +#define _MM_FROUND_TO_NEAREST_INT   0x00
> > +#define _MM_FROUND_TO_ZERO  0x01
> > +#define _MM_FROUND_TO_POS_INF   0x02
> > +#define _MM_FROUND_TO_NEG_INF   0x03
> > +#define _MM_FROUND_CUR_DIRECTION0x04
> 
> You can just write "0" .. "4", heh.

Copied from config/i386/smmintrin.h.

> > +#define _MM_FROUND_NINT\
> > +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_FLOOR   \
> > +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_CEIL\
> > +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_TRUNC   \
> > +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_RINT\
> > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_NEARBYINT   \
> > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> 
> All these macro definitions will comfortably fit on one line.

Copied from config/i386/smmintrin.h.

> > +__inline __m128d
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_pd (__m128d __A, int __rounding)
> > +{
> 
> Non-static inline is not what you want, esp. with gnu-inline?  Or, what
> is the goal, and why can you not do it with modern inline?

This is the same basic signature as the other 600+ intrinsics.
Actually, they were all described as "extern", but in a previous
review, you said:
> "extern" on definitions is superfluous
So, I've dropped that for newer ones.
Should they all instead be "static"?

The goal is to be compatible with the i386 implementations.
Those typically use something like:

  extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

(which kinda makes me want to put "extern" back, now that I think
about it).

I'm not sure what you mean by "modern inline".

> > +  __v2df __r;
> > +  union {
> > +double __fr;
> > +long long __fpscr;
> > +  } __save, __tmp;
> > +
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +  {
> 
> Wrong indent.  This code is very hard to read because of that.

OK, will fix in v2.

> If you figure that gee, it would be a nice if we had a builtin for
> mffsce, then please make one?  :-)

Is one use-case sufficient grounds?  I can give it a shot if so.

> > +case _MM_FROUND_TO_NEAREST_INT:
> > +  __tmp.__fr = __builtin_mffsl ();
> > +  __attribute__((fallthrough));
> 
> Space before (.

OK

> > +case _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC:
> 
> Space after |.

OK

> Please fix these things and resend.

Will do.  Thanks!

PC


[PATCH 5/6] rs6000: Support more SSE4.1 "cmp", "mul", "pack" intrinsics

2021-08-09 Thread Paul A. Clarke via Gcc-patches
Also, copy tests for:
- _mm_cmpeq_epi64, _mm_cmpgt_epi64
- _mm_mullo_epi32, _mm_mul_epi32
- _mm_packus_epi32

from gcc/testsuite/gcc.target/i386.

2021-08-09  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
_mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-packusdw.c: Same.
* gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
* gcc.target/powerpc/sse4_1-pmuldq.c: Same.
* gcc.target/powerpc/sse4_1-pmulld.c: Same.
* gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
---
 gcc/config/rs6000/smmintrin.h | 41 +++
 gcc/testsuite/gcc.target/powerpc/pr78102.c| 23 ++
 .../gcc.target/powerpc/sse4_1-packusdw.c  | 73 +++
 .../gcc.target/powerpc/sse4_1-pcmpeqq.c   | 46 
 .../gcc.target/powerpc/sse4_1-pmuldq.c| 51 +
 .../gcc.target/powerpc/sse4_1-pmulld.c| 46 
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   | 46 
 7 files changed, 326 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 7f6ff7baff50..8d6ae98c7ce3 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -392,6 +392,15 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
 
 #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
+#ifdef _ARCH_PWR8
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_cmpeq ((__v2di)__X, (__v2di)__Y);
+}
+#endif
+
 __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_min_epi8 (__m128i __X, __m128i __Y)
@@ -448,6 +457,22 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
   return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
 }
 
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mullo_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_mul ((__v4su)__X, (__v4su)__Y);
+}
+
+#ifdef _ARCH_PWR8
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mul_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_mule ((__v4si)__X, (__v4si)__Y);
+}
+#endif
+
 __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_cvtepi8_epi16 (__m128i __A)
@@ -611,4 +636,20 @@ _mm_minpos_epu16 (__m128i __A)
   return __r.__m;
 }
 
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_packus_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_packsu ((__v4si)__X, (__v4si)__Y);
+}
+
+#ifdef _ARCH_PWR8
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmpgt_epi64 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_cmpgt ((__v2di)__X, (__v2di)__Y);
+}
+#endif
+
 #endif
diff --git a/gcc/testsuite/gcc.target/powerpc/pr78102.c 
b/gcc/testsuite/gcc.target/powerpc/pr78102.c
new file mode 100644
index ..a9db140f7335
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mvsx -Wno-psabi" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#include 
+
+__m128i
+foo (const __m128i x, const __m128i y)
+{
+  return _mm_cmpeq_epi64 (x, y);
+}
+
+__v2di
+bar (const __v2di x, const __v2di y)
+{
+  return x == y;
+}
+
+__v2di
+baz (const __v2di x, const __v2di y)
+{
+  return x != y;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
new file mode 100644
index ..2438a755cbe9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
@@ -0,0 +1,73 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mvsx -Wno-psabi" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+#define NUM 64
+
+static unsigned short
+int_to_ushort (int iVal)
+{
+  unsigned short sVal;
+
+  if (iVal < 0)
+sVal = 0;
+  else if (iVal > 0x)
+sVal = 0x;
+  else sVal = iVal;
+
+  return sVal;
+}
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x[NUM / 4];
+  int i[NUM];
+} src1, src2;
+  union
+{
+  __m128i x[NUM / 4];
+  unsigned short s[NUM * 2];
+} 

[PATCH 4/6] rs6000: Support SSE4.1 "cvt" intrinsics

2021-08-09 Thread Paul A. Clarke via Gcc-patches
Also, copy tests for:
- _mm_cvtepi8_epi16, _mm_cvtepi8_epi32, _mm_cvtepi8_epi64
- _mm_cvtepi16_epi32, _mm_cvtepi16_epi64
- _mm_cvtepi32_epi64,
- _mm_cvtepu8_epi16, _mm_cvtepu8_epi32, _mm_cvtepu8_epi64
- _mm_cvtepu16_epi32, _mm_cvtepu16_epi64
- _mm_cvtepu32_epi64

from gcc/testsuite/gcc.target/i386.

sse4_1-pmovsxbd.c, sse4_1-pmovsxbq.c, and sse4_1-pmovsxbw.c were
modified from using "char" types to "signed char" types, because
the default is unsigned on powerpc.

2021-08-09  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cvtepi8_epi16, _mm_cvtepi8_epi32,
_mm_cvtepi8_epi64, _mm_cvtepi16_epi32, _mm_cvtepi16_epi64,
_mm_cvtepi32_epi64, _mm_cvtepu8_epi16, _mm_cvtepu8_epi32,
_mm_cvtepu8_epi64, _mm_cvtepu16_epi32, _mm_cvtepu16_epi64,
_mm_cvtepu32_epi64): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmovsxbd.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-pmovsxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwq.c: Same.
---
 gcc/config/rs6000/smmintrin.h | 136 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
 13 files changed, 646 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 5d345e3fd56b..7f6ff7baff50 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -448,6 +448,142 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
   return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
 }
 
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi16 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v16qi)__A);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi32 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi)__A);
+  return (__m128i) vec_unpackh ((__v8hi)__A);
+}
+
+#ifdef _ARCH_PWR8
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi)__A);
+  __A = (__m128i) vec_unpackh ((__v8hi)__A);
+  return (__m128i) vec_unpackh ((__v4si)__A);
+}
+#endif
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi32 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v8hi)__A);
+}
+
+#ifdef _ARCH_PWR8
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v8hi)__A);
+  return (__m128i) vec_unpackh ((__v4si)__A);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi32_epi64 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v4si)__A);
+}
+#endif
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepu8_epi16 (__m128i __A)
+{
+  const __v16qu __zero = {0};
+#ifdef 

[PATCH 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics

2021-08-09 Thread Paul A. Clarke via Gcc-patches
Copy some simple redirections from i386 , for:
- _mm_test_all_zeros
- _mm_test_all_ones
- _mm_test_mix_ones_zeros

2021-08-09  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_test_all_zeros,
_mm_test_all_ones, _mm_test_mix_ones_zeros): Replace.
---
 gcc/config/rs6000/smmintrin.h | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index f7f03d8d7782..5d345e3fd56b 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -385,34 +385,12 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
   return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
 }
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
-}
+#define _mm_test_all_zeros(M, V) _mm_testz_si128 ((M), (V))
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_ones (__m128i __A)
-{
-  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
-  return vec_all_eq ((__v16qu) __A, __ones);
-}
+#define _mm_test_all_ones(V) \
+  _mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V)))
 
-__inline int
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
-  const int any_ones = vec_any_ne (__Amasked, __zero);
-  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
-  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
-  const int any_zeros = vec_any_ne (__notAmasked, __zero);
-  return any_ones * any_zeros;
-}
+#define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
 __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-- 
2.27.0



[PATCH 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-09 Thread Paul A. Clarke via Gcc-patches
Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible.

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-08-09  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.
---
 gcc/config/rs6000/smmintrin.h | 246 -
 .../gcc.target/powerpc/sse4_1-round3.h|  81 ++
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 +++
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 ++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 ++
 6 files changed, 968 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 3767a67eada7..862e78ac7d60 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,188 @@
 #include 
 #include 
 
+/* Rounding mode macros. */
+#define _MM_FROUND_TO_NEAREST_INT   0x00
+#define _MM_FROUND_TO_ZERO  0x01
+#define _MM_FROUND_TO_POS_INF   0x02
+#define _MM_FROUND_TO_NEG_INF   0x03
+#define _MM_FROUND_CUR_DIRECTION0x04
+
+#define _MM_FROUND_NINT\
+  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_FLOOR   \
+  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_CEIL\
+  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_TRUNC   \
+  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_RINT\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_NEARBYINT   \
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
+
+#define _MM_FROUND_RAISE_EXC0x00
+#define _MM_FROUND_NO_EXC   0x08
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_pd (__m128d __A, int __rounding)
+{
+  __v2df __r;
+  union {
+double __fr;
+long long __fpscr;
+  } __save, __tmp;
+
+  if (__rounding & _MM_FROUND_NO_EXC)
+  {
+/* Save enabled exceptions, and disable all exceptions.
+   Pre-POWER9, mffsce decodes to mffs, requiring the additional
+   mtfsf, below, to disable exceptions.  */
+__asm__ __volatile__ (
+  ".machine push; .machine \"power9\"; mffsce %0; .machine pop"
+  : "=f" (__save.__fr));
+__save.__fpscr &= 0xf8;
+__tmp.__fpscr = __save.__fpscr;
+#ifndef _ARCH_PWR9
+__tmp.__fpscr &= ~0xf8;
+__builtin_mtfsf (0x01, __tmp.__fr);
+#endif
+  }
+
+  switch (__rounding)
+  {
+case _MM_FROUND_TO_NEAREST_INT:
+  __tmp.__fr = __builtin_mffsl ();
+  __attribute__((fallthrough));
+case 

[PATCH 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics

2021-08-09 Thread Paul A. Clarke via Gcc-patches
Also, copy tests for _mm_min_epi8, _mm_min_epu16, _mm_min_epi32,
_mm_min_epu32, _mm_max_epi8, _mm_max_epu16, _mm_max_epi32, _mm_max_epu32
from gcc/testsuite/gcc.target/i386.

sse4_1-pmaxsb.c and sse4_1-pminsb.c were modified from using
"char" types to "signed char" types, because the default is unsigned on
powerpc.

2021-08-09  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_min_epi8, _mm_min_epu16,
_mm_min_epi32, _mm_min_epu32, _mm_max_epi8, _mm_max_epu16,
_mm_max_epi32, _mm_max_epu32): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmaxsb.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-pmaxsd.c: Same.
* gcc.target/powerpc/sse4_1-pmaxud.c: Same.
* gcc.target/powerpc/sse4_1-pmaxuw.c: Same.
* gcc.target/powerpc/sse4_1-pminsb.c: Same.
* gcc.target/powerpc/sse4_1-pminsd.c: Same.
* gcc.target/powerpc/sse4_1-pminud.c: Same.
* gcc.target/powerpc/sse4_1-pminuw.c: Same.
---
 gcc/config/rs6000/smmintrin.h | 56 +++
 .../gcc.target/powerpc/sse4_1-pmaxsb.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pmaxsd.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pmaxud.c| 47 
 .../gcc.target/powerpc/sse4_1-pmaxuw.c| 47 
 .../gcc.target/powerpc/sse4_1-pminsb.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pminsd.c| 46 +++
 .../gcc.target/powerpc/sse4_1-pminud.c| 47 
 .../gcc.target/powerpc/sse4_1-pminuw.c| 47 
 9 files changed, 428 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 862e78ac7d60..f7f03d8d7782 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -414,6 +414,62 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v16qi)__X, (__v16qi)__Y);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v8hu)__X, (__v8hu)__Y);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4si)__X, (__v4si)__Y);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4su)__X, (__v4su)__Y);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v16qi)__X, (__v16qi)__Y);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v8hu)__X, (__v8hu)__Y);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4si)__X, (__v4si)__Y);
+}
+
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
+}
+
 /* Return horizontal packed word minimum and its index in bits [15:0]
and bits [18:16] respectively.  */
 __inline __m128i
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
new file mode 100644
index ..24a74da309b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+#define NUM 1024
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x[NUM / 16];
+  signed char i[NUM];
+} dst, src1, src2;
+  int i, sign = 1;
+  signed char max;
+
+  for (i = 0; i < NUM; i++)
+{
+  src1.i[i] = i * i * sign;
+  src2.i[i] = (i + 20) * sign;
+  sign = -sign;

[PATCH 0/6] rs6000: Support more SSE4.1 intrinsics

2021-08-09 Thread Paul A. Clarke via Gcc-patches
Paul A. Clarke (6):
  rs6000: Support SSE4.1 "round" intrinsics
  rs6000: Support SSE4.1 "min" and "max" intrinsics
  rs6000: Simplify some SSE4.1 "test" intrinsics
  rs6000: Support SSE4.1 "cvt" intrinsics
  rs6000: Support more SSE4.1 "cmp", "mul", "pack" intrinsics
  rs6000: Guard some x86 intrinsics implementations

 gcc/config/rs6000/emmintrin.h |  12 +-
 gcc/config/rs6000/pmmintrin.h |   2 +
 gcc/config/rs6000/smmintrin.h | 431 --
 gcc/config/rs6000/tmmintrin.h |   2 +
 gcc/testsuite/gcc.target/powerpc/pr78102.c|  23 +
 .../gcc.target/powerpc/sse4_1-packusdw.c  |  73 +++
 .../gcc.target/powerpc/sse4_1-pcmpeqq.c   |  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxsb.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxsd.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pmaxud.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pmaxuw.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pminsb.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pminsd.c|  46 ++
 .../gcc.target/powerpc/sse4_1-pminud.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pminuw.c|  47 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
 .../gcc.target/powerpc/sse4_1-pmuldq.c|  51 +++
 .../gcc.target/powerpc/sse4_1-pmulld.c|  46 ++
 .../gcc.target/powerpc/sse4_1-round3.h|  81 
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   |  46 ++
 35 files changed, 2341 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c

-- 
2.27.0



[PATCH 6/6] rs6000: Guard some x86 intrinsics implementations

2021-08-09 Thread Paul A. Clarke via Gcc-patches
Some compatibility implementations of x86 intrinsics include
Power intrinsics which require POWER8.  Guard them.

emmintrin.h:
- _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
  but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
  The "POWER8" version works fine on pre-POWER8.
- _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
pmmintrin.h:
- _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
- _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
smmintrin.h:
- _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
- _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
- _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
tmmintrin.h:
- _mm_sign_epi8: vec_neg(v4si) uses vsububm.
- _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
- _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
  Note that the above three could actually be supported pre-POWER8,
  but current GCC does not support them before POWER8.
- _mm_sign_pi8: depends on _mm_sign_epi8.
- _mm_sign_pi16: depends on _mm_sign_epi16.
- _mm_sign_pi32: depends on _mm_sign_epi32.

2021-08-09  Paul A. Clarke  

gcc
* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
* config/rs6000/pmmintrin.h: Same.
* config/rs6000/smmintrin.h: Same.
* config/rs6000/tmmintrin.h: Same.
---
 gcc/config/rs6000/emmintrin.h | 12 ++--
 gcc/config/rs6000/pmmintrin.h |  2 ++
 gcc/config/rs6000/smmintrin.h |  4 
 gcc/config/rs6000/tmmintrin.h |  2 ++
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ce1287edf782..32ad72b4cc35 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpord_pd (__m128d __A, __m128d __B)
 {
-#if _ARCH_PWR8
   __v2du c, d;
   /* Compare against self will return false (0's) if NAN.  */
   c = (__v2du)vec_cmpeq (__A, __A);
   d = (__v2du)vec_cmpeq (__B, __B);
-#else
-  __v2du a, b;
-  __v2du c, d;
-  const __v2du double_exp_mask  = {0x7ff0, 0x7ff0};
-  a = (__v2du)vec_abs ((__v2df)__A);
-  b = (__v2du)vec_abs ((__v2df)__B);
-  c = (__v2du)vec_cmpgt (double_exp_mask, a);
-  d = (__v2du)vec_cmpgt (double_exp_mask, b);
-#endif
   /* A != NAN and B != NAN.  */
   return ((__m128d)vec_and(c, d));
 }
@@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B)
   return ((__m64)a * (__m64)b);
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_mul_epu32 (__m128i __A, __m128i __B)
 {
@@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B)
   return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B);
 #endif
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_slli_epi16 (__m128i __A, int __B)
diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h
index eab712fdfa66..d5da1a6daa34 100644
--- a/gcc/config/rs6000/pmmintrin.h
+++ b/gcc/config/rs6000/pmmintrin.h
@@ -123,6 +123,7 @@ _mm_hsub_pd (__m128d __X, __m128d __Y)
vec_mergel ((__v2df) __X, (__v2df)__Y));
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movehdup_ps (__m128 __X)
 {
@@ -134,6 +135,7 @@ _mm_moveldup_ps (__m128 __X)
 {
   return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loaddup_pd (double const *__P)
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 8d6ae98c7ce3..d2ba5f11de2e 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -278,6 +278,7 @@ _mm_extract_ps (__m128 __X, const int __N)
   return ((__v4si)__X)[__N & 3];
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
 {
@@ -289,6 +290,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
   #endif
   return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
@@ -349,6 +351,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
   return (__m128d) __r;
 }
 
+#ifdef _ARCH_PWR8
 __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
@@ -357,6 +360,7 @@ _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
   const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
   return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
 }
+#endif
 
 __inline int
 __attribute__ 

Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16

2021-08-03 Thread Paul A. Clarke via Gcc-patches
On Mon, Aug 02, 2021 at 05:29:08PM -0500, Segher Boessenkool wrote:
> On Thu, Jul 15, 2021 at 06:29:17PM -0500, Paul A. Clarke wrote:
> > Add a naive implementation of the subject x86 intrinsic to
> > ease porting.
> 
> > --- a/gcc/config/rs6000/smmintrin.h
> > +++ b/gcc/config/rs6000/smmintrin.h
> > @@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
> >return any_ones * any_zeros;
> >  }
> >  
> > +/* Return horizontal packed word minimum and its index in bits [15:0]
> > +   and bits [18:16] respectively.  */
> > +__inline __m128i
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_minpos_epu16 (__m128i __A)
> > +{
> > +  union __u
> > +{
> > +  __m128i __m;
> > +  __v8hu __uh;
> > +};
> > +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> > +  unsigned short __ridx = 0;
> > +  unsigned short __rmin = __u.__uh[__ridx];
> > +  for (unsigned long __i = 1; __i < 8; __i++)
> > +{
> > +  if (__u.__uh[__i] < __rmin)
> > +   {
> > + __rmin = __u.__uh[__i];
> > + __ridx = __i;
> > +   }
> > +}
> > +  __r.__uh[0] = __rmin;
> > +  __r.__uh[1] = __ridx;
> > +  return __r.__m;
> > +}
> 
> As before: does this work correctly on BE?  Was it tested there?

Per the "cover letter":
| Tested on BE, LE (32 and 64bit).

> Okay for trunk if so.  Thanks!

Thanks! I'll push this shortly.

PC


Re: [PATCH v2 4/6] rs6000: Add tests for SSE4.1 "ceil" intrinsics

2021-07-30 Thread Paul A. Clarke via Gcc-patches
On Wed, Jul 28, 2021 at 05:16:32PM -0500, Segher Boessenkool wrote:
> On Fri, Jul 16, 2021 at 08:50:20AM -0500, Paul A. Clarke wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
> > @@ -0,0 +1,27 @@
> > +#include 
> > +#include 
> > +#include "sse4_1-check.h"
> > +
> > +#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
> 
> Pet peeve: sizeof is an operator, not a function, so even if you want to
> protect the macro parameter this just is
>   #define DIM(a) (sizeof (a) / sizeof (a)[0])
> 
> > +  (void) fesetround (round_save);
> 
> Please don't cast to (void).  That never does *anything*.
> 
> Okay for trunk (these are all testsuite files after all, and we should
> test horrrible style as well! :-P )

I didn't want to be responsible for promulgating horrible style, so
I incorporated the above changes and pushed as
d656a3d3ce88d402a14e8c120f1b0e78a3979deb.  :-)

PC


Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16

2021-07-29 Thread Paul A. Clarke via Gcc-patches
On Tue, Jul 27, 2021 at 10:29:13PM -0400, David Edelsohn via Gcc-patches wrote:
> > Add a naive implementation of the subject x86 intrinsic to
> > ease porting.
> >
> > 2021-07-15  Paul A. Clarke  
> >
> > gcc
> > * config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
> 
> Segher already approved this with the changes requested.

Segher said:
| This does not compute the index correctly for big endian (it needs to
| walk from right to left for that).  The construction of the return value
| looks wrong as well.
| 
| Okay for trunk with that fixed.  Thanks!

I responded:
| I'm not seeing the issue here. The values are numbered by element order,
| and the results are in the "first" (minimum value) and "second" (index of
| first encountered minimum value in element order) elements of the result.

I did not get a response, nor did I change any code. It feels like a stretch
to equate the above exchange to "approved", so I'll continue to wait for
explicit approval.

PC


[PATCH v2 6/6] rs6000: Add tests for SSE4.1 "floor" intrinsics

2021-07-16 Thread Paul A. Clarke via Gcc-patches
Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
These are modelled after (and depend upon parts of) the tests for
_mm_ceil intrinsics, recently posted.

Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.

2021-07-16  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/sse4_1-floorpd.c: New.
* gcc.target/powerpc/sse4_1-floorps.c: New.
* gcc.target/powerpc/sse4_1-floorsd.c: New.
* gcc.target/powerpc/sse4_1-floorss.c: New.
* gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from
gcc/testsuite/gcc.target/i386.
---
v2: Improve formatting per review from Bill.

 .../gcc.target/powerpc/sse4_1-floorpd.c   |  51 
 .../gcc.target/powerpc/sse4_1-floorps.c   |  41 ++
 .../gcc.target/powerpc/sse4_1-floorsd.c   | 119 ++
 .../gcc.target/powerpc/sse4_1-floorss.c   |  95 ++
 .../gcc.target/powerpc/sse4_1-roundpd-2.c |  36 ++
 5 files changed, 342 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
new file mode 100644
index ..ad21644f50c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_floor_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  0.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  0.0,  0.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.cp+50 } },
+  { { .f = {  0x1.ep+50,  0x1.0p+51 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.2p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.ep+51 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+52 } },
+   { -0x1.0p+52, -0x1.ep+52 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.4p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.2p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.0p+51, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0, -1.0 } },
+  { { .f = { -0.50, -0.25 } }, { -1.0, -1.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
new file mode 100644
index ..a53ef9aa9e8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, mode) _mm_floor_ps (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  0.0,  0.0,  0.0 } },
+
+  { { .f = {  0x1.f8p+21,  0x1.fap+21,
+ 0x1.fcp+21,  0x1.fep+21 } },
+   {  0x1.f8p+21,  0x1.f8p+21,
+ 0x1.f8p+21,  0x1.f8p+21 } },
+
+  { { .f = {  0x1.fap+22,  0x1.fcp+22,
+ 0x1.fep+22,  0x1.fep+23 } },
+   {  0x1.f8p+22,  0x1.fcp+22,
+ 0x1.fcp+22,  0x1.fep+23 } },
+
+  { { .f = { -0x1.fep+23, -0x1.fep+22,
+-0x1.fcp+22, -0x1.fap+22 } },
+   { 

  1   2   >