Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-18 Thread will schmidt via Gcc-patches
On Mon, 2022-10-17 at 13:08 -0500, Segher Boessenkool wrote:
> On Mon, Sep 19, 2022 at 11:13:20AM -0500, will schmidt wrote:
> >   The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
> > and can be disabled by dependent options when it should not be.
> > This manifests in the issue seen in PR101865 where -mno-vsx
> > mistakenly disables _ARCH_PWR8.
> > This change replaces the relevant TARGET_DIRECT_MOVE references
> > with a TARGET_POWER8 entry so that the direct_move and power8
> > features can be enabled or disabled independently.
> 
> We should get rid of TARGET_DIRECT_MOVE altogether.  Please see
> 57f108f5a1e1:
> rs6000: Disable -m[no-]direct-move (PR85293)
> 
> The -mno-direct-move option causes a lot of problems, since it
> forces
> us to be able to generate code for p8 and up with some crucial
> instructions missing.  This patch removes the -m[no-]direct-move
> options so that the user cannot put us into this unexpected
> situation
> anymore.  Internally we still have all the same flags, and they
> are
> automatically set based on -mcpu; getting rid of that is a lot
> more
> work and will have to wait for GCC 9 (in some places the flag is
> used
> to see if we are compiling for a p8 _at all_).
> 
> It did not happen in GCC 9 obviously.  Do you want to take a
> shot?  It
> doesn't have to be all at once, it's probably best if not even -- as
> I
> wrote in the commit message, the flag always was used to mean
> different
> things.

As long as it's OK to be removed, I'll certainly take a shot at it. 
With that in mind that may simplify things for me here.
I expect that
anything currently guarded by DIRECT_MOVE should instead be guarded by
POWER8.


> 
> > The existing (and rather lengthy) commentary for DIRECT_MOVE
> > remains
> > in place in rs6000-c.cc:rs6000_target_modify_macros().  The
> > if-defined logic there will now set a __DIRECT_MOVE__ define when
> > TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
> > purposes, but is otherwise unused.  This can be removed in a
> > subsequent patch, or in an update of this patch, depending on
> > feedback.
> 
> There should be no such macro, for the same reason there should be no
> -mdirect-move option: it is so very essential to all code we
> generate,
> it *always* is enabled if we have P8 or later.

fair enough.

> 
> > gcc/
> > PR Target/101865
> > * config/rs6000/rs6000-builtin.cc
> > (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
> > usage with TARGET_POWER8.
> 
> Please don't arbitrarily wrap lines.  It is harder to read, and it
> looks
> like something is missing.

> 
> > * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
> > Add OPTION_MASK_POWER8 entry.
> 
> Especially in cases like this, where it looks like you forgot to
> write
> something after the colon.
> 
> > @@ -24046,10 +24045,11 @@ static struct rs6000_opt_mask const
> > rs6000_opt_masks[] =
> >{ "block-ops-vector-pair",   OPTION_MASK_BLOCK_OPS_VECTOR_PA
> > IR,
> > false,
> > true  },
> >{ "cmpb",OPTION_MASK_CMPB,   fal
> > se, true  },
> >{ "crypto",  OPTION_MASK_CRYPTO, fal
> > se, true  },
> >{ "direct-move", OPTION_MASK_DIRECT_MOVE,false,
> > true  },
> > +  { "power8",  OPTION_MASK_POWER8, fal
> > se, true  },
> 
> Why would we want a #pragma power8 ?

Hmm, thinko on my part, i'll reevaluate.


> 
> > --- a/gcc/config/rs6000/rs6000.opt
> > +++ b/gcc/config/rs6000/rs6000.opt
> > @@ -490,10 +490,15 @@ mcrypto
> >  Target Mask(CRYPTO) Var(rs6000_isa_flags)
> >  Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2
> > instructions.
> >  
> >  mdirect-move
> >  Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags)
> > WarnRemoved
> > +Enable direct move (ISA 2.07).
> 
> It is undocumented and should remain that, except eventually we
> should
> remove it completely (but leave some stubs so that code in the wild
> keeps compiling).
> 
> > +mpower8
> > +Target Mask(POWER8) Var(rs6000_isa_flags)
> > +Use instructions added in ISA 2.07 (power8).
> 
> There should not be such an option.  It is set by -mcpu=power8 and
> later, but can never be enabled or disabled direfctly by the user.

OK.


Thanks for the detailed review.  :-)
-Will


> 
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -3407,11 +3407,11 @@ (define_insn "vsx_extract_"
> >if (element == VECTOR_ELEMENT_SCALAR_64BIT)
> >  {
> >if (op0_regno == op1_regno)
> > return ASM_COMMENT_START " vec_extract to same register";
> >  
> > -  else if (INT_REGNO_P (op0_regno) && TARGET_DIRECT_MOVE
> > +  else if (INT_REGNO_P (op0_regno) && TARGET_POWER8
> >&& TARGET_POWERPC64)
> 
> That fits on one line now.
> 
> Thanks,
> 
> 
> Segher



Re: [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-10-17 Thread will schmidt via Gcc-patches
On Mon, 2022-10-17 at 10:32 -0500, Segher Boessenkool wrote:
> Hi!
> 
> Everything Ke Wen said.  Some more commments / hints:

Thanks for the reviews. :-)

I'll rework things and repost 'soon'.

Thanks
-WIll



Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-13 Thread will schmidt via Gcc-patches


Ping.

On Mon, 2022-09-19 at 11:13 -0500, will schmidt wrote:
> [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865]
> 
> Hi,
>   The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
> and can be disabled by dependent options when it should not be.
> This manifests in the issue seen in PR101865 where -mno-vsx
> mistakenly disables _ARCH_PWR8.
> 
> This change replaces the relevant TARGET_DIRECT_MOVE references
> with a TARGET_POWER8 entry so that the direct_move and power8
> features can be enabled or disabled independently.
> 
> This is done via the OPTION_MASK definitions, so this
> means that some references to the OPTION_MASK_DIRECT_MOVE
> option are now replaced with OPTION_MASK_POWER8.
> 
> The existing (and rather lengthy) commentary for DIRECT_MOVE remains
> in place in rs6000-c.cc:rs6000_target_modify_macros().  The
> if-defined logic there will now set a __DIRECT_MOVE__ define when
> TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
> purposes, but is otherwise unused.  This can be removed in a
> subsequent patch, or in an update of this patch, depending on feedback.
> 
> This regests cleanly (power8,power9,power10), and resolves
> PR 101865 as represented in the tests from (1/2).
> 
> OK for trunk?
> Thanks,
> -Will
> 
> 
> gcc/
>   PR Target/101865
>   * config/rs6000/rs6000-builtin.cc
>   (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
>   usage with TARGET_POWER8.
>   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros):
>   Add __DIRECT_MOVE__ define.  Replace _ARCH_PWR8_ define
>   conditional with OPTION_MASK_POWER8.
>   * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
>   Add OPTION_MASK_POWER8 entry.
>   (POWERPC_MASKS): Same.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8.
>   (rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8.
>   * config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8.
>   * config/rs6000/vsx.md (vsx_extract_): Replace
>   TARGET_DIRECT_MOVE usage with TARGET_POWER8.
>   (define_peephole2): Same.
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 3ce729c1e6de..91a0f39bd796 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
> fncode)
>  case ENB_P7:
>return TARGET_POPCNTD;
>  case ENB_P7_64:
>return TARGET_POPCNTD && TARGET_POWERPC64;
>  case ENB_P8:
> -  return TARGET_DIRECT_MOVE;
> +  return TARGET_POWER8;
>  case ENB_P8V:
>return TARGET_P8_VECTOR;
>  case ENB_P9:
>return TARGET_MODULO;
>  case ENB_P9_64:
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index ca9cc42028f7..41d51b039061 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, 
> HOST_WIDE_INT flags)
>   turned off in any of the following conditions:
>   1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
>   disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
>   enabled.
>   2. TARGET_VSX is off.  */
> -  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
> +  if ((OPTION_MASK_DIRECT_MOVE) != 0)
> +rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__");
> +  if ((flags & OPTION_MASK_POWER8) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
>if ((flags & OPTION_MASK_MODULO) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
>if ((flags & OPTION_MASK_POWER10) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index c3825bcccd84..c873f6d58989 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -48,10 +48,11 @@
> system.  */
>  #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER   \
>| OPTION_MASK_P8_VECTOR\
>| OPTION_MASK_CRYPTO   \
>| OPTION_MASK_DIRECT_MOVE  \
> +  | OPTION_MASK_POWER8   \
>| OPTION_

[PATCH, rs6000] Fix addg6s builtin with long long parameters. (PR100693)

2022-10-06 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Fix addg6s builtin with long long parameters. (PR100693)

Hi,
  As reported in PR 100693, attempts to use __builtin_addg6s
with long long arguments result in truncated results.

Since the int and long long types can be coerced into each other,
(documented further near the rs6000-c.cc change), this is handled
by adding a builtin overload (ADDG6S_OV), and the addition of some
special handling in altivec_resolve_overloaded_builtin() to map
the calls to addg6s_32 or addg6s_64; similar to how the SCAL_CMPB
builtins are currently handled.

This has sniff-tested cleanly.

I'm seeing a regression failure show up in
testsuite/g++.dg/modules/adl-3*.c; which seems entirely unrelated
to the content in this change.  I'm poking at that a bit more to
see if I can tell the what/why for that.

OK for trunk?

Thanks,
-Will

gcc/
PR target/100693

* config/rs6000/rs6000-builtins.def ([POWER7]): Replace bif-name
__builtin_addg6s with bif-name __builtin_addg6s_32.
([POWER7-64]): New bif-name __builtin_addg6s_64.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Add handler mapping RS6000_OVLD_ADDG6S_OV to RS6000_BIF_ADDG6S
and RS6000_BIF_ADDG6S_32.
* config/rs6000/rs6000-overload.def (ADDG6S_OV): Add overloaded
entry __builtin_addg6s mapped to ADDG6S_32 and ADDG6S.
* config/rs6000/rs6000.md ("addg6s", UNSPEC_ADDG6S): Replace with
("addg6s3") and rework.
* doc/extend.texi (__builtin_addg6s): Add documentation for
__builtin_addg6s with unsigned long long parameters.

gcc/testsuite/
* testsuite/gcc.target/powerpc/pr100693-compile.c: New.
* testsuite/gcc.target/powerpc/pr100693-run.c: New.

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f76f54793d73..11050e4c26d5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2010,12 +2010,13 @@
 XXSPLTD_V2DI vsx_xxspltd_v2di {}
 
 
 ; Power7 builtins (ISA 2.06).
 [power7]
-  const unsigned int __builtin_addg6s (unsigned int, unsigned int);
-ADDG6S addg6s {}
+
+  const unsigned int __builtin_addg6s_32 (unsigned int, unsigned int);
+ADDG6S_32 addg6ssi3 {}
 
   const signed long __builtin_bpermd (signed long, signed long);
 BPERMD bpermd_di {32bit}
 
   const unsigned int __builtin_cbcdtd (unsigned int);
@@ -2041,10 +2042,14 @@
 UNPACK_V1TI unpackv1ti {}
 
 
 ; Power7 builtins requiring 64-bit GPRs (even with 32-bit addressing).
 [power7-64]
+
+  const unsigned long __builtin_addg6s_64 (unsigned long, unsigned long);
+ADDG6S addg6sdi3 {no32bit}
+
   const signed long long __builtin_divde (signed long long, signed long long);
 DIVDE dive_di {}
 
   const unsigned long long __builtin_divdeu (unsigned long long, \
  unsigned long long);
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 566094626293..28e8b6761ce5 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1919,10 +1919,35 @@ altivec_resolve_overloaded_builtin (location_t loc, 
tree fndecl,
   instance_code, fcode, types, args);
if (call != error_mark_node)
  return call;
break;
   }
+  /* We need to special case __builtin_addg6s because the overloaded
+forms of this function take (unsigned int, unsigned int) or
+(unsigned long long, unsigned long long).  Since C conventions
+allow the respective argument types to be implicitly coerced into
+each other, the default handling does not provide adequate
+discrimination between the desired forms of the function.  */
+case RS6000_OVLD_ADDG6S_OV:
+  {
+   machine_mode arg1_mode = TYPE_MODE (types[0]);
+   machine_mode arg2_mode = TYPE_MODE (types[1]);
+
+   /* If any supplied arguments are wider than 32 bits, resolve to
+  64-bit variant of built-in function.  */
+   if (GET_MODE_PRECISION (arg1_mode) > 32
+   || GET_MODE_PRECISION (arg2_mode) > 32)
+ instance_code = RS6000_BIF_ADDG6S;
+   else
+ instance_code = RS6000_BIF_ADDG6S_32;
+
+   tree call = find_instance (&unsupported_builtin, &instance,
+  instance_code, fcode, types, args);
+   if (call != error_mark_node)
+ return call;
+   break;
+  }
 case RS6000_OVLD_VEC_VSIE:
   {
machine_mode arg1_mode = TYPE_MODE (types[0]);
 
/* If supplied first argument is wider than 64 bits, resolve to
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 44e2945aaa0e..41b74c0c1500 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -193,10 +193,16 @@
   unsigned int __builtin_cmpb (unsigned int, unsigned int);
 CMPB_32
   unsigned long long __built

Re: [PATCH] fixincludes: Deal also with the _Float128x cases [PR107059]

2022-10-04 Thread will schmidt via Gcc-patches
On Fri, 2022-09-30 at 09:20 +0200, Jakub Jelinek via Gcc-patches wrote:
> On Wed, Sep 28, 2022 at 08:19:43PM +0200, Jakub Jelinek via Gcc-
> patches wrote:
> > Another case are the following 3 snippets:
> > #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> > #   error "_Float128X supported but no constant suffix"
> > #  else
> > #   define __f128x(x) x##f128x
> > #  endif
> > ...
> > #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> > #   error "_Float128X supported but no complex type"
> > #  else
> > #   define __CFLOAT128X _Complex _Float128x
> > #  endif
> > ...
> > #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> > #   error "_Float128x supported but no type"
> > #  endif
> > but as no target has _Float128x right now and don't see it
> > coming soon, it isn't a big deal (on the glibc side it is of
> > course ok to adjust those).
> 
> This incremental patch deals handles the above 3 cases, so we
> fixinclude what glibc itself changed too.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux (together with
> the
> previously posted fixincludes/ change too), ok for trunk?

Hi,

The combination of these two patches allows me to build gcc
successfully.  (PPC64LE with RHEL9).

A nit that Part1 needed massaging
of the path/to/files (i.e. gcc/inclhack.def versus
fixincludes/inclhack.def) to apply.

I can't otherwise speak to the
changes, aside from they seem to work for me.

Thanks
-WIll



> 
> 2022-09-30  Jakub Jelinek  
> 
>   PR bootstrap/107059
>   * inclhack.def (glibc_cxx_floatn_5): New.
>   * fixincl.x: Regenerated.
>   * tests/base/bits/floatn.h: Regenerated.
> 
> --- fixincludes/inclhack.def.jj   2022-09-29 22:18:47.974402688
> +0200
> +++ fixincludes/inclhack.def  2022-09-29 22:22:48.151145670 +0200
> @@ -2131,6 +2131,23 @@ fix = {
>   EOT;
>  };
> 
> +fix = {
> +hackname  = glibc_cxx_floatn_5;
> +files = bits/floatn.h, bits/floatn-common.h,
> "*/bits/floatn.h", "*/bits/floatn-common.h";
> +select= "^([ \t]*#[ \t]*if !__GNUC_PREREQ \\(7, 0\\) \\|\\|
> )defined __cplusplus\n"
> + "([ \t]*#[ \t]+error \"_Float128[xX] supported but no
> )";
> +c_fix = format;
> +c_fix_arg = "%1(defined __cplusplus && !__GNUC_PREREQ (13,
> 0))\n%2";
> +test_text = <<-EOT
> + #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> + #   error "_Float128X supported but no constant suffix"
> + #  endif
> + #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> + #   error "_Float128x supported but no type"
> + #  endif
> + EOT;
> +};
> +
>  /*  glibc-2.3.5 defines pthread mutex initializers incorrectly,
>   *  so we replace them with versions that correspond to the
>   *  definition.
> --- fixincludes/fixincl.x.jj  2022-09-29 22:18:47.975402675 +0200
> +++ fixincludes/fixincl.x 2022-09-29 22:22:55.675909244 +0200
> @@ -2,11 +2,11 @@
>   *
>   * DO NOT EDIT THIS FILE   (fixincl.x)
>   *
> - * It has been AutoGen-ed  September 28, 2022 at 07:56:15 PM by
> AutoGen 5.18.16
> + * It has been AutoGen-ed  September 29, 2022 at 10:22:55 PM by
> AutoGen 5.18.16
>   * From the definitionsinclhack.def
>   * and the template file   fixincl
>   */
> -/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Sep 28 19:56:15 CEST 2022
> +/* DO NOT SVN-MERGE THIS FILE, EITHER Thu Sep 29 22:22:55 CEST 2022
>   *
>   * You must regenerate it.  Use the ./genfixes script.
>   *
> @@ -15,7 +15,7 @@
>   * certain ANSI-incompatible system header files which are fixed to
> work
>   * correctly with ANSI C and placed in a directory that GNU C will
> search.
>   *
> - * This file contains 271 fixup descriptions.
> + * This file contains 272 fixup descriptions.
>   *
>   * See README for more information.
>   *
> @@ -4273,6 +4273,43 @@ static const char* apzGlibc_Cxx_Floatn_4
> 
>  /* * * * * * * * * * * * * * * * * * * * * * * * * *
>   *
> + *  Description of Glibc_Cxx_Floatn_5 fix
> + */
> +tSCC zGlibc_Cxx_Floatn_5Name[] =
> + "glibc_cxx_floatn_5";
> +
> +/*
> + *  File name selection pattern
> + */
> +tSCC zGlibc_Cxx_Floatn_5List[] =
> +  "bits/floatn.h\0bits/floatn-
> common.h\0*/bits/floatn.h\0*/bits/floatn-common.h\0";
> +/*
> + *  Machine/OS name selection pattern
> + */
> +#define apzGlibc_Cxx_Floatn_5Machs (const char**)NULL
> +
> +/*
> + *  content selection pattern - do fix if pattern found
> + */
> +tSCC zGlibc_Cxx_Floatn_5Select0[] =
> +   "^([ \t]*#[ \t]*if !__GNUC_PREREQ \\(7, 0\\) \\|\\| )defined
> __cplusplus\n\
> +([ \t]*#[ \t]+error \"_Float128[xX] supported but no )";
> +
> +#defineGLIBC_CXX_FLOATN_5_TEST_CT  1
> +static tTestDesc aGlibc_Cxx_Floatn_5Tests[] = {
> +  { TT_EGREP,zGlibc_Cxx_Floatn_5Select0, (regex_t*)NULL }, };
> +
> +/*
> + *  Fix Command Arguments for Glibc_Cxx_Floatn_5
> + */
> +static const char* apzGlibc_Cxx_Floatn_5Patch[] = {
> +"format",
> +"%1(defined __cplusplus && !__GNUC_PREREQ (13, 0))\n\
> +%2",
> +(char*)NULL };
> +
> +/* * * * * * * * * * * * * * * * * * * * * * * * * *

Re: [PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines

2022-09-20 Thread will schmidt via Gcc-patches
On Tue, 2022-09-20 at 16:14 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Sep 19, 2022 at 06:19:15PM -0500, will schmidt wrote:
> >   This is the first of a batch of changes that eliminate a number
> > of define TARGET_foo entries we have collected over time.
> 
> Good good :-)
> 
> > TARGET_CTZ is defined as TARGET_MODULO, and has a low number
> > of uses.  References to TARGET_CTZ should be safe to replace
> > with TARGET_MODULO throughout.
> 
> No, please don't.  This has nothing to with "modulo".  If you want to
> say this is just whether we have ISA 3.0 or p9, make a new target
> macro
> for *that* and use that everywhere.
> 
> This is a general issue, that will make the code much more sane if
> you
> can fix it!

> 
> > TARGET_FCTIDZ is entirely unused, and safe to remove.
> 
> Please make separate patches for separate issues.  This makes it much
> easier to review, and MUCH easier for all other ways we need to
> handle
> it (backports, reverts, everything else).  With Git it is *easier* to
> keep separate patches separate than it is to lump it all
> together.  So,
> the trick is to keep things in separate commits during development
> already (and you will find more benefits doing that, too!)

Yup, I actually developed these three (plus a bunch more) separately,
but combined the first three for posting.   I'll split them back out
and repost after a bit. 

> 
> TARGET_FCTIDZ was never used, it always used TARGET_FCFID directly.
> 
> The original PEM mistakenly said this insn is "64-bit only".  This
> was
> fixed in ISA 2.01 .
> 
> > TARGET_FCTIWUZ has a low number of uses, and can be directly
> > replaced with TARGET_POPCNTD.
> 
> It is a p7 (ISA 2.06) insn.  Please make a TARGET_P7 or such?


Yes.  I do have a change later in the (unposted) series to replace
POPCNTD with POWER7, at a glance thats #17 down the line. In review I
agree with your comment that the in-between changes aren't the best
choices. I'll see about skipping the in-between values and going
straight for POPCNTD->POWER7.

I am looking at the TARGET_POWER10 notation as the target style, versus
TARGET_P7, but I can go that direction if we think that would be
preferred.   Maybe it is since this is a retro-fix versus new. :-)


> 
> In the current situation target macros like TARGET_POPCNTD are abused
> to
> mean either "can we use the popcntd insn", or to mean "can we use
> insn
> new on p7".  Or sometimes something in between, or something in this
> general neighbourhood.  It is never clear which is meant, which makes
> it
> very hard to untangle this.  But thanks for trying!  :-)
>
> (Don't let me dicsourage you btw, most is pretty straightforward).

Absolutely..   I do have this mostly covered locally, I just need to
refine a few parts.  :-)

> 
> 
> > * config/rs6000/rs6000.h (TARGET_CTZ): Replace with
> > TARGET_MODULO.
> 
> Changelogs are indented with tabs, and this fits on one line.
> 
> So, please make TARGET_P7 and such, and OPTION_MASKs for those in
> rs6000-cpus.def?

willdo, 
thanks
-Will


> 
> 
> Segher



[PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines

Hi,
  This is the first of a batch of changes that eliminate a number
of define TARGET_foo entries we have collected over time.

TARGET_CTZ is defined as TARGET_MODULO, and has a low number
of uses.  References to TARGET_CTZ should be safe to replace
with TARGET_MODULO throughout.

TARGET_FCTIDZ is entirely unused, and safe to remove.

TARGET_FCTIWUZ has a low number of uses, and can be directly
replaced with TARGET_POPCNTD.

This eliminates three defines.

There should be no codegen changes, and this has regtested OK.
OK for trunk?
Thanks,

gcc/
* config/rs6000/rs6000.h (TARGET_CTZ): Replace with
TARGET_MODULO.
(TARGET_FCTIDZ): Remove.
(TARGET_FCTIWUZ): Replace with TARGET_POPCNTD.
* config/rs6000/rs6000.cc (TARGET_CTZ): Replace with TARGET_MODULO.
* config/rs6000/rs6000.md (ctz2): Replace TARGET_CTZ
with TARGET_MODULO.
(ctz2_hw): Same.
(fixuns_truncsi2): Replace TARGET_FCTIWUZ
with TARGET_POPCNTD.
(fixuns_truncsi2_stfiwx): Same.
(fctiwz_): Same.

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index fcca062a8709..eea427b1ca51 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -21998,11 +21998,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   if (!TARGET_MODULO && (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
 case CTZ:
-  *total = COSTS_N_INSNS (TARGET_CTZ ? 1 : 4);
+  *total = COSTS_N_INSNS (TARGET_MODULO ? 1 : 4);
   return false;
 
 case FFS:
   *total = COSTS_N_INSNS (4);
   return false;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index eb7b21584970..ee887efd1122 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -456,20 +456,17 @@ extern int rs6000_vector_align[];
 || TARGET_PPC_GPOPT/* 970/power4 */\
 || TARGET_POPCNTB  /* ISA 2.02 */  \
 || TARGET_CMPB /* ISA 2.05 */  \
 || TARGET_POPCNTD) /* ISA 2.06 */
 
-#define TARGET_FCTIDZ  TARGET_FCFID
 #define TARGET_STFIWX  TARGET_PPC_GFXOPT
 #define TARGET_LFIWAX  TARGET_CMPB
 #define TARGET_LFIWZX  TARGET_POPCNTD
 #define TARGET_FCFIDS  TARGET_POPCNTD
 #define TARGET_FCFIDU  TARGET_POPCNTD
 #define TARGET_FCFIDUS TARGET_POPCNTD
 #define TARGET_FCTIDUZ TARGET_POPCNTD
-#define TARGET_FCTIWUZ TARGET_POPCNTD
-#define TARGET_CTZ TARGET_MODULO
 #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
 #define TARGET_MADDLD  TARGET_MODULO
 
 #define TARGET_XSCVDPSPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
@@ -1751,11 +1748,11 @@ typedef struct rs6000_args
 
 /* The CTZ patterns that are implemented in terms of CLZ return -1 for input of
zero.  The hardware instructions added in Power9 and the sequences using
popcount return 32 or 64.  */
 #define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
-  (TARGET_CTZ || TARGET_POPCNTD
\
+  (TARGET_MODULO || TARGET_POPCNTD 
\
? ((VALUE) = GET_MODE_BITSIZE (MODE), 2)\
: ((VALUE) = -1, 2))
 
 /* Specify the machine mode that pointers have.
After generation of rtl, the compiler makes no further distinction
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ad5a4cf2ef83..619a87374734 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2414,11 +2414,11 @@ (define_insn "clz2"
 (define_expand "ctz2"
[(set (match_operand:GPR 0 "gpc_reg_operand")
 (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand")))]
   ""
 {
-  if (TARGET_CTZ)
+  if (TARGET_MODULO)
 {
   emit_insn (gen_ctz2_hw (operands[0], operands[1]));
   DONE;
 }
 
@@ -2445,11 +2445,11 @@ (define_expand "ctz2"
 })
 
 (define_insn "ctz2_hw"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
-  "TARGET_CTZ"
+  "TARGET_MODULO"
   "cnttz %0,%1"
   [(set_attr "type" "cntlz")])
 
 (define_expand "ffs2"
   [(set (match_operand:GPR 0 "gpc_reg_operand")
@@ -6326,11 +6326,11 @@ (define_insn_and_split 
"*fix_trunc2_mem"
 })
 
 (define_expand "fixuns_truncsi2"
   [(set (match_operand:SI 0 "gpc_reg_operand")
(unsigned_fix:SI (match_operand:SFDF 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_FCTIWUZ && TARGET_STFIWX"
+  "TARGET_HARD_FLOAT && TARGET_POPCNTD && TARGET_STFIWX"
 {
   if (!TARGET_P8_VECTOR)
 {
   emit_insn (gen_fixuns_truncsi2_stfiwx (operands[0], operands[1]));
   DONE;
@@ -6339,11 +6339,11 @@ (define_expand "

[PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865]

Hi,
  The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
and can be disabled by dependent options when it should not be.
This manifests in the issue seen in PR101865 where -mno-vsx
mistakenly disables _ARCH_PWR8.

This change replaces the relevant TARGET_DIRECT_MOVE references
with a TARGET_POWER8 entry so that the direct_move and power8
features can be enabled or disabled independently.

This is done via the OPTION_MASK definitions, so this
means that some references to the OPTION_MASK_DIRECT_MOVE
option are now replaced with OPTION_MASK_POWER8.

The existing (and rather lengthy) commentary for DIRECT_MOVE remains
in place in rs6000-c.cc:rs6000_target_modify_macros().  The
if-defined logic there will now set a __DIRECT_MOVE__ define when
TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
purposes, but is otherwise unused.  This can be removed in a
subsequent patch, or in an update of this patch, depending on feedback.

This regests cleanly (power8,power9,power10), and resolves
PR 101865 as represented in the tests from (1/2).

OK for trunk?
Thanks,
-Will


gcc/
PR Target/101865
* config/rs6000/rs6000-builtin.cc
(rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
usage with TARGET_POWER8.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros):
Add __DIRECT_MOVE__ define.  Replace _ARCH_PWR8_ define
conditional with OPTION_MASK_POWER8.
* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
Add OPTION_MASK_POWER8 entry.
(POWERPC_MASKS): Same.
* config/rs6000/rs6000.cc (rs6000_option_override_internal):
Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8.
(rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8.
* config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8.
* config/rs6000/vsx.md (vsx_extract_): Replace
TARGET_DIRECT_MOVE usage with TARGET_POWER8.
(define_peephole2): Same.

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 3ce729c1e6de..91a0f39bd796 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P7:
   return TARGET_POPCNTD;
 case ENB_P7_64:
   return TARGET_POPCNTD && TARGET_POWERPC64;
 case ENB_P8:
-  return TARGET_DIRECT_MOVE;
+  return TARGET_POWER8;
 case ENB_P8V:
   return TARGET_P8_VECTOR;
 case ENB_P9:
   return TARGET_MODULO;
 case ENB_P9_64:
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index ca9cc42028f7..41d51b039061 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
  turned off in any of the following conditions:
  1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
enabled.
  2. TARGET_VSX is off.  */
-  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
+  if ((OPTION_MASK_DIRECT_MOVE) != 0)
+rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__");
+  if ((flags & OPTION_MASK_POWER8) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
   if ((flags & OPTION_MASK_MODULO) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_POWER10) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index c3825bcccd84..c873f6d58989 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -48,10 +48,11 @@
system.  */
 #define ISA_2_7_MASKS_SERVER   (ISA_2_6_MASKS_SERVER   \
 | OPTION_MASK_P8_VECTOR\
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DIRECT_MOVE  \
+| OPTION_MASK_POWER8   \
 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
 | OPTION_MASK_QUAD_MEMORY  \
 | OPTION_MASK_QUAD_MEMORY_ATOMIC)
 
 /* ISA masks setting fusion options.  */
@@ -124,10 +125,11 @@
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \
 | OPTION_MASK_DIRECT_MOVE  \
+| OPTION_MASK_POWER8   

[PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option.

Hi,

This adds an assortment of tests to exercise the -mno-vsx option and
confirm the impacts on the ARCH_PWR8 define.

These are based on and inspired by PR 101865, which
reports that _ARCH_PWR8 is disabled when -mno-vsx
is passed on the commandline.

There are a small number of failures introduced by these tests,
those are resolved with the changes in part 2.

OK for trunk?
Thanks,
-Will


gcc/testsuite:
* gcc.target/powerpc/predefine_p7-novsx.c: New test.
* gcc.target/powerpc/predefine_p8-noaltivec-novsx.c: New test.
* gcc.target/powerpc/predefine_p8-novsx.c: New test.
* gcc.target/powerpc/predefine_p9-novsx.c: New test.
* gcc.target/powerpc/predefine_pragma_vsx.c: New test.


diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
new file mode 100644
index ..e842025b4d3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
@@ -0,0 +1,9 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR7 and ARCH_PWR8 defines gets set
+ * when we specify power7, plus options.
+/* This is a variation of the test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power7 -mno-vsx" } */
+/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR7 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
new file mode 100644
index ..c3b705ca3d48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
@@ -0,0 +1,7 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling both altivec 
and vsx. */
+/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-altivec -mno-vsx" } */
+/* { dg-final { scan-file predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
_ARCH_PWR8 1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
_ARCH_PWR9 1($|\\n)" } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
__VSX__ 1($|\\n)" } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
__ALTIVEC__ 1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
new file mode 100644
index ..8b6c69b20104
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
@@ -0,0 +1,8 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
+   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
+/* This is the primary test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-vsx" } */
+/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p8-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
new file mode 100644
index ..eef42c111663
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
@@ -0,0 +1,10 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
+   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
+/* This is the primary test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power9 -mno-vsx" } */
+/* {xfail *-*-*} */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR9 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p9-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c
new file mode 100644
index ..b300600af999
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c
@@ -0,0 +1,83 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */
+
+/* Ensure that if we set a pragma gcc target for an
+   older processor, we do not compile builtins that
+   the older target does not supp

[PATCH, rs6000, v2] Cleanup some vstrir define_expand naming inconsistencies

2022-07-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000, v2] Cleanup some vstrir define_expand naming inconsistencies

Hi,
  This cleans up some of the naming around the vstrir and vstril
instruction definitions, with some cosmetic changes for consistency.
No functional changes.
Regtested just in case, no regressions.

[V2]
Used 'direct' instead of 'internal', and cosmetically reworked
the changelog.

OK for trunk?

Thanks,

gcc/
* config/rs6000/altivec.md:
(vstrir_code_): Rename to...
(vstrir_direct_): ... this.
(vstrir_p_code_): Rename to...
(vstrir_p_direct_): ... this.
(vstril_code_): Rename to...
(vstril_direct_): ... this.
(vstril_p_code_): Rename to...
(vstril_p_direct_): ... this.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index efc8ae35c2e7..2c4940f2e21c 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -884,44 +884,44 @@ (define_expand "vstrir_"
(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
UNSPEC_VSTRIR))]
   "TARGET_POWER10"
 {
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstrir_code_ (operands[0], operands[1]));
+emit_insn (gen_vstrir_direct_ (operands[0], operands[1]));
   else
-emit_insn (gen_vstril_code_ (operands[0], operands[1]));
+emit_insn (gen_vstril_direct_ (operands[0], operands[1]));
   DONE;
 })
 
-(define_insn "vstrir_code_"
+(define_insn "vstrir_direct_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))]
   "TARGET_POWER10"
   "vstrir %0,%1"
   [(set_attr "type" "vecsimple")])
 
-;; This expands into same code as vstrir_ followed by condition logic
+;; This expands into same code as vstrir followed by condition logic
 ;; so that a single vstribr. or vstrihr. or vstribl. or vstrihl. instruction
 ;; can, for example, satisfy the needs of a vec_strir () function paired
 ;; with a vec_strir_p () function if both take the same incoming arguments.
 (define_expand "vstrir_p_"
   [(match_operand:SI 0 "gpc_reg_operand")
(match_operand:VIshort 1 "altivec_register_operand")]
   "TARGET_POWER10"
 {
   rtx scratch = gen_reg_rtx (mode);
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstrir_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstrir_p_direct_ (scratch, operands[1]));
   else
-emit_insn (gen_vstril_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstril_p_direct_ (scratch, operands[1]));
   emit_insn (gen_cr6_test_for_zero (operands[0]));
   DONE;
 })
 
-(define_insn "vstrir_p_code_"
+(define_insn "vstrir_p_direct_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))
(set (reg:CC CR6_REGNO)
@@ -936,17 +936,17 @@ (define_expand "vstril_"
(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
UNSPEC_VSTRIR))]
   "TARGET_POWER10"
 {
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstril_code_ (operands[0], operands[1]));
+emit_insn (gen_vstril_direct_ (operands[0], operands[1]));
   else
-emit_insn (gen_vstrir_code_ (operands[0], operands[1]));
+emit_insn (gen_vstrir_direct_ (operands[0], operands[1]));
   DONE;
 })
 
-(define_insn "vstril_code_"
+(define_insn "vstril_direct_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))]
   "TARGET_POWER10"
@@ -962,18 +962,18 @@ (define_expand "vstril_p_"
(match_operand:VIshort 1 "altivec_register_operand")]
   "TARGET_POWER10"
 {
   rtx scratch = gen_reg_rtx (mode);
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstril_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstril_p_direct_ (scratch, operands[1]));
   else
-emit_insn (gen_vstrir_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstrir_p_direct_ (scratch, operands[1]));
   emit_insn (gen_cr6_test_for_zero (operands[0]));
   DONE;
 })
 
-(define_insn "vstril_p_code_"
+(define_insn "vstril_p_direct_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))
(set (reg:CC CR6_REGNO)



[PATCH, rs6000, v2] Additional cleanup of rs6000_builtin_mask

2022-07-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000, v2] Additional cleanup of rs6000_builtin_mask

Hi,
  Post the rs6000 builtins rewrite, some of the leftover builtin
code is redundant and can be removed.
  This replaces the usage of bu_mask in rs6000_target_modify_macros
with checks against the rs6000_isa_flags equivalent directly.  Thusly
the bu_mask variable can be removed.  After this update there
are no other uses of rs6000_builtin_mask_calculate, so that function
can also be safely removed.

No functional change, though some output under debug has been removed.

[V2]
  Per patch review and subsequent investigations, the
rs6000_builtin_mask and x_rs6000_builtin_mask can also be removed, as
well as the entirety of the rs6000_builtin_mask_names table.

gcc/
* config/rs6000/rs6000-c.cc: Update comments.
(rs6000_target_modify_macros): Remove bu_mask references.
(rs6000_define_or_undefine_macro): Replace bu_mask reference
with a rs6000_cpu value check.
(rs6000_cpu_cpp_builtins): Remove rs6000_builtin_mask_calculate()
parameter from call to rs6000_target_modify_macros.
* config/rs6000/rs6000-protos.h (rs6000_target_modify_macros,
rs6000_target_modify_macros_ptr): Remove parameter from extern
for the prototype.
* config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): Remove
parameter from prototype, update calls to this function.
(rs6000_print_builtin_options): Remove prototype, call and function.
(rs6000_builtin_mask_calculate): Remove function.
(rs6000_debug_reg_global): Remove call to rs6000_print_builtin_options.
(rs6000_option_override_internal): Remove rs6000_builtin_mask var
and builtin_mask debug output.
(rs6000_builtin_mask_names): Remove.
(rs6000_pragma_target_parse): Remove prev_bumask, cur_bumask,
diff_bumask references; Update calls to
rs6000_target_modify_ptr.
* config/rs6000/rs6000.opt (rs6000_builtin_mask): Remove.

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 0d13645040ff..4d051b906582 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -333,24 +333,20 @@ rs6000_define_or_undefine_macro (bool define_p, const 
char *name)
   else
 cpp_undef (parse_in, name);
 }
 
 /* Define or undefine macros based on the current target.  If the user does
-   #pragma GCC target, we need to adjust the macros dynamically.  Note, some of
-   the options needed for builtins have been moved to separate variables, so
-   have both the target flags and the builtin flags as arguments.  */
+   #pragma GCC target, we need to adjust the macros dynamically.  */
 
 void
-rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
-HOST_WIDE_INT bu_mask)
+rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
 {
   if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
 fprintf (stderr,
-"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX
-", " HOST_WIDE_INT_PRINT_HEX ")\n",
+"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX ")\n",
 (define_p) ? "define" : "undef",
-flags, bu_mask);
+flags);
 
   /* Each of the flags mentioned below controls whether certain
  preprocessor macros will be automatically defined when
  preprocessing source files for compilation by this compiler.
  While most of these flags can be enabled or disabled
@@ -593,14 +589,12 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
   /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or
  via the target attribute/pragma.  */
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
 rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
 
-  /* options from the builtin masks.  */
-  /* Note that OPTION_MASK_FPRND is enabled only if
- (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell).  */
-  if ((bu_mask & OPTION_MASK_FPRND) != 0)
+  /* Tell the user if we are targeting CELL.  */
+  if (rs6000_cpu == PROCESSOR_CELL)
 rs6000_define_or_undefine_macro (define_p, "__PPU__");
 
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
 rs6000_define_or_undefine_macro (define_p, "__MMA__");
@@ -614,12 +608,11 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
 
 void
 rs6000_cpu_cpp_builtins (cpp_reader *pfile)
 {
   /* Define all of the common macros.  */
-  rs6000_target_modify_macros (true, rs6000_isa_flags,
-  rs6000_builtin_mask_calculate ());
+  rs6000_target_modify_macros (true, rs6000_isa_flags);
 
   if (TARGET_FRE)
 builtin_define ("__RECIP__");
   if (TARGET_FRES)
 builtin_define ("__RECIPF__");
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 3ea010236090..b3c16e7448d8 100644
--- a/gcc/config/rs6000/rs6000-pr

Re: [PATCH, rs6000] Additional cleanup of rs6000_builtin_mask

2022-07-15 Thread will schmidt via Gcc-patches
On Thu, 2022-07-14 at 11:28 +0800, Kewen.Lin wrote:
> Hi Will,
> 
> Thanks for the cleanup!  Some comments are inlined.

Hi, 
Thanks for the review.  A few comments and responses below.  TLDR I'll
incorporate the suggestions in V2 that will show up ... after.  :-)

> 
> on 2022/7/14 05:39, will schmidt wrote:
> > [PATCH, rs6000] Additional cleanup of rs6000_builtin_mask
> > 
> > Hi,
> >   Post the rs6000 builtins rewrite, some of the leftover builtin
> > code is redundant and can be removed.
> >   This replaces the remaining usage of bu_mask in
> > rs6000_target_modify_macros() with checks against the rs6000_cpu
> > directly.
> > Thusly the bu_mask variable can be removed.  After that variable
> > is eliminated there are no other uses of
> > rs6000_builtin_mask_calculate(),
> > so that function can also be safely removed.
> > 
> 
> The TargetVariable rs6000_builtin_mask in rs6000.opt is useless, it
> seems
> it can be removed together?

Yes, if I also remove usage of x_rs6000_builtin_mask.   There are a few
remaining reference to x_r_b_m, but those appear safe to remove after
this cleanup as well.  I'll confirm and likely include the removal in
V2.   


> 
> > I have tested this on current systems (P8,P9,P10) without
> > regressions.
> > 
> > OK for trunk?
> > 
> > 
> > Thanks,
> > -Will
> > 
> > 


> >  
> > -  /* Set the builtin mask of the various options used that could
> > affect which
> > - builtins were used.  In the past we used target_flags, but
> > we've run out
> > - of bits, and some options are no longer in target_flags.  */
> > -  rs6000_builtin_mask = rs6000_builtin_mask_calculate ();
> > -  if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
> > -rs6000_print_builtin_options (stderr, 0, "builtin mask",
> > - rs6000_builtin_mask);
> > -
> 
> I wonder if it's a good idea to still dump some information for
> built-in
> functions debugging even with new bif framework, it can be handled in
> a
> separated patch if yes.  The new bif framework adopts stanzas for bif
> guarding, if we want to do similar things, we can refer to the code
> like:





> TARGET_POPCNTB means all bifs with ENB_P5 are available
> TARGET_CMPB means all bifs with ENB_P6 are available
> ...
> 
> , dump information like "current enabled stanzas: ENB_xx, ENB_xxx,
> ..."
> (even without ENB_ prefix).

Possibly.  There does exist some debug already, and I still have some
work in progress related to some of the OPTION and TARGET handling. 
I'll keep this in mind as I continue poking in this space. :-)


> >/* Initialize all of the registers.  */
> >rs6000_init_hard_regno_mode_ok (global_init_p);
> >  
> >/* Save the initial options in case the user does function
> > specific options */
> >if (global_init_p)
> > @@ -24495,17 +24442,15 @@ rs6000_pragma_target_parse (tree args,
> > tree pop_target)
> >  
> >if ((diff_flags != 0) || (diff_bumask != 0))
> > {
> >   /* Delete old macros.  */
> >   rs6000_target_modify_macros_ptr (false,
> > -  prev_flags & diff_flags,
> > -  prev_bumask & diff_bumask);
> > +  prev_flags & diff_flags);
> >  
> >   /* Define new macros.  */
> >   rs6000_target_modify_macros_ptr (true,
> > -  cur_flags & diff_flags,
> > -  cur_bumask & diff_bumask);
> > +  cur_flags & diff_flags);
> > }
> >  }
> >  
> >return true;
> >  }
> > @@ -24732,19 +24677,10 @@ rs6000_print_isa_options (FILE *file, int
> > indent, const char *string,
> >rs6000_print_options_internal (file, indent, string, flags, "-
> > m",
> >  &rs6000_opt_masks[0],
> >  ARRAY_SIZE (rs6000_opt_masks));
> >  }
> >  
> > -static void
> > -rs6000_print_builtin_options (FILE *file, int indent, const char
> > *string,
> > - HOST_WIDE_INT flags)
> > -{
> > -  rs6000_print_options_internal (file, indent, string, flags, "",
> > -&rs6000_builtin_mask_names[0],
> > -ARRAY_SIZE
> > (rs6000_builtin_mask_names));
> > -}
> 
> rs6000_builtin_mask_names becomes useless too, can be removed too?

It can.  I'll include removal in V2.
Thanks
-Will

> 
> BR,
> Kewen



[PATCH, rs6000] Additional cleanup of rs6000_builtin_mask

2022-07-13 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Additional cleanup of rs6000_builtin_mask

Hi,
  Post the rs6000 builtins rewrite, some of the leftover builtin
code is redundant and can be removed.
  This replaces the remaining usage of bu_mask in
rs6000_target_modify_macros() with checks against the rs6000_cpu directly.
Thusly the bu_mask variable can be removed.  After that variable
is eliminated there are no other uses of rs6000_builtin_mask_calculate(),
so that function can also be safely removed.

I have tested this on current systems (P8,P9,P10) without regressions.

OK for trunk?


Thanks,
-Will

gcc/
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Remove
bu_mask references.  (rs6000_define_or_undefine_macro): Replace
bu_mask reference with a rs6000_cpu value check.
(rs6000_cpu_cpp_builtins): Remove rs6000_builtin_mask_calculate()
parameter from call to rs6000_target_modify_macros.
* config/rs6000/rs6000-protos.h (rs6000_target_modify_macros,
rs6000_target_modify_macros_ptr): Remove parameter from extern
for the prototype.
* config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): Remove
parameter from prototype, update calls to this function.
(rs6000_print_builtin_options): Remove prototype, call and function.
(rs6000_builtin_mask_calculate): Remove function.
(rs6000_debug_reg_global): Remove call to rs6000_print_builtin_options.
(rs6000_option_override_internal): Remove rs6000_builtin_mask var
and builtin_mask debug output.
(rs6000_pragma_target_parse): Update calls to
rs6000_target_modify_ptr.


diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 0d13645040ff..4d051b906582 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -333,24 +333,20 @@ rs6000_define_or_undefine_macro (bool define_p, const 
char *name)
   else
 cpp_undef (parse_in, name);
 }
 
 /* Define or undefine macros based on the current target.  If the user does
-   #pragma GCC target, we need to adjust the macros dynamically.  Note, some of
-   the options needed for builtins have been moved to separate variables, so
-   have both the target flags and the builtin flags as arguments.  */
+   #pragma GCC target, we need to adjust the macros dynamically.  */
 
 void
-rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
-HOST_WIDE_INT bu_mask)
+rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
 {
   if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
 fprintf (stderr,
-"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX
-", " HOST_WIDE_INT_PRINT_HEX ")\n",
+"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX ")\n",
 (define_p) ? "define" : "undef",
-flags, bu_mask);
+flags);
 
   /* Each of the flags mentioned below controls whether certain
  preprocessor macros will be automatically defined when
  preprocessing source files for compilation by this compiler.
  While most of these flags can be enabled or disabled
@@ -593,14 +589,12 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
   /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or
  via the target attribute/pragma.  */
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
 rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
 
-  /* options from the builtin masks.  */
-  /* Note that OPTION_MASK_FPRND is enabled only if
- (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell).  */
-  if ((bu_mask & OPTION_MASK_FPRND) != 0)
+  /* Tell the user if we are targeting CELL.  */
+  if (rs6000_cpu == PROCESSOR_CELL)
 rs6000_define_or_undefine_macro (define_p, "__PPU__");
 
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
 rs6000_define_or_undefine_macro (define_p, "__MMA__");
@@ -614,12 +608,11 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
 
 void
 rs6000_cpu_cpp_builtins (cpp_reader *pfile)
 {
   /* Define all of the common macros.  */
-  rs6000_target_modify_macros (true, rs6000_isa_flags,
-  rs6000_builtin_mask_calculate ());
+  rs6000_target_modify_macros (true, rs6000_isa_flags);
 
   if (TARGET_FRE)
 builtin_define ("__RECIP__");
   if (TARGET_FRES)
 builtin_define ("__RECIPF__");
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 3ea010236090..b3c16e7448d8 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -318,13 +318,12 @@ extern void rs6000_pragma_longcall (struct cpp_reader *);
 extern void rs6000_cpu_cpp_builtins (struct cpp_reader *);
 #ifdef TREE_CODE
 extern bool rs6000_pragma_target_parse (tree, tree);
 #endif
 extern void rs6000_activate_target_options (tree new_tree);
-extern void rs6000

Re: [PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies

2022-07-13 Thread will schmidt via Gcc-patches
On Wed, 2022-07-13 at 14:39 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jul 13, 2022 at 01:18:29PM -0500, will schmidt wrote:
> >   This cleans up some of the naming around the vstrir and vstril
> > instruction definitions, with some cosmetic changes for
> > consistency.
> > gcc/
> > * config/rs6000/altivec.md (vstrir_code_): Rename
> > to vstrir_internal_.
> > (vstrir_p_code_): Rename to vstrir_p_internal_.
> > (vstril_code_): Rename to vstril_internal_.
> > (vstril_p_code_): Rename to vstril_p_internal_.
> 
> It doesn't show the new names on the lhs this way.  One way to do
> better
> is to write e.g.
>   (vstril_code_): Rename to...
>   (vstril_internal_): ... this.

Ok.

> 
> It often is a good idea to say "... for VIshort" and similar
> btw.

Ok. 

> 
> I'm not a fan of "internal" either, it doesn't say anything.  At
> least
> put it at the very end of the names please?
I'm easily convinced. ;-)  I wonder if I should just drop "_internal"
entirely and go with "vstrir_".  Otherwise I'll rework to be
"vstrir__internal".
At a glance I see we do have some other existing define_insn entries
with _internal at the tail and a few others embedded in the middle. 
I'll leave a note and perhaps review those after.  :-)

Thanks,
-Will

> 
> Okay for trunk with that changed.  Thanks!
> 
> 
> Segher



[PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies

2022-07-13 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies

Hi,
  This cleans up some of the naming around the vstrir and vstril
instruction definitions, with some cosmetic changes for consistency.
No functional changes.
Regtested just in case, no regressions.  :-)
OK for trunk?

Thanks,

gcc/
* config/rs6000/altivec.md (vstrir_code_): Rename
to vstrir_internal_.
(vstrir_p_code_): Rename to vstrir_p_internal_.
(vstril_code_): Rename to vstril_internal_.
(vstril_p_code_): Rename to vstril_p_internal_.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index efc8ae35c2e7..5aea02e9ad6e 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -884,44 +884,44 @@ (define_expand "vstrir_"
(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
UNSPEC_VSTRIR))]
   "TARGET_POWER10"
 {
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstrir_code_ (operands[0], operands[1]));
+emit_insn (gen_vstrir_internal_ (operands[0], operands[1]));
   else
-emit_insn (gen_vstril_code_ (operands[0], operands[1]));
+emit_insn (gen_vstril_internal_ (operands[0], operands[1]));
   DONE;
 })
 
-(define_insn "vstrir_code_"
+(define_insn "vstrir_internal_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))]
   "TARGET_POWER10"
   "vstrir %0,%1"
   [(set_attr "type" "vecsimple")])
 
-;; This expands into same code as vstrir_ followed by condition logic
+;; This expands into same code as vstrir followed by condition logic
 ;; so that a single vstribr. or vstrihr. or vstribl. or vstrihl. instruction
 ;; can, for example, satisfy the needs of a vec_strir () function paired
 ;; with a vec_strir_p () function if both take the same incoming arguments.
 (define_expand "vstrir_p_"
   [(match_operand:SI 0 "gpc_reg_operand")
(match_operand:VIshort 1 "altivec_register_operand")]
   "TARGET_POWER10"
 {
   rtx scratch = gen_reg_rtx (mode);
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstrir_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstrir_p_internal_ (scratch, operands[1]));
   else
-emit_insn (gen_vstril_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstril_p_internal_ (scratch, operands[1]));
   emit_insn (gen_cr6_test_for_zero (operands[0]));
   DONE;
 })
 
-(define_insn "vstrir_p_code_"
+(define_insn "vstrir_p_internal_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))
(set (reg:CC CR6_REGNO)
@@ -936,17 +936,17 @@ (define_expand "vstril_"
(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
UNSPEC_VSTRIR))]
   "TARGET_POWER10"
 {
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstril_code_ (operands[0], operands[1]));
+emit_insn (gen_vstril_internal_ (operands[0], operands[1]));
   else
-emit_insn (gen_vstrir_code_ (operands[0], operands[1]));
+emit_insn (gen_vstrir_internal_ (operands[0], operands[1]));
   DONE;
 })
 
-(define_insn "vstril_code_"
+(define_insn "vstril_internal_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))]
   "TARGET_POWER10"
@@ -962,18 +962,18 @@ (define_expand "vstril_p_"
(match_operand:VIshort 1 "altivec_register_operand")]
   "TARGET_POWER10"
 {
   rtx scratch = gen_reg_rtx (mode);
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstril_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstril_p_internal_ (scratch, operands[1]));
   else
-emit_insn (gen_vstrir_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstrir_p_internal_ (scratch, operands[1]));
   emit_insn (gen_cr6_test_for_zero (operands[0]));
   DONE;
 })
 
-(define_insn "vstril_p_code_"
+(define_insn "vstril_p_internal_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))
(set (reg:CC CR6_REGNO)



Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-08 Thread will schmidt via Gcc-patches
On Tue, 2022-06-07 at 23:16 -0400, Michael Meissner wrote:
> On Tue, Jun 07, 2022 at 07:59:34PM -0500, Peter Bergner wrote:
> > On 6/7/22 4:24 PM, Segher Boessenkool wrote:
> > > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> > > > I think I mentioned this offline, but I'd prefer a negative target flag,
> > > > something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, 
> > > > meaning we'd
> > > > generate stxvp by default.
> > > 
> > > NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
> > > positive.  All of them.
> > 
> > That's not what I was asking for.  I totally agree that 
> > -mno-store-vector-pair
> > should disable generating stxvp and that -mstore-vector-pair should enable
> > generating it.  What I asked for was that the internal flag we use to enable
> > and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR 
> > is
> > true when we use -mno-store-vector-pair and false when using 
> > -mstore-vector-pair.
> > That way we can add that flag to power10's rs6000-cpu.def entry and then 
> > we're
> > done.  What I don't want to have to do is that if/when power87 is released, 
> > we
> > still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to
> > get stxvp insns generated.  That adds a cost to every cpu after power10 
> > since
> > we'd have to remember to add that flag to every follow-on cpu.
> 
> FWIW, I really dislike having negative flags like that (just talking about the
> option mask internals, not the user option).

I can't tell there is agreement in either direction, i'll throw some
comments out and see if that helps make a decision. 

I agree with avoiding the negative flags.  Whenever I run across a code
snippet reading  "if (! TARGET_NOT_FOO) ... " it's time to double-check 
everything.  :-)  

If the proposal is to have "TARGET_NO_STORE_VECTOR_PAIR" set to "off",
I'd counter propose whatever variation possible to drop the "NO" from
the string. i.e. "TARGET_STORE_VECTOR_PAIR" set to however it makes
sense to indicate enabled, or not.

All that said, .. with a strong preference to have the internal flags
matching the option flags as closely as possible.


> 
> I don't view the cost to add one postive flag to the next CPU as bad, as it
> will be a one time cost.  Presumably it would be set also next++ CPU.  This is
> like power8 is all of the power7 flags + new flags.  Power9 is all of the
> power8 flags + new flags.  I.e. in general it is cumulative.  Yes, I'm aware
> there are times when there are breaks, but hopefully those are rare.

This sounds reasonable.   Some weight could be added for which way to
bias the flag based on a guess of what the 'power87' release will
allow, but ultimately that shouldn't really matter. 

And no, power87 isnt' real AFAIK,.. I'm just repeating the example
provided by Peter :-) 

Thanks
-Will

> 
> Otherwise it is like the mess with -mpower8-fusion, where going from power8 to
> power9 we have to clear the fusion flag.  If store vector pair is a postive
> flag, then it isn't set in power10 flags, but it might be set in next cpu
> flags.  But if it is a negative flag, we have to explicitly clear it.
> 
> We can do it, but I just prefer to go with the positive flag approach.
> 



Re: [PATCH, V3] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293

2022-06-07 Thread will schmidt via Gcc-patches
On Tue, 2022-06-07 at 15:21 -0500, Segher Boessenkool wrote:
> On Tue, Jun 07, 2022 at 02:26:17PM -0500, will schmidt wrote:
> > On Mon, 2022-06-06 at 20:31 -0400, Michael Meissner wrote:
> > >  (define_insn "vsx_xxspltd_"
> > >[(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> > > -(unspec:VSX_D [(match_operand:VSX_D 1
> > > "vsx_register_operand"
> > > "wa")
> 
> Someone (you?) uses format=flawed.  You cannot reply to emails that
> contain patches that way, it messes up everything :-(

Right..  Something on my end may be posessed, several of my emails
today have tried to go all HTML on me, and or otherwise gone
format-wonky, which I do not want.  ;-) 


> 
> > > -(match_operand:QI 2 "u5bit_cint_operand" "i")]
> > > -  UNSPEC_VSX_XXSPLTD))]
> > > + (vec_duplicate:VSX_D
> > > +  (vec_select:
> > > +   (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
> > > +   (parallel [(match_operand:QI 2 "const_0_to_1_operand"
> > > "i")]]
> > >"VECTOR_MEM_VSX_P (mode)"
> > 
> > Noting that
> > (define_mode_iterator VSX_D [V2DF V2DI])
> > (define_mode_attr VS_scalar [(V1TI  "TI")
> >  (V2DF  "DF")
> >  (V2DI  "DI")
> >  (V4SF  "SF")
> >  (V4SI  "SI")
> >  (V8HI  "HI")
> >  (V16QI "QI")])
> 
> Yeah, the comment
> ;; Map the scalar mode for a vector type
> is misleading, in more ways than one :-(
> 
> And the whole thing is just the same as VEC_base anyway, so it is
> much
> better to just use that.
> 
> 
> Segher



Re: [PATCH 3/3] Adjust MMA tests to account for no store vector pair.

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:56 -0400, Michael Meissner wrote:
> [PATCH 3/3] Adjust MMA tests to account for no store vector pair.
> 
> In changing the default for generating the store vector pair instructions,
> I had to adjust several of the MMA tests to remove checking for these
> instructions.  Mostly I just deleted the scan-assembler lines checking for
> stxvp.  In two of the tests, I added the -mstore-vector-pair option since
> the point of the test was to check for specific cases with store vector
> pair instructions.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
> 1)Little endian power10 using the --with-cpu=power10 option.
> 
> 2)Little endian power9 using the --with-cpu=power9 option.
> 
> 3)Big endian power8 using the --with-cpu=power8 option.  On this 
> system,
>   both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs.  Can I check this patch into the
> trunk?  If there are no changes needed for the backports, can I check this
> code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/mma-builtin-1.c: Eliminate checking for store
>   vector pair instructions.
>   * gcc.target/powerpc/mma-builtin-10-pair.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-10-quit.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-2.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-3.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-4.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-5.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-6.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-7.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-9.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-8.c: Add -mstore-vector-pair.
>   * gcc.target/powerpc/pr102976.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c   | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-10-pair.c | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-10-quad.c | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c   | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c   | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c   | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c   | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c   | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-7.c   | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-8.c   | 2 +-
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c   | 2 --
>  gcc/testsuite/gcc.target/powerpc/pr102976.c| 6 +-
>  12 files changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c 
> b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> index 69ee826e1be..47b45b00403 100644
> --- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> @@ -260,7 +260,6 @@ foo13b (__vector_quad *dst, __vector_quad *src, vec_t 
> *vec)
> 
>  /* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
>  /* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
> -/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
>  /* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
>  /* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
>  /* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */




This all seems straightforward.   LGTM, thanks. 
-Will




Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:55 -0400, Michael Meissner wrote:
> [PATCH 1/3] Disable generating store vector pair.
> 
> Testing has revealed that the power10 has some slowdowns if the store
> vector pair instruction is generated in some cases.  This patch disables
> generating the store vector pair instructions (stxvp, pstxvp, and stxvpx)
> unless an undocumented switch is used.  It is anticipated that perhaps
> with future machines we can generate the store vector pair instruction.
> 
> This patch does a split after reload to convert a store vector pair
> instruction into a pair of store vector instructions.
> 
> We do continue to generate the load vector pair instructions (lxvp, plxvp,
> and lxvpx), since we have found that in code that heavily uses MMA, it is
> still a win to generate the load vector pair instructions.
> 
> There are two future patches planed:
> 
> 1)Disable block moves from generating load/store vector pair
>   instructions unless the the store vector pair instructions are
>   being generted.
> 
> 2)Make the built-in functions for generating store vector pair
>   always generate those instructions even if store vector pair
>   instructions are disabled.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
> 1)Little endian power10 using the --with-cpu=power10 option.
> 
> 2)Little endian power9 using the --with-cpu=power9 option.
> 
> 3)Big endian power8 using the --with-cpu=power8 option.  On this 
> system,
>   both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs except for the tests that are
> modified in patch #3 in these series of patches.  Can I check this patch
> into the trunk?  If there are no changes needed for the backports, can I
> check this code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/mma.md (movoo): Disable generating store vector
>   pair instructions unless these are enabled by the user.
>   (movxo): Likewise.
>   * config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): If store
>   vector pair instructions are disabled, do not allow vector pair
>   addresses to be indexed.
>   (rs6000_split_multireg_move): Do not split XOmode stores into two
>   store vector pair instructions unless store vector pair
>   instructions are enabled.
>   * config/rs6000/rs6000.md (isa attribute): Add stxvp attribute.
>   (enabled attribute): Disable alternative using store vector pair
>   instructions unless they are enabled.
>   * config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/p10-store-vector-pair-1.c: New test.
>   * gcc.target/powerpc/p10-store-vector-pair-2.c: New test.
> ---
>  gcc/config/rs6000/mma.md  | 41 ++
>  gcc/config/rs6000/rs6000.cc   |  9 +-
>  gcc/config/rs6000/rs6000.md   |  8 +-
>  gcc/config/rs6000/rs6000.opt  |  4 +
>  .../powerpc/p10-store-vector-pair-1.c | 82 +++
>  .../powerpc/p10-store-vector-pair-2.c | 81 ++
>  6 files changed, 206 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index a183b6a168a..9b5f243b88d 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -274,26 +274,35 @@ (define_expand "movoo"
>DONE;
>  })
> 
> +;; By default for power10, do not generate the stxvp/pstxvp/stxvpx
> +;; instructions.  Instead, split these instructions into two separate store
> +;; vector instructions.  We do always generate a lxvp/plxvp/lxvpx 
> instruction.
> +;; We leave in the support for generating stxvp/pstxvp/stxvpx in future
> +;; machines.

... and if (undocumented) STORE_VECTOR_PAIR option is indicated ?

Nothing else jumps out at me.  

Thanks
-Will




Re: [PATCH 2/3] Disable generating load/store vector pairs for block copies.

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:55 -0400, Michael Meissner wrote:
> [PATCH 2/3] Disable generating load/store vector pairs for block copies.
> 
> If the store vector pair instruction is disabled, do not generate block
> copies that use load and store vector pair instructions.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
> 1)Little endian power10 using the --with-cpu=power10 option.
> 
> 2)Little endian power9 using the --with-cpu=power9 option.
> 
> 3)Big endian power8 using the --with-cpu=power8 option.  On this 
> system,
>   both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs.  Can I check this patch into the
> trunk?  If there are no changes needed for the backports, can I check this
> code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/rs6000-string.cc (expand_block_move): If the store
>   vector pair instructions are disabled, do not generate block
>   copies using load and store vector pairs.
> ---
>  gcc/config/rs6000/rs6000-string.cc | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index 59d901ac68d..1b18e043269 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -2787,14 +2787,16 @@ expand_block_move (rtx operands[], bool might_overlap)
>rtx src, dest;
>bool move_with_length = false;
> 
> -  /* Use OOmode for paired vsx load/store.  Use V2DI for single
> -  unaligned vsx load/store, for consistency with what other
> -  expansions (compare) already do, and so we can use lxvd2x on
> -  p8.  Order is VSX pair unaligned, VSX unaligned, Altivec, VSX
> -  with length < 16 (if allowed), then gpr load/store.  */
> +  /* Use OOmode for paired vsx load/store unless the store vector pair
> +  instructions are disabled.  Use V2DI for single unaligned vsx
> +  load/store, for consistency with what other expansions (compare)
> +  already do, and so we can use lxvd2x on p8.  Order is VSX pair
> +  unaligned, VSX unaligned, Altivec, VSX with length < 16 (if allowed),
> +  then gpr load/store.  */
> 
>if (TARGET_MMA && TARGET_BLOCK_OPS_UNALIGNED_VSX
> && TARGET_BLOCK_OPS_VECTOR_PAIR
> +   && TARGET_STORE_VECTOR_PAIR
> && bytes >= 32
> && (align >= 256 || !STRICT_ALIGNMENT))


Seems straightforward.  LGTM, 
Thanks
-Will




>   {
> -- 
> 2.35.3
> 
> 



Re: [PATCH, V3] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:31 -0400, Michael Meissner wrote:
> Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target
> 99293.
> 
> This is version 3 of the patch.  The original patch was:
> 
> > Date: Mon, 28 Mar 2022 12:26:02 -0400
> > Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract
> > for V2DI/V2DF, PR target 99293.
> > Message-ID: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html
> 
> Version 2 of the patch was:
> 
> > Date: Fri, 13 May 2022 10:49:26 -0400
> > Subject: [PATCH] Optimize vec_splats of constant V2DI/V2DF
> > vec_extract, PR target/99293
> > Message-ID: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594797.html
> 
> The differences between version 2 and version 3 was to clean up the
> description
> of what the patch does, and to make the example test case clear.
> 
> In PR target/99293, it was pointed out that doing:
> 
>   vector long long dest0, dest1, src;
>   /* ... */
>   dest0 = vec_splats (vec_extract (src, 0));
>   dest1 = vec_splats (vec_extract (src, 1));
> 
> would generate slower code.
> 
> It generates the following code on power8:
> 
>   ;; vec_splats (vec_extract (src, 0))
>   xxpermdi 0,34,34,3
>   xxpermdi 34,0,0,0
> 
>   ;; vec_splats (vec_extract (src, 1))
>   xxlor 0,34,34
>   xxpermdi 34,0,0,0
> 
> However on power9 and power10 it generates:
> 
>   ;; vec_splats (vec_extract (src, 0))
>   mfvsld 3,34
>   mtvsrdd 34,9,9
> 
>   ;; vec_splats (vec_extract (src, 1))
>   mfvsrd 9,34
>   mtvsrdd 34,9,9
> 
> This is due to the power9 having the mfvsrld instruction which can
> extract
> either 64-bit element into a GPR.  While there are alternatives for
> both
> vector registers and GPR registers, the register allocator prefers to
> put
> DImode into GPR registers.
> 
> In this case, it is better to have a single combiner pattern that can
> generate
> a single xxpermdi, instead of 2 insnsns (the extract and then the
> concat).
> This is true if the two operations are move from vector register and
> move to
> vector register.  As Segher pointed out in a previous version of the
> patch, the
> combiner already tries doing creating a (vec_duplicate (vec_select
> ...))
> pattern, but we didn't provide one.
> 
> This patch reworks vsx_xxspltd_ for V2DImode and V2DFmode so
> that it now
> uses VEC_DUPLICATE, which the combiner checks for.

Ok.

> 
> I have built Spec 2017 with this patch installed, and the cam4_r
> benchmark
> is the only benchmark that generated different code (3
> mfvsrld/mtvsrdd
> pairs of instructions were replaced with xxpermdi).
> 
> I have built bootstrap versions on the following systems and I have
> run
> the regression tests.  There were no regressions in the runs:
> 
>   Power9 little endian, --with-cpu=power9
>   Power10 little endian, --with-cpu=power10
>   Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit
> tests)

Ok.


> 
> Can I install this into the trunk?  After a burn-in period, can I
> backport
> and install this into GCC 11 and GCC 10 branches?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/
>   PR target/99293
>   * config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove
>   UNSPEC_VSX_XXSPLTD case.
>   * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete.
>   (vsx_xxspltd_): Rewrite to use VEC_DUPLICATE.
> 
> gcc/testsuite:
>   PR target/99293
>   * gcc.target/powerpc/builtins-1.c: Update insn count.
>   * gcc.target/powerpc/pr99293.c: New test.
> ---
>  gcc/config/rs6000/rs6000-p8swap.cc|  1 -
>  gcc/config/rs6000/vsx.md  | 19 +++
>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr99293.c| 51
> +++
>  4 files changed, 62 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 275702fee1b..3160fcbdeca 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -807,7 +807,6 @@ rtx_is_swappable_p (rtx op, unsigned int
> *special)
> case UNSPEC_VUPKLU_V4SF:
>   return 0;
> case UNSPEC_VSPLT_DIRECT:
> -   case UNSPEC_VSX_XXSPLTD:
>   *special = SH_SPLAT;
>   return 1;
> case UNSPEC_REDUC_PLUS:
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 1b75538f42f..a1a1ce95195 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -296,7 +296,6 @@ (define_c_enum "unspec"
> UNSPEC_VSX_XXPERM
> 
> UNSPEC_VSX_XXSPLTW
> -   UNSPEC_VSX_XXSPLTD
> UNSPEC_VSX_DIVSD
> UNSPEC_VSX_DIVUD
> UNSPEC_VSX_DIVSQ

Ok.

> @@ -4673,16 +4672,18 @@ (define_insn "vsx_vsplt_di"
>  ;; V2DF/V2DI splat for use by vec_splat builtin
>  (define_insn "vsx_xxspltd_"
>[(set (match_operand:VSX_D 0 "vsx_register_operand" "

Re: [PATCH,RS6000 2/5] Rework the RS6000_BTM defines.

2022-06-07 Thread will schmidt via Gcc-patches
On Tue, 2022-06-07 at 10:50 +0800, Kewen.Lin wrote:
> Hi Will,


Hi!

> 
> The whole series looks good to me, thanks!

:-)

> IMHO one place can be
> further
refactored, not sure if it's worth to updating together in
> this series, it's ...

Additional comments below.  
I've made note of the comments, and request (ask) that this be
approved, with a pinky promise that I intend to follow up on the
suggestions in my next patch series.


> 
> on 2022/6/7 06:05, will schmidt wrote:
> > [PATCH,RS6000 2/5) Rework the RS6000_BTM defines.
> > 
> > The RS6000_BTM_ definitions are mostly unused after the
> > rs6000
> > builtin code was reworked.  The remaining references can be
> > replaced
> > with the OPTION_MASK_ and MASK_ equivalents.
> > 
> > This patch remvoes the defines:
> > RS6000_BTM_FRES, RS6000_BTM_FRSQRTE, RS6000_BTM_FRSQRTES,
> > RS6000_BTM_POPCNTD, RS6000_BTM_CELL, RS6000_BTM_DFP,
> > RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128, RS6000_BTM_64BIT,
> > RS6000_BTM_POWERPC64, RS6000_BTM_FLOAT128, RS6000_BTM_FLOAT128_HW
> > RS6000_BTM_MMA, RS6000_BTM_P10.
> > 
> > I note that the BTM -> OPTION_MASK mappings are not always 1-to-1.
> > in particular the BTM_FRES and BTM_FRSQRTE values were both mapped
> > to
> > OPTION_MASK_PPC_GFXOPT, while the BTM_FRE and BTM_FRSQRTES both
> > mapped
> > to OPTION_MASK_POPCNTB.  In total I spent quite a bit of time
> > double-checking these since it looked like copy/paste errors.  I
> > split
> > some of these changes out into a subsequent patch to limit the
> > amount
> > of potential confusion in any particular patch.
> > 
> > gcc/
> > * config/rs6000/rs6000-c.cc: Update comments.
> > * config/rs6000/rs6000.cc (RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
> > RS6000_BTM_FRSQRTES, RS6000_BTM_POPCNTD, RS6000_BTM_CELL,
> > RS6000_BTM_64BIT, RS6000_BTM_POWERPC64, RS6000_BTM_DFP,
> > RS6000_BTM_HARD_FLOAT,RS6000_BTM_LDBL128, RS6000_BTM_FLOAT128,
> > RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA, RS6000_BTM_P10):
> > Replace
> > with OPTION_MASK_PPC_GFXOPT, OPTION_MASK_PPC_GFXOPT,
> > OPTION_MASK_POPCNTB, OPTION_MASK_POPCNTD,
> > OPTION_MASK_FPRND, MASK_64BIT, MASK_POWERPC64,
> > OPTION_MASK_DFP, OPTION_MASK_SOFT_FLOAT, OPTION_MASK_MULTIPLE,
> > OPTION_MASK_FLOAT128_KEYWORD, OPTION_MASK_FLOAT128_HW,
> > OPTION_MASK_MMA, OPTION_MASK_POWER10.
> > * config/rs6000/rs6000.h (RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
> > RS6000_BTM_FRSQRTES, RS6000_BTM_POPCNTD, RS6000_BTM_CELL,
> > RS6000_BTM_DFP, RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128,
> > RS6000_BTM_64BIT, RS6000_BTM_POWERPC64, RS6000_BTM_FLOAT128,
> > RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA, RS6000_BTM_P10):
> > Delete.
> > 
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index 9c8cbd7a66e4..4c99afc761ae 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -594,13 +594,13 @@ rs6000_target_modify_macros (bool define_p,
> > HOST_WIDE_INT flags,
> >   via the target attribute/pragma.  */
> >if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
> >  rs6000_define_or_undefine_macro (define_p,
> > "__FLOAT128_HARDWARE__");
> >  
> >/* options from the builtin masks.  */
> > -  /* Note that RS6000_BTM_CELL is enabled only if (rs6000_cpu ==
> > - PROCESSOR_CELL) (e.g. -mcpu=cell).  */
> > -  if ((bu_mask & RS6000_BTM_CELL) != 0)
> > +  /* Note that OPTION_MASK_FPRND is enabled only if
> > + (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell).  */
> > +  if ((bu_mask & OPTION_MASK_FPRND) != 0)
> >  rs6000_define_or_undefine_macro (define_p, "__PPU__");
> >  
> 
> ... here.  In function rs6000_target_modify_macros, bu_mask is used
> by
> two places, the beginning debug outputting and the above
> OPTION_MASK_FPRND
> check.  I wonder if we can get rid of bu_mask and just use sth. like:
> 
> (rs6000_cpu == PROCESSOR_CELL) && (flags & OPTION_MASK_FPRND)
> 

Agreed.

> // the others are using "flags &", it's passed by rs6000_isa_flags,
> // should be the same as just using OPTION_MASK_FPRND.
> 
> If we drop bu_mask in function rs6000_target_modify_macros, function

> rs6000_builtin_mask_calculate will have only one use place in
> function
> rs6000_option_override_internal.  IMHO this function
> rs6000_builtin_mask_calculate also becomes stale after built-in
> function
> rewriting and needs some updates with new bif framework later.

The DEBUG output using the builtin_mask still appeared to have some
potential value, but I can make a point to investigate that further.

I do have in my queue to try to resolve PR 101865, that is the bug with
ARCH_PWR8.  I got into this OPTION_MASK side-quest as part of the
investigation into that bug.   I can make a point to investigate and
clean up the bu_mask usage as part of that series.

Thanks
-Will

> 
> BR,
> Kewen



[PATCH,RS6000 4/5] Replace MASK_ with OPTION_MASK_

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH,RS6000 4/5] Replace MASK_ with OPTION_MASK_

This replaces the MASK_ references with OPTION_MASK_
and removes the now unused defines.

This patch removes the defines for
MASK_ALTIVEC, MASK_CMPB, MASK_CRYPTO, MASK_DFP,
MASK_DIRECT_MOVE, MASK_DLMZB, MASK_EABI, MASK_FLOAT128_KEYWORD,
MASK_FLOAT128_HW, MASK_FPRND, MASK_P8_FUSION, MASK_HARD_FLOAT,
MASK_HTM, MASK_MFCRF, MASK_MMA, MASK_MULHW, MASK_MULTIPLE,
MASK_NO_UPDATE.

gcc/
* config/rs6000/aix71.h (TARGET_DEFAULT): Replace MASK_MFCRF with
OPTION_MASK_MFCRF.
* config/rs6000/darwin.h (TARGET_DEFAULT): Replace MASK_MULTIPLE with
OPTION_MASK_MULTIPLE.
* config/rs6000/darwin64-biarch.h (TARGET_DEFAULT): Same.
* config/rs6000/default.h (TARGET_DEFAULT): Replace MASK_MFCRF with
OPTION_MASK_MFCRF.
* config/rs6000/eabi.h (TARGET_DEFAULT): Replace MASK_EABI with
OPTION_MASK_EABI.
* config/rs6000/eabialtivec.h (TARGET_DEFAULT): Same.
* config/rs6000/linuxaltivec.h (TARGET_DEFAULT): Replace
MASK_ALTIVEC with OPTION_MASK_ALTIVEC.
* config/rs6000/rs6000-cpus.def (MASK_ALTIVEC, MASK_CMPB,
MASK_CRYPTO, MASK_DFP, MASK_DIRECT_MOVE, MASK_DLMZB, MASK_EABI,
MASK_FLOAT128_KEYWORD, MASK_FLOAT128_HW, MASK_FPRND,
MASK_P8_FUSION, MASK_HARD_FLOAT, MASK_HTM, MASK_ISEL, MASK_MFCRF,
MASK_MMA, MASK_MULHW, MASK_MULTIPLE, MASK_NO_UPDATE):
Replace with
OPTION_MASK_ALTIVEC, OPTION_MASK_CMPB, OPTION_MASK_CRYPTO,
OPTION_MASK_DFP, OPTION_MASK_DIRECT_MOVE, OPTION_MASK_DLMZB,
OPTION_MASK_EABI, OPTION_MASK_FLOAT128_KEYWORD,
OPTION_MASK_FLOAT128_HW, OPTION_MASK_FPRND, OPTION_MASK_P8_FUSION,
OPTION_MASK_HARD_FLOAT, OPTION_MASK_HTM, OPTION_MASK_ISEL,
OPTION_MASK_MFCRF, OPTION_MASK_MMA, OPTION_MASK_MULHW,
OPTION_MASK_MULTIPLE, OPTION_MASK_NO_UPDATE.
* config/rs6000/rs6000.cc (rs6000_darwin_file_start): Replace
MASK_MFCRF, MASK_ALTIVEC with OPTION_MASK_MFCRF, OPTION_MASK_ALTIVEC.
* config/rs6000/rs6000.h (TARGET_DEFAULT): Replace MASK_MULTIPLE
with OPTION_MASK_MULTIPLE.
(MASK_ALTIVEC, MASK_CMPB, MASK_CRYPTO, MASK_DFP,
MASK_DIRECT_MOVE, MASK_DLMZB, MASK_EABI, MASK_FLOAT128_KEYWORD,
MASK_FLOAT128_HW, MASK_FPRND, MASK_P8_FUSION, MASK_HARD_FLOAT,
MASK_HTM, MASK_ISEL, MASK_MFCRF, MASK_MMA, MASK_MULHW,
MASK_MULTIPLE, MASK_NO_UPDATE): Delete.
* config/rs6000/vxworks.h (TARGET_DEFAULT): Replace MASK_EABI
with OPTION_MASK_EABI.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 57e07bcc65ee..3f7e6e380ca8 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -135,13 +135,14 @@ do {  
\
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
 
 #undef  TARGET_DEFAULT
 #ifdef RS6000_BI_ARCH
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF | 
MASK_POWERPC64 | MASK_64BIT)
+#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #else
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF)
+#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | OPTION_MASK_MFCRF)
 #endif
 
 #undef  PROCESSOR_DEFAULT
 #define PROCESSOR_DEFAULT PROCESSOR_POWER7
 #undef  PROCESSOR_DEFAULT64
diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index b5cef42610f7..ec02022c6a9f 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_MULTIPLE | MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index 57b0fab084e3..a53e567f8b73 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | MASK_MULTIPLE | MASK_PPC_GFXOPT)
+   | OPTION_MASK_MULTIPLE | MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index 4bf0feef2f8e..f3a81404eff3 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -22,14 +22,16 @@ along with GCC; see the file COPYING3.  If not see
 #incl

[PATCH,RS6000 5/5] Replace MASK_ usage with OPTION_MASK_

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH,RS6000 5/5] Replace MASK_ usage with OPTION_MASK_

This continues the changes of replacing the MASK_ defines
with their OPTION_MASK_ equivalents.

This patch removes the defines for
MASK_P8_VECTOR, MASK_P9_VECTOR, MASK_P9_MISC, MASK_POPCNTB,
MASK_POPCNTD, MASK_PPC_GFXOPT, MASK_PPC_GPOPT, MASK_RECIP_PRECISION,
MASK_SOFT_FLOAT, MASK_VSX, MASK_POWER10, MASK_P10_FUSION.

gcc/
* config/rs6000/aix71.h (MASK_PPC_GPOPT, MASK_PPC_GFXOPT): Replace with
OPTION_MASK_PPC_GPOPT, OPTION_MASK_PPC_GFXOPT.
* config/rs6000/darwin.h (MASK_PPC_GFXOPT): Replace with
OPTION_MASK_PPC_GFXOPT.
* config/rs6000/darwin64-biarch.h (MASK_PPC_GFXOPT): Same.
* config/rs6000/default64.h (MASK_PPC_GPOPT, MASK_PPC_GFXOPT): Replace 
with
OPTION_MASK_PPC_GPOPT, OPTION_MASK_PPC_GFXOPT.
* config/rs6000/rs6000-c.cc: Update comment.
* config/rs6000/rs6000-cpus.def: Update RS6000_CPU macro calls.
* config/rs6000/rs6000.cc (rs6000_darwin_file_start): Replace
MASK_PPC_GPOPT with OPTION_MASK_PPC_GPOPT.
(rs6000_builtin_mask_names): Replace MASK_PPC_GFXOPT, MASK_POPCNTB
with OPTION_MASK_PPC_GFXOPT, OPTION_MASK_POPCNTB.
* config/rs6000/rs6000.h: (MASK_P8_VECTOR, MASK_P9_VECTOR,
MASK_P9_MISC, MASK_POPCNTB, MASK_POPCNTD, MASK_PPC_GFXOPT,
MASK_PPC_GPOPT, MASK_RECIP_PRECISION, MASK_SOFT_FLOAT,
MASK_VSX, MASK_POWER10, MASK_P10_FUSION): Delete.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 3f7e6e380ca8..323d7c884d18 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -135,14 +135,15 @@ do {  
\
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
 
 #undef  TARGET_DEFAULT
 #ifdef RS6000_BI_ARCH
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT \
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
| OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #else
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | OPTION_MASK_MFCRF)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF)
 #endif
 
 #undef  PROCESSOR_DEFAULT
 #define PROCESSOR_DEFAULT PROCESSOR_POWER7
 #undef  PROCESSOR_DEFAULT64
diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index ec02022c6a9f..6a8845eb3bb7 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index a53e567f8b73..6515bcc8bf5a 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | OPTION_MASK_MULTIPLE | MASK_PPC_GFXOPT)
+   | OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index f3a81404eff3..0bec94935e2b 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -28,10 +28,10 @@ along with GCC; see the file COPYING3.  If not see
| MASK_LITTLE_ENDIAN)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower8"
 #else
 #undef TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_PPC_GFXOPT | MASK_PPC_GPOPT \
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT \
| OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower4"
 #endif
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 4c99afc761ae..0d13645040ff 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -382,11 +382,11 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
 
  3. If either of the above two conditions apply except that the
TARGET_DEFAULT macro is defined to equal zero, and
TARGET_POWERPC64 and
a) BYTES_BIG_ENDIAN and the flag to be enabled is either
-  MASK_PPC_GFXOPT or MASK_POWERPC64 (flags for "powerpc64"
+  OPTION_MASK_PPC_GFXOPT or MASK_POWERPC64 (flags for "powerpc64"
   

[PATCH, RS6000 3/5] Rework the RS6000_BTM defines, continued.

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH, RS6000 3/5] Rework the RS6000_BTM defines, continued.

The RS6000_BTM_ definitions are mostly unused after
the rs6000 builtin code was reworked.   This cleans
up the remaining RS6000_BTM_ references by replacing
them with their OPTION_MASK_ equivalents.

This patch removes the defines
RS6000_BTM_MODULO, RS6000_BTM_ALTIVEC, RS6000_BTM_CMPB,
RS6000_BTM_VSX, RS6000_BTM_P8_VECTOR, RS6000_BTM_P9_VECTOR,
RS6000_BTM_P9_MISC, RS6000_BTM_CRYPTO, RS6000_BTM_HTM,
RS6000_BTM_FRE.

gcc/
* config/rs6000/rs6000.cc (RS6000_BTM_ALTIVEC, RS6000_BTM_CMPB,
RS6000_BTM_VSX, RS6000_BTM_FRE, RS6000_BTM_P8_VECTOR,
RS6000_BTM_P9_VECTOR, RS6000_BTM_P9_MISC, RS6000_BTM_MODULO,
RS6000_BTM_CRYPTO, RS6000_BTM_HTM): Replace with OPTION_MASK_ALTIVEC,
OPTION_MASK_CMPB, OPTION_MASK_VSX, OPTION_MASK_POPCNTB,
OPTION_MASK_P8_VECTOR, OPTION_MASK_P9_VECTOR, OPTION_MASK_P9_MISC,
OPTION_MASK_MODULO, OPTION_MASK_CRYPTO, OPTION_MASK_HTM.
* config/rs6000/rs6000.h (RS6000_BTM_MODULO, RS6000_BTM_ALTIVEC,
RS6000_BTM_CMPB, RS6000_BTM_VSX, RS6000_BTM_P8_VECTOR,
RS6000_BTM_P9_VECTOR, RS6000_BTM_P9_MISC, RS6000_BTM_CRYPTO,
RS6000_BTM_HTM, RS6000_BTM_FRE): Remove.

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 253110910bfa..6b7a6db9a445 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3377,27 +3377,27 @@ darwin_rs6000_override_options (void)
bits, and some options are no longer in target_flags.  */
 
 HOST_WIDE_INT
 rs6000_builtin_mask_calculate (void)
 {
-  return (((TARGET_ALTIVEC)? RS6000_BTM_ALTIVEC   : 0)
- | ((TARGET_CMPB)  ? RS6000_BTM_CMPB  : 0)
- | ((TARGET_VSX)   ? RS6000_BTM_VSX   : 0)
- | ((TARGET_FRE)   ? RS6000_BTM_FRE   : 0)
+  return (((TARGET_ALTIVEC)? OPTION_MASK_ALTIVEC: 0)
+ | ((TARGET_CMPB)  ? OPTION_MASK_CMPB   : 0)
+ | ((TARGET_VSX)   ? OPTION_MASK_VSX: 0)
+ | ((TARGET_FRE)   ? OPTION_MASK_POPCNTB: 0)
  | ((TARGET_FRES)  ? OPTION_MASK_PPC_GFXOPT : 0)
  | ((TARGET_FRSQRTE)   ? OPTION_MASK_PPC_GFXOPT : 0)
  | ((TARGET_FRSQRTES)  ? OPTION_MASK_POPCNTB: 0)
  | ((TARGET_POPCNTD)   ? OPTION_MASK_POPCNTD: 0)
  | ((rs6000_cpu == PROCESSOR_CELL) ? OPTION_MASK_FPRND  : 0)
- | ((TARGET_P8_VECTOR) ? RS6000_BTM_P8_VECTOR : 0)
- | ((TARGET_P9_VECTOR) ? RS6000_BTM_P9_VECTOR : 0)
- | ((TARGET_P9_MISC)   ? RS6000_BTM_P9_MISC   : 0)
- | ((TARGET_MODULO)? RS6000_BTM_MODULO: 0)
+ | ((TARGET_P8_VECTOR) ? OPTION_MASK_P8_VECTOR  : 0)
+ | ((TARGET_P9_VECTOR) ? OPTION_MASK_P9_VECTOR  : 0)
+ | ((TARGET_P9_MISC)   ? OPTION_MASK_P9_MISC: 0)
+ | ((TARGET_MODULO)? OPTION_MASK_MODULO : 0)
  | ((TARGET_64BIT) ? MASK_64BIT : 0)
  | ((TARGET_POWERPC64) ? MASK_POWERPC64 : 0)
- | ((TARGET_CRYPTO)? RS6000_BTM_CRYPTO: 0)
- | ((TARGET_HTM)   ? RS6000_BTM_HTM   : 0)
+ | ((TARGET_CRYPTO)? OPTION_MASK_CRYPTO : 0)
+ | ((TARGET_HTM)   ? OPTION_MASK_HTM: 0)
  | ((TARGET_DFP)   ? OPTION_MASK_DFP: 0)
  | ((TARGET_HARD_FLOAT)? OPTION_MASK_SOFT_FLOAT : 0)
  | ((TARGET_LONG_DOUBLE_128
  && TARGET_HARD_FLOAT
  && !TARGET_IEEEQUAD)  ? OPTION_MASK_MULTIPLE   : 0)
@@ -24044,23 +24044,23 @@ static struct rs6000_opt_mask const 
rs6000_opt_masks[] =
 };
 
 /* Builtin mask mapping for printing the flags.  */
 static struct rs6000_opt_mask const rs6000_builtin_mask_names[] =
 {
-  { "altivec",  RS6000_BTM_ALTIVEC,false, false },
-  { "vsx",  RS6000_BTM_VSX,false, false },
-  { "fre",  RS6000_BTM_FRE,false, false },
+  { "altivec",  OPTION_MASK_ALTIVEC,   false, false },
+  { "vsx",  OPTION_MASK_VSX,   false, false },
+  { "fre",  OPTION_MASK_POPCNTB,   false, false },
   { "fres", OPTION_MASK_PPC_GFXOPT, false, false },
   { "frsqrte",  OPTION_MASK_PPC_GFXOPT, false, false },
   { "frsqrtes", OPTION_MASK_POPCNTB,   false, false },
   { "popcntd",  OPTION_MASK_POPCNTD,   false, false },
   { "cell", OPTION_MASK_FPRND, false, false },
-  { "power8-vector",RS6000_BTM_P8_VECTOR,  false, false },
-  { "power9-vector",RS6000_BTM_P9_VECTOR,  false, false },
-  { "power9-misc",  RS6000_BTM_P9_MISC,false, false },
-  { "crypto",  

[PATCH,RS6000 2/5] Rework the RS6000_BTM defines.

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH,RS6000 2/5) Rework the RS6000_BTM defines.

The RS6000_BTM_ definitions are mostly unused after the rs6000
builtin code was reworked.  The remaining references can be replaced
with the OPTION_MASK_ and MASK_ equivalents.

This patch remvoes the defines:
RS6000_BTM_FRES, RS6000_BTM_FRSQRTE, RS6000_BTM_FRSQRTES,
RS6000_BTM_POPCNTD, RS6000_BTM_CELL, RS6000_BTM_DFP,
RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128, RS6000_BTM_64BIT,
RS6000_BTM_POWERPC64, RS6000_BTM_FLOAT128, RS6000_BTM_FLOAT128_HW
RS6000_BTM_MMA, RS6000_BTM_P10.

I note that the BTM -> OPTION_MASK mappings are not always 1-to-1.
in particular the BTM_FRES and BTM_FRSQRTE values were both mapped to
OPTION_MASK_PPC_GFXOPT, while the BTM_FRE and BTM_FRSQRTES both mapped
to OPTION_MASK_POPCNTB.  In total I spent quite a bit of time
double-checking these since it looked like copy/paste errors.  I split
some of these changes out into a subsequent patch to limit the amount
of potential confusion in any particular patch.

gcc/
* config/rs6000/rs6000-c.cc: Update comments.
* config/rs6000/rs6000.cc (RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
RS6000_BTM_FRSQRTES, RS6000_BTM_POPCNTD, RS6000_BTM_CELL,
RS6000_BTM_64BIT, RS6000_BTM_POWERPC64, RS6000_BTM_DFP,
RS6000_BTM_HARD_FLOAT,RS6000_BTM_LDBL128, RS6000_BTM_FLOAT128,
RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA, RS6000_BTM_P10): Replace
with OPTION_MASK_PPC_GFXOPT, OPTION_MASK_PPC_GFXOPT,
OPTION_MASK_POPCNTB, OPTION_MASK_POPCNTD,
OPTION_MASK_FPRND, MASK_64BIT, MASK_POWERPC64,
OPTION_MASK_DFP, OPTION_MASK_SOFT_FLOAT, OPTION_MASK_MULTIPLE,
OPTION_MASK_FLOAT128_KEYWORD, OPTION_MASK_FLOAT128_HW,
OPTION_MASK_MMA, OPTION_MASK_POWER10.
* config/rs6000/rs6000.h (RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
RS6000_BTM_FRSQRTES, RS6000_BTM_POPCNTD, RS6000_BTM_CELL,
RS6000_BTM_DFP, RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128,
RS6000_BTM_64BIT, RS6000_BTM_POWERPC64, RS6000_BTM_FLOAT128,
RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA, RS6000_BTM_P10): Delete.

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 9c8cbd7a66e4..4c99afc761ae 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -594,13 +594,13 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
  via the target attribute/pragma.  */
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
 rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
 
   /* options from the builtin masks.  */
-  /* Note that RS6000_BTM_CELL is enabled only if (rs6000_cpu ==
- PROCESSOR_CELL) (e.g. -mcpu=cell).  */
-  if ((bu_mask & RS6000_BTM_CELL) != 0)
+  /* Note that OPTION_MASK_FPRND is enabled only if
+ (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell).  */
+  if ((bu_mask & OPTION_MASK_FPRND) != 0)
 rs6000_define_or_undefine_macro (define_p, "__PPU__");
 
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
 rs6000_define_or_undefine_macro (define_p, "__MMA__");
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d4defc855d02..253110910bfa 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3381,32 +3381,32 @@ rs6000_builtin_mask_calculate (void)
 {
   return (((TARGET_ALTIVEC)? RS6000_BTM_ALTIVEC   : 0)
  | ((TARGET_CMPB)  ? RS6000_BTM_CMPB  : 0)
  | ((TARGET_VSX)   ? RS6000_BTM_VSX   : 0)
  | ((TARGET_FRE)   ? RS6000_BTM_FRE   : 0)
- | ((TARGET_FRES)  ? RS6000_BTM_FRES  : 0)
- | ((TARGET_FRSQRTE)   ? RS6000_BTM_FRSQRTE   : 0)
- | ((TARGET_FRSQRTES)  ? RS6000_BTM_FRSQRTES  : 0)
- | ((TARGET_POPCNTD)   ? RS6000_BTM_POPCNTD   : 0)
- | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL  : 0)
+ | ((TARGET_FRES)  ? OPTION_MASK_PPC_GFXOPT : 0)
+ | ((TARGET_FRSQRTE)   ? OPTION_MASK_PPC_GFXOPT : 0)
+ | ((TARGET_FRSQRTES)  ? OPTION_MASK_POPCNTB: 0)
+ | ((TARGET_POPCNTD)   ? OPTION_MASK_POPCNTD: 0)
+ | ((rs6000_cpu == PROCESSOR_CELL) ? OPTION_MASK_FPRND  : 0)
  | ((TARGET_P8_VECTOR) ? RS6000_BTM_P8_VECTOR : 0)
  | ((TARGET_P9_VECTOR) ? RS6000_BTM_P9_VECTOR : 0)
  | ((TARGET_P9_MISC)   ? RS6000_BTM_P9_MISC   : 0)
  | ((TARGET_MODULO)? RS6000_BTM_MODULO: 0)
- | ((TARGET_64BIT) ? RS6000_BTM_64BIT : 0)
- | ((TARGET_POWERPC64) ? RS6000_BTM_POWERPC64 : 0)
+ | ((TARGET_64BIT) ? MASK_64BIT : 0)
+ | ((TARGET_POWERPC64) ? MASK_POWERPC64 : 0)
  | ((TARGET_CRYPTO)? RS6000_BTM

[PATCH,RS6000 1/5] Clean-up MASK_ and RS6000_BTM_ definitions.

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH,RS6000 1/5] Clean-up MASK_ and RS6000_BTM_ definitions.

Hi,

This patch removes the defines that are no longer used, and
updates the comment for the set of MASK_ defines.

This patch removes the defines for
MASK_REGNAMES, MASK_PROTOTYPE, RS6000_BTM_ALWAYS, RS6000_BTM_COMMON.

gcc/
* config/rs6000/rs6000.c (RS6000_BTM_COMMON, RS6000_BTM_ALWAYS,
MASK_REGNAMES, OPTION_MASK_REGNAMES, MASK_PROTOTYPE,
OPTION_MASK_PROTOTYPE, MASK_UPDATE, OPTION_MASK_UPDATE): Remove.

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3b8941a86584..2ff17a16e43c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -503,12 +503,13 @@ extern int rs6000_vector_align[];
answers if the arguments are not in the normal range.  */
 #define TARGET_MINMAX  (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT \
 && (TARGET_P9_MINMAX || !flag_trapping_math))
 
 /* In switching from using target_flags to using rs6000_isa_flags, the options
-   machinery creates OPTION_MASK_ instead of MASK_.  For now map
-   OPTION_MASK_ back into MASK_.  */
+   machinery creates OPTION_MASK_ instead of MASK_.  The MASK_
+   options that have not yet been replaced by their OPTION_MASK_
+   equivalents are defined here.  */
 #define MASK_ALTIVEC   OPTION_MASK_ALTIVEC
 #define MASK_CMPB  OPTION_MASK_CMPB
 #define MASK_CRYPTOOPTION_MASK_CRYPTO
 #define MASK_DFP   OPTION_MASK_DFP
 #define MASK_DIRECT_MOVE   OPTION_MASK_DIRECT_MOVE
@@ -534,11 +535,10 @@ extern int rs6000_vector_align[];
 #define MASK_PPC_GFXOPTOPTION_MASK_PPC_GFXOPT
 #define MASK_PPC_GPOPT OPTION_MASK_PPC_GPOPT
 #define MASK_RECIP_PRECISION   OPTION_MASK_RECIP_PRECISION
 #define MASK_SOFT_FLOATOPTION_MASK_SOFT_FLOAT
 #define MASK_STRICT_ALIGN  OPTION_MASK_STRICT_ALIGN
-#define MASK_UPDATEOPTION_MASK_UPDATE
 #define MASK_VSX   OPTION_MASK_VSX
 #define MASK_POWER10   OPTION_MASK_POWER10
 #define MASK_P10_FUSIONOPTION_MASK_P10_FUSION
 
 #ifndef IN_LIBGCC2
@@ -551,18 +551,10 @@ extern int rs6000_vector_align[];
 
 #ifdef TARGET_LITTLE_ENDIAN
 #define MASK_LITTLE_ENDIAN OPTION_MASK_LITTLE_ENDIAN
 #endif
 
-#ifdef TARGET_REGNAMES
-#define MASK_REGNAMES  OPTION_MASK_REGNAMES
-#endif
-
-#ifdef TARGET_PROTOTYPE
-#define MASK_PROTOTYPE OPTION_MASK_PROTOTYPE
-#endif
-
 #ifdef TARGET_MODULO
 #define RS6000_BTM_MODULO  OPTION_MASK_MODULO
 #endif
 
 
@@ -2250,11 +2242,10 @@ extern int frame_pointer_needed;
 
 
 /* Builtin targets.  For now, we reuse the masks for those options that are in
target flags, and pick a random bit for ldbl128, which isn't in
target_flags.  */
-#define RS6000_BTM_ALWAYS  0   /* Always enabled.  */
 #define RS6000_BTM_ALTIVEC MASK_ALTIVEC/* VMX/altivec vectors.  */
 #define RS6000_BTM_CMPBMASK_CMPB   /* ISA 2.05: compare 
bytes.  */
 #define RS6000_BTM_VSX MASK_VSX/* VSX (vector/scalar).  */
 #define RS6000_BTM_P8_VECTOR   MASK_P8_VECTOR  /* ISA 2.07 vector.  */
 #define RS6000_BTM_P9_VECTOR   MASK_P9_VECTOR  /* ISA 3.0 vector.  */
@@ -2275,32 +2266,10 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_FLOAT128MASK_FLOAT128_KEYWORD /* IEEE 128-bit float.  */
 #define RS6000_BTM_FLOAT128_HW MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
 #define RS6000_BTM_MMA MASK_MMA/* ISA 3.1 MMA.  */
 #define RS6000_BTM_P10 MASK_POWER10
 
-#define RS6000_BTM_COMMON  (RS6000_BTM_ALTIVEC \
-| RS6000_BTM_VSX   \
-| RS6000_BTM_P8_VECTOR \
-| RS6000_BTM_P9_VECTOR \
-| RS6000_BTM_P9_MISC   \
-| RS6000_BTM_MODULO\
-| RS6000_BTM_CRYPTO\
-| RS6000_BTM_FRE   \
-| RS6000_BTM_FRES  \
-| RS6000_BTM_FRSQRTE   \
-| RS6000_BTM_FRSQRTES  \
-| RS6000_BTM_HTM   \
-| RS6000_BTM_POPCNTD   \
-| RS6000_BTM_CELL  \
-| RS6000_BTM_DFP   \
-| RS6000_BTM_HARD_FLOAT\
-| RS6000_BTM_LDBL128   \
-   

[PATCH,RS6000 0/5] Clean up MASK_ and RS6000_BTM_ defines

2022-06-06 Thread will schmidt via Gcc-patches
Hi,
  This series cleans up the assorted MASK_, OPTION_MASK_,
and RS6000_BTM_ defines that we have sprinkled through the
rs6000 target code.

The MASK_ entries are currently defined as their OPTION_MASK_
equivalents since their introduction when the rs6000_isa_flags was
added via commit 4d9675496a28ef6184f2a9c3ac5e6e3ea63606c1 .
This series replaces references to the MASK_ entries with their
OPTION_MASK equivalents as much as possible.

The RS6000_BTM_ defines are mostly unused since the built-in rewrites
from late 2021 and early 2022, and the remaining usage is
straightforward to replace with OPTION_MASK_ values.

The OPTION_MASK_ definitions themselves remain.

Due to size and to keep some of these changes clean I have split this
into several parts.

After this series there are a few remaining MASK_ entries
(MASK_POWERPC64, MASK_64BIT and MASK_LITTLE_ENDIAN) which are
conditionally defined, and potentially more invasive to resolve.
Those are deliberately not addressed as part of this series.

This has cleanly regtested (no functional change).  When approved
this series will be committed as a group, though it should be
bisectable.

OK for trunk?

1/5: Remove unused defines and touch up comments.
2/5: Rework RS6000_BTM_foo defines, part 1.
3/5: Rework RS6000_BTM_foo defines, part 2.
4/5: Rework MASK_foo defines, part 1.
5/5. Rework MASK_foo defines, part 2.



Re: [PATCH, rs6000] Clean up the option_mask defines (part 1)

2022-05-26 Thread will schmidt via Gcc-patches
On Thu, 2022-05-26 at 13:31 -0500, Segher Boessenkool wrote:
> > > 



> On Thu, May 26, 2022 at 09:40:18AM -0500, will schmidt wrote:
> > On Thu, 2022-05-26 at 05:47 -0500, Segher Boessenkool wrote:
> 
> > I'll dig a bit more, but would handle that in a separate
> > patch.
> 
> Can you please make a new patch series that just does everything?  This
> is so much easier to handle for everyone, even you yourself :-)

Yes, willdo.  Thanks


-Will


> 
> First some small preparatory patches; then the long *boring* patches
> that are the meat of the matter, but are completely mechanical
> (formatting notwithstanding), so are easy to review; and then some more
> small patches to do final cleanup.
> 
> So each patch will be easy to write, write a commit message for, write a
> changelog for, and easy to review as well.  Long patches are no problem
> at all if they are completely boring!
> 
> 
> Segher



Re: [PATCH, rs6000] Clean up the option_mask defines (part 1)

2022-05-26 Thread will schmidt via Gcc-patches
On Thu, 2022-05-26 at 05:47 -0500, Segher Boessenkool wrote:
> Hi!
> 

Hi, 
Thanks Kewen and Segher for the reviews.  Additional comments below.


> On Thu, May 26, 2022 at 03:01:37PM +0800, Kewen.Lin wrote:
> > on 2022/5/26 14:12, Kewen.Lin via Gcc-patches wrote:
> > > on 2022/5/26 04:25, will schmidt via Gcc-patches wrote:
> > > > We have an assortment of MASK and OPTION_MASK #defines
> > > > throughout
> > > > the rs6000 code, MASK_ALTIVEC and OPTION_MASK_ALTIVEC as an
> > > > example.
> > > > 
> > > > We currently #define the MASK_ entries to their
> > > > OPTION_MASK_
> > > > equivalents so the two names could be used interchangeably.
> > > > 
> > > > The mapping is in place from when we switched from using
> > > > target_flags to rs6000_isa_flags via
> > > > commit 4d9675496a28ef6184f2a9c3ac5e6e3ea63606c1 in 2012.
> > > > 
> > > > This patch converts the references for most of the lingering
> > > > MASK_*
> > > > values to OPTION_MASK_*  and removes the now redundant defines.
> > > 
> > > Nice, thanks for the cleanup!
> 
> +1
> 
> > > I found there are still some masks left:
> > > 
> > > MASK_POWERPC64, MASK_64BIT and MASK_LITTLE_ENDIAN.
> > > 
> > > Is there one part 4 for them?  Or is there some particular reason
> > > not to clean up them?
> > 
> > aha, I see.  Those three are conditional definitions, I agree it's
> > better
> > to leave them alone. :)
> 
> It is much better to untangle this mess, and fix it :-)  But that is
> (potentially) a bigger job, of course, so let's not balloon this
> patch.

Right.  I have looked briefly at those, and was not convinced those
three would be trivial to rework.  In the interest if incremental
progress I didn't address those in this set.   :-)   If anything I'll
address those in a later patch, whether could be part4 but more likely
a different patchset.

> 
> > > > -{ "970", "ppc970", MASK_PPC_GPOPT | MASK_MFCRF |
> > > > MASK_POWERPC64 },
> > > > +{ "970", "ppc970", OPTION_MASK_PPC_GPOPT |
> > > > OPTION_MASK_MFCRF | MASK_POWERPC64 },
> > 
> > Nit: This line is too long.
> 
Yup, I missed that one. :-)

> Yeah, the longer names are a bit annoying in any case.  We'll get
> used
> to it (if those long lines are fixed ;-) )

Agree.  I would not be opposed to somewhat shorter names for these, but
naming is hard, and the long names are existing and sufficient for the
moment.

> 
> > Nit: Some of these BTM lines below exceed 80 characters, a few
> > already existed
> > previously.
> 
> Yes, and it is easily avoidable in this case.  Most of these comments
> have no content at all, and the rest could just be on separate lines.
> 
> But, are those builtin masks still used at all?  Can't we just use
> the
> option masks where they still are?  The builtins do not use them
> anymore :-)

They are still referenced in rs6000_builtin_mask_calculate() function,
which is used to assign a value to rs6000_builtin_mask, which is still
in use.  I had not yet dug deeper there, but agree it appears that is
only used to print the current options, so could probably be safely
eliminated.  I'll dig a bit more, but would handle that in a separate
patch.

Thanks
-Will


> 
> 
> Segher



[PATCH, rs6000] Clean up the option_mask defines (part 3)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines (part 3)

Hi,

Per code review, the MASK_REGNAMES, OPTION_MASK_REGNAMES,
MASK_PROTOTYPE, OPTION_MASK_PROTOTYPE options are not used
elsewhere in the codebase.  Thus it should be safe to remove them.

This includes an update to a nearby comment to hint that most
of the MASK_ options have now been replaced with their
OPTION_MASK_ equivalents.

Regtested OK on power10.  OK for trunk?

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index dcf632c1f1ad..fe77a343d2e1 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -503,12 +503,13 @@ extern int rs6000_vector_align[];
answers if the arguments are not in the normal range.  */
 #define TARGET_MINMAX  (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT \
 && (TARGET_P9_MINMAX || !flag_trapping_math))
 
 /* In switching from using target_flags to using rs6000_isa_flags, the options
-   machinery creates OPTION_MASK_ instead of MASK_.  For now map
-   OPTION_MASK_ back into MASK_.  */
+   machinery creates OPTION_MASK_ instead of MASK_.  The MASK_
+   options that have not yet been replaced by their OPTION_MASK_
+   equivalents are defined here.  */
 #define MASK_STRICT_ALIGN  OPTION_MASK_STRICT_ALIGN
 
 #ifndef IN_LIBGCC2
 #define MASK_POWERPC64 OPTION_MASK_POWERPC64
 #endif
@@ -519,18 +520,10 @@ extern int rs6000_vector_align[];
 
 #ifdef TARGET_LITTLE_ENDIAN
 #define MASK_LITTLE_ENDIAN OPTION_MASK_LITTLE_ENDIAN
 #endif
 
-#ifdef TARGET_REGNAMES
-#define MASK_REGNAMES  OPTION_MASK_REGNAMES
-#endif
-
-#ifdef TARGET_PROTOTYPE
-#define MASK_PROTOTYPE OPTION_MASK_PROTOTYPE
-#endif
-
 #ifdef TARGET_MODULO
 #define RS6000_BTM_MODULO  OPTION_MASK_MODULO
 #endif
 
 



Re: [PATCH, rs6000] Clean up the option_mask defines (part 2)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines (part 2)

Hi,
This patch reworks most of the lingering MASK_*
values to OPTION_MASK_* and removes the now redundant defines.

Regtested OK on power10.  OK for trunk?

gcc/
* rs6000.h (RS6000_BTM_VSX, RS6000_BTM_P8_VECTOR, RS6000_BTM_P9_VECTOR,
RS6000_BTM_P9_MISC, RS6000_BTM_HTM, RS6000_BTM_POPCNTD,
RS6000_BTM_DFP, RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128,
RS6000_BTM_FLOAT128, RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA,
RS6000_BTM_P10): Rework defines to use OPTION_MASK_.
(MASK_DFP, MASK_DIRECT_MOVE, MASK_FLOAT128_KEYWORD,
MASK_FLOAT128_HW, MASK_P8_FUSION, MASK_HARD_FLOAT, MASK_HTM,
MASK_MMA, MASK_MULTIPLE, MASK_NO_UPDATE, MASK_P8_VECTOR,
MASK_P9_VECTOR, MASK_P9_MISC, MASK_POPCNTD, MASK_RECIP_PRECISION,
MASK_SOFT_FLOAT, MASK_UPDATE, MASK_VSX, MASK_POWER10,
MASK_P10_FUSION): Remove unused defines.
* config/rs6000/rs6000-cpus.def (RS6000_CPU): Rework macro calls to
use OPTION_MASK_ defines.
* config/rs6000/darwin.h (TARGET_DEFAULT) Update define to use
OPTION_MASK_MULTIPLE.
* config/rs6000/darwin64-biarch.h (TARGET_DEFAULT): Same.

diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index 86556ccbbf58..6a8845eb3bb7 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index 6a700c61c4c2..6515bcc8bf5a 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
+   | OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index ca78bd8cf89f..4301b1bcb120 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -174,29 +174,31 @@
 
RS6000_CPU (NAME, CPU, FLAGS)
 
where the arguments are the fields of struct rs6000_ptt.  */
 
-RS6000_CPU ("401", PROCESSOR_PPC403, MASK_SOFT_FLOAT)
-RS6000_CPU ("403", PROCESSOR_PPC403, MASK_SOFT_FLOAT | MASK_STRICT_ALIGN)
-RS6000_CPU ("405", PROCESSOR_PPC405, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
-   | OPTION_MASK_DLMZB)
+RS6000_CPU ("401", PROCESSOR_PPC403, OPTION_MASK_SOFT_FLOAT)
+RS6000_CPU ("403", PROCESSOR_PPC403, OPTION_MASK_SOFT_FLOAT | 
MASK_STRICT_ALIGN)
+RS6000_CPU ("405", PROCESSOR_PPC405, OPTION_MASK_SOFT_FLOAT
+   | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
 RS6000_CPU ("405fp", PROCESSOR_PPC405, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("440", PROCESSOR_PPC440, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
+RS6000_CPU ("440", PROCESSOR_PPC440, OPTION_MASK_SOFT_FLOAT | OPTION_MASK_MULHW
| OPTION_MASK_DLMZB)
 RS6000_CPU ("440fp", PROCESSOR_PPC440, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("464", PROCESSOR_PPC440, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
+RS6000_CPU ("464", PROCESSOR_PPC440, OPTION_MASK_SOFT_FLOAT | OPTION_MASK_MULHW
| OPTION_MASK_DLMZB)
 RS6000_CPU ("464fp", PROCESSOR_PPC440, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("476", PROCESSOR_PPC476, MASK_SOFT_FLOAT | OPTION_MASK_PPC_GFXOPT
-   | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND
-   | OPTION_MASK_CMPB | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("476fp", PROCESSOR_PPC476, OPTION_MASK_PPC_GFXOPT
-   | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND
-   | OPTION_MASK_CMPB | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
+RS6000_CPU ("476", PROCESSOR_PPC476,
+   OPTION_MASK_SOFT_FLOAT | OPTION_MASK_PPC_GFXOPT | OPTION_MASK_MFCRF
+   | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND | OPTION_MASK_CMPB
+   | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
+RS6000_CPU ("476fp", PROCESSOR_PPC476,
+   OPTION_MASK_PPC_GFXOPT | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB
+   | OPTION_MASK_FPRND | OPTION_MASK_CMPB | OPTION_MASK_MULHW
+   | OPTION_MASK_DLMZB)
 RS6000_CPU ("505", PROCESSOR_MPCCORE, 0)
-RS6000_CPU ("601", PROCESSOR_PPC601, MASK_MULTIPLE)
+RS6000_C

[PATCH, rs6000] Clean up the option_mask defines (part 1)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines

Hi,

We have an assortment of MASK and OPTION_MASK #defines throughout
the rs6000 code, MASK_ALTIVEC and OPTION_MASK_ALTIVEC as an example.

We currently #define the MASK_ entries to their OPTION_MASK_
equivalents so the two names could be used interchangeably.

The mapping is in place from when we switched from using
target_flags to rs6000_isa_flags via
commit 4d9675496a28ef6184f2a9c3ac5e6e3ea63606c1 in 2012.

This patch converts the references for most of the lingering MASK_*
values to OPTION_MASK_*  and removes the now redundant defines.

I have split this into multiple parts due to size.

Regtested OK on power10.  OK for trunk?

gcc/
* rs6000.h (MASK_ALTIVEC, MASK_CMPB, MASK_CRYPTO
MASK_DLMZB, MASK_EABI, MASK_FPRND, MASK_ISEL
MASK_MFCRF, MASK_MULHW, MASK_POPCNTB, MASK_PPC_GFXOPT
MASK_PPC_GPOPT):  Remove defines.
(RS6000_BTM_ALTIVEC, RS6000_BTM_CMPB, RS6000_BTM_CRYPTO,
RS6000_BTM_FRE, RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
RS6000_BTM_FRSQRTES, RS6000_BTM_CELL) : Redefine using
OPTION_MASK_ instead of MASK_.
* rs6000-cpus.def (RS6000_CPU) Update macro calls to use
OPTION_MASK_ instead of MASK_.
* rs6000.cc (rs6000_darwin_file_start): Update mapping[] table
entries to use OPTION_MASK_PPC_GPOPT, OPTION_MASK_MFCRF,
OPTION_MASK_ALTIVEC instead of their MASK_ variants.
* rs6000-c.cc : Update comment to reference OPTION_MASK_GFXOPT.
* aix71.h (TARGET_DEFAULT): Update define to use OPTION_MASK_
instead of MASK_.
* darwin.h (TARGET_DEFAULT): Same.
* darwin64-biarch.h (TARGET_DEFAULT): Same.
* default64.h (TARGET_DEFAULT): Same.
* eabi.h (TARGET_DEFAULT): Same.
* eabialtivec.h (TARGET_DEFAULT): Same.
* linuxaltivec.h (TARGET_DEFAULT): Same.
* vxworks.h (TARGET_DEFAULT): Same.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 57e07bcc65ee..8c2ec5d36375 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -135,13 +135,15 @@ do {  
\
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
 
 #undef  TARGET_DEFAULT
 #ifdef RS6000_BI_ARCH
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF | 
MASK_POWERPC64 | MASK_64BIT)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #else
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF)
 #endif
 
 #undef  PROCESSOR_DEFAULT
 #define PROCESSOR_DEFAULT PROCESSOR_POWER7
 #undef  PROCESSOR_DEFAULT64
diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index b5cef42610f7..86556ccbbf58 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_MULTIPLE | MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index 57b0fab084e3..6a700c61c4c2 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | MASK_MULTIPLE | MASK_PPC_GFXOPT)
+   | MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index 4bf0feef2f8e..08b58c965d19 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -27,9 +27,10 @@ along with GCC; see the file COPYING3.  If not see
 #define TARGET_DEFAULT (ISA_2_7_MASKS_SERVER | MASK_POWERPC64 | MASK_64BIT | 
MASK_LITTLE_ENDIAN)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower8"
 #else
 #undef TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_PPC_GFXOPT | MASK_PPC_GPOPT | MASK_MFCRF | 
MASK_POWERPC64 | MASK_64BIT)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT \
+   | OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower4"
 #endif
diff --git a/gcc/config/rs6000/eabi.h b/gcc/config/rs6000/eabi.h
index e58283fe5d4e..367de7bc27

Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109.

2022-05-18 Thread will schmidt via Gcc-patches
On Tue, 2022-05-17 at 23:15 -0400, Michael Meissner wrote:
> On Fri, May 13, 2022 at 01:20:30PM -0500, will schmidt wrote:
> > On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote:
> > > 
> > > 



> > > gcc/
> > >   PR target/103109
> > >   * config/rs6000/rs6000.md (su_int32): New code attribute.
> > >   (mul3): Convert from define_expand to
> > >   define_insn_and_split.
> > >   (maddld4): Add generator function.
> > 
> > -(define_insn "*maddld4"
> > +(define_insn "maddld4"
> > 
> > Is the removal of the "*" considering adding generator?  (Thats
> > terminology that I'm not immediately familiar with). 
> 
> Yes.  If you have a pattern:
> 
>   (define_insn "foosi2"
> [(set (match_operand:SI 0 "register_operand" "=r")
>   (foo:SI (match_operand:SI 1 "register_operand" "r")))]
>   ""
>   "foo %0,%1")
> 
> It creates a 'gen_foosi2' function that has 2 arguments, and it makes
> the insn
> listed.
> 
> It then has support for insn recognition and output.
> 
> If the pattern starts with a '*', there is no 'gen_foosi2' function
> created,
> but the insn recognitiion and output are still done.
> 
> In practice, you typically use the '*' names for patterns that are
> used as the
> targets of combination, or separate insns for different machines.
> 
> Here is the verbage from rtl.texi:
> 
> These names serve one of two purposes.  The first is to indicate that
> the
> instruction performs a certain standard job for the RTL-generation
> pass of the compiler, such as a move, an addition, or a conditional
> jump.  The second is to help the target generate certain target-
> specific
> operations, such as when implementing target-specific intrinsic
> functions.
> 
> It is better to prefix target-specific names with the name of the
> target, to avoid any clash with current or future standard names.
> 
> The absence of a name is indicated by writing an empty string
> where the name should go.  Nameless instruction patterns are never
> used for generating RTL code, but they may permit several simpler
> insns
> to be combined later on.
> 
> For the purpose of debugging the compiler, you may also specify a
> name beginning with the @samp{*} character.  Such a name is used only
> for identifying the instruction in RTL dumps; it is equivalent to
> having
> a nameless pattern for all other purposes.  Names beginning with the
> @samp{*} character are not required to be unique.


Thanks for the explanation.  :-)

-Will





[PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines.

2022-05-17 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines.

Hi, 

These defines are no longer used once the rs6000 built-in
reworks were completed.   Would be good to remove them.

There was a reference to RS6000_BTC_SPECIAL in a TODO comment
in rs6000-builtins.def.  That comment remains, but I have updated
the comment to refer to "SPECIAL" processing, instead of having it
refer directly to the RS6000_BTC_SPECIAL macro.

2022-05-17  Will Schmidt  

gcc/
* config/rs6000/rs6000-builtins.def: rephrase
RS6000_BTC_SPECIAL in comment.
* config/rs6000/rs6000.h:  Remove definitions
RS6000_BTC_UNARY, RS6000_BTC_BINARY,
RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY,
RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK,
RS6000_BTC_SPECIAL, RS6000_BTC_PREDICATE, RS6000_BTC_ABS,
RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_MISC,
RS6000_BTC_CONST, RS6000_BTC_PURE, RS6000_BTC_FP,
RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR,
RS6000_BTC_ATTR_MASK, RS6000_BTC_SPR, RS6000_BTC_VOID,
RS6000_BTC_CR, RS6000_BTC_OVERLOADED, RS6000_BTC_GIMPLE,
RS6000_BTC_MISC_MASK, RS6000_BTC_MEM, RS6000_BTC_SAT,
RS6000_BTM_ALWAYS


diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f4a9f24bcc5c..9a63a9eda580 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1423,11 +1423,11 @@
 
   pure vsc __builtin_vsx_ld_elemrev_v16qi (signed long, const void *);
 LD_ELEMREV_V16QI vsx_ld_elemrev_v16qi {ldvec,endian}
 
 ; TODO: There is apparent intent in rs6000-builtin.def to have
-; RS6000_BTC_SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are
+; SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are
 ; no def_builtin calls for any of them.  At some point, we may want to add a
 ; set of built-ins for whichever vector types make sense for these.
 
   pure vsq __builtin_vsx_lxvd2x_v1ti (signed long, const void *);
 LXVD2X_V1TI vsx_load_v1ti {ldvec}
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 523256a5c9d5..90a357ab7932 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2247,58 +2247,10 @@ extern char rs6000_reg_names[][8];  /* register 
names (0 vs. %r0).  */
 /* #define  MACHINE_no_sched_speculative_load */
 
 /* General flags.  */
 extern int frame_pointer_needed;
 
-/* Classification of the builtin functions as to which switches enable the
-   builtin, and what attributes it should have.  We used to use the target
-   flags macros, but we've run out of bits, so we now map the options into new
-   settings used here.  */
-
-/* Builtin operand count.  */
-#define RS6000_BTC_UNARY   0x0001  /* normal unary function.  */
-#define RS6000_BTC_BINARY  0x0002  /* normal binary function.  */
-#define RS6000_BTC_TERNARY 0x0003  /* normal ternary function.  */
-#define RS6000_BTC_QUATERNARY  0x0004  /* normal quaternary
-  function. */
-#define RS6000_BTC_QUINARY 0x0005  /* normal quinary function.  */
-#define RS6000_BTC_SENARY  0x0006  /* normal senary function.  */
-#define RS6000_BTC_OPND_MASK   0x0007  /* Mask to isolate operands. */
-
-/* Builtin attributes.  */
-#define RS6000_BTC_SPECIAL 0x  /* Special function.  */
-#define RS6000_BTC_PREDICATE   0x0008  /* predicate function.  */
-#define RS6000_BTC_ABS 0x0010  /* Altivec/VSX ABS
-  function.  */
-#define RS6000_BTC_DST 0x0020  /* Altivec DST function.  */
-
-#define RS6000_BTC_TYPE_MASK   0x003f  /* Mask to isolate types */
-
-#define RS6000_BTC_MISC0x  /* No special 
attributes.  */
-#define RS6000_BTC_CONST   0x0100  /* Neither uses, nor
-  modifies global state.  */
-#define RS6000_BTC_PURE0x0200  /* reads global
-  state/mem and does
-  not modify global state.  */
-#define RS6000_BTC_FP  0x0400  /* depends on rounding mode.  */
-#define RS6000_BTC_QUAD0x0800  /* Uses a register 
quad.  */
-#define RS6000_BTC_PAIR0x1000  /* Uses a register 
pair.  */
-#define RS6000_BTC_QUADPAIR0x1800  /* Uses a quad and a pair.  */
-#define RS6000_BTC_ATTR_MASK   0x1f00  /* Mask of the attributes.  */
-
-/* Miscellaneous information.  */
-#define RS6000_BTC_SPR 0x0100  /* function references SPRs.  */
-#define RS6000_BTC_VOID0x0200  /* function has no 
return value.  */
-#define RS6000_BTC_CR  0x0400  /* function 

Re: [PATCH] Generate vadduqm and vsubuqm for TImode add/subtract

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 12:19 -0400, Michael Meissner wrote:
> Generate vadduqm and vsubuqm for TImode add/subtract
> 
> If the TImode variable is in an Altivec register instead of a GPR
> register, then generate vadduqm and vsubuqm instead of having to move the
> value to the GPR registers and doing the add and subtract with carry
> instructions.  To do this, we have to delay the splitting of the addition
> and subtraction until after register allocation.

Ok.


> 
> I have built this patch on little endian power10, little endian power9, and 
> big
> endian power8 systems.  There were no regressions.  Can I install this patch 
> to
> the GCC 13 master branch?
> 
> 2022-05-13   Michael Meissner  
> 
> gcc/
>   * config/rs6000/rs6000.md (addti3): Generate vadduqm if we are
>   using the Altivec registers.
>   (subti3): Generate vsubuqm if we using the Altivec registers.
>   (negti3): New insn.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vadduqm-vsubuqm.c: New test.
> ---
>  gcc/config/rs6000/rs6000.md   | 82 ++-
>  .../gcc.target/powerpc/vadduqm-vsubuqm.c  | 22 +
>  2 files changed, 83 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vadduqm-vsubuqm.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 83eacec57ba..f120ca0b48d 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7139,15 +7139,22 @@ (define_expand "feraiseexceptsi"
>  ;;
>  ;; Addti3/subti3 are define_insn_and_splits instead of define_expand, to 
> allow
>  ;; for combine to make things like multiply and add with extend operations.
> +;;
> +;; Also add support in case the 128-bit integer happens to be an Altivec
> +;; register.
> 
>  (define_insn_and_split "addti3"
> -  [(set (match_operand:TI 0 "gpc_reg_operand"   "=&r,r,r")
> - (plus:TI (match_operand:TI 1 "gpc_reg_operand"   "r, 0,r")
> -  (match_operand:TI 2 "reg_or_short_operand"  "rI,r,0")))
> +  [(set (match_operand:TI 0 "gpc_reg_operand"  "=&r, r,r,v")
> + (plus:TI (match_operand:TI 1 "gpc_reg_operand"   "r, 0,r,v")
> +  (match_operand:TI 2 "reg_or_short_operand"  "rI,r,0,v")))

Nit..  I still can't tell of the "r, 0,r,v" should be comma-space, or
comma delimited.

Remainder looks OK.  
thanks
-Will



> (clobber (reg:DI CA_REGNO))]
>"TARGET_64BIT"
> -  "#"
> -  "&& 1"
> +  "@
> +   #
> +   #
> +   #
> +   vadduqm %0,%1,%2"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
>[(pc)]
>  {
>rtx lo0 = gen_lowpart (DImode, operands[0]);
> @@ -7157,27 +7164,27 @@ (define_insn_and_split "addti3"
>rtx hi1 = gen_highpart (DImode, operands[1]);
>rtx hi2 = gen_highpart_mode (DImode, TImode, operands[2]);
> 
> -  if (!reg_or_short_operand (lo2, DImode))
> -lo2 = force_reg (DImode, lo2);
> -  if (!adde_operand (hi2, DImode))
> -hi2 = force_reg (DImode, hi2);
> -
>emit_insn (gen_adddi3_carry (lo0, lo1, lo2));
>emit_insn (gen_adddi3_carry_in (hi0, hi1, hi2));
>DONE;
>  }
> -  [(set_attr "length" "8")
> +  [(set_attr "length" "8,8,8,*")
> +   (set_attr "isa""*,*,*,p8v")
> (set_attr "type"   "add")
> (set_attr "size"   "128")])
> 
>  (define_insn_and_split "subti3"
> -  [(set (match_operand:TI 0 "gpc_reg_operand""=&r,r,r")
> - (minus:TI (match_operand:TI 1 "reg_or_short_operand" "rI,0,r")
> -   (match_operand:TI 2 "gpc_reg_operand"  "r, r,0")))
> +  [(set (match_operand:TI 0 "gpc_reg_operand""=&r, r,r,v")
> + (minus:TI (match_operand:TI 1 "reg_or_short_operand"  "rI,0,r,v")
> +   (match_operand:TI 2 "gpc_reg_operand"   "r, r,0,v")))
> (clobber (reg:DI CA_REGNO))]
>"TARGET_64BIT"
> -  "#"
> -  "&& 1"
> +  "@
> +   #
> +   #
> +   #
> +   vsubuqm %0,%1,%2"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
>[(pc)]
>  {
>rtx lo0 = gen_lowpart (DImode, operands[0]);
> @@ -7187,16 +7194,49 @@ (define_insn_and_split "subti3"
>rtx hi1 = gen_highpart_mode (DImode, TImode, operands[1]);
>rtx hi2 = gen_highpart (DImode, operands[2]);
> 
> -  if (!reg_or_short_operand (lo1, DImode))
> -lo1 = force_reg (DImode, lo1);
> -  if (!adde_operand (hi1, DImode))
> -hi1 = force_reg (DImode, hi1);
> -
>emit_insn (gen_subfdi3_carry (lo0, lo2, lo1));
>emit_insn (gen_subfdi3_carry_in (hi0, hi2, hi1));
>DONE;
> +}
> +  [(set_attr "length" "8,8,8,*")
> +   (set_attr "isa""*,*,*,p8v")
> +   (set_attr "type"   "add")
> +   (set_attr "size"   "128")])
> +
> +;; 128-bit integer negation, normally use GPRs.  If we are using Altivec
> +;; registers, create a 0 and do a vsubuqm.
> +(define_insn_and_split "negti3"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=&r,&v")
> + (neg:TI (match_operand:TI 1 "gpc_reg_operand"   "r,v")))
> +   (clobber (reg:DI CA_REGNO))]
> +  "TARGET_64BIT"
> +  "#"
> 

Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109.

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote:
> Optimize multiply/add of DImode extended to TImode, PR target/103109.
> 
> On power9 and power10 systems, we have instructions that support doing
> 64-bit integers converted to 128-bit integers and producing 128-bit
> results.  This patch adds support to generate these instructions.
> 
> Previously GCC had define_expands to handle conversion of the 64-bit
> extend to 128-bit and multiply.  This patch changes these define_expands
> to define_insn_and_split and then it provides combiner patterns to
> generate thes multiply/add instructions.
> 
> To support using this optimization on power9, this patch extends the sign
> extend DImode to TImode to also run on power9 (added for PR
> target/104698).
> 
> This patch needs the previous patch to add unsigned DImode to TImode
> conversion so that the combiner can combine the extend, multiply, and add
> instructions.
> 
> I have built this patch on little endian power10, little endian power9, and 
> big
> endian power8 systems.  There were no regressions when I ran it.  Can I 
> install
> this patch into the GCC 13 master branch?
> 
> 2022-05-13   Michael Meissner  
> 
> gcc/
>   PR target/103109
>   * config/rs6000/rs6000.md (su_int32): New code attribute.
>   (mul3): Convert from define_expand to
>   define_insn_and_split.
>   (maddld4): Add generator function.

-(define_insn "*maddld4"
+(define_insn "maddld4"

Is the removal of the "*" considering adding generator?  (Thats
terminology that I'm not immediately familiar with). 




>   (mulditi3_adddi3): New insn.
>   (mulditi3_add_const): New insn.
>   (mulditi3_adddi3_upper): New insn.
> 
> gcc/testsuite/
>   PR target/103109
>   * gcc.target/powerpc/pr103109.c: New test.


ok


> ---
>  gcc/config/rs6000/rs6000.md | 128 +++-
>  gcc/testsuite/gcc.target/powerpc/pr103109.c |  62 ++
>  2 files changed, 184 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103109.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 2aba70393d8..83eacec57ba 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -667,6 +667,9 @@ (define_code_attr uns [(fix   "")
>  (float   "")
>  (unsigned_float  "uns")])
> 
> +(define_code_attr su_int32 [(sign_extend "s32bit_cint_operand")
> + (zero_extend "c32bit_cint_operand")])
> +
>  ; Various instructions that come in SI and DI forms.
>  ; A generic w/d attribute, for things like cmpw/cmpd.
>  (define_mode_attr wd [(QI"b")
> @@ -3190,13 +3193,16 @@ (define_insn "mulsi3_highpart_64"
>"mulhw %0,%1,%2"
>[(set_attr "type" "mul")])
> 
> -(define_expand "mul3"
> -  [(set (match_operand: 0 "gpc_reg_operand")
> +(define_insn_and_split "mul3"
> +  [(set (match_operand: 0 "gpc_reg_operand" "=&r")
>   (mult: (any_extend:
> - (match_operand:GPR 1 "gpc_reg_operand"))
> +(match_operand:GPR 1 "gpc_reg_operand" "r"))
> (any_extend:
> - (match_operand:GPR 2 "gpc_reg_operand"]
> +(match_operand:GPR 2 "gpc_reg_operand" "r"]
>"!(mode == SImode && TARGET_POWERPC64)"
> +  "#"
> +  "&& 1"
> +  [(pc)]
>  {
>rtx l = gen_reg_rtx (mode);
>rtx h = gen_reg_rtx (mode);
> @@ -3205,9 +3211,10 @@ (define_expand "mul3"
>emit_move_insn (gen_lowpart (mode, operands[0]), l);
>emit_move_insn (gen_highpart (mode, operands[0]), h);
>DONE;
> -})
> +}
> +  [(set_attr "length" "8")])
> 

ok


> -(define_insn "*maddld4"
> +(define_insn "maddld4"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>   (plus:GPR (mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
>   (match_operand:GPR 2 "gpc_reg_operand" "r"))

ok

> @@ -3216,6 +3223,115 @@ (define_insn "*maddld4"
>"maddld %0,%1,%2,%3"
>[(set_attr "type" "mul")])
> 
> +(define_insn_and_split "*mulditi3_adddi3"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=&r")
> + (plus:TI
> +  (mult:TI
> +   (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
> +   (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
> +  (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"]
> +  "TARGET_MADDLD && TARGET_POWERPC64"
> +  "#"
> +  "&& 1"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx dest_hi = gen_highpart (DImode, dest);
> +  rtx dest_lo = gen_lowpart (DImode, dest);
> +  rtx op1 = operands[1];
> +  rtx op2 = operands[2];
> +  rtx op3 = operands[3];
> +  rtx tmp_hi, tmp_lo;
> +
> +  if (can_create_pseudo_p ())
> +{
> +  tmp_hi = gen_reg_rtx (DImode);
> +  tmp_lo = gen_reg_rtx (DImode);
> +}
> +  else
> +{
> +  tmp_hi = dest_hi;
> +  tmp_lo = dest_lo;
> +}
> +
> +  emit_insn (gen_mulditi3_adddi3_upper (tmp_hi, op1, o

Re: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation.

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 12:13 -0400, Michael Meissner wrote:
> Add zero_extendditi2.  Improve lxvr*x code generation.
> 


Content here matches what I commented on in the prior email with
subject "Delay splitting addti3...".  






> This pattern adds zero_extendditi2 so that if we are extending DImode
> that
> is in a GPR register to TImode in a vector register, the compiler can
> generate MTVSRDDD.
> 
> In addition the patterns for generating lxvr{b,h,w,d}x were tuned to
> allow
> loading to gpr registers.  This prevents needlessly doing direct
> moves to
> get the value into the vector registers if the gpr register was
> already
> selected.
> 
> In updating the insn counts for two tests due to these changes, I
> noticed
> the tests were done at -O0.  I changed this so that the tests are now
> done
> at the normal -O2 optimization level.
> 
> This patch will be needed for an upcoming patch for PR target/103109.
> 
> I have built this patch on little endian power10, little endian
> power9,
> and big endian power8 systems.  There were no regressions with this
> patch.  Can I install this on the GCC 13 trunk?
> 
> 2022-05-013   Michael Meissner  
> 
> gcc/
>   * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading
> to
>   GPR registers.
>   (vsx_stxvrx): Add support for storing from GPR registers.
>   (zero_extendditi2): New insn.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2
>   instead of -O0 and update insn counts.
>   * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
>   * gcc.target/powerpc/zero-extend-di-ti.c: New test.
> ---
>  gcc/config/rs6000/vsx.md  | 82
> +--
>  .../powerpc/vsx-load-element-extend-int.c | 36 
>  .../powerpc/vsx-load-element-extend-short.c   | 35 
>  .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++
>  4 files changed, 164 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-
> ti.c
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index c091e5e2f47..ad971e3a1de 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_"
>  }
>  })
> 
> -;; Load rightmost element from load_data
> -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx.
> -(define_insn "vsx_lxvrx"
> -  [(set (match_operand:TI 0 "vsx_register_operand" "=wa")
> - (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand"
> "Z")))]
> -  "TARGET_POWER10"
> -  "lxvrx %x0,%y1"
> -  [(set_attr "type" "vecload")])
> +;; Load rightmost element from load_data using lxvrbx, lxvrhx,
> lxvrwx, lxvrdx.
> +;; Support TImode being in a GPR register to prevent generating
> lvxr{d,w,b}x
> +;; and then two direct moves if we ultimately need the value in a
> GPR register.
> +(define_insn_and_split "vsx_lxvrx"
> +  [(set (match_operand:TI 0 "register_operand" "=r,wa")
> + (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand"
> "m,Z")))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "@
> +   #
> +   lxvrx %x0,%y1"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
> +  [(set (match_dup 2) (match_dup 3))
> +   (set (match_dup 4) (const_int 0))]
> +{
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +
> +  operands[2] = gen_lowpart (DImode, op0);
> +  operands[3] = (mode == DImode
> +  ? op1
> +  : gen_rtx_ZERO_EXTEND (DImode, op1));
> +
> +  operands[4] = gen_highpart (DImode, op0);
> +}
> +  [(set_attr "type" "load,vecload")
> +   (set_attr "num_insns" "2,*")])
> 
>  ;; Store rightmost element into store_data
>  ;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di"
>DONE;
>  })
> 
> +;; Zero extend DI to TI.  If we don't have the MTVSRDD instruction
> (and LXVRDX
> +;; in the case of power10), we use the machine independent code.  If
> we are
> +;; loading up GPRs, we fall back to the old code.
> +(define_insn_and_split "zero_extendditi2"
> +  [(set (match_operand:TI 0
> "register_operand" "=r,r, wa,&wa")
> + (zero_extend:TI (match_operand:DI 1
> "register_operand"  "r,wa,r,  wa")))]
> +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> +  "@
> +   #
> +   #
> +   mtvsrdd %x0,0,%1
> +   #"
> +  "&& reload_completed
> +   && (int_reg_operand (operands[0], TImode)
> +   || vsx_register_operand (operands[1], DImode))"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  int dest_regno = reg_or_subregno (dest);
> +
> +  /* Handle conversion to GPR registers.  Load up the low part and
> then do
> + zero out the upper part.  */
> +  if (INT_REGNO_P (dest_regno))
> +{
> +  rtx dest_hi = gen_highpart (DImode, dest);
> +  rtx dest_lo = gen_lowpart (DImode, dest);
> +
> +  emit_move_insn (dest_lo, src);
> +  emit_move_insn (dest_hi, const0_rtx);
> +  DONE;
> +}
> +
> +  /

Re: [PATCH] Delay splitting addti3/subti3 until first split pass.

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 11:08 -0400, Michael Meissner wrote:
> Add zero_extendditi2.  Improve lxvr*x code generation.
> 

Hi,


> Subject: Re: [PATCH] Delay splitting addti3/subti3 until first split
pass.

Subject does not seem to match contents?





> This pattern adds zero_extendditi2 so that if we are extending DImode that
> is in a GPR register to TImode in a vector register, the compiler can
> generate MTVSRDDD.

Just "mtvsrdd".   


> 
> In addition the patterns for generating lxvr{b,h,w,d}x were tuned to allow
> loading to gpr registers.  This prevents needlessly doing direct moves to
> get the value into the vector registers if the gpr register was already
> selected.
> 
> In updating the insn counts for two tests due to these changes, I noticed
> the tests were done at -O0.  I changed this so that the tests are now done
> at the normal -O2 optimization level.
> 
s/normal/default/ ?


> This patch will be needed for an upcoming patch for PR target/103109.
> 
ok

> I have built this patch on little endian power10, little endian power9,
> and big endian power8 systems.  There were no regressions with this
> patch.  Can I install this on the GCC 13 trunk?
> 
> 2022-05-013   Michael Meissner  
> 
> gcc/
>   * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading to
>   GPR registers.
>   (vsx_stxvrx): Add support for storing from GPR registers.

swap the froms and tos ?


>   (zero_extendditi2): New insn.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2
>   instead of -O0 and update insn counts.
>   * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
>   * gcc.target/powerpc/zero-extend-di-ti.c: New test.
> ---
>  gcc/config/rs6000/vsx.md  | 82 +--
>  .../powerpc/vsx-load-element-extend-int.c | 36 
>  .../powerpc/vsx-load-element-extend-short.c   | 35 
>  .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++
>  4 files changed, 164 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index c091e5e2f47..ad971e3a1de 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_"
>  }
>  })
> 
> -;; Load rightmost element from load_data
> -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx.
> -(define_insn "vsx_lxvrx"
> -  [(set (match_operand:TI 0 "vsx_register_operand" "=wa")
> - (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "Z")))]
> -  "TARGET_POWER10"
> -  "lxvrx %x0,%y1"
> -  [(set_attr "type" "vecload")])
> +;; Load rightmost element from load_data using lxvrbx, lxvrhx, lxvrwx, 
> lxvrdx.
> +;; Support TImode being in a GPR register to prevent generating lvxr{d,w,b}x
> +;; and then two direct moves if we ultimately need the value in a GPR 
> register.

Perhaps break into two sentences and split the description of what is
prevented in a separate sentence.  ?


> +(define_insn_and_split "vsx_lxvrx"
> +  [(set (match_operand:TI 0 "register_operand" "=r,wa")
> + (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "m,Z")))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "@
> +   #
> +   lxvrx %x0,%y1"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
> +  [(set (match_dup 2) (match_dup 3))
> +   (set (match_dup 4) (const_int 0))]
> +{
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +
> +  operands[2] = gen_lowpart (DImode, op0);
> +  operands[3] = (mode == DImode
> +  ? op1
> +  : gen_rtx_ZERO_EXTEND (DImode, op1));
> +
> +  operands[4] = gen_highpart (DImode, op0);
> +}
> +  [(set_attr "type" "load,vecload")
> +   (set_attr "num_insns" "2,*")])
> 
>  ;; Store rightmost element into store_data
>  ;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di"
>DONE;
>  })
> 
> +;; Zero extend DI to TI.  If we don't have the MTVSRDD instruction (and 
> LXVRDX
> +;; in the case of power10), we use the machine independent code.  If we are
> +;; loading up GPRs, we fall back to the old code.

Will 'old code' have meaning to future readers of this lump of code?


> +(define_insn_and_split "zero_extendditi2"
> +  [(set (match_operand:TI 0 "register_operand" "=r,r, 
> wa,&wa")
> + (zero_extend:TI (match_operand:DI 1 "register_operand"  "r,wa,r,  
> wa")))]
> +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> +  "@
> +   #
> +   #
> +   mtvsrdd %x0,0,%1
> +   #"
> +  "&& reload_completed
> +   && (int_reg_operand (operands[0], TImode)
> +   || vsx_register_operand (operands[1], DImode))"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  int dest_regno = reg_or_subregno (dest);
> +
> +  /* Handle conversion to GPR registers.  Load up the low part and then do
> + zero out the upper part.  */


s/do//

> +  if (IN

Re: [PATCH] Replace UNSPEC with RTL code for extendditi2.

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 10:52 -0400, Michael Meissner wrote:
> Replace UNSPEC with RTL code for extendditi2.
> 

Hi,


> When I submitted my patch on March 12th for extendditi2, Segher
> wished I
> had removed the use of the UNSPEC for the vextsd2q instruction.  This
> patch rewrites extendditi2_vector to use VEC_SELECT rather than
> UNSPEC.


I'd suggest a paragraph break between the two sentences.   


> 
> 2022-05-13   Michael Meissner  
> 
> gcc/
>   * config/rs6000/vsx.md (UNSPEC_EXTENDDITI2): Delete.

>   (extendditi2_vector): Rewrite to use VEC_SELECT as a
>   define_expand.

>   (extendditi2_vector2): New insn.


Ok, so per my interpretation of the patch below, it converts the
define_insn extendditi2_vector into a define_expand, and creates a new
extendditi2_vector2 instruction.  


Content below seems reasonable, I've not reviewed it extensively.  

Thanks
-Will

> ---
>  gcc/config/rs6000/vsx.md | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index a1a1ce95195..c091e5e2f47 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -358,7 +358,6 @@ (define_c_enum "unspec"
> UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX
> UNSPEC_XXGENPCV
> UNSPEC_MTVSBM
> -   UNSPEC_EXTENDDITI2
> UNSPEC_VCNTMB
> UNSPEC_VEXPAND
> UNSPEC_VEXTRACT
> @@ -5083,10 +5082,25 @@ (define_insn_and_split "extendditi2"
> (set_attr "type" "shift,load,vecmove,vecperm,load")])
> 
>  ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in
> TI reg
> -(define_insn "extendditi2_vector"
> +(define_expand "extendditi2_vector"
> +  [(use (match_operand:TI 0 "gpc_reg_operand"))
> +   (use (match_operand:TI 1 "gpc_reg_operand"))]
> +  "TARGET_POWER10"
> +{
> +  rtx dest = operands[0];
> +  rtx src_v2di = gen_lowpart (V2DImode, operands[1]);
> +  rtx element = GEN_INT (VECTOR_ELEMENT_SCALAR_64BIT);
> +
> +  emit_insn (gen_extendditi2_vector2 (dest, src_v2di, element));
> +  DONE;
> +})
> +
> +(define_insn "extendditi2_vector2"
>[(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> - (unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
> -  UNSPEC_EXTENDDITI2))]
> + (sign_extend:TI
> +  (vec_select:DI
> +   (match_operand:V2DI 1 "gpc_reg_operand" "v")
> +   (parallel [(match_operand 2 "vsx_scalar_64bit" "wD")]]
>"TARGET_POWER10"
>"vextsd2q %0,%1"
>[(set_attr "type" "vecexts")])
> -- 
> 2.35.3
> 
> 



Re: [PATCH] Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 10:49 -0400, Michael Meissner wrote:
> Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293.
> 
> This patch has been previously posted, but it seemed to get lost.:
> 
> > Date: Tue, 29 Mar 2022 23:25:31 -0400
> > Subject: [PATCH, V2] Optimize vec_splats of constant vec_extract for 
> > V2DI/V2DF, PR target 99293.
> > Message-ID: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592509.html
> 
> I had originally posted a previous version of this patch here.  There were
> changes asked for, which I did in this patch.
> 
> > Date: Mon, 28 Mar 2022 12:26:02 -0400
> > Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for 
> > V2DI/V2DF, PR target 99293.
> > Message-ID: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html
> 


Hi, 
generally lgtm.  A few typos and comment suggests called out below. 
Thanks.
-Will



> In PR target/99293, it was pointed out that doing:
> 
>   vector long long dest0, dest1, src;
>   /* ... */
>   dest0 = vec_splats (vec_extract (src, 0));
>   dest1 = vec_splats (vec_extract (src, 1));
> 
> would generate slower code.
> 
> It generates the following code on power8:
> 
>   ;; vec_splats (vec_extract (src, 0))
>   xxpermdi 0,34,34,3
>   xxpermdi 34,0,0,0
> 
>   ;; vec_splats (vec_extract (src, 1))
>   xxlor 0,34,34
>   xxpermdi 34,0,0,0
> 
> However on power9 and power10 it generates:
> 
>   ;; vec_splats (vec_extract (src, 0))
>   mfvsld 3,34
>   mtvsrdd 34,9,9
> 
>   ;; vec_splats (vec_extract (src, 1))
>   mfvsrd 9,34
>   mtvsrdd 34,9,9
> 
> This is due to the power9 having the mfvsrld instruction which can extract
> either 64-bit element into a GPR.  While there are alternatives for both
> vector registers and GPR registers, the register allocator prefers to put
> DImode into GPR registers.
> 
> However in this case, it is better to have a single combiner pattern that
> can generate a single xxpermdi, instead of doing 2 insnsns (the extract

I like the idea of insnsns being the plural of insn. 

> and then the concat).  This is particularly true if the two operations are
> move from vector register and move to vector register.  As Segher pointed
> out in a previous version of the patch, the combiner already tries doing

s/doing//

> creating a (vec_duplicate (vec_select ...)) pattern, but we didn't provide
> one.



> 
> This patch reworks vsx_xxspltd_ for V2DImode and V2DFmode so that it
> no longer uses an UNSPEC.  Instead it uses VEC_DUPLICATE, which the
> combiner checks for.

potentially
s/no longer uses an UNSPEC.  Instead it uses/now uses/
and possibly 
s/ch
ecks for/can find/

> 





> I have built Spec 2017 with this patch installed, and the cam4_r benchmark
> is the only benchmark that generated different code (3 mfvsrld/mtvsrdd
> pairs of instructions were replaced with xxpermdi).
> 
> I have built bootstrap versions on the following systems and I have run
> the regression tests.  There were no regressions in the runs:
> 
>   Power9 little endian, --with-cpu=power9
>   Power10 little endian, --with-cpu=power10
>   Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests)
> 
> Can I install this into the trunk?  After a burn-in period, can I backport
> and install this into GCC 12, GCC 11 and GCC 10 branches?
> 
> 2022-05-13   Michael Meissner  
> 
> gcc/
>   PR target/99293
>   * config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove
>   UNSPEC_VSX_XXSPLTD case.
>   * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete.
>   (vsx_xxspltd_): Rewrite to use VEC_DUPLICATE.
> 
> gcc/testsuite:
>   PR target/99293
>   * gcc.target/powerpc/builtins-1.c: Update insn count.
>   * gcc.target/powerpc/pr99293.c: New test.
> ---
>  gcc/config/rs6000/rs6000-p8swap.cc|  1 -
>  gcc/config/rs6000/vsx.md  | 19 +-
>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr99293.c| 36 +++
>  4 files changed, 47 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index d301bc3fe59..1973d9c8245 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -805,7 +805,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
> case UNSPEC_VUPKLU_V4SF:
>   return 0;
> case UNSPEC_VSPLT_DIRECT:
> -   case UNSPEC_VSX_XXSPLTD:
>   *special = SH_SPLAT;
>   return 1;
> case UNSPEC_REDUC_PLUS:

ok

> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 1b75538f42f..a1a1ce95195 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -296,7 +296,6 @@ (define_c_enum "unspec"
> UNSPEC_VSX_XXPERM
> 
> UNSPEC_VSX_XXSPLTW
> -   UNSPEC_VSX_XXSPL

Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-04-20 Thread will schmidt via Gcc-patches
On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059
> 
> This is V4 of the patch.  Compared to V3 of the patch, GCC will just
> ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.
> 


Hi, 
No comments on code, a few comments about the comments below.



> The splitting of signed halfword and word loads into unsigned load and
> sign extension is now suppressed with -Os, but it is done normally if we
> are not optimizing for space.

I see references to TARGET_P8_FUSION_SIGN in the patch below, and some
removal of old code.  I assume this describes the implementation that
remains.  

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.

specification.  :-)

> 
> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't 
> do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.

inlining. 


> 
> This code makes the -mpower8-fusion option a nop.  It is accepted without
> warning, but it does nothing.  Power8 fusion is only enabled if we are tuning
> for a power8.
> 
> The undocumented -mpower8-fusion-sign option is also made into a nop.
> 
> I left in the pragma target and attribute target support for power8-fusion, 
> but
> using it doesn't do anything now.  This is because I told the customer who
> encountered this problem that one solution was to add an explicit
> no-power8-fusion option in their target pragma or attribute to work around the
> problem.
> 
> I have tested this patch on a little endian power10 system.  I have tested
> previous versions on little endian power9 and big endian power8 systems.
> Can I apply this patch to the master branch?
> 
> If it is accepted, I will produce a similar patch for back porting to GCC 11
> and GCC 10.
> 
> 2022-04-12   Michael Meissner  
> 
> gcc/
>   PR target/102059
>   * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
>   (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
>   (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Delete code that set the power8 fusion options automatically.
>   (rs6000_opt_masks): Allow #pragma target and attribute target
>   power8-fusion option for backwards compatibility.
>   (rs6000_print_options_internal): Skip printing backward
>   compatibility options that are just ignored.
>   * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
>   (TARGET_P8_FUSION_SIGN): Likewise.
>   (MASK_P8_FUSION): Delete.
>   * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
>   ignore it completely.
>   (-mpower8-fusion-sign): Likewise.
>   * doc/invoke.texi (RS/6000 and PowerPC Options): Delete
>   -mpower8-fusion.
> 
> gcc/testsuite/
>   PR target/102059
>   * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
>   * gcc.dg/lto/pr102059-2_0.c: Likewise.
>   * gcc.target/powerpc/pr102059-3.c: Likewise.
>   * gcc.target/powerpc/pr102059-4.c: New test.
> ---
>  gcc/config/rs6000/rs6000-cpus.def | 18 +++
>  gcc/config/rs6000/rs6000.cc   | 49 +--
>  gcc/config/rs6000/rs6000.h| 13 -
>  gcc/config/rs6000/rs6000.opt  |  8 +--
>  gcc/doc/invoke.texi   | 13 +
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c   |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +
>  9 files changed, 62 insertions(+), 68 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..d913a3d6b73 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -54,19 +54,14 @@
>| OPTION_MASK_QUAD_MEMORY  \
>| OPTION_MASK_QUAD_MEMORY_ATOMIC)
> 
> -/* ISA masks setting fusion options.  */
> -#define OTHER_FUSION_MASKS   (OPTION_MASK_P8_FUSION  \
> -  | OPTION_MASK_P8_FUSION_SIGN)
> -
>  /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not 

Re: [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-08 Thread will schmidt via Gcc-patches
On Mon, 2022-02-28 at 11:17 +0800, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
>   This patch corrects the match pattern in pr56605.c. The former pattern
> is wrong and test case fails with GCC11. It should match following insn on
> each subtarget after mode promotion is disabled. The patch need to be
> backported to GCC11.
> 

Hi,

I note This patch appears to (partially?) address the P1 [11 regression] pr.  
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146


The issue makes reference to a different proposed patch 
in issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103197
titled  ppc inline expansion of memcpy/memmove should not use lxsibzx/stxsibx 
for a single byte
proposed patch named
rs6000: Disparage lfiwzx and similar

I can't address any of the background or history there.  :-)


> //gimple
> _17 = (unsigned int) _20;
>  prolog_loop_niters.4_23 = _17 & 3;
> 
> //rtl
> (insn 19 18 20 2 (parallel [
> (set (reg:CC 208)
> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0)
> (const_int 3 [0x3]))
> (const_int 0 [0])))
> (set (reg:SI 129 [ prolog_loop_niters.5 ])
> (and:SI (subreg:SI (reg:DI 207) 0)
> (const_int 3 [0x3])))
> ]) 197 {*andsi3_imm_mask_dot2}
> 
> 
>   Bootstrapped and tested on powerpc64-linux BE/LE and AIX with no 
> regressions.
> Is this okay for trunk and GCC11? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-02-28 Haochen Gui 
> 
> gcc/testsuite/
>   PR target/102146
>   * gcc.target/powerpc/pr56605.c: Correct match pattern in combine pass.
> 
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> index fdedbfc573d..231d808aa99 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia)
>  ia[i] = (int) sb[i];
>  }
> 
> -/* { dg-final { scan-rtl-dump-times {\(compare:CC 
> \((?:and|zero_extend):(?:DI) \((?:sub)?reg:[SD]I} 1 "combine" } } */
> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI 
> \(reg:DI} 1 "combine" } } */


SO with the update, (i squint so this is an approximate handwave) this
drops the zero_extend and changes the destination type to be DI for the
scan-rtl.This appears to match the rtl as mentioned in the patch
comments.


> 



Re: [PATCH] rs6000/test: Adjust p9-vec-length-7 sensitive to unroll [PR103196]

2022-04-07 Thread will schmidt via Gcc-patches
On Mon, 2022-02-28 at 13:37 +0800, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR103196 shows, p9-vec-length-full-7.c needs to be adjusted as the
> complete unrolling can happen on some of its loops.  This patch is to
> use pragma "GCC unroll 0" to disable all possible loop unrollings.
> Hope it can help the case not that fragile.

ok

Is the lack of effectiveness of "-fno-unroll-loops" otherwise
understood, or is there further issue behind that option? 

I would
expect the effect of the option, versus the pragma, two to roughly
equivalent.   Obviously it is not.  :-)
> 
> There are some other p9-vec-length* cases, I noticed that some of them
> use either bigger or unknown loop iteration counts, and
> "p9-vec-length-3*" have considered the effects of complete unrolling.
> So I just leave them alone for now.
> 
> Tested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
>   PR testsuite/103196
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/p9-vec-length-7.h: Add DO_PRAGMA macro.
>   * gcc.target/powerpc/p9-vec-length-epil-7.c: Use unroll pragma to
>   disable any unrollings.
>   * gcc.target/powerpc/p9-vec-length-full-7.c: Remove useless option.
>   * gcc.target/powerpc/p9-vec-length.h: Likewise.

I suggest a slight rearrangement and correction.

The -fno-unroll-loops options are removed from *-epil-7.c and *-full-7.c.

p9-vec-length.h  adds the DO_PRAGMA macro.

p9-vec-length-7.h updates (corrects?) whitespace and adds the PRAGMA call for 
"GCC unroll 0" around the test loop. 




> > ---
> >  .../gcc.target/powerpc/p9-vec-length-7.h| 17 +++--
> >  .../gcc.target/powerpc/p9-vec-length-epil-7.c   |  2 +-
> >  .../gcc.target/powerpc/p9-vec-length-full-7.c   |  2 +-
> >  .../gcc.target/powerpc/p9-vec-length.h  |  2 ++
> >  4 files changed, 15 insertions(+), 8 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h 
> > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h
> > index 4ef8f974a04..4f338565619 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h
> > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h
> > @@ -7,14 +7,19 @@
> >  #define START 1
> >  #define END 59
> >  
> > +/* Note that we use pragma unroll to disable any loop unrollings.  */
> > +
> >  #define test(TYPE) 
> > \
> > -  TYPE x_##TYPE[N] __attribute__((aligned(16)));   
> >  \
> > -  void __attribute__((noinline, noclone)) test_npeel_##TYPE() {
> > \
> > +  TYPE x_##TYPE[N] __attribute__ ((aligned (16))); 
> > \
> > +  void __attribute__ ((noinline, noclone)) test_npeel_##TYPE ()
> > \
> > +  {
> > \
> >  TYPE v = 0;
> > \
> > -for (unsigned int i = START; i < END; i++) {   
> > \
> > -  x_##TYPE[i] = v; 
> > \
> > -  v += 1;  
> > \
> > -}  
> > \
> > +DO_PRAGMA (GCC unroll 0)   
> > \
> > +for (unsigned int i = START; i < END; i++) 
> > \
> > +  {
> > \
> > +   x_##TYPE[i] = v;   \
> > +   v += 1;\
> > +  }
> > \
> >}


Some whitespace fix-ups (ok), and the addition of
the "DO_PRAGMA (GCC unroll 0)".

ok.


> >  
> >  TEST_ALL (test)
> > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c 
> > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> > index a27ee347ca1..859fedd5679 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile { target { lp64 && powerpc_p9vector_ok } } } */
> > -/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize 
> > -fno-vect-cost-model -fno-unroll-loops -ffast-math" } */
> > +/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize 
> > -fno-vect-cost-model -ffast-math" } */

ok

> >  
> >  /* { dg-additional-options "--param=vect-partial-vector-usage=1" } */
> >  
> > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c 
> > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> > index 89ff38443e7..5fe542bba20 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.

Re: [PATCH] Disable float128 tests on VxWorks, PR target/104253.

2022-04-07 Thread will schmidt via Gcc-patches
On Thu, 2022-04-07 at 06:00 -0500, Segher Boessenkool wrote:
> On Thu, Apr 07, 2022 at 12:29:45AM -0400, Michael Meissner wrote:
> > In PR target/104253, it was pointed out the that test case added as part
> > of fixing the PR does not work on VxWorks because float128 is not
> > supported on that system.  I have modified the three tests for float128 so
> > that they are manually excluded on VxWorks systems.  In looking at the
> > code, I also added checks in check_effective_target_ppc_ieee128_ok to
> > disable the systems that will never support VSX instructions which are
> > required for float128 support (eabi, eabispe, darwin).
> 
> It's just one extra to the big list here, but, why do we need all these
> manual exclusions anyway?  What is broken about the test itself?



>From the PR, it looks like this test noted an error, not actually a
failure.  

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253#c17

cc1: warning: The '-mfloat128' option may not be fully supported


which comes out of gcc/config/rs6000/rs6000.cc 
rs6000_option_override_internal() via 

  /* IEEE 128-bit floating point requires VSX support.  */
  if (TARGET_FLOAT128_KEYWORD)
{
  if (!TARGET_VSX)
{

}
  else if (!TARGET_FLOAT128_TYPE)
{
  TARGET_FLOAT128_TYPE = 1;
  warning (0, "The %<-mfloat128%> option may not be fully
supported");
}
}


> 
> It would be so much more useful if the tests would help us, instead of
> producing a lot of extra busy-work.





> 
> 
> Segher



Re: [PATCH v2] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-04-07 Thread will schmidt via Gcc-patches
On Thu, 2022-04-07 at 17:29 +0800, Kewen.Lin wrote:
> Hi,
> 
> As PR103353 shows, we may want to continue to expand a MMA built-in
> function like a normal function, even if we have already emitted
> error messages about some missing required conditions.  As shown in
> that PR, without one explicit mov optab on OOmode provided, it would
> call emit_move_insn recursively.
> 
> So this patch is to allow the mov pattern to be generated when we are
> expanding to RTL and have seen errors even without MMA supported, it's
> expected that the generated pattern would not cause further ICEs as the
> compilation would stop soon after expanding.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591150.html
> 
> v2: Polish some comments and add one test case as Will and Peter suggested.

Thanks.

> 
> Is it ok for trunk or upcoming stage1?
> 
> BR,
> Kewen
> --
> 
>   PR target/103353
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition
>   check to preparation statements and add handlings for !TARGET_MMA.
>   (define_expand movxo): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr103353.c: New test.
> ---
>  gcc/config/rs6000/mma.md| 42 ++---
>  gcc/testsuite/gcc.target/powerpc/pr103353.c | 22 +++
>  2 files changed, 58 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103353.c
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 907c9d6d516..746a77a0957 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -268,10 +268,25 @@ (define_int_attr avvi4i4i4  
> [(UNSPEC_MMA_PMXVI8GER4PP   "pmxvi8ger4pp")
>  (define_expand "movoo"
>[(set (match_operand:OO 0 "nonimmediate_operand")
>   (match_operand:OO 1 "input_operand"))]
> -  "TARGET_MMA"
> +  ""
>  {
> -  rs6000_emit_move (operands[0], operands[1], OOmode);
> -  DONE;
> +  if (TARGET_MMA) {
> +rs6000_emit_move (operands[0], operands[1], OOmode);
> +DONE;
> +  }
> +  /* Opaque modes are only expected to be available when MMA is supported,
> + but PR103353 shows we may want to continue to expand a MMA built-in
> + function, even if we have already emitted error messages about some
> + missing required conditions.  As shown in that PR, without one
> + explicit mov optab on OOmode provided, it would call emit_move_insn
> + recursively.  So we allow this pattern to be generated when we are
> + expanding to RTL and have seen errors, even though there is no MMA
> + support.  It would not cause further ICEs as the compilation would
> + stop soon after expanding.  */
> +  else if (currently_expanding_to_rtl && seen_error ())
> +;
> +  else
> +gcc_unreachable ();
>  })

ok

> 
>  (define_insn_and_split "*movoo"
> @@ -300,10 +315,25 @@ (define_insn_and_split "*movoo"
>  (define_expand "movxo"
>[(set (match_operand:XO 0 "nonimmediate_operand")
>   (match_operand:XO 1 "input_operand"))]
> -  "TARGET_MMA"
> +  ""
>  {
> -  rs6000_emit_move (operands[0], operands[1], XOmode);
> -  DONE;
> +  if (TARGET_MMA) {
> +rs6000_emit_move (operands[0], operands[1], XOmode);
> +DONE;
> +  }
> +  /* Opaque modes are only expected to be available when MMA is supported,
> + but PR103353 shows we may want to continue to expand a MMA built-in
> + function, even if we have already emitted error messages about some
> + missing required conditions.  As shown in that PR, without one
> + explicit mov optab on XOmode provided, it would call emit_move_insn
> + recursively.  So we allow this pattern to be generated when we are
> + expanding to RTL and have seen errors, even though there is no MMA
> + support.  It would not cause further ICEs as the compilation would
> + stop soon after expanding.  */
> +  else if (currently_expanding_to_rtl && seen_error ())
> +;
> +  else
> +gcc_unreachable ();
>  })

ok


> 
>  (define_insn_and_split "*movxo"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103353.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103353.c
> new file mode 100644
> index 000..6b0bedbb958
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103353.c
> @@ -0,0 +1,22 @@
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* If the default cpu type is power10 or later, MMA is enabled by default.
> +   To keep the test point available all the time, this case specifies
> +   -mdejagnu-cpu=power6 to make it be tested without MMA.  */
> +/* { dg-options "-maltivec -mdejagnu-cpu=power6" } */
> +
> +/* Verify there is no ICE and don't check the error messages on MMA
> +   requirement since they could be fragile and are not test points
> +   of this case.  */
> +
> +void
> +foo (__vector_pair *dst, double *x)
> +{
> +  dst[0] = __builtin_vsx_

Re: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation.

2022-04-06 Thread will schmidt via Gcc-patches
On Wed, 2022-04-06 at 14:21 -0400, Michael Meissner wrote:
> From bf51c49f1481001c7b3223474d261dcbf9365eda Mon Sep 17 00:00:00 2001
> From: Michael Meissner 
> Date: Fri, 1 Apr 2022 22:27:13 -0400
> Subject: [PATCH] Add zero_extendditi2.  Improve lxvr*x code generation.
> 

Hi,

> This pattern adds zero_extendditi2 so that if we are extending DImode to
> TImode, and we want the result in a vector register, the compiler can
> generate MTVSRDDD.
> 
> In addition the patterns for generating lxvr{b,h,w,d}x were tuned to allow
> loading to gpr registers.  This prevents needlessly doing direct moves to
> get the value into the vector registers if the gpr register was already
> selected.

ok

> 
> In updating the insn counts for two tests due to these changes, I noticed
> the tests were done at -O0.  I changed this so that the tests are now done
> at the normal -O2 optimization level.

Per the comments (which you fixed up later in patch), I note they were
deliberately done at -O0 since under higher optimizations gcc would
generate other load instructions during those tests.  Presumably with
these changes that is no longer the case.  :-)
> 
> I have tested this patch with bootstrap builds and running the regression
> testsuite using this patch on:
> 
>   Little endian power10, --with-cpu=power10
>   Little endian power9, --with-cpu=power9
>   Big endian power8, --with-cpu=power8 (both 64/32-bit tests done).
> 
> There were no regressions.  Can I check this into the master branch?
> 
> 2022-04-06   Michael Meissner  
> 
> gcc/
>   * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading to
>   GPR registers.
>   (vsx_stxvrx): Add support for storing from GPR registers.
>   (zero_extendditi2): New insn.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2
>   instead of -O0 and update insn counts.
>   * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
>   * gcc.target/powerpc/zero-extend-di-ti.c: New test.
> 
> ---
>  gcc/config/rs6000/vsx.md  | 82 +--
>  .../powerpc/vsx-load-element-extend-int.c | 36 
>  .../powerpc/vsx-load-element-extend-short.c   | 35 
>  .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++
>  4 files changed, 164 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index c091e5e2f47..ad971e3a1de 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_"
>  }
>  })
> 
> -;; Load rightmost element from load_data
> -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx.
> -(define_insn "vsx_lxvrx"
> -  [(set (match_operand:TI 0 "vsx_register_operand" "=wa")
> - (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "Z")))]
> -  "TARGET_POWER10"
> -  "lxvrx %x0,%y1"
> -  [(set_attr "type" "vecload")])
> +;; Load rightmost element from load_data using lxvrbx, lxvrhx, lxvrwx, 
> lxvrdx.
> +;; Support TImode being in a GPR register to prevent generating lvxr{d,w,b}x
> +;; and then two direct moves if we ultimately need the value in a GPR 
> register.
> +(define_insn_and_split "vsx_lxvrx"
> +  [(set (match_operand:TI 0 "register_operand" "=r,wa")
> + (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "m,Z")))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "@
> +   #
> +   lxvrx %x0,%y1"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
> +  [(set (match_dup 2) (match_dup 3))
> +   (set (match_dup 4) (const_int 0))]
> +{
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +
> +  operands[2] = gen_lowpart (DImode, op0);
> +  operands[3] = (mode == DImode
> +  ? op1
> +  : gen_rtx_ZERO_EXTEND (DImode, op1));
> +
> +  operands[4] = gen_highpart (DImode, op0);
> +}
> +  [(set_attr "type" "load,vecload")
> +   (set_attr "num_insns" "2,*")])
> 
>  ;; Store rightmost element into store_data
>  ;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di"
>DONE;
>  })
> 
> +;; Zero extend DI to TI.  If we don't have the MTVSRDD instruction (and 
> LXVRDX
> +;; in the case of power10), we use the machine independent code.  If we are
> +;; loading up GPRs, we fall back to the old code.

In this context it's not clear what is the "old code" ?
The mtvsrdd
instruction is referenced in this code path.  I see no direct reference
to lxvrdx here, though I suppose it's assumed somewhere behind the
emit_ calls.


> +(define_insn_and_split "zero_extendditi2"
> +  [(set (match_operand:TI 0 "register_operand" "=r,r, 
> wa,&wa")
> + (zero_extend:TI (match_operand:DI 1 "register_operand"  "r,wa,r,  
> wa")))]
> +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> +  "@
> +   #
> +   #
> +   mtvsrdd %x0,0,%1
> +   #"
> +  "&& reload_completed
> +   && (int_reg_op

Re: [PATCH] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-04-01 Thread will schmidt via Gcc-patches
On Thu, 2022-03-03 at 16:38 +0800, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 

Hi

> As PR103353 shows, we may want to continue to expand a MMA built-in
> function like a normal function, even if we have already emitted
> error messages about some missing required conditions.  As shown in
> that PR, without one explicit mov optab on OOmode provided, it would
> call emit_move_insn recursively.
> 
> So this patch is to allow the mov pattern to be generated when we are
> expanding to RTL and have seen errors even without MMA supported, it's
> expected that the generated pattern would not cause further ICEs as the
> compilation would stop soon after expanding.

Is there a testcase, new or existing, that illustrates this error path?

> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> --
> 
>   PR target/103353
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition
>   check to preparation statements and add handlings for !TARGET_MMA.
>   (define_expand movxo): Likewise.

> > ---
> >  gcc/config/rs6000/mma.md | 42 ++--
> >  1 file changed, 36 insertions(+), 6 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> > index 907c9d6d516..f76a87b4a21 100644
> > --- a/gcc/config/rs6000/mma.md
> > +++ b/gcc/config/rs6000/mma.md
> > @@ -268,10 +268,25 @@ (define_int_attr avvi4i4i4
> > [(UNSPEC_MMA_PMXVI8GER4PP   "pmxvi8ger4pp")
> >  (define_expand "movoo"
> >[(set (match_operand:OO 0 "nonimmediate_operand")
> > (match_operand:OO 1 "input_operand"))]
> > -  "TARGET_MMA"
> > +  ""
> >  {
> > -  rs6000_emit_move (operands[0], operands[1], OOmode);
> > -  DONE;
> > +  if (TARGET_MMA) {
> > +rs6000_emit_move (operands[0], operands[1], OOmode);
> > +DONE;
> > +  }
> > +  /* Opaque modes are only expected to be available when MMA is supported,
> > + but PR103353 shows we may want to continue to expand a MMA built-in
> > + function like a normal function, even if we have already emitted
> > + error messages about some missing required conditions.

perhaps drop "like a normal function".  


> > + As shown in that PR, without one explicit mov optab on OOmode 
> > provided,
> > + it would call emit_move_insn recursively.  So we allow this pattern to
> > + be generated when we are expanding to RTL and have seen errors, even
> > + though there is no MMA support.  It would not cause further ICEs as
> > + the compilation would stop soon after expanding.  */

Testcase would be particularly helpful to illustrate this, i think.  

TH
anks,
-Will

> > +  else if (currently_expanding_to_rtl && seen_error ())
> > +;
> > +  else
> > +gcc_unreachable ();
> >  })
> >  
> >  (define_insn_and_split "*movoo"
> > @@ -300,10 +315,25 @@ (define_insn_and_split "*movoo"
> >  (define_expand "movxo"
> >[(set (match_operand:XO 0 "nonimmediate_operand")
> > (match_operand:XO 1 "input_operand"))]
> > -  "TARGET_MMA"
> > +  ""
> >  {
> > -  rs6000_emit_move (operands[0], operands[1], XOmode);
> > -  DONE;
> > +  if (TARGET_MMA) {
> > +rs6000_emit_move (operands[0], operands[1], XOmode);
> > +DONE;
> > +  }
> > +  /* Opaque modes are only expected to be available when MMA is supported,
> > + but PR103353 shows we may want to continue to expand a MMA built-in
> > + function like a normal function, even if we have already emitted
> > + error messages about some missing required conditions.
> > + As shown in that PR, without one explicit mov optab on OOmode 
> > provided,
> > + it would call emit_move_insn recursively.  So we allow this pattern to
> > + be generated when we are expanding to RTL and have seen errors, even
> > + though there is no MMA support.  It would not cause further ICEs as
> > + the compilation would stop soon after expanding.  */
> > +  else if (currently_expanding_to_rtl && seen_error ())
> > +;
> > +  else
> > +gcc_unreachable ();
> >  })
> >  
> >  (define_insn_and_split "*movxo"
> > -- 
> > 2.25.1
> > 



Re: [PATCH 8/8] rs6000: Fix some missing built-in attributes [PR104004]

2022-03-30 Thread will schmidt via Gcc-patches
On Fri, 2022-01-28 at 11:50 -0600, Bill Schmidt via Gcc-patches wrote:
> PR104004 caught some misses on my part in converting to the new
> built-in
> function infrastructure.  In particular, I forgot to mark all of the
> "nosoft"
> built-ins, and one of those should also have been marked "no32bit".
> 
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this okay for trunk?
> 
> Thanks,
> Bill
> 
Hi,

The patch here seems reasonable to me. 
There are comments/subsequent pings that include commentary about
additional test coverage.

I see all of the builtins referenced here appear to be touched by
the existing test  gcc.target/powerpc/test_fpscr_drn_builtin.c .
I could create a variation of that test forcing ! hard_dfp in case that
would help, though i'm uncertain the value there. 

Thanks
-Will

> 
> 2022-01-27  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-builtin.def (MFFSL): Mark nosoft.
>   (MTFSB0): Likewise.
>   (MTFSB1): Likewise.
>   (SET_FPSCR_RN): Likewise.
>   (SET_FPSCR_DRN): Mark nosoft and no32bit.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def
> b/gcc/config/rs6000/rs6000-builtins.def
> index c8f0cf332eb..98619a649e3 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -215,7 +215,7 @@
>  ; processors, this builtin automatically falls back to mffs on older
>  ; platforms.  Thus it appears here in the [always] stanza.
>double __builtin_mffsl ();
> -MFFSL rs6000_mffsl {}
> +MFFSL rs6000_mffsl {nosoft}
> 
>  ; This is redundant with __builtin_pack_ibm128, as it requires long
>  ; double to be __ibm128.  Should probably be deprecated.
> @@ -226,10 +226,10 @@
>  MFTB rs6000_mftb_di {32bit}
> 
>void __builtin_mtfsb0 (const int<0,31>);
> -MTFSB0 rs6000_mtfsb0 {}
> +MTFSB0 rs6000_mtfsb0 {nosoft}
> 
>void __builtin_mtfsb1 (const int<0,31>);
> -MTFSB1 rs6000_mtfsb1 {}
> +MTFSB1 rs6000_mtfsb1 {nosoft}
> 
>void __builtin_mtfsf (const int<0,255>, double);
>  MTFSF rs6000_mtfsf {}
> @@ -238,7 +238,7 @@
>  PACK_IF packif {}
> 
>void __builtin_set_fpscr_rn (const int[0,3]);
> -SET_FPSCR_RN rs6000_set_fpscr_rn {}
> +SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
> 
>const double __builtin_unpack_ibm128 (__ibm128, const int<0,1>);
>  UNPACK_IF unpackif {}
> @@ -2969,7 +2969,7 @@
>  PACK_TD packtd {}
> 
>void __builtin_set_fpscr_drn (const int[0,7]);
> -SET_FPSCR_DRN rs6000_set_fpscr_drn {}
> +SET_FPSCR_DRN rs6000_set_fpscr_drn {nosoft,no32bit}
> 
>const unsigned long long __builtin_unpack_dec128 (_Decimal128, \
>  const int<0,1>);



Re: [PATCH v3, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-21 Thread will schmidt via Gcc-patches
On Mon, 2022-03-21 at 09:51 +0800, HAO CHEN GUI wrote:
> Hi,
>This patch adds V1TI mode into a new mode iterator used in vector
> comparison expands.Without the patch, the comparisons between two vector
> __int128 are converted to scalar comparisons with branches. The code is
> suboptimal.The patch fixes the issue. Now all comparisons between two
> vector __int128 generates P10 new comparison instructions. Also the
> relative built-ins generate the same instructions after gimple folding.
> So they're added back to the list.
> 

Hi,
Thanks for reworking the description, this clears up my uncertainty. 
:-)
A few spots where spaces should be added after periods.  No need to
re-post for just that.  Patch content otherwise seems OK to me, though
I defer to others for any subtleties with actual VEC_IC related
changes, 
Thanks
-Will


>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-03-16 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
>   V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
>   * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector
>   __int128.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 5d34c1bcfc9..fac7f43f438 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPEQUT:
>fold_compare_helper (gsi, EQ_EXPR, stmt);
>return true;
> 
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPNET:
>fold_compare_helper (gsi, NE_EXPR, stmt);
>return true;
> 
> @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>fold_compare_helper (gsi, GE_EXPR, stmt);
>return true;
> 
> @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> -/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
> 
> @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>fold_compare_helper (gsi, LE_EXPR, stmt);
>return true;
> 
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index b87a742cca8..d88869cc8d0 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; Vector int modes for comparison
> +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")])
> +
>  ;; 128-bit int modes
>  (define_mode_iterator VEC_TI [V1TI TI])
> 
> @@ -533,10 +536,10 @@ (define_expand "vcond_mask_"
> 
>  ;; For signed integer vectors comparison.
>  (define_expand "vec_cmp"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 "signed_or_equality_comparison_operator"

Re: [PATCHv2, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-17 Thread will schmidt via Gcc-patches
On Thu, 2022-03-17 at 13:35 +0800, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
>This patch adds V1TI mode into a new mode iterator used in vector
> comparison expands.With the patch, both built-ins and direct
> comparison
> could generate P10 new V1TI comparison instructions.

Hi,


-/* We deliberately omit RS6000_BIF_CMPGE_1TI ...
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */


I assume it is the case, but don't see a before/after example to
clarify the situation.   A clear statement that the 'worse code'
situation has been resolved with this addition of TI modes into the
iterators, would be good.

Otherwise lgtm.  :-)

Thanks,
-Will


> 
>Bootstrapped and tested on ppc64 Linux BE and LE with no
> regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-03-16 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
>   V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
>   * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector
>   __int128.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 5d34c1bcfc9..fac7f43f438 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because
> gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPEQUT:
>fold_compare_helper (gsi, EQ_EXPR, stmt);
>return true;
> 
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because
> gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPNET:
>fold_compare_helper (gsi, NE_EXPR, stmt);
>return true;
> 
> @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and
> RS6000_BIF_CMPGE_U1TI
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>fold_compare_helper (gsi, GE_EXPR, stmt);
>return true;
> 
> @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> -/* We deliberately omit RS6000_BIF_VCMPGTUT and
> RS6000_BIF_VCMPGTST
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
> 
> @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPLE_1TI and
> RS6000_BIF_CMPLE_U1TI
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>fold_compare_helper (gsi, LE_EXPR, stmt);
>return true;
> 
> diff --git a/gcc/config/rs6000/vector.md
> b/gcc/config/rs6000/vector.md
> index b87a742cca8..d88869cc8d0 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; Vector int modes for comparison
> +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI
> "TARGET_POWER10")])
> +
>  ;; 128-bit int modes
>  (define_mode_iterator VEC_TI [V1TI TI])
> 
> @@ -533,10 +536,10 @@ (define_expand "vcond_mask_"
> 
>  ;; For signed integer vectors comparison.
>  (define_expand "vec_cmp"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 "signed_or_equality_comparison_operator"
> -   [(match_operand:VEC_I 2 "vint_operand")
> -(match_operand:VEC_I 3 "vint_operand")]))]
> +  

Re: rs6000: RFC/Update support for addg6s instruction. PR100693

2022-03-16 Thread will schmidt via Gcc-patches
On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote:
> > For PR100693, we currently provide an addg6s builtin using unsigned
> > int arguments, but we are missing an unsigned long long argument
> > equivalent.  This patch adds an overload to provide the long long
> > version of the builtin.
> > 
> > unsigned long long __builtin_addg6s (unsigned long long, unsigned
> > long long);
> > 
> > RFC/concerns: This patch works, but looking briefly at intermediate
> > stages
> > is not behaving quite as I expected.   Looking at the intermediate
> > dumps, I
> > see in pr100693.original that calls I expect to be routed to the
> > internal
> > __builtin_addg6s_si() that uses (unsigned int) arguments are
> > instead being
> > handled by __builtin_addg6s_di() with casts that convert the
> > arguments to
> > (unsigned long long).
> 
> Did you test with actual 32-bit variables, instead of just function
> arguments?  Function arguments are always passed in (sign-extended)
> registers.
> 
> Like,
> 
> unsigned int f(unsigned int *a, unsigned int *b)
> {
>   return __builtin_addg6s(*a, *b);
> }


I perhaps missed that subtlety.  I'll investigate that further.

> 
> > As a test, I see if I swap the order of the builtins in rs6000-
> > overload.def
> > I end up with code casting the ULL values to UI, which provides
> > truncated
> > results, and is similar to what occurs today without this patch.
> > 
> > All that said, this patch seems to work.  OK for next stage 1?
> > Tested on power8BE as well as LE power8,power9,power10.
> 
> Please ask again when stage 1 has started?
> 
> > gcc/
> > PR target/100693
> > * config/rs6000/rs600-builtins.def: Remove entry for
> > __builtin_addgs()
> >   and add entries for __builtin_addg6s_di() and
> > __builtin_addg6s_si().
> 
> Indent of second and further lines should be at the "*", not two
> spaces
> after that.
> 
> > -   UNSPEC_ADDG6S
> > +   UNSPEC_ADDG6S_SI
> > +   UNSPEC_ADDG6S_DI
> 
> You do not need multiple unspec numbers.  You can differentiate them
> based on the modes of the arguments, already :-)
> 
> >  ;; Miscellaneous ISA 2.06 (power7) instructions
> > -(define_insn "addg6s"
> > +(define_insn "addg6s_si"
> >[(set (match_operand:SI 0 "register_operand" "=r")
> > (unspec:SI [(match_operand:SI 1 "register_operand" "r")
> > (match_operand:SI 2 "register_operand" "r")]
> > -  UNSPEC_ADDG6S))]
> > +  UNSPEC_ADDG6S_SI))]
> > +  "TARGET_POPCNTD"
> > +  "addg6s %0,%1,%2"
> > +  [(set_attr "type" "integer")])
> > +
> > +(define_insn "addg6s_di"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (unspec:DI [(match_operand:DI 1 "register_operand" "r")
> > +   (match_operand:DI 2 "register_operand" "r")]
> > +  UNSPEC_ADDG6S_DI))]
> >"TARGET_POPCNTD"
> >"addg6s %0,%1,%2"
> >[(set_attr "type" "integer")])
> 
> (define_insn "addg6s"
>   [(set (match_operand:GPR 0 "register_operand" "=r")
>   (unspec:GPR [(match_operand:GPR 1 "register_operand" "r")
>(match_operand:GPR 2 "register_operand" "r")]
>   UNSPEC_ADDG6S))]
>   "TARGET_POPCNTD"
>   "addg6s %0,%1,%2"
>   [(set_attr "type" "integer")])
> You do not need multiple unspec numbers.  You can differentiate
them
> based on the modes of the arguments, already :-)


Yeah, Thats what I thought, which is a big part of why I posted this
with RFC. :-)When I attempted this there was an issue with multiple
s (behind the GPR predicate) versus the singular "addg6s"
define_insn.  
It's possible I had something else wrong there, but I'll
go back to that attempt and work in that direction.

> 
> We do not want DI (here, and in most places) for -m32!
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile { target { powerpc*-*-linux* } } } */
> 
> Why only on Linux?
> 
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> 
> Why not on Darwin?  And why skip it anyway, given the previous line
> :-)
> 
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
> That is the wrong requirement.  You want to test for Power7, not for
> VSX.  I realise you probably copied this from elsewhere :-(  (If from
> another addg6s testcase, just keep it).

Because reasons. :-)   The stanzas are copied from the nearby bcd-1.c
testcase that has a simpler test for addg6s.Given the input I'll
try to correct the stanzas here and limit how much error I carry along.

Thanks for the feedback and review.   I'll investigate further, and
resubmit at stage1.   

Thanks,
-Will

> 
> 
> Segher



rs6000: RFC/Update support for addg6s instruction. PR100693

2022-03-16 Thread will schmidt via Gcc-patches
Hi,

RFC/Update support for addg6s instruction.  PR100693

For PR100693, we currently provide an addg6s builtin using unsigned
int arguments, but we are missing an unsigned long long argument
equivalent.  This patch adds an overload to provide the long long
version of the builtin.

unsigned long long __builtin_addg6s (unsigned long long, unsigned long long);

RFC/concerns: This patch works, but looking briefly at intermediate stages
is not behaving quite as I expected.   Looking at the intermediate dumps, I
see in pr100693.original that calls I expect to be routed to the internal
__builtin_addg6s_si() that uses (unsigned int) arguments are instead being
handled by __builtin_addg6s_di() with casts that convert the arguments to
(unsigned long long).
i.e.
 return (unsigned int) __builtin_addg6s_di
 ((long long unsigned int) a, (long long unsigned int) b);

As a test, I see if I swap the order of the builtins in rs6000-overload.def
I end up with code casting the ULL values to UI, which provides truncated
results, and is similar to what occurs today without this patch.

All that said, this patch seems to work.  OK for next stage 1?
Tested on power8BE as well as LE power8,power9,power10.

2022-03-15  Will Schmidt  

gcc/
PR target/100693
* config/rs6000/rs600-builtins.def: Remove entry for __builtin_addgs()
  and add entries for __builtin_addg6s_di() and __builtin_addg6s_si().
* config/rs6000/rs6000-overload.def: Add overloaded entries allowing
  __builtin_addg6s() to map to either of the __builtin_addg6s_{di,si}
  builtins.
* config/rs6000/rs6000.md: Add UNSPEC_ADDG6S_SI and UNSPEC_ADDG6S_DI
  unspecs.   Add define_insn entries for addg6s_si and addg6s_di based
  on those unspecs.
* doc/extend.texi:  Add entry for ULL __builtin_addg6s (ULL, ULL);

testsuite/
PR target/100693
* gcc.target/powerpc/pr100693.c:  New test.

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index ae2760c33389..4c23cac26932 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1993,12 +1993,16 @@
 XXSPLTD_V2DI vsx_xxspltd_v2di {}
 
 
 ; Power7 builtins (ISA 2.06).
 [power7]
-  const unsigned int __builtin_addg6s (unsigned int, unsigned int);
-ADDG6S addg6s {}
+  const unsigned long long __builtin_addg6s_di (unsigned long long, \
+   unsigned long long);
+ADDG6S_DI addg6s_di {}
+
+  const unsigned int __builtin_addg6s_si (unsigned int, unsigned int);
+ADDG6S_SI addg6s_si {}
 
   const signed long __builtin_bpermd (signed long, signed long);
 BPERMD bpermd_di {32bit}
 
   const unsigned int __builtin_cbcdtd (unsigned int);
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 44e2945aaa0e..931f85b738c5 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -76,10 +76,15 @@
 ; Blank lines may be used as desired in this file between the lines as
 ; defined above; that is, you can introduce as many extra newlines as you
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[ADDG6S, __builtin_i_addg6s, __builtin_addg6s]
+  unsigned long long __builtin_addg6s_di (signed long long, unsigned long 
long);
+ADDG6S_DI
+  unsigned int __builtin_addg6s_si (unsigned int, unsigned int);
+ADDG6S_SI
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
 BCDADD_V1TI
   vuc __builtin_vec_bcdadd (vuc, vuc, const int);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index fdfbc6566a5c..d040f127eb55 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -122,11 +122,12 @@ (define_c_enum "unspec"
UNSPEC_P8V_MTVSRWZ
UNSPEC_P8V_RELOAD_FROM_GPR
UNSPEC_P8V_MTVSRD
UNSPEC_P8V_XXPERMDI
UNSPEC_P8V_RELOAD_FROM_VSX
-   UNSPEC_ADDG6S
+   UNSPEC_ADDG6S_SI
+   UNSPEC_ADDG6S_DI
UNSPEC_CDTBCD
UNSPEC_CBCDTD
UNSPEC_DIVE
UNSPEC_DIVEU
UNSPEC_UNPACK_128BIT
@@ -14495,15 +14496,24 @@ (define_peephole2
   operands[5] = change_address (mem, mode, new_addr);
 })

 
 ;; Miscellaneous ISA 2.06 (power7) instructions
-(define_insn "addg6s"
+(define_insn "addg6s_si"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:SI 1 "register_operand" "r")
(match_operand:SI 2 "register_operand" "r")]
-  UNSPEC_ADDG6S))]
+  UNSPEC_ADDG6S_SI))]
+  "TARGET_POPCNTD"
+  "addg6s %0,%1,%2"
+  [(set_attr "type" "integer")])
+
+(define_insn "addg6s_di"
+  [(set (match_operand:DI 0 "regi

Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread will schmidt via Gcc-patches
On Thu, 2022-03-10 at 13:49 -0600, Segher Boessenkool wrote:
> On Thu, Mar 10, 2022 at 10:44:52AM -0600, will schmidt wrote:
> > On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> > > --- a/gcc/config/rs6000/rs6000-cpus.def
> > > +++ b/gcc/config/rs6000/rs6000-cpus.def
> > > @@ -43,9 +43,7 @@
> > >| OPTION_MASK_ALTIVEC  
> > > \
> > >| OPTION_MASK_VSX)
> > > 
> > > -/* For now, don't provide an embedded version of ISA 2.07.  Do
> > > not set power8
> > > -   fusion here, instead set it in rs6000.cc if we are tuning for
> > > a power8
> > > -   system.  */
> > > +/* For now, don't provide an embedded version of ISA 2.07.  */
> > 
> > ok.  (as far as removing the comment, I'm not clear what the
> > remaining
> > comment is telling me, but thats outside of the scope of this
> > patch).
> 
> It is saying there is nothing that implements Book III-E of ISA 2.07
> (nothing in GCC, but no actual CPU either).  Or Category: Embedded
> even
> maybe :-)

Lol, Ok.  The small-e in embedded did not clue me in that this was
referring to the big-E Embedded category.  :-)

> It could be clearer perhaps, or just be removed completely; it might
> have been useful historically, but it isn't anymore really.


THanks,
-Will

> 
> 
> Segher



Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread will schmidt via Gcc-patches
On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059

Hi,

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.
> 
specification. 
(ok :-)


> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't 
> do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.
> 

inlining.


> This code removes the -mpower8-fusion and -mpower8-fusion-sign options, and
> only enables power8 fusion if we are tuning for a power8.  Power8 sign fusion
> is only enabled if we are tuning for a power8 and we have -O3 optimization or
> higher.
> 
> I left the options -mno-power8-fusion and -mno-power8-fusion-sign in 
> rs6000.opt
> and they don't issue a warning.  If the user explicitly used -mpower8-fusion 
> or
> -mpower8-fusion-sign, then they will get a warning that the swtich has been
> removed.
> 

switch


> Similarly, I left in the pragma target and attribute target support for the
> fusion options, but they don't do anything now.  This is because I believe the
> customer who encountered this problem now is explicitly setting the
> no-power8-fusion option in the pragma or attribute to avoid the warning.
> 
> I have tested this on the following systems, and they all bootstraps fine and
> there were no regressions in the test suite:
> 
> big endian power8 (both 64-bit and 32-bit)
> little endian power9
> little endian power10
> 

ok.

> Can I check this patch into the current master branch for GCC and after a
> cooling period check in the patch to the GCC 11 and GCC 10 branches.  The
> customer is currently using GCC 10.
> 
> 2022-03-09   Michael Meissner  
> 
> gcc/
>   PR target/102059
>   * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
>   (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
>   (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.

ok

>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Delete code that set the power8 fusion options automatically.
>   (rs6000_opt_masks): Allow #pragma target and attribute target to set
>   power8-fusion and power8-fusion-sign, but these no longer represent
>   options that the user can set.
>   (rs6000_print_options_internal): Skip printing nop options.

ok


>   * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
>   (TARGET_P8_FUSION_SIGN): Likewise.
>   (MASK_P8_FUSION): Delete.

ok


>   * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
>   ignore the no form and warn that the option was removed for the regular
>   form.
>   (-mpower8-fusion-sign): Likewise.

ok

>   * doc/invoke.texi (RS/6000 and PowerPC Options): Delete -mpower8-fusion
>   and -mpower8-fusion-sign.

This change removes the -mpower8-fusion and -mno-power8-fusion options,
There is not a direct reference to -mpower8-fusion-sign in the change
here.  It may be an implied removal, but not immediately obvious to me.


> 
> gcc/testsuite/
>   PR target/102059
>   * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
>   * gcc.dg/lto/pr102059-2_0.c: Likewise.
>   * gcc.target/powerpc/pr102059-3.c: Likewise.
>   * gcc.target/powerpc/pr102059-4.c: New test.

ok

> ---
>  gcc/config/rs6000/rs6000-cpus.def | 22 +++--
>  gcc/config/rs6000/rs6000.cc   | 49 +--
>  gcc/config/rs6000/rs6000.h| 14 +-
>  gcc/config/rs6000/rs6000.opt  | 19 +--
>  gcc/doc/invoke.texi   | 13 +
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c   |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +
>  9 files changed, 75 insertions(+), 71 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..a05b2d8c41a 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -43,9 +43,7 @@
>| OPTION_MASK_ALTIVEC  \
>| OPTION_MASK_VSX)
> 
> -/* For now, don't provide an embedded version of ISA 2.07.  Do n

Re: [PATCH] Optimize signed DImode -> TImode on power10, PR target/104698

2022-03-01 Thread will schmidt via Gcc-patches
On Mon, 2022-02-28 at 22:21 -0500, Michael Meissner wrote:
> Optimize signed DImode -> TImode on power10, PR target/104698.
> 

Hi,
  Logic seems OK to me, a few suggestions on the comments intermixed
below.  As always, i defer if there are counter arguments. :-)


> On power10, GCC tries to optimize the signed conversion from DImode to
> TImode by using the vextsd2q instruction.  However to generate this
> instruction, it would have to generate 3 direct moves (1 from the GPR
> registers to the altivec registers, and 2 from the altivec registers to
> the GPR register).
> 
> This patch adds code back in to use the shift right immediate instruction
> to do the conversion if the target/source is GPR registers.


Perhaps drop "back in".   If it's necessary to call out a previous
commit that removed the code for whatever reason, certainly do so. 
It's not clear from context if that was the case.


> 
> 2022-02-28   Michael Meissner  
> 
> gcc/
>   PR target/104698
>   * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
>   (extendditi2): Replace with code to deal with both GPR registers
>   and with altivec registers.

Perhaps enhance with 
(extendditi2):  Convert from define_expand to
define_insn_and_split.  Replace with code ...


> ---
>  gcc/config/rs6000/vsx.md | 73 
>  1 file changed, 52 insertions(+), 21 deletions(-)
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index b53de103872..62464f67f4d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5023,15 +5023,58 @@ (define_expand "vsignextend_si_v2di"
>DONE;
>  })
> 
> -;; ISA 3.1 vector sign extend
> -;; Move DI value from GPR to TI mode in VSX register, word 1.
> -(define_insn "mtvsrdd_diti_w1"
> -  [(set (match_operand:TI 0 "register_operand" "=wa")
> - (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> -  UNSPEC_MTVSRD_DITI_W1))]
> -  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> -  "mtvsrdd %x0,0,%1"
> -  [(set_attr "type" "vecmove")])
> +;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets.  
> If
> +;; the register allocator prefers the GPRs, we won't have to move the value 
> to
> +;; the altivec registers, do the vextsd2q instruction and move it back.  If 
> we
> +;; aren't compiling for 64-bit power10, don't provide the service and let the
> +;; machine independent code handle the extension.

So, the ".. we won't have to ..." applies to the altivec target path
here?   Describing in a way that indicates what code doesn't do doesn't
seem right. 
If so, and perhaps even if not,  i suggest rearranging the
comment slightly so it can be read as an either or.  
If the register
allocator prefers the GPRS, ... 
Otherwise, for altivec registers we dothe vextsd2q ... 


> +(define_insn_and_split "extendditi2"
> +  [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
> + (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
> +   (clobber (reg:DI CA_REGNO))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "#"
> +  "&& reload_completed"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  int dest_regno = reg_or_subregno (dest);
> +
> +  /* Handle conversion to GPR registers.  Load up the low part and then do
> + a sign extension to the upper part.  */
> +  if (INT_REGNO_P (dest_regno))
> +{
> +  rtx dest_hi = gen_highpart (DImode, dest);
> +  rtx dest_lo = gen_lowpart (DImode, dest);
> +
> +  emit_move_insn (dest_lo, src);
> +  emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));
> +  DONE;
> +}
ok

> +
> +  /* For conversion to Altivec register, generate either a splat operation or
> + a load rightmost double word instruction.  Both instructions gets the
> + DImode value into the lower 64 bits, and then do the vextsd2q
> + instruction.  */

consider   s/instruction. Both instructions gets/to get/

> +  else if (ALTIVEC_REGNO_P (dest_regno))
> +{
> +  if (MEM_P (src))
> + emit_insn (gen_vsx_lxvrdx (dest, src));
> +  else
> + {
> +   rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> +   emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
> + }
> +
> +  emit_insn (gen_extendditi2_vector (dest, dest));
> +  DONE;
> +}

ok

lgtm, thanks
-Will

> +
> +  else
> +gcc_unreachable ();
> +}
> +  [(set_attr "length" "8")])
> 
>  ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
>  (define_insn "extendditi2_vector"
> @@ -5042,18 +5085,6 @@ (define_insn "extendditi2_vector"
>"vextsd2q %0,%1"
>[(set_attr "type" "vecexts")])
> 
> -(define_expand "extendditi2"
> -  [(set (match_operand:TI 0 "gpc_reg_operand")
> - (sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> -  "TARGET_POWER10"
> -  {
> -/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits.  
> */
> -rtx temp = gen_reg_rtx (TImode);
> -emit

Re: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.

2021-11-05 Thread will schmidt via Gcc-patches
On Fri, 2021-11-05 at 00:11 -0400, Michael Meissner wrote:
> Generate XXSPLTIDP for scalars on power10.
> 
> This patch implements XXSPLTIDP support for SF, and DF scalar constants.
> The previous patch added support for vector constants.  This patch adds
> the support for SFmode and DFmode scalar constants.
> 
> I added 2 new tests to test loading up SF and DF scalar constants.


ok

> 
> 2021-11-05  Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
>   (UNSPEC_XXSPLTIW_CONST): New unspec.
>   (movsf_hardfloat): Add support for generating XXSPLTIDP.
>   (mov_hardfloat32): Likewise.
>   (mov_hardfloat64): Likewise.
>   (xxspltidp__internal): New insns.
>   (xxspltiw__internal): New insns.
>   (splitters for SF/DFmode): Add new splitters for XXSPLTIDP.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/vec-splat-constant-df.c: New test.
>   * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
> ---

ok


>  gcc/config/rs6000/rs6000.md   | 97 +++
>  .../powerpc/vec-splat-constant-df.c   | 60 
>  .../powerpc/vec-splat-constant-sf.c   | 60 
>  3 files changed, 199 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 3a7bcd2426e..4122acb98cf 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -156,6 +156,8 @@ (define_c_enum "unspec"
> UNSPEC_PEXTD
> UNSPEC_HASHST
> UNSPEC_HASHCHK
> +   UNSPEC_XXSPLTIDP_CONST
> +   UNSPEC_XXSPLTIW_CONST
>])
> 
>  ;;
> @@ -7764,17 +7766,17 @@ (define_split
>  ;;
>  ;;   LWZ  LFSLXSSP   LXSSPX STFS   STXSSP
>  ;;   STXSSPX  STWXXLXOR  LI FMRXSCPSGNDP
> -;;   MR   MT  MF   NOP
> +;;   MR   MT  MF   NOPXXSPLTIDP
> 
>  (define_insn "movsf_hardfloat"
>[(set (match_operand:SF 0 "nonimmediate_operand"
>"=!r,   f, v,  wa,m, wY,
> Z, m, wa, !r,f, wa,
> -   !r,*c*l,  !r, *h")
> +   !r,*c*l,  !r, *h,wa")
>   (match_operand:SF 1 "input_operand"
>"m, m, wY, Z, f, v,
> wa,r, j,  j, f, wa,
> -   r, r, *h, 0"))]
> +   r, r, *h, 0, eP"))]
>"(register_operand (operands[0], SFmode)
> || register_operand (operands[1], SFmode))
> && TARGET_HARD_FLOAT
> @@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat"
> mr %0,%1
> mt%0 %1
> mf%1 %0
> -   nop"
> +   nop
> +   #"
>[(set_attr "type"
>   "load,   fpload,fpload, fpload,fpstore,   fpstore,
>fpstore,store, veclogical, integer,   fpsimple,  fpsimple,
> -  *,  mtjmpr,mfjmpr, *")
> +  *,  mtjmpr,mfjmpr, *, vecperm")
> (set_attr "isa"
>   "*,  *, p9v,p8v,   *, p9v,
>p8v,*, *,  *, *, *,
> -  *,  *, *,  *")])
> +  *,  *, *,  *, p10")])
> 
>  ;;   LWZ  LFIWZX STWSTFIWX MTVSRWZMFVSRWZ
>  ;;   FMR  MR MT%0   MF%1   NOP
> @@ -8064,18 +8067,18 @@ (define_split
> 
>  ;;   STFD LFD FMR LXSDSTXSD
>  ;;   LXSD STXSD   XXLOR   XXLXOR  GPR<-0
> -;;   LWZ  STW MR
> +;;   LWZ  STW MR  XXSPLTIDP
> 
> 
>  (define_insn "*mov_hardfloat32"
>[(set (match_operand:FMOVE64 0 "nonimmediate_operand"
>  "=m,  d,  d,  ,   wY,
>,   Z,  ,  ,  !r,
> -  Y,  r,  !r")
> +  Y,  r,  !r, wa")
>   (match_operand:FMOVE64 1 "input_operand"
>   "d,  m,  d,  wY, ,
>Z,  ,   ,  ,  ,
> -  r,  Y,  r"))]
> +  r,  Y,  r,  eP"))]
>"! TARGET_POWERPC64 && TARGET_HARD_FLOAT
> && (gpc_reg_operand (operands[0], mode)
> || gpc_reg_operand (operands[1], mode))"
> @@ -8092,20 +8095,21 @@ (define_insn "*mov_hardfloat32"
> #
> #
> #
> +   #
> #"
>[(set_attr "type"
>  "fpstore, fpload, fpsimple,   fpload, fpstore,
>   fpload,  fpstore,veclogical, veclogical, two,
> - store,   load,   two")
> +

Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants

2021-11-05 Thread will schmidt via Gcc-patches
On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote:
> Generate XXSPLTIDP for vectors on power10.
> 
> This patch implements XXSPLTIDP support for all vector constants.  The
> XXSPLTIDP instruction is given a 32-bit immediate that is converted to a 
> vector
> of two DFmode constants.  The immediate is in SFmode format, so only constants
> that fit as SFmode values can be loaded with XXSPLTIDP.
> 
> The constraint (eP) added in the previous patch for XXSPLTIW is also used
> for XXSPLTIDP.
> 

ok


> DImode scalar constants are not handled.  This is due to the majority of 
> DImode
> constants will be in the GPR registers.  With vector registers, you have the
> problem that XXSPLTIDP splats the double word into both elements of the
> vector.  However, if TImode is loaded with an integer constant, it wants a 
> full
> 128-bit constant.

This may be worth as adding to a todo somewhere in the code.

> 
> SFmode and DFmode scalar constants are not handled in this patch.  The
> support for for those constants will be in the next patch.

ok

> 
> I have added a temporary switch (-msplat-float-constant) to control whether or
> not the XXSPLTIDP instruction is generated.
> 
> I added 2 new tests to test loading up V2DI and V2DF vector constants.




> 
> 2021-11-05  Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/predicates.md (easy_fp_constant): Add support for
>   generating XXSPLTIDP.
>   (vsx_prefixed_constant): Likewise.
>   (easy_vector_constant): Likewise.
>   * config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
>   New declaration.
>   * config/rs6000/rs6000.c (output_vec_const_move): Add support for
>   generating XXSPLTIDP.
>   (prefixed_xxsplti_p): Likewise.
>   (constant_generates_xxspltidp): New function.
>   * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
>   regex for power10.
>   * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
>   * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
> ---


ok

>  gcc/config/rs6000/predicates.md   |   9 ++
>  gcc/config/rs6000/rs6000-protos.h |   1 +
>  gcc/config/rs6000/rs6000.c| 108 ++
>  gcc/config/rs6000/rs6000.opt  |   4 +
>  .../powerpc/pr86731-fwrapv-longlong.c |   9 +-
>  .../powerpc/vec-splat-constant-v2df.c |  64 +++
>  .../powerpc/vec-splat-constant-v2di.c |  50 
>  7 files changed, 241 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> 
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index ed6252bd0c4..d748b11857c 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant"
> 
>if (constant_generates_xxspltiw (&vsx_const))
>   return true;
> +
> +  if (constant_generates_xxspltidp (&vsx_const))
> + return true;
>  }
> 
>/* Otherwise consider floating point constants hard, so that the
> @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant"
>if (constant_generates_xxspltiw (&vsx_const))
>  return true;
> 
> +  if (constant_generates_xxspltidp (&vsx_const))
> +return true;
> +
>return false;
>  })
> 
> @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant"
> 
> if (constant_generates_xxspltiw (&vsx_const))
>   return true;
> +
> +   if (constant_generates_xxspltidp (&vsx_const))
> + return true;
>   }


ok

> 
>if (TARGET_P9_VECTOR
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 99c6a671289..2d28df7442d 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
>  vec_const_128bit_type *);
>  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
>  extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
> +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *);
>  #endif /* RTX_CODE */
> 
>  #ifdef TREE_CODE
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index be24f56eb31..8fde48cf2b3 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx *operands)
> operands[2] = GEN_INT (imm);
> return "xxspltiw %x0,%2";
>   }
> +
> +   imm = constant_generates_xxspltidp (&vsx_const);
> +   if (imm)


Just a nit that the two lines could be combined into a similar form
as used elsewhere as ...
if (constant_generates_xxsp

Re: [PATCH 3/5] Add Power10 XXSPLTIW

2021-11-05 Thread will schmidt via Gcc-patches
On Fri, 2021-11-05 at 00:09 -0400, Michael Meissner wrote:
> Generate XXSPLTIW on power10.
> 

Hi,


> This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
> instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
> adding support for vector constants that can be used, and adding a
> VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
> 
> The eP constraint was added to recognize constants that can be loaded into
> vector registers with a single prefixed instruction.

Perhaps Swap "... the eP constraint was added ..."  for "Add the eP
constraint to ..."


> 
> I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
> constants.


> 
> 2021-11-05  Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/constraints.md (eP): Update comment.
>   * config/rs6000/predicates.md (easy_fp_constant): Add support for
>   generating XXSPLTIW.
>   (vsx_prefixed_constant): New predicate.
>   (easy_vector_constant): Add support for
>   generating XXSPLTIW.
>   * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
>   declaration.
>   (constant_generates_xxspltiw): Likewise.
>   * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
>   XXSPLTIW, don't do XXSPLTIB and sign extend.

Perhaps just 'generate XXSPLTIW if possible'.  

>   (output_vec_const_move): Add support for XXSPLTIW.
>   (prefixed_xxsplti_p): New function.
>   (constant_generates_xxspltiw): New function.
>   * config/rs6000/rs6000.md (prefixed attribute): Add support to
>   mark XXSPLTI* instructions as being prefixed.
>   * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
>   switch.
>   * config/rs6000/vsx.md (vsx_mov_64bit): Add support for
>   generating XXSPLTIW or XXSPLTIDP.
>   (vsx_mov_32bit): Likewise.
>   * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
>   eP constraint.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
>   * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
>   * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
>   * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
>   * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
> ---
>  gcc/config/rs6000/constraints.md  |  6 ++
>  gcc/config/rs6000/predicates.md   | 46 ++-
>  gcc/config/rs6000/rs6000-protos.h |  2 +
>  gcc/config/rs6000/rs6000.c| 81 +++
>  gcc/config/rs6000/rs6000.md   |  5 ++
>  gcc/config/rs6000/rs6000.opt  |  4 +
>  gcc/config/rs6000/vsx.md  | 28 +++
>  gcc/doc/md.texi   |  4 +
>  .../powerpc/vec-splat-constant-v16qi.c| 27 +++
>  .../powerpc/vec-splat-constant-v4sf.c | 67 +++
>  .../powerpc/vec-splat-constant-v4si.c | 51 
>  .../powerpc/vec-splat-constant-v8hi.c | 62 ++
>  .../gcc.target/powerpc/vec-splati-runnable.c  |  4 +-
>  13 files changed, 369 insertions(+), 18 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
> 
> diff --git a/gcc/config/rs6000/constraints.md 
> b/gcc/config/rs6000/constraints.md
> index e72132b4c28..a4b05837fa6 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -213,6 +213,12 @@ (define_constraint "eI"
>"A signed 34-bit integer constant if prefixed instructions are supported."
>(match_operand 0 "cint34_operand"))
> 
> +;; A SF/DF scalar constant or a vector constant that can be loaded into 
> vector
> +;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
> +(define_constraint "eP"
> +  "A constant that can be loaded into a VSX register with one prefixed insn."
> +  (match_operand 0 "vsx_prefixed_constant"))
> +
>  ;; A TF/KF scalar constant or a vector constant that can load certain IEEE
>  ;; 128-bit constants into vector registers using LXVKQ.
>  (define_constraint "eQ"
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index e0d1c718e9f..ed6252bd0c4 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant"
>vec_const_128bit_type vsx_const;
>if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
>  {
> -  if (constant_generates_lxvkq (&vsx_const) != 0)
> +  if (constant_generates_lxvkq (&vsx_const))
> + return true;
> +
> +  if (constant_generates_xxspltiw (&vsx_const))
>   return true;
>  }
> 

o

Re: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)

2021-11-05 Thread will schmidt via Gcc-patches
On Fri, 2021-11-05 at 00:07 -0400, Michael Meissner wrote:
> Add LXVKQ support.
> 
> This patch adds support to generate the LXVKQ instruction to load specific
> IEEE-128 floating point constants.
> 
> Compared to the last time I submitted this patch, I modified it so that it
> uses the bit pattern of the vector to see if it can generate the LXVKQ
> instruction.  This means on a little endian Power system, the
> following code will generate a LXVKQ 34,16 instruction:
> 
> vector long long foo (void)
> {
> #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>   return (vector long long) { 0x, 0x8000 };
> #else
>   return (vector long long) { 0x8000, 0x };
> #endif
> }
> 
> because that vector pattern is the same bit pattern as -0.0F128.
> 
> 2021-11-05  Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/constraints.md (eQ): New constraint.
>   * config/rs6000/predicates.md (easy_fp_constant): Add support for
>   generating the LXVKQ instruction.
>   (easy_vector_constant_ieee128): New predicate.
>   (easy_vector_constant): Add support for generating the LXVKQ
>   instruction.
>   * config/rs6000/rs6000-protos.h (constant_generates_lxvkq): New
>   declaration.
>   * config/rs6000/rs6000.c (output_vec_const_move): Add support for
>   generating LXVKQ.
>   (constant_generates_lxvkq): New function.
>   * config/rs6000/rs6000.opt (-mieee128-constant): New debug
>   option.
>   * config/rs6000/vsx.md (vsx_mov_64bit): Add support for
>   generating LXVKQ.
>   (vsx_mov_32bit): Likewise.
>   * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
>   eQ constraint.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/float128-constant.c: New test.
> ---
>  gcc/config/rs6000/constraints.md  |   6 +
>  gcc/config/rs6000/predicates.md   |  34 
>  gcc/config/rs6000/rs6000-protos.h |   1 +
>  gcc/config/rs6000/rs6000.c|  62 +++
>  gcc/config/rs6000/rs6000.opt  |   4 +
>  gcc/config/rs6000/vsx.md  |  14 ++
>  gcc/doc/md.texi   |   4 +
>  .../gcc.target/powerpc/float128-constant.c| 160 ++
>  8 files changed, 285 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-constant.c
> 
> diff --git a/gcc/config/rs6000/constraints.md 
> b/gcc/config/rs6000/constraints.md
> index c8cff1a3038..e72132b4c28 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -213,6 +213,12 @@ (define_constraint "eI"
>"A signed 34-bit integer constant if prefixed instructions are supported."
>(match_operand 0 "cint34_operand"))
> 
> +;; A TF/KF scalar constant or a vector constant that can load certain IEEE
> +;; 128-bit constants into vector registers using LXVKQ.
> +(define_constraint "eQ"
> +  "An IEEE 128-bit constant that can be loaded into VSX registers."
> +  (match_operand 0 "easy_vector_constant_ieee128"))
> +
>  ;; Floating-point constraints.  These two are defined so that insn
>  ;; length attributes can be calculated exactly.
> 

ok


> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 956e42bc514..e0d1c718e9f 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant"
>if (TARGET_VSX && op == CONST0_RTX (mode))
>  return 1;
> 
> +  /* Constants that can be generated with ISA 3.1 instructions are easy.  */

Easy is relative, but OK.

> +  vec_const_128bit_type vsx_const;
> +  if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> +{
> +  if (constant_generates_lxvkq (&vsx_const) != 0)
> + return true;
> +}
> +
>/* Otherwise consider floating point constants hard, so that the
>   constant gets pushed to memory during the early RTL phases.  This
>   has the advantage that double precision constants that can be
> @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant"
> return 0;
>  })
> 
> +;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
> +;; via the LXVKQ instruction.
> +
> +(define_predicate "easy_vector_constant_ieee128"
> +  (match_code "const_vector,const_double")
> +{
> +  vec_const_128bit_type vsx_const;
> +
> +  /* Can we generate the LXVKQ instruction?  */
> +  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
> +  || !TARGET_VSX)
> +return false;

Presumably all of the checks there are valid.  (Can we have power10
without float128_hw or ieee128_constant flags set?)I do notice the
addition of an ieee128_constant flag below.
> +
> +  return (vec_const_128bit_to_bytes (op, mode, &vsx_const)
> +   && constant_generates_lxvkq (&vsx_const) != 0);
> +})
> +

ok


>  ;; Return 1 if the operand is a constant that ca

Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)

2021-11-05 Thread will schmidt via Gcc-patches
On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote:
> Add new constant data structure.
> 
> This patch provides the data structure and function to convert a
> CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
> an array of bytes, half-words, words, and  double words that can be loaded
> into a 128-bit vector register.
> 
> The next patches will use this data structure to generate code that
> generates load of the vector/floating point registers using the XXSPLTIDP,
> XXSPLTIW, and LXVKQ instructions that were added in power10.
> 
> 2021-11-05  Michael Meissner  
> 

Email here is different than the from:.  No big deal either way.  

> gcc/
> 
>   * config/rs6000/rs6000-protos.h (VECTOR_128BIT_*): New macros.

I defer to maintainers.  I like to explicitly include the full macro names here 
so a grep later on can easily find it.  


>   (vec_const_128bit_type): New structure type.
>   (vec_const_128bit_to_bytes): New declaration.
>   * config/rs6000/rs6000.c (constant_int_to_128bit_vector): New
>   helper function.
>   (constant_fp_to_128bit_vector): New helper function.
>   (vec_const_128bit_to_bytes): New function.

ok

> ---
>  gcc/config/rs6000/rs6000-protos.h |  28 
>  gcc/config/rs6000/rs6000.c| 253 ++
>  2 files changed, 281 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 14f6b313105..490d6e33736 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -222,6 +222,34 @@ address_is_prefixed (rtx addr,
>return (iform == INSN_FORM_PREFIXED_NUMERIC
> || iform == INSN_FORM_PCREL_LOCAL);
>  }
> +
> +/* Functions and data structures relating to 128-bit constants that are
> +   converted to byte, half-word, word, and double-word values.  All fields 
> are
> +   kept in big endian order.  We also convert scalar values to 128-bits if 
> they
> +   are going to be loaded into vector registers.  */
> +#define VECTOR_128BIT_BITS   128
> +#define VECTOR_128BIT_BYTES  (128 / 8)
> +#define VECTOR_128BIT_HALF_WORDS (128 / 16)
> +#define VECTOR_128BIT_WORDS  (128 / 32)
> +#define VECTOR_128BIT_DOUBLE_WORDS   (128 / 64)

ok

> +
> +typedef struct {
> +  /* Constant as various sized items.  */
> +  unsigned HOST_WIDE_INT double_words[VECTOR_128BIT_DOUBLE_WORDS];
> +  unsigned int words[VECTOR_128BIT_WORDS];
> +  unsigned short half_words[VECTOR_128BIT_HALF_WORDS];
> +  unsigned char bytes[VECTOR_128BIT_BYTES];
> +
> +  unsigned original_size;/* Constant size before splat.  */
> +  bool fp_constant_p;/* Is the constant floating 
> point?  */
> +  bool all_double_words_same;/* Are the double words all 
> equal?  */
> +  bool all_words_same;   /* Are the words all equal?  */
> +  bool all_half_words_same;  /* Are the halft words all equal?  */

half

> +  bool all_bytes_same;   /* Are the bytes all equal?  */




> +} vec_const_128bit_type;
> +

ok.  


> +extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
> +vec_const_128bit_type *);
>  #endif /* RTX_CODE */
> 
>  #ifdef TREE_CODE
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 01affc7a47c..f285022294a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -28619,6 +28619,259 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
>fprintf (file, "\n");
>  }
> 
> +
> +/* Copy an integer constant to the vector constant structure.  */
> +

Here and subsequent comments, I'd debate on whether to enhance the
comment to be explicit on the structure name being copied to/from.
(vec_const_128bit_type is easy to search for, vector or constant or
structure are not as unique)

> +static void
> +constant_int_to_128bit_vector (rtx op,
> +machine_mode mode,
> +size_t byte_num,
> +vec_const_128bit_type *info)
> +{
> +  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
> +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> +
> +  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
> +info->bytes[byte_num++] = (uvalue >> shift) & 0xff;
> +}

I didn't confirm the maths, but looks OK at a glance.


> +
> +/* Copy an floating point constant to the vector constant structure.  */
> +

s/an/a/

> +static void
> +constant_fp_to_128bit_vector (rtx op,
> +   machine_mode mode,
> +   size_t byte_num,
> +   vec_const_128bit_type *info)
> +{
> +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> +  unsigned num_words = bitsize / 32;
> +  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
> +  long real_words[VECTOR_128BIT_WORDS];
> +
> +  /* Make sure we don't overflow the real_words array and that it is
> +

Re: PING^4: [RS6000] rotate and mask constants [PR94393]

2021-11-03 Thread will schmidt via Gcc-patches
On Mon, 2021-10-25 at 14:41 -0500, Pat Haugen via Gcc-patches wrote:
> Ping.
> 
> On 8/10/21 10:49 AM, Pat Haugen via Gcc-patches wrote:
> > On 7/27/21 1:35 PM, will schmidt wrote:
> > > On Fri, 2021-07-23 at 15:23 -0500, Pat Haugen via Gcc-patches wrote:
> > > > Ping 
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html
> > > > 
> > > > I've done a current bootstrap/regtest on powerpc64/powerpc64le with
> > > > no regressions.
> > > > 
> > > > -Pat
> > > 
> > > That patch was previously posted by Alan Modra.
> > > Given the time lapse this may need to be re-posted entirely, pending
> > > what the maintainers suggest.. :-)
> > > 
> > > 
> > 
> > Including full patch.

A non-authorative re-review.  :-)

> > 
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 60f406a4ff6..8e758a1f2dd 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -1131,6 +1131,8 @@ static tree rs6000_handle_altivec_attribute (tree *, 
> > tree, tree, int, bool *);
> >  static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
> >  static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
> > +static bool rotate_and_mask_constant (unsigned HOST_WIDE_INT, 
> > HOST_WIDE_INT *,
> > + int *, unsigned HOST_WIDE_INT *);
> >  static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
> >  static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, 
> > bool);
> >  static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t,

Prototype for rotate_and_mask_constant function.  OK.


> > @@ -5973,7 +5975,7 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
> >  }
> >  
> >  /* Helper for num_insns_constant.  Allow constants formed by the
> > -   num_insns_constant_gpr sequences, plus li -1, rldicl/rldicr/rlwinm,
> > +   num_insns_constant_gpr sequences, and li/lis+rldicl/rldicr/rldic/rlwinm,
> > and handle modes that require multiple gprs.  */

Ok.

> >  
> >  static int
> > @@ -5988,8 +5990,8 @@ num_insns_constant_multi (HOST_WIDE_INT value, 
> > machine_mode mode)
> >if (insns > 2
> >   /* We won't get more than 2 from num_insns_constant_gpr
> >  except when TARGET_POWERPC64 and mode is DImode or
> > -wider, so the register mode must be DImode.  */
> > - && rs6000_is_valid_and_mask (GEN_INT (low), DImode))
> > +wider.  */
> > + && rotate_and_mask_constant (low, NULL, NULL, NULL))
> > insns = 2;
> >total += insns;
> >/* If BITS_PER_WORD is the number of bits in HOST_WIDE_INT, doing
> > @@ -10077,6 +10079,244 @@ rs6000_emit_set_const (rtx dest, rtx source)
> >return true;
> >  }
> >  
> > +/* Rotate DImode word, being careful to handle the case where
> > +   HOST_WIDE_INT is larger than DImode.  */
> > +
> > +static inline unsigned HOST_WIDE_INT
> > +rotate_di (unsigned HOST_WIDE_INT x, unsigned int shift)
> > +{
> > +  unsigned HOST_WIDE_INT mask_hi, mask_lo;
> > +
> > +  mask_hi = (HOST_WIDE_INT_1U << 63 << 1) - (HOST_WIDE_INT_1U << shift);
> > +  mask_lo = (HOST_WIDE_INT_1U << shift) - 1;
> > +  x = ((x << shift) & mask_hi) | ((x >> (64 - shift)) & mask_lo);
> > +  x = (x ^ (HOST_WIDE_INT_1U << 63)) - (HOST_WIDE_INT_1U << 63);
> > +  return x;
> > +}

ok.

> > +
> > +/* Can C be formed by rotating a 16-bit positive value left by C16LSB?  */

Perhaps rephrase as "Attempt to form a constant by ...  C16LSB."

> > +
> > +static inline bool
> > +is_rotate_positive_constant (unsigned HOST_WIDE_INT c, int c16lsb,
> > +HOST_WIDE_INT *val, int *shift,
> > +unsigned HOST_WIDE_INT *mask)
> > +{
> > +  if ((c & ~(HOST_WIDE_INT_UC (0x7fff) << c16lsb)) == 0)
> > +{
> > +  /* eg. c = 1100   ... 
> > +-> val = 0x3000, shift = 49, mask = -1ull.  */
> > +  if (val)
> > +   {
> > + c >>= c16lsb;
> > + /* Make the value and shift canonical in the sense of
> > +selecting the smallest value.  For the example above
> > +-> val = 3, shift = 61.  */

Comment seems reasonable.  I did not review or c

Re: [PATCH] rs6000: Add psabi diagnostic for C++ zero-width bit field ABI change (PR102024)

2021-09-22 Thread will schmidt via Gcc-patches
On Tue, 2021-09-21 at 17:35 -0500, Bill Schmidt wrote:
> Hi!
> 
> Previously zero-width bit fields were removed from structs, so that otherwise
> homogeneous aggregates were treated as such and passed in FPRs and VSRs.
> This was incorrect behavior per the ELFv2 ABI.  Now that these fields are no
> longer being removed, we generate the correct parameter passing code.  Alert
> the unwary user in the rare cases where this behavior changes.
> 
> As noted in the PR, once the GCC 12 Changes page has text describing this 
> issue,
> we can update the diagnostic message to reference that URL.  I'll handle that
> in a follow-up patch.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this okay for trunk?

How previously?  is this one that will need all the backports? 

> 
> Thanks!
> Bill
> 
> 
> 2021-09-21  Bill Schmidt  
> 
> gcc/
>   PR target/102024
>   * config/rs6000/rs6000-call.c (rs6000_aggregate_candidate): Detect
>   zero-width bit fields and return indicator.
>   (rs6000_discover_homogeneous_aggregate): Diagnose when the
>   presence of a zero-width bit field changes parameter passing in
>   GCC 12.
> 
> gcc/testsuite/
>   PR target/102024
>   * g++.target/powerpc/pr102024.C: New.


ok

> ---
>  gcc/config/rs6000/rs6000-call.c | 39 ++---
>  gcc/testsuite/g++.target/powerpc/pr102024.C | 23 
>  2 files changed, 57 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr102024.C
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 7d485480225..c02b202b0cd 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -6227,7 +6227,7 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
>  
>  static int
>  rs6000_aggregate_candidate (const_tree type, machine_mode *modep,
> - int *empty_base_seen)
> + int *empty_base_seen, int *zero_width_bf_seen)
>  {
>machine_mode mode;
>HOST_WIDE_INT size;
> @@ -6298,7 +6298,8 @@ rs6000_aggregate_candidate (const_tree type, 
> machine_mode *modep,
> return -1;
>  
>   count = rs6000_aggregate_candidate (TREE_TYPE (type), modep,
> - empty_base_seen);
> + empty_base_seen,
> + zero_width_bf_seen);
>   if (count == -1
>   || !index
>   || !TYPE_MAX_VALUE (index)
> @@ -6336,6 +6337,12 @@ rs6000_aggregate_candidate (const_tree type, 
> machine_mode *modep,
>   if (TREE_CODE (field) != FIELD_DECL)
> continue;
>  
> + if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field))
> +   {
> + *zero_width_bf_seen = 1;
> + continue;
> +   }
> +

Noting that the definition comes from tree.h and is 
#define SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD(NODE, VAL) \
  do {  \
gcc_checking_assert (DECL_BIT_FIELD (NODE));\
FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_0 = (VAL);   \
  } while (0)

ok.



>   if (DECL_FIELD_ABI_IGNORED (field))
> {
>   if (lookup_attribute ("no_unique_address",
> @@ -6347,7 +6354,8 @@ rs6000_aggregate_candidate (const_tree type, 
> machine_mode *modep,
> }
>  
>   sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep,
> - empty_base_seen);
> + empty_base_seen,
> + zero_width_bf_seen);
>   if (sub_count < 0)
> return -1;
>   count += sub_count;
> @@ -6381,7 +6389,8 @@ rs6000_aggregate_candidate (const_tree type, 
> machine_mode *modep,
> continue;
>  
>   sub_count = rs6000_aggregate_candidate (TREE_TYPE (field), modep,
> - empty_base_seen);
> + empty_base_seen,
> + zero_width_bf_seen);
>   if (sub_count < 0)
> return -1;
>   count = count > sub_count ? count : sub_count;
> @@ -6423,8 +6432,10 @@ rs6000_discover_homogeneous_aggregate (machine_mode 
> mode, const_tree type,
>  {
>machine_mode field_mode = VOIDmode;
>int empty_base_seen = 0;
> +  int zero_width_bf_seen = 0;
>int field_count = rs6000_aggregate_candidate (type, &field_mode,
> - &empty_base_seen);
> + &empty_base_seen,
> + &zero_width_bf_seen);
>  

That appears to be all of the callers of rs6000_aggregate_candidate. 
(ok).

>if (field_co

Re: [PATCH 05/18] rs6000: Support for vectorizing built-in functions

2021-09-13 Thread will schmidt via Gcc-patches
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> This patch just duplicates a couple of functions and adjusts them to use the
> new builtin names.  There's no logical change otherwise.
> 
> 2021-08-31  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000.c (rs6000-builtins.h): New include.
>   (rs6000_new_builtin_vectorized_function): New function.
>   (rs6000_new_builtin_md_vectorized_function): Likewise.
>   (rs6000_builtin_vectorized_function): Call
>   rs6000_new_builtin_vectorized_function.
>   (rs6000_builtin_md_vectorized_function): Call
>   rs6000_new_builtin_md_vectorized_function.

ok

> ---
>  gcc/config/rs6000/rs6000.c | 253 +
>  1 file changed, 253 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index b7ea1483da5..52c78c7500c 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -78,6 +78,7 @@
>  #include "case-cfn-macros.h"
>  #include "ppc-auxv.h"
>  #include "rs6000-internal.h"
> +#include "rs6000-builtins.h"
>  #include "opts.h"
> 
>  /* This file should be included last.  */
> @@ -5501,6 +5502,251 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
> loop *loop)
>return nunroll;
>  }
> 
> +/* Returns a function decl for a vectorized version of the builtin function
> +   with builtin function code FN and the result vector type TYPE, or 
> NULL_TREE
> +   if it is not available.  */
> +
> +static tree
> +rs6000_new_builtin_vectorized_function (unsigned int fn, tree type_out,
> + tree type_in)
> +{
> +  machine_mode in_mode, out_mode;
> +  int in_n, out_n;
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +fprintf (stderr, "rs6000_new_builtin_vectorized_function (%s, %s, %s)\n",
> +  combined_fn_name (combined_fn (fn)),
> +  GET_MODE_NAME (TYPE_MODE (type_out)),
> +  GET_MODE_NAME (TYPE_MODE (type_in)));
> +
> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> +  || TREE_CODE (type_in) != VECTOR_TYPE)
> +return NULL_TREE;
> +
> +  out_mode = TYPE_MODE (TREE_TYPE (type_out));
> +  out_n = TYPE_VECTOR_SUBPARTS (type_out);
> +  in_mode = TYPE_MODE (TREE_TYPE (type_in));
> +  in_n = TYPE_VECTOR_SUBPARTS (type_in);
> +
> +  switch (fn)
> +{
> +CASE_CFN_COPYSIGN:
> +  if (VECTOR_UNIT_VSX_P (V2DFmode)
> +   && out_mode == DFmode && out_n == 2
> +   && in_mode == DFmode && in_n == 2)
> + return rs6000_builtin_decls_x[RS6000_BIF_CPSGNDP];
> +  if (VECTOR_UNIT_VSX_P (V4SFmode)
> +   && out_mode == SFmode && out_n == 4
> +   && in_mode == SFmode && in_n == 4)
> + return rs6000_builtin_decls_x[RS6000_BIF_CPSGNSP];
> +  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +   && out_mode == SFmode && out_n == 4
> +   && in_mode == SFmode && in_n == 4)
> + return rs6000_builtin_decls_x[RS6000_BIF_COPYSIGN_V4SF];
> +  break;
> +CASE_CFN_CEIL:
> +  if (VECTOR_UNIT_VSX_P (V2DFmode)
> +   && out_mode == DFmode && out_n == 2
> +   && in_mode == DFmode && in_n == 2)
> + return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIP];
> +  if (VECTOR_UNIT_VSX_P (V4SFmode)
> +   && out_mode == SFmode && out_n == 4
> +   && in_mode == SFmode && in_n == 4)
> + return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIP];
> +  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +   && out_mode == SFmode && out_n == 4
> +   && in_mode == SFmode && in_n == 4)
> + return rs6000_builtin_decls_x[RS6000_BIF_VRFIP];
> +  break;
> +CASE_CFN_FLOOR:
> +  if (VECTOR_UNIT_VSX_P (V2DFmode)
> +   && out_mode == DFmode && out_n == 2
> +   && in_mode == DFmode && in_n == 2)
> + return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIM];
> +  if (VECTOR_UNIT_VSX_P (V4SFmode)
> +   && out_mode == SFmode && out_n == 4
> +   && in_mode == SFmode && in_n == 4)
> + return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIM];
> +  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +   && out_mode == SFmode && out_n == 4
> +   && in_mode == SFmode && in_n == 4)
> + return rs6000_builtin_decls_x[RS6000_BIF_VRFIM];
> +  break;
> +CASE_CFN_FMA:
> +  if (VECTOR_UNIT_VSX_P (V2DFmode)
> +   && out_mode == DFmode && out_n == 2
> +   && in_mode == DFmode && in_n == 2)
> + return rs6000_builtin_decls_x[RS6000_BIF_XVMADDDP];
> +  if (VECTOR_UNIT_VSX_P (V4SFmode)
> +   && out_mode == SFmode && out_n == 4
> +   && in_mode == SFmode && in_n == 4)
> + return rs6000_builtin_decls_x[RS6000_BIF_XVMADDSP];
> +  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +   && out_mode == SFmode && out_n == 4
> +   && in_mode == SFmode && in_n == 4)
> + return rs6000_builtin_decls_x[RS6000_BIF_VMADDFP];
> +  break;
> +CASE_CFN_TRUNC:
> +  if (VECTOR_UNIT_VSX_P (V2DFmode)
> +   && out_mode == DFmode && out_n == 2
> +   && in_mode == DFmode && in_n == 2)
> + return rs6000_builtin_decls_x[RS6000_BIF_XVRDP

Re: [PATCH 04/18] rs6000: Handle some recent MMA builtin changes

2021-09-13 Thread will schmidt via Gcc-patches
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
> __builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
> I had been using to automate gimple folding of MMA builtins.  Previously,
> every MMA function that could be folded had an associated internal function
> that it was folded into.  The LXVP/STXVP builtins are just folded directly
> into memory operations.
> 
> Instead of relying on this pattern, this patch adds a new attribute to
> builtins called "mmaint," which is set for all MMA builtins that have an
> associated internal builtin.  The naming convention that adds _INTERNAL to
> the builtin index name remains.
> 
> The rest of the patch is just duplicating Peter's patch, using the new
> builtin infrastructure.
> 
> 2021-08-23  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint flag.
>   (ASSEMBLE_PAIR): Likewise.
>   (BUILD_ACC): Likewise.
>   (DISASSEMBLE_ACC): Likewise.
>   (DISASSEMBLE_PAIR): Likewise.
>   (PMXVBF16GER2): Likewise.
>   (PMXVBF16GER2NN): Likewise.
>   (PMXVBF16GER2NP): Likewise.
>   (PMXVBF16GER2PN): Likewise.
>   (PMXVBF16GER2PP): Likewise.
>   (PMXVF16GER2): Likewise.
>   (PMXVF16GER2NN): Likewise.
>   (PMXVF16GER2NP): Likewise.
>   (PMXVF16GER2PN): Likewise.
>   (PMXVF16GER2PP): Likewise.
>   (PMXVF32GER): Likewise.
>   (PMXVF32GERNN): Likewise.
>   (PMXVF32GERNP): Likewise.
>   (PMXVF32GERPN): Likewise.
>   (PMXVF32GERPP): Likewise.
>   (PMXVF64GER): Likewise.
>   (PMXVF64GERNN): Likewise.
>   (PMXVF64GERNP): Likewise.
>   (PMXVF64GERPN): Likewise.
>   (PMXVF64GERPP): Likewise.
>   (PMXVI16GER2): Likewise.
>   (PMXVI16GER2PP): Likewise.
>   (PMXVI16GER2S): Likewise.
>   (PMXVI16GER2SPP): Likewise.
>   (PMXVI4GER8): Likewise.
>   (PMXVI4GER8PP): Likewise.
>   (PMXVI8GER4): Likewise.
>   (PMXVI8GER4PP): Likewise.
>   (PMXVI8GER4SPP): Likewise.
>   (XVBF16GER2): Likewise.
>   (XVBF16GER2NN): Likewise.
>   (XVBF16GER2NP): Likewise.
>   (XVBF16GER2PN): Likewise.
>   (XVBF16GER2PP): Likewise.
>   (XVF16GER2): Likewise.
>   (XVF16GER2NN): Likewise.
>   (XVF16GER2NP): Likewise.
>   (XVF16GER2PN): Likewise.
>   (XVF16GER2PP): Likewise.
>   (XVF32GER): Likewise.
>   (XVF32GERNN): Likewise.
>   (XVF32GERNP): Likewise.
>   (XVF32GERPN): Likewise.
>   (XVF32GERPP): Likewise.
>   (XVF64GER): Likewise.
>   (XVF64GERNN): Likewise.
>   (XVF64GERNP): Likewise.
>   (XVF64GERPN): Likewise.
>   (XVF64GERPP): Likewise.
>   (XVI16GER2): Likewise.
>   (XVI16GER2PP): Likewise.
>   (XVI16GER2S): Likewise.
>   (XVI16GER2SPP): Likewise.
>   (XVI4GER8): Likewise.
>   (XVI4GER8PP): Likewise.
>   (XVI8GER4): Likewise.
>   (XVI8GER4PP): Likewise.
>   (XVI8GER4SPP): Likewise.
>   (XXMFACC): Likewise.
>   (XXMTACC): Likewise.
>   (XXSETACCZ): Likewise.
>   (ASSEMBLE_PAIR_V): Likewise.
>   (BUILD_PAIR): Likewise.
>   (DISASSEMBLE_PAIR_V): Likewise.
>   (LXVP): New.
>   (STXVP): New.

ok

>   * config/rs6000/rs6000-call.c
>   (rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
>   RS6000_BIF_STXVP.
>   * config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint.
>   (parse_bif_attrs): Handle ismmaint.
>   (write_decls): Add bif_mmaint_bit and bif_is_mmaint.
>   (write_bif_static_init): Handle ismmaint.

ok

> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 145 ---
>  gcc/config/rs6000/rs6000-call.c  |  38 +-
>  gcc/config/rs6000/rs6000-gen-builtins.c  |  38 +++---
>  3 files changed, 135 insertions(+), 86 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index a8c6b9e988f..1966516551e 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -129,6 +129,7 @@
>  ;   mma  Needs special handling for MMA
>  ;   quad MMA instruction using a register quad as an input operand
>  ;   pair MMA instruction using a register pair as an input operand
> +;   mmaint   MMA instruction expanding to internal call at GIMPLE time
>  ;   no32bit  Not valid for TARGET_32BIT
>  ;   32bitRequires different handling for TARGET_32BIT
>  ;   cpu  This is a "cpu_is" or "cpu_supports" builtin
> @@ -3584,415 +3585,421 @@
> 
>  [mma]
>void __builtin_mma_assemble_acc (v512 *, vuc, vuc, vuc, vuc);
> -ASSEMBLE_ACC nothing {mma}
> +ASSEMBLE_ACC nothing {mma,mmaint}
> 
>v512 __builtin_mma_assemble_acc_internal (vuc, vuc, vuc, vuc);
>  ASSEMBLE_ACC_INTERNAL mma_assemble_acc {mma}
> 
>void __builtin_mma_assemble_pair (v256 *, vuc, vuc);
> -ASSEMBLE_PAIR nothing {mma}
> + 

Re: [PATCH 03/18] rs6000: Handle gimple folding of target built-ins

2021-09-13 Thread will schmidt via Gcc-patches
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> This is another patch that looks bigger than it really is.  Because we
> have a new namespace for the builtins, allowing us to have both the old
> and new builtin infrastructure supported at once, we need versions of
> these functions that use the new builtin namespace.  Otherwise the code is
> unchanged.
> 
> 2021-08-31  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
>   New forward decl.
>   (rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin.
>   (rs6000_new_builtin_valid_without_lhs): New function.
>   (rs6000_gimple_fold_new_mma_builtin): Likewise.
>   (rs6000_gimple_fold_new_builtin): Likewise.
> ---
>  gcc/config/rs6000/rs6000-call.c | 1165 +++
>  1 file changed, 1165 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 2c68aa3580c..eae4e15df1e 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, 
> machine_mode,
>  static void rs6000_common_init_builtins (void);
>  static void htm_init_builtins (void);
>  static void mma_init_builtins (void);
> +static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
> 
> 
>  /* Hash table to keep track of the argument types for builtin functions.  */
> @@ -12024,6 +12025,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
> *gsi)
>  bool
>  rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  {
> +  if (new_builtins_are_live)
> +return rs6000_gimple_fold_new_builtin (gsi);
> +
>gimple *stmt = gsi_stmt (*gsi);
>tree fndecl = gimple_call_fndecl (stmt);
>gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == 
> BUILT_IN_MD);

ok

> @@ -12971,6 +12975,35 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
>return false;
>  }
> 
> +/*  Helper function to sort out which built-ins may be valid without having
> +a LHS.  */
> +static bool
> +rs6000_new_builtin_valid_without_lhs (enum rs6000_gen_builtins fn_code,
> +   tree fndecl)
> +{
> +  if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node)
> +return true;

Is that a better or improved version of the code as seen in
rs6000_builtin_valid_without_lhs ? 
That is
>  if (rs6000_builtin_info[fn_code].attr & RS6000_BTC_VOID)
>return true;

ok either way.


> +
> +  switch (fn_code)
> +{
> +case RS6000_BIF_STVX_V16QI:
> +case RS6000_BIF_STVX_V8HI:
> +case RS6000_BIF_STVX_V4SI:
> +case RS6000_BIF_STVX_V4SF:
> +case RS6000_BIF_STVX_V2DI:
> +case RS6000_BIF_STVX_V2DF:
> +case RS6000_BIF_STXVW4X_V16QI:
> +case RS6000_BIF_STXVW4X_V8HI:
> +case RS6000_BIF_STXVW4X_V4SF:
> +case RS6000_BIF_STXVW4X_V4SI:
> +case RS6000_BIF_STXVD2X_V2DF:
> +case RS6000_BIF_STXVD2X_V2DI:
> +  return true;
> +default:
> +  return false;
> +}
> +}
> +
>  /* Check whether a builtin function is supported in this target
> configuration.  */
>  bool
> @@ -13024,6 +13057,1138 @@ rs6000_new_builtin_is_supported (enum 
> rs6000_gen_builtins fncode)
>gcc_unreachable ();
>  }
> 
> +/* Expand the MMA built-ins early, so that we can convert the 
> pass-by-reference
> +   __vector_quad arguments into pass-by-value arguments, leading to more
> +   efficient code generation.  */
> +static bool
> +rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
> + rs6000_gen_builtins fn_code)
> +{
> +  gimple *stmt = gsi_stmt (*gsi);
> +  size_t fncode = (size_t) fn_code;
> +
> +  if (!bif_is_mma (rs6000_builtin_info_x[fncode]))
> +return false;
> +
> +  /* Each call that can be gimple-expanded has an associated built-in
> + function that it will expand into.  If this one doesn't, we have
> + already expanded it!  */
> +  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
> +return false;
> +
> +  bifdata *bd = &rs6000_builtin_info_x[fncode];
> +  unsigned nopnds = bd->nargs;
> +  gimple_seq new_seq = NULL;
> +  gimple *new_call;
> +  tree new_decl;
> +
> +  /* Compatibility built-ins; we used to call these
> + __builtin_mma_{dis,}assemble_pair, but now we call them
> + __builtin_vsx_{dis,}assemble_pair.  Handle the old versions.  */
> +  if (fncode == RS6000_BIF_ASSEMBLE_PAIR)
> +fncode = RS6000_BIF_ASSEMBLE_PAIR_V;
> +  else if (fncode == RS6000_BIF_DISASSEMBLE_PAIR)
> +fncode = RS6000_BIF_DISASSEMBLE_PAIR_V;
> +
> +  if (fncode == RS6000_BIF_DISASSEMBLE_ACC
> +  || fncode == RS6000_BIF_DISASSEMBLE_PAIR_V)
> +{
> +  /* This is an MMA disassemble built-in function.  */
> +  push_gimplify_context (true);
> +  unsigned nvec = (fncode == RS6000_BIF_DISASSEMBLE_ACC) ? 4 : 2;
> +  tree dst_ptr = gimple_call_arg (stmt, 0);
> +  tree src_ptr = gimple_call_ar

Re: [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza

2021-09-13 Thread will schmidt via Gcc-patches
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> I over-restricted use of __builtin_mffsl, since I was unaware that it
> automatically uses mffs when mffsl is not available.  Paul Clarke
> pointed
> this out in discussion of his SSE 4.1 compatibility patches.
> 
> 2021-08-31  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-call.c (__builtin_mffsl): Move from
> [power9]
>   to [always].
> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index 6a28d5189f8..a8c6b9e988f 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -208,6 +208,12 @@
>double __builtin_mffs ();
>  MFFS rs6000_mffs {}
> 
> +; Although the mffsl instruction is only available on POWER9 and
> later
> +; processors, this builtin automatically falls back to mffs on older
> +; platforms.  Thus it appears here in the [always] stanza.
> +  double __builtin_mffsl ();
> +MFFSL rs6000_mffsl {}
> +
>  ; This thing really assumes long double == __ibm128, and I'm told it
> has
>  ; been used as such within libgcc.  Given that __builtin_pack_ibm128
>  ; exists for the same purpose, this should really not be used at
> all.
> @@ -2784,9 +2790,6 @@
>signed long long __builtin_darn_raw ();
>  DARN_RAW darn_raw {}
> 
> -  double __builtin_mffsl ();
> -MFFSL rs6000_mffsl {}
> -
>const signed int __builtin_dtstsfi_eq_dd (const int<6>,
> _Decimal64);
>  TSTSFI_EQ_DD dfptstsfi_eq_dd {}
> 


Looks reasonable,
Thanks
-Will



Re: [PATCH 01/18] rs6000: Handle overloads during program parsing

2021-09-13 Thread will schmidt via Gcc-patches
On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:

Hi, 
  Just a couple cosmetic nits noted below, the majority if which is also in
the original code this is based on.  
THanks
-Will


> Although this patch looks quite large, the changes are fairly minimal.
> Most of it is duplicating the large function that does the overload
> resolution using the automatically generated data structures instead of
> the old hand-generated ones.  This doesn't make the patch terribly easy to
> review, unfortunately.  Just be aware that generally we aren't changing
> the logic and functionality of overload handling.

ok


> 
> 2021-08-31  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-c.c (rs6000-builtins.h): New include.
>   (altivec_resolve_new_overloaded_builtin): New forward decl.
>   (rs6000_new_builtin_type_compatible): New function.
>   (altivec_resolve_overloaded_builtin): Call
>   altivec_resolve_new_overloaded_builtin.
>   (altivec_build_new_resolved_builtin): New function.
>   (altivec_resolve_new_overloaded_builtin): Likewise.
>   * config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported):
>   Likewise.
>   * config/rs6000/rs6000-gen-builtins.c (write_decls): Remove _p from
>   name of rs6000_new_builtin_is_supported.


ok

> ---
>  gcc/config/rs6000/rs6000-c.c| 1088 +++
>  gcc/config/rs6000/rs6000-call.c |   53 ++
>  gcc/config/rs6000/rs6000-gen-builtins.c |2 +-
>  3 files changed, 1142 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index afcb5bb6e39..aafb4e6a98f 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -35,6 +35,9 @@
>  #include "langhooks.h"
>  #include "c/c-tree.h"
> 
> +#include "rs6000-builtins.h"
> +
> +static tree altivec_resolve_new_overloaded_builtin (location_t, tree, void 
> *);
> 
> 
>  /* Handle the machine specific pragma longcall.  Its syntax is
> @@ -811,6 +814,30 @@ is_float128_p (tree t)
> && t == long_double_type_node));
>  }
> 
> +static bool
> +rs6000_new_builtin_type_compatible (tree t, tree u)
> +{
> +  if (t == error_mark_node)
> +return false;
> +
> +  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (u))
> +return true;
> +
> +  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
> +  && is_float128_p (t) && is_float128_p (u))
> +return true;
> +
> +  if (POINTER_TYPE_P (t) && POINTER_TYPE_P (u))
> +{
> +  t = TREE_TYPE (t);
> +  u = TREE_TYPE (u);
> +  if (TYPE_READONLY (u))
> + t = build_qualified_type (t, TYPE_QUAL_CONST);
> +}
> +
> +  return lang_hooks.types_compatible_p (t, u);
> +}
> +

ok

>  static inline bool
>  rs6000_builtin_type_compatible (tree t, int id)
>  {
> @@ -927,6 +954,10 @@ tree
>  altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>   void *passed_arglist)
>  {
> +  if (new_builtins_are_live)
> +return altivec_resolve_new_overloaded_builtin (loc, fndecl,
> +passed_arglist);
> +
>vec *arglist = static_cast *> 
> (passed_arglist);
>unsigned int nargs = vec_safe_length (arglist);
>enum rs6000_builtins fcode

ok

> @@ -1930,3 +1961,1060 @@ altivec_resolve_overloaded_builtin (location_t loc, 
> tree fndecl,
>  return error_mark_node;
>}
>  }
> +
> +/* Build a tree for a function call to an Altivec non-overloaded builtin.
> +   The overloaded builtin that matched the types and args is described
> +   by DESC.  The N arguments are given in ARGS, respectively.
> +
> +   Actually the only thing it does is calling fold_convert on ARGS, with
> +   a small exception for vec_{all,any}_{ge,le} predicates. */
> +
> +static tree
> +altivec_build_new_resolved_builtin (tree *args, int n, tree fntype,
> + tree ret_type,
> + rs6000_gen_builtins bif_id,
> + rs6000_gen_builtins ovld_id)
> +{
> +  tree argtypes = TYPE_ARG_TYPES (fntype);
> +  tree arg_type[MAX_OVLD_ARGS];
> +  tree fndecl = rs6000_builtin_decls_x[bif_id];
> +  tree call;
> +
> +  for (int i = 0; i < n; i++)
> +arg_type[i] = TREE_VALUE (argtypes), argtypes = TREE_CHAIN (argtypes);
> +
> +  /* The AltiVec overloading implementation is overall gross, but this
> + is particularly disgusting.  The vec_{all,any}_{ge,le} builtins
> + are completely different for floating-point vs. integer vector
> + types, because the former has vcmpgefp, but the latter should use
> + vcmpgtXX.
> +
> + In practice, the second and third arguments are swapped, and the
> + condition (LT vs. EQ, which is recognizable by bit 1 of the first
> + argument) is reversed.  Patch the arguments here before building
> + the resolved CALL_EXPR.  */
> +  if (n == 3
> +  && ovld_id == RS6000_OVLD_VEC_CMPGE_P
> +  && bif_id != RS6000_BIF_VCMPGEFP

Re: [PATCH] Generate XXSPLTIDP on power10.

2021-08-26 Thread will schmidt via Gcc-patches
On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> Generate XXSPLTIDP on power10.
> 
> This patch implements XXSPLTIDP support for SF and DF scalar constants and 
> V2DF
> vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
> is converted to a vector of two DFmode constants.  The immediate is in SFmode
> format, so only constants that fit as SFmode values can be loaded with
> XXSPLTIDP.

ok

> 
> I added a new constraint (eF) to match constants that can be loaded with the
> XXSPLTIDP instruction.

> 
> I have added a temporary switch (-mxxspltidp) to control whether or not the
> XXSPLTIDP instruction is generated.

How temporary?  

> 
> I added 3 new tests to test loading up SF/DF scalar and V2DF vector
> constants.
> 
> I have tested this with bootstrap compilers on power10 systems and there was 
> no
> regression.  I have built GCC with these patches on little endian power9 and
> big endian power8 systems, and there were no regressions.
> 
> In addition, I have built and run the full Spec 2017 rate suite, comparing 
> with
> the patches enabled and not enabled.  There were roughly 66,000 XXSPLTIDP's
> generated in the rate build for Spec 2017.  On a stand-alone system that is
> running single threaded, blender_r has a 1.9% increase in performance, and 
> rest
> of the benchmarks are performance neutral.  However, I would expect that in a
> real world scenario, switching to use XXSPLTIDP will increase performance due
> to removing all of the loads.

ok

> 
> Can I check this into the master branch?
> 
> 2021-08-25  Michael Meissner  
> 
> gcc/
>   * config/rs6000/constraints.md (eF): New constraint.
>   * config/rs6000/predicates.md (easy_fp_constant): If we can load
>   the scalar constant with XXSPLTIDP, the floating point constant is
>   easy.

Could be shortened to something like ? 
  Add clause to accept xxspltidp_operand as easy.

>   (xxspltidp_operand): New predicate.

Will there ever be another instruction using the SF/DF CONST_DOUBLE  or
V2DF CONST_VECTOR ?   I tentatively question the name of the operand,
but defer.. 

>   (easy_vector_constant): If we can generate XXSPLTIDP, mark the
>   vector constant as easy.

Duplicated from above.

>   * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New
>   declaration.
>   (prefixed_permute_p): Likewise.


>   * config/rs6000/rs6000.c (xxspltidp_constant_p): New function.
>   (output_vec_const_move): Add support for XXSPLTIDP.
>   (prefixed_permute_p): New function.

Duplicated.

>   * config/rs6000/rs6000.md (prefixed attribute): Add support for
>   permute prefixed instructions.
>   (movsf_hardfloat): Add XXSPLTIDP support.
>   (mov_hardfloat32, FMOVE64 iterator): Likewise.
>   (mov_hardfloat64, FMOVE64 iterator): Likewise.
>   * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
>   * config/rs6000/vsx.md (vsx_move_64bit): Add XXSPLTIDP
>   support.
>   (vsx_move_32bit): Likewise.

No e in mov (per patch contents below).

>   (vsx_splat_v2df_xxspltidp): New insn.
>   (XXSPLTIDP): New mode iterator.
>   (xxspltidp__internal): New insn and splits.
>   (xxspltidp__inst): Replace xxspltidp_v2df_inst with an
>   iterated form that also does SFmode, and DFmode.
Swap "an iterated form" with "xxspltidp__inst  ?




> 
> gcc/testsuite/
>   * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
>   * gcc.target/powerpc/vec-splat-constant-df.c: New test.
>   * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> ---
>  gcc/config/rs6000/constraints.md  |   5 +
>  gcc/config/rs6000/predicates.md   |  17 +++
>  gcc/config/rs6000/rs6000-protos.h |   2 +
>  gcc/config/rs6000/rs6000.c| 106 ++
>  gcc/config/rs6000/rs6000.md   |  45 +---
>  gcc/config/rs6000/rs6000.opt  |   4 +
>  gcc/config/rs6000/vsx.md  |  64 ++-
>  .../powerpc/vec-splat-constant-df.c   |  60 ++
>  .../powerpc/vec-splat-constant-sf.c   |  60 ++
>  .../powerpc/vec-splat-constant-v2df.c |  64 +++
>  10 files changed, 405 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> 
> diff --git a/gcc/config/rs6000/constraints.md 
> b/gcc/config/rs6000/constraints.md
> index c8cff1a3038..ea2e4a267c3 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -208,6 +208,11 @@ (define_constraint "P"
>(and (match_code "const_int")
> (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 
> 0x1")))
> 
> +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
> +(define_constra

Re: [PATCH 06/34] rs6000: Add power7 and power7-64 builtins

2021-08-10 Thread will schmidt via Gcc-patches
On Thu, 2021-07-29 at 08:30 -0500, Bill Schmidt wrote:
> 2021-04-02  Bill Schmidt  
> 

Hi,


> gcc/
>   * config/rs6000/rs6000-builtin-new.def: Add power7 and power7-64
>   stanzas.


ok

> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 39 
>  1 file changed, 39 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index ca694be1ac3..bffce52ee47 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -1957,3 +1957,42 @@
>  
>const vsll __builtin_vsx_xxspltd_2di (vsll, const int<1>);
>  XXSPLTD_V2DI vsx_xxspltd_v2di {}
> +
> +
> +; Power7 builtins (ISA 2.06).
> +[power7]
> +  const unsigned int __builtin_addg6s (unsigned int, unsigned int);
> +ADDG6S addg6s {}

Add all of the sixes...   (ok).

> +
> +  const signed long __builtin_bpermd (signed long, signed long);
> +BPERMD bpermd_di {}
> +
> +  const unsigned int __builtin_cbcdtd (unsigned int);
> +CBCDTD cbcdtd {}
> +
> +  const unsigned int __builtin_cdtbcd (unsigned int);
> +CDTBCD cdtbcd {}
> +
> +  const signed int __builtin_divwe (signed int, signed int);
> +DIVWE dive_si {}
> +
> +  const unsigned int __builtin_divweu (unsigned int, unsigned int);
> +DIVWEU diveu_si {}
> +
> +  const vsq __builtin_pack_vector_int128 (unsigned long long, unsigned long 
> long);
> +PACK_V1TI packv1ti {}
> +
> +  void __builtin_ppc_speculation_barrier ();
> +SPECBARR speculation_barrier {}
> +
> +  const unsigned long __builtin_unpack_vector_int128 (vsq, const int<1>);
> +UNPACK_V1TI unpackv1ti {}
> +
> +
> +; Power7 builtins requiring 64-bit GPRs (even with 32-bit addressing).
> +[power7-64]
> +  const signed long long __builtin_divde (signed long long, signed long 
> long);
> +DIVDE dive_di {}
> +
> +  const unsigned long long __builtin_divdeu (unsigned long long, unsigned 
> long long);
> +DIVDEU diveu_di {}

ok

thanks
-Will





Re: [PATCH 05/34] rs6000: Add available-everywhere and ancient builtins

2021-08-10 Thread will schmidt via Gcc-patches
On Thu, 2021-07-29 at 08:30 -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-builtin-new.def: Add always, power5, and
>   power6 stanzas.
> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 72 
>  1 file changed, 72 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index 974cdc8c37c..ca694be1ac3 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -184,6 +184,78 @@
>  
>  
>  
> +; Builtins that have been around since time immemorial or are just
> +; considered available everywhere.
> +[always]
> +  void __builtin_cpu_init ();
> +CPU_INIT nothing {cpu}
> +
> +  bool __builtin_cpu_is (string);
> +CPU_IS nothing {cpu}
> +
> +  bool __builtin_cpu_supports (string);
> +CPU_SUPPORTS nothing {cpu}
> +
> +  unsigned long long __builtin_ppc_get_timebase ();
> +GET_TB rs6000_get_timebase {}
> +
> +  double __builtin_mffs ();
> +MFFS rs6000_mffs {}
> +
> +; This will break for long double == _Float128.  libgcc history.

Add a few more words to provide bigger hints for future archeological
digs?  (This is perhaps an obvious issue, but I'd need to do some
spelunking)
I see similar comments below, maybe just a wordier comment for the
first occurance.   Unsure...  

> +  const long double __builtin_pack_longdouble (double, double);
> +PACK_TF packtf {}
> +
> +  unsigned long __builtin_ppc_mftb ();
> +MFTB rs6000_mftb_di {32bit}
> +
> +  void __builtin_mtfsb0 (const int<5>);
> +MTFSB0 rs6000_mtfsb0 {}
> +
> +  void __builtin_mtfsb1 (const int<5>);
> +MTFSB1 rs6000_mtfsb1 {}
> +
> +  void __builtin_mtfsf (const int<8>, double);
> +MTFSF rs6000_mtfsf {}
> +
> +  const __ibm128 __builtin_pack_ibm128 (double, double);
> +PACK_IF packif {}
> +
> +  void __builtin_set_fpscr_rn (const int[0,3]);
> +SET_FPSCR_RN rs6000_set_fpscr_rn {}
> +
> +  const double __builtin_unpack_ibm128 (__ibm128, const int<1>);
> +UNPACK_IF unpackif {}
> +
> +; This will break for long double == _Float128.  libgcc history.
> +  const double __builtin_unpack_longdouble (long double, const int<1>);
> +UNPACK_TF unpacktf {}
> +
> +
> +; Builtins that have been around just about forever, but not quite.
> +[power5]
> +  fpmath double __builtin_recipdiv (double, double);
> +RECIP recipdf3 {}
> +
> +  fpmath float __builtin_recipdivf (float, float);
> +RECIPF recipsf3 {}
> +
> +  fpmath double __builtin_rsqrt (double);
> +RSQRT rsqrtdf2 {}
> +
> +  fpmath float __builtin_rsqrtf (float);
> +RSQRTF rsqrtsf2 {}
> +
> +
> +; Power6 builtins.

I see in subsequent patches you also call out the ISA version in the
comment.  so perhaps
; Power6 builtins (ISA 2.05).

Similar comment for Power5 reference
above.


> +[power6]
> +  const signed long __builtin_p6_cmpb (signed long, signed long);
> +CMPB cmpbdi3 {}
> +
> +  const signed int __builtin_p6_cmpb_32 (signed int, signed int);
> +CMPB_32 cmpbsi3 {}
> +
> +

ok.


>  ; AltiVec builtins.
>  [altivec]
>const vsc __builtin_altivec_abs_v16qi (vsc);



Re: [PATCH 04/34] rs6000: Add VSX builtins

2021-08-10 Thread will schmidt via Gcc-patches
On Thu, 2021-07-29 at 08:30 -0500, Bill Schmidt wrote:
> 2021-06-07  Bill Schmidt  
> 


Hi,

> gcc/
>   * config/rs6000/rs6000-builtin-new.def: Add vsx stanza.
> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 857 +++
>  1 file changed, 857 insertions(+)
> 


ok

> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index f1aa5529cdd..974cdc8c37c 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -1028,3 +1028,860 @@
>  
>const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
>  VEC_SET_V8HI nothing {set}
> +
> +
> +; VSX builtins.
> +[vsx]
> +  pure vd __builtin_altivec_lvx_v2df (signed long, const void *);
> +LVX_V2DF altivec_lvx_v2df {ldvec}
> +
> +  pure vsll __builtin_altivec_lvx_v2di (signed long, const void *);
> +LVX_V2DI altivec_lvx_v2di {ldvec}
> +
> +  pure vd __builtin_altivec_lvxl_v2df (signed long, const void *);
> +LVXL_V2DF altivec_lvxl_v2df {ldvec}
> +
> +  pure vsll __builtin_altivec_lvxl_v2di (signed long, const void *);
> +LVXL_V2DI altivec_lvxl_v2di {ldvec}
> +
> +  const vd __builtin_altivec_nabs_v2df (vd);
> +NABS_V2DF vsx_nabsv2df2 {}
> +
> +  const vsll __builtin_altivec_nabs_v2di (vsll);
> +NABS_V2DI nabsv2di2 {}
> +
> +  void __builtin_altivec_stvx_v2df (vd, signed long, void *);
> +STVX_V2DF altivec_stvx_v2df {stvec}
> +
> +  void __builtin_altivec_stvx_v2di (vsll, signed long, void *);
> +STVX_V2DI altivec_stvx_v2di {stvec}
> +
> +  void __builtin_altivec_stvxl_v2df (vd, signed long, void *);
> +STVXL_V2DF altivec_stvxl_v2df {stvec}
> +
> +  void __builtin_altivec_stvxl_v2di (vsll, signed long, void *);
> +STVXL_V2DI altivec_stvxl_v2di {stvec}
> +
> +  const vd __builtin_altivec_vand_v2df (vd, vd);
> +VAND_V2DF andv2df3 {}
> +
> +  const vsll __builtin_altivec_vand_v2di (vsll, vsll);
> +VAND_V2DI andv2di3 {}
> +
> +  const vull __builtin_altivec_vand_v2di_uns (vull, vull);
> +VAND_V2DI_UNS andv2di3 {}
> +
> +  const vd __builtin_altivec_vandc_v2df (vd, vd);
> +VANDC_V2DF andcv2df3 {}
> +
> +  const vsll __builtin_altivec_vandc_v2di (vsll, vsll);
> +VANDC_V2DI andcv2di3 {}
> +
> +  const vull __builtin_altivec_vandc_v2di_uns (vull, vull);
> +VANDC_V2DI_UNS andcv2di3 {}
> +
> +  const vsll __builtin_altivec_vcmpequd (vull, vull);
> +VCMPEQUD vector_eqv2di {}
> +
> +  const int __builtin_altivec_vcmpequd_p (int, vsll, vsll);
> +VCMPEQUD_P vector_eq_v2di_p {pred}
> +
> +  const vsll __builtin_altivec_vcmpgtsd (vsll, vsll);
> +VCMPGTSD vector_gtv2di {}
> +
> +  const int __builtin_altivec_vcmpgtsd_p (int, vsll, vsll);
> +VCMPGTSD_P vector_gt_v2di_p {pred}
> +
> +  const vsll __builtin_altivec_vcmpgtud (vull, vull);
> +VCMPGTUD vector_gtuv2di {}
> +
> +  const int __builtin_altivec_vcmpgtud_p (int, vsll, vsll);
> +VCMPGTUD_P vector_gtu_v2di_p {pred}
> +
> +  const vd __builtin_altivec_vnor_v2df (vd, vd);
> +VNOR_V2DF norv2df3 {}
> +
> +  const vsll __builtin_altivec_vnor_v2di (vsll, vsll);
> +VNOR_V2DI norv2di3 {}
> +
> +  const vull __builtin_altivec_vnor_v2di_uns (vull, vull);
> +VNOR_V2DI_UNS norv2di3 {}
> +
> +  const vd __builtin_altivec_vor_v2df (vd, vd);
> +VOR_V2DF iorv2df3 {}
> +
> +  const vsll __builtin_altivec_vor_v2di (vsll, vsll);
> +VOR_V2DI iorv2di3 {}
> +
> +  const vull __builtin_altivec_vor_v2di_uns (vull, vull);
> +VOR_V2DI_UNS iorv2di3 {}
> +
> +  const vd __builtin_altivec_vperm_2df (vd, vd, vuc);
> +VPERM_2DF altivec_vperm_v2df {}
> +
> +  const vsll __builtin_altivec_vperm_2di (vsll, vsll, vuc);
> +VPERM_2DI altivec_vperm_v2di {}
> +
> +  const vull __builtin_altivec_vperm_2di_uns (vull, vull, vuc);
> +VPERM_2DI_UNS altivec_vperm_v2di_uns {}
> +
> +  const vd __builtin_altivec_vreve_v2df (vd);
> +VREVE_V2DF altivec_vrevev2df2 {}
> +
> +  const vsll __builtin_altivec_vreve_v2di (vsll);
> +VREVE_V2DI altivec_vrevev2di2 {}
> +
> +  const vd __builtin_altivec_vsel_2df (vd, vd, vd);
> +VSEL_2DF vector_select_v2df {}
> +
> +  const vsll __builtin_altivec_vsel_2di (vsll, vsll, vsll);
> +VSEL_2DI_B vector_select_v2di {}
> +
> +  const vull __builtin_altivec_vsel_2di_uns (vull, vull, vull);
> +VSEL_2DI_UNS vector_select_v2di_uns {}
> +
> +  const vd __builtin_altivec_vsldoi_2df (vd, vd, const int<4>);
> +VSLDOI_2DF altivec_vsldoi_v2df {}
> +
> +  const vsll __builtin_altivec_vsldoi_2di (vsll, vsll, const int<4>);
> +VSLDOI_2DI altivec_vsldoi_v2di {}
> +
> +  const vd __builtin_altivec_vxor_v2df (vd, vd);
> +VXOR_V2DF xorv2df3 {}
> +
> +  const vsll __builtin_altivec_vxor_v2di (vsll, vsll);
> +VXOR_V2DI xorv2di3 {}
> +
> +  const vull __builtin_altivec_vxor_v2di_uns (vull, vull);
> +VXOR_V2DI_UNS xorv2di3 {}
> +
> +  const signed __int128 __builtin_vec_ext_v1ti (vsq, signed int);
> +VEC_EXT_V1TI nothing {extract}
> +
> +  const double __builtin_vec_ex

Re: [PATCH 42/55] rs6000: Handle gimple folding of target built-ins

2021-07-28 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:


Hi,


> This is another patch that looks bigger than it really is.  Because we
> have a new namespace for the builtins, allowing us to have both the old
> and new builtin infrastructure supported at once, we need versions of
> these functions that use the new builtin namespace.  Otherwise the code is
> unchanged.

> 
> 2021-06-17  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
>   New forward decl.
>   (rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin.
>   (rs6000_new_builtin_valid_without_lhs): New function.
>   (rs6000_gimple_fold_new_mma_builtin): Likewise.
>   (rs6000_gimple_fold_new_builtin): Likewise.

ok

> ---
>  gcc/config/rs6000/rs6000-call.c | 1152 +++
>  1 file changed, 1152 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 269fddcdc7e..52df3d165e1 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, 
> machine_mode,
>  static void rs6000_common_init_builtins (void);
>  static void htm_init_builtins (void);
>  static void mma_init_builtins (void);
> +static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
> 
> 
>  /* Hash table to keep track of the argument types for builtin functions.  */
> @@ -11992,6 +11993,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
> *gsi)
>  bool
>  rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  {
> +  if (new_builtins_are_live)
> +return rs6000_gimple_fold_new_builtin (gsi);
> +
>gimple *stmt = gsi_stmt (*gsi);
>tree fndecl = gimple_call_fndecl (stmt);
>gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == 
> BUILT_IN_MD);
> @@ -12939,6 +12943,35 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
>return false;
>  }
> 
> +/*  Helper function to sort out which built-ins may be valid without having
> +a LHS.  */
> +static bool
> +rs6000_new_builtin_valid_without_lhs (enum rs6000_gen_builtins fn_code,
> +   tree fndecl)
> +{
> +  if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node)
> +return true;
> +
> +  switch (fn_code)
> +{
> +case RS6000_BIF_STVX_V16QI:
> +case RS6000_BIF_STVX_V8HI:
> +case RS6000_BIF_STVX_V4SI:
> +case RS6000_BIF_STVX_V4SF:
> +case RS6000_BIF_STVX_V2DI:
> +case RS6000_BIF_STVX_V2DF:
> +case RS6000_BIF_STXVW4X_V16QI:
> +case RS6000_BIF_STXVW4X_V8HI:
> +case RS6000_BIF_STXVW4X_V4SF:
> +case RS6000_BIF_STXVW4X_V4SI:
> +case RS6000_BIF_STXVD2X_V2DF:
> +case RS6000_BIF_STXVD2X_V2DI:
> +  return true;
> +default:
> +  return false;
> +}
> +}

ok

> +
>  /* Check whether a builtin function is supported in this target
> configuration.  */
>  bool
> @@ -13030,6 +13063,1125 @@ rs6000_new_builtin_is_supported_p (enum 
> rs6000_gen_builtins fncode)
>return true;
>  }
> 
> +/* Expand the MMA built-ins early, so that we can convert the 
> pass-by-reference
> +   __vector_quad arguments into pass-by-value arguments, leading to more
> +   efficient code generation.  */
> +static bool
> +rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
> + rs6000_gen_builtins fn_code)
> +{
> +  gimple *stmt = gsi_stmt (*gsi);
> +  size_t fncode = (size_t) fn_code;
> +
> +  if (!bif_is_mma (rs6000_builtin_info_x[fncode]))
> +return false;
> +
> +  /* Each call that can be gimple-expanded has an associated built-in
> + function that it will expand into.  If this one doesn't, we have
> + already expanded it!  */
> +  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
> +return false;
> +
> +  bifdata *bd = &rs6000_builtin_info_x[fncode];
> +  unsigned nopnds = bd->nargs;
> +  gimple_seq new_seq = NULL;
> +  gimple *new_call;
> +  tree new_decl;
> +
> +  /* Compatibility built-ins; we used to call these
> + __builtin_mma_{dis,}assemble_pair, but now we call them
> + __builtin_vsx_{dis,}assemble_pair.  Handle the old verions.  */

versions.
(this snippet appears new to this version, so don't need to search for
an existing typo in current code. :-)

> +  if (fncode == RS6000_BIF_ASSEMBLE_PAIR)
> +fncode = RS6000_BIF_ASSEMBLE_PAIR_V;
> +  else if (fncode == RS6000_BIF_DISASSEMBLE_PAIR)
> +fncode = RS6000_BIF_DISASSEMBLE_PAIR_V;
> +
> +  if (fncode == RS6000_BIF_DISASSEMBLE_ACC
> +  || fncode == RS6000_BIF_DISASSEMBLE_PAIR_V)
> +{
> +  /* This is an MMA disassemble built-in function.  */
> +  push_gimplify_context (true);
> +  unsigned nvec = (fncode == RS6000_BIF_DISASSEMBLE_ACC) ? 4 : 2;
> +  tree dst_ptr = gimple_call_arg (stmt, 0);
> +  tree src_ptr = gimple_call_arg (stmt, 1);
> +  tree src_type = TREE_TYPE (src_ptr);
> +  tree

Re: [PATCH v2] rs6000: Add load density heuristic

2021-07-27 Thread will schmidt via Gcc-patches
On Wed, 2021-05-26 at 10:59 +0800, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 


Hi,


> This is the updated version of patch to deal with the bwaves_r
> degradation due to vector construction fed by strided loads.
> 
> As Richi's comments [1], this follows the similar idea to over
> price the vector construction fed by VMAT_ELEMENTWISE or
> VMAT_STRIDED_SLP.  Instead of adding the extra cost on vector
> construction costing immediately, it firstly records how many
> loads and vectorized statements in the given loop, later in
> rs6000_density_test (called by finish_cost) it computes the
> load density ratio against all vectorized stmts, and check
> with the corresponding thresholds DENSITY_LOAD_NUM_THRESHOLD
> and DENSITY_LOAD_PCT_THRESHOLD, do the actual extra pricing
> if both thresholds are exceeded.

ok

> 
> Note that this new load density heuristic check is based on
> some fields in target cost which are updated as needed when
> scanning each add_stmt_cost entry, it's independent of the
> current function rs6000_density_test which requires to scan
> non_vect stmts.  Since it's checking the load stmts count
> vs. all vectorized stmts, it's kind of density, so I put
> it in function rs6000_density_test.  With the same reason to
> keep it independent, I didn't put it as an else arm of the
> current existing density threshold check hunk or before this
> hunk.

ok

> 
> In the investigation of -1.04% degradation from 526.blender_r
> on Power8, I noticed that the extra penalized cost 320 on one
> single vector construction with type V16QI is much exaggerated,
> which makes the final body cost unreliable, so this patch adds
> one maximum bound for the extra penalized cost for each vector
> construction statement.

ok

> 
> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
> 
> Full SPEC2017 performance evaluation on Power8/Power9 with
> option combinations:
>   * -O2 -ftree-vectorize {,-fvect-cost-model=very-cheap} {,-ffast-math}
>   * {-O3, -Ofast} {,-funroll-loops}
> 
> bwaves_r degradations on P8/P9 have been fixed, nothing else
> remarkable was observed.

So, this fixes the "-1.04% degradation from 526.blender_r on Power8"
degredation with no additional regressions.  that sounds good. 

> 
> Is it ok for trunk?
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570076.html
> 
> BR,
> Kewen
> -
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.c (struct rs6000_cost_data): New members
>   nstmts, nloads and extra_ctor_cost.
>   (rs6000_density_test): Add load density related heuristics and the
>   checks, do extra costing on vector construction statements if need.
>   (rs6000_init_cost): Init new members.
>   (rs6000_update_target_cost_per_stmt): New function.
>   (rs6000_add_stmt_cost): Factor vect_nonmem hunk out to function
>   rs6000_update_target_cost_per_stmt and call it.
> 

> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 83d29cbfac1..806c3335cbc 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> 
> @@ -5231,6 +5231,12 @@ typedef struct _rs6000_cost_data
>  {
>struct loop *loop_info;
>unsigned cost[3];
> +  /* Total number of vectorized stmts (loop only).  */
> +  unsigned nstmts;
> +  /* Total number of loads (loop only).  */
> +  unsigned nloads;
> +  /* Possible extra penalized cost on vector construction (loop only).  */
> +  unsigned extra_ctor_cost;
> 
>/* For each vectorized loop, this var holds TRUE iff a non-memory vector
>   instruction is needed by the vectorization.  */
>bool vect_nonmem;
> @@ -5292,9 +5298,45 @@ rs6000_density_test (rs6000_cost_data *data)
>if (dump_enabled_p ())
>   dump_printf_loc (MSG_NOTE, vect_location,
>"density %d%%, cost %d exceeds threshold, penalizing "
> -  "loop body cost by %d%%", density_pct,
> +  "loop body cost by %d%%\n", density_pct,
>vec_cost + not_vec_cost, DENSITY_PENALTY);
>  }
> +
> +  /* Check if we need to penalize the body cost for latency and
> + execution resources bound from strided or elementwise loads
> + into a vector.  */
> +  if (data->extra_ctor_cost > 0)
> +{
> +  /* Threshold for load stmts percentage in all vectorized stmts.  */
> +  const int DENSITY_LOAD_PCT_THRESHOLD = 45;
> +  /* Threshold for total number of load stmts.  */
> +  const int DENSITY_LOAD_NUM_THRESHOLD = 20;
> +
> +  gcc_assert (data->nloads <= data->nstmts);
> +  unsigned int load_pct = (data->nloads * 100) / (data->nstmts);
> +
> +  /* It's likely to be bounded by latency and execution resources
> +  from many scalar loads which are strided or elementwise loads
> +  into a vector if both conditions below are found:
> +1. there are many loads, it's easy to result in a long wait
> +   for load units;
> +2. load has a big proportion of all vectorized statements,
> +

Re: [PATCH 51/55] rs6000: Miscellaneous uses of rs6000_builtin_decls_x

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-03-05  Bill Schmidt  
> 



Hi,

Could use a longer description. 


> gcc/
>   * config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Use
>   rs6000_builtin_decls_x when appropriate.
>   (add_condition_to_bb): Likewise.
>   (rs6000_atomic_assign_expand_fenv): Likewise.
> ---
>  gcc/config/rs6000/rs6000.c | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 70a2be90787..7f6c1f8036e 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -22767,12 +22767,16 @@ rs6000_builtin_reciprocal (tree fndecl)
>if (!RS6000_RECIP_AUTO_RSQRTE_P (V2DFmode))
>   return NULL_TREE;
> 
> +  if (new_builtins_are_live)
> + return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_2DF];
>return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_2DF];
> 
>  case VSX_BUILTIN_XVSQRTSP:
>if (!RS6000_RECIP_AUTO_RSQRTE_P (V4SFmode))
>   return NULL_TREE;
> 
> +  if (new_builtins_are_live)
> + return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_4SF];
>return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_4SF];
> 

ok


>  default:
> @@ -25361,7 +25365,10 @@ add_condition_to_bb (tree function_decl,
> tree version_decl,
> 
>tree bool_zero = build_int_cst (bool_int_type_node, 0);
>tree cond_var = create_tmp_var (bool_int_type_node);
> -  tree predicate_decl = rs6000_builtin_decls [(int)
> RS6000_BUILTIN_CPU_SUPPORTS];
> +  tree predicate_decl
> += (new_builtins_are_live
> +   ? rs6000_builtin_decls_x[(int) RS6000_BIF_CPU_SUPPORTS]
> +   : rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS]);
>const char *arg_str = rs6000_clone_map[clone_isa].name;
>tree predicate_arg = build_string_literal (strlen (arg_str) + 1,
> arg_str);
>gimple *call_cond_stmt = gimple_build_call (predicate_decl, 1,
> predicate_arg);
> @@ -27586,8 +27593,14 @@ rs6000_atomic_assign_expand_fenv (tree
> *hold, tree *clear, tree *update)
>return;
>  }
> 
> -  tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS];
> -  tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF];
> +  tree mffs
> += (new_builtins_are_live
> +   ? rs6000_builtin_decls_x[RS6000_BIF_MFFS]
> +   : rs6000_builtin_decls[RS6000_BUILTIN_MFFS]);
> +  tree mtfsf
> += (new_builtins_are_live
> +   ? rs6000_builtin_decls_x[RS6000_BIF_MTFSF]
> +   : rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]);
>tree call_mffs = build_call_expr (mffs, 0);

ok,

lgtm,
thanks
-Will

> 
>/* Generates the equivalent of feholdexcept (&fenv_var)



Re: [PATCH 50/55] rs6000: Update rs6000_builtin_decl

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-03-05  Bill Schmidt  
> 

Hi,
  Description could be a bit longer. :-)  (Even just a duplicate of the
mail subject to fill the space would prob be fine.) 

> gcc/
>   * config/rs6000/rs6000-call.c (rs6000_new_builtin_decl): New
>   function.
>   (rs6000_builtin_decl): Call it.
> ---
>  gcc/config/rs6000/rs6000-call.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 6b60f0852ef..54cf014ed23 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -16258,11 +16258,31 @@ rs6000_init_builtins (void)
>  }
>  }
> 
> +static tree
> +rs6000_new_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
> +{
> +  rs6000_gen_builtins fcode = (rs6000_gen_builtins) code;
> +
> +  if (fcode >= RS6000_OVLD_MAX)
> +return error_mark_node;
> +
> +  if (!rs6000_new_builtin_is_supported_p (fcode))
> +{
> +  rs6000_invalid_new_builtin (fcode);
> +  return error_mark_node;
> +}
> +
> +  return rs6000_builtin_decls_x[code];
> +}
> +
>  /* Returns the rs6000 builtin decl for CODE.  */
> 
>  tree
>  rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
>  {
> +  if (new_builtins_are_live)
> +return rs6000_new_builtin_decl (code, initialize_p);
> +


Ok,
lgtm, 
thanks
-Will

>HOST_WIDE_INT fnmask;
> 
>if (code >= RS6000_BUILTIN_COUNT)



Re: [PATCH 55/55] rs6000: Enable the new builtin support

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-03-05  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c (write_init_file):
>   Initialize new_builtins_are_live to 1.
> ---
>  gcc/config/rs6000/rs6000-gen-builtins.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c
> b/gcc/config/rs6000/rs6000-gen-builtins.c
> index acb213ca606..658c4e042f1 100644
> --- a/gcc/config/rs6000/rs6000-gen-builtins.c
> +++ b/gcc/config/rs6000/rs6000-gen-builtins.c
> @@ -2755,7 +2755,7 @@ write_init_file (void)
>fprintf (init_file, "#include \"rs6000-builtins.h\"\n");
>fprintf (init_file, "\n");
> 
> -  fprintf (init_file, "int new_builtins_are_live = 0;\n\n");
> +  fprintf (init_file, "int new_builtins_are_live = 1;\n\n");


Needs moar fanfare! :-)

lgtm, 
thanks
-Will

> 
>fprintf (init_file, "tree
> rs6000_builtin_decls_x[RS6000_OVLD_MAX];\n\n");
> 



Re: [PATCH 53/55] rs6000: Update altivec.h for automated interfaces

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-06-10  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/altivec.h: Delete a number of #defines that are
>   now superfluous; include rs6000-vecdefines.h; include some
>   synonyms.
> ---
>  gcc/config/rs6000/altivec.h | 522 +++---
> --
>  1 file changed, 41 insertions(+), 481 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index 5b631c7ebaf..8daf933e53e 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -55,32 +55,36 @@
>  #define __CR6_LT 2
>  #define __CR6_LT_REV 3
> 
> -/* Synonyms.  */
> +#include "rs6000-vecdefines.h"
> +
> +/* Deprecated interfaces.  */
> +#define vec_lvx vec_ld
> +#define vec_lvxl vec_ldl
> +#define vec_stvx vec_st
> +#define vec_stvxl vec_stl
>  #define vec_vaddcuw vec_addc
>  #define vec_vand vec_and
>  #define vec_vandc vec_andc
> -#define vec_vrfip vec_ceil
>  #define vec_vcmpbfp vec_cmpb
>  #define vec_vcmpgefp vec_cmpge
>  #define vec_vctsxs vec_cts
>  #define vec_vctuxs vec_ctu
>  #define vec_vexptefp vec_expte
> -#define vec_vrfim vec_floor
> -#define vec_lvx vec_ld
> -#define vec_lvxl vec_ldl
>  #define vec_vlogefp vec_loge
>  #define vec_vmaddfp vec_madd
>  #define vec_vmhaddshs vec_madds
> -#define vec_vmladduhm vec_mladd
>  #define vec_vmhraddshs vec_mradds
> +#define vec_vmladduhm vec_mladd
>  #define vec_vnmsubfp vec_nmsub
>  #define vec_vnor vec_nor
>  #define vec_vor vec_or
> -#define vec_vpkpx vec_packpx
>  #define vec_vperm vec_perm
> -#define vec_permxor __builtin_vec_vpermxor
> +#define vec_vpkpx vec_packpx
>  #define vec_vrefp vec_re
> +#define vec_vrfim vec_floor
>  #define vec_vrfin vec_round
> +#define vec_vrfip vec_ceil
> +#define vec_vrfiz vec_trunc
>  #define vec_vrsqrtefp vec_rsqrte
>  #define vec_vsel vec_sel
>  #define vec_vsldoi vec_sld
> @@ -91,440 +95,56 @@
>  #define vec_vspltisw vec_splat_s32
>  #define vec_vsr vec_srl
>  #define vec_vsro vec_sro
> -#define vec_stvx vec_st
> -#define vec_stvxl vec_stl
>  #define vec_vsubcuw vec_subc
>  #define vec_vsum2sws vec_sum2s
>  #define vec_vsumsws vec_sums
> -#define vec_vrfiz vec_trunc
>  #define vec_vxor vec_xor

Appears to be rearranged/alphabetized.. OK.

> +#ifdef _ARCH_PWR8
> +#define vec_vclz vec_cntlz
> +#define vec_vgbbd vec_gb
> +#define vec_vmrgew vec_mergee
> +#define vec_vmrgow vec_mergeo
> +#define vec_vpopcntu vec_popcnt
> +#define vec_vrld vec_rl
> +#define vec_vsld vec_sl
> +#define vec_vsrd vec_sr
> +#define vec_vsrad vec_sra
> +#endif


Does anything bad happen if these are simply defined, without the 
#ifdef/#endif protection? 
I'm wondering if there is some scenario with
pragma GCC target "cpu=powerX" where we may want them defined
anyway.  


Everything else appeears straightforward on this one, appears to be
mostly deletions. 

lgtm,
thanks
-Will


> +
> +#ifdef _ARCH_PWR9
> +#define vec_extract_fp_from_shorth vec_extract_fp32_from_shorth
> +#define vec_extract_fp_from_shortl vec_extract_fp32_from_shortl
> +#define vec_vctz vec_cnttz
> +#endif
> +
> +/* Synonyms.  */
>  /* Functions that are resolved by the backend to one of the
> typed builtins.  */
> -#define vec_vaddfp __builtin_vec_vaddfp
> -#define vec_addc __builtin_vec_addc
> -#define vec_adde __builtin_vec_adde
> -#define vec_addec __builtin_vec_addec
> -#define vec_vaddsws __builtin_vec_vaddsws
> -#define vec_vaddshs __builtin_vec_vaddshs
> -#define vec_vaddsbs __builtin_vec_vaddsbs
> -#define vec_vavgsw __builtin_vec_vavgsw
> -#define vec_vavguw __builtin_vec_vavguw
> -#define vec_vavgsh __builtin_vec_vavgsh
> -#define vec_vavguh __builtin_vec_vavguh
> -#define vec_vavgsb __builtin_vec_vavgsb
> -#define vec_vavgub __builtin_vec_vavgub
> -#define vec_ceil __builtin_vec_ceil
> -#define vec_cmpb __builtin_vec_cmpb
> -#define vec_vcmpeqfp __builtin_vec_vcmpeqfp
> -#define vec_cmpge __builtin_vec_cmpge
> -#define vec_vcmpgtfp __builtin_vec_vcmpgtfp
> -#define vec_vcmpgtsw __builtin_vec_vcmpgtsw
> -#define vec_vcmpgtuw __builtin_vec_vcmpgtuw
> -#define vec_vcmpgtsh __builtin_vec_vcmpgtsh
> -#define vec_vcmpgtuh __builtin_vec_vcmpgtuh
> -#define vec_vcmpgtsb __builtin_vec_vcmpgtsb
> -#define vec_vcmpgtub __builtin_vec_vcmpgtub
> -#define vec_vcfsx __builtin_vec_vcfsx
> -#define vec_vcfux __builtin_vec_vcfux
> -#define vec_cts __builtin_vec_cts
> -#define vec_ctu __builtin_vec_ctu
> -#define vec_cpsgn __builtin_vec_copysign
> -#define vec_double __builtin_vec_double
> -#define vec_doublee __builtin_vec_doublee
> -#define vec_doubleo __builtin_vec_doubleo
> -#define vec_doublel __builtin_vec_doublel
> -#define vec_doubleh __builtin_vec_doubleh
> -#define vec_expte __builtin_vec_expte
> -#define vec_float __builtin_vec_float
> -#define vec_float2 __builtin_vec_float2
> -#define vec_floate __builtin_vec_floate
> -#define vec_floato __builtin_vec_floato
> -#define vec_floor __builtin_vec_floor
> -#define vec_loge __builtin_v

Re: [PATCH 52/55] rs6000: Debug support

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-04-01  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-call.c (rs6000_debug_type): New function.
>   (def_builtin): Change debug formatting for easier parsing and
>   include more information.
>   (rs6000_init_builtins): Add dump of autogenerated builtins.
>   (altivec_init_builtins): Dump __builtin_altivec_mask_for_load for
>   completeness.
> ---
>  gcc/config/rs6000/rs6000-call.c | 193 +++-
>  1 file changed, 189 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 54cf014ed23..d28bb14b0bb 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -8880,6 +8880,106 @@ rs6000_gimplify_va_arg (tree valist, tree type, 
> gimple_seq *pre_p,
> 
>  /* Builtins.  */
> 
> +/* Debug utility to translate a type node to a single token.  */
> +static
> +const char *rs6000_debug_type (tree type)
> +{
> +  if (type == void_type_node)
> +return "void";
> +  else if (type == long_integer_type_node)
> +return "long";
> +  else if (type == long_unsigned_type_node)
> +return "ulong";
> +  else if (type == long_long_integer_type_node)
> +return "longlong";
> +  else if (type == long_long_unsigned_type_node)
> +return "ulonglong";
> +  else if (type == bool_V16QI_type_node)
> +return "vbc";
> +  else if (type == bool_V2DI_type_node)
> +return "vbll";
> +  else if (type == bool_V4SI_type_node)
> +return "vbi";
> +  else if (type == bool_V8HI_type_node)
> +return "vbs";

I'd be strongly tempted to rearrange the order and put V16 after V8 in
the list.  Similar to the order you previously used in
rs6000_expand_new_builtin(). Same comment elsewhere.


> +  else if (type == bool_int_type_node)
> +return "bool";
> +  else if (type == dfloat64_type_node)
> +return "_Decimal64";
> +  else if (type == double_type_node)
> +return "double";
> +  else if (type == intDI_type_node)
> +return "sll";
> +  else if (type == intHI_type_node)
> +return "ss";
> +  else if (type == ibm128_float_type_node)
> +return "__ibm128";
> +  else if (type == opaque_V4SI_type_node)
> +return "opaque";
> +  else if (POINTER_TYPE_P (type))
> +return "void*";
> +  else if (type == intQI_type_node || type == char_type_node)
> +return "sc";
> +  else if (type == dfloat32_type_node)
> +return "_Decimal32";
> +  else if (type == float_type_node)
> +return "float";
> +  else if (type == intSI_type_node || type == integer_type_node)
> +return "si";
> +  else if (type == dfloat128_type_node)
> +return "_Decimal128";
> +  else if (type == long_double_type_node)
> +return "longdouble";
> +  else if (type == intTI_type_node)
> +return "sq";
> +  else if (type == unsigned_intDI_type_node)
> +return "ull";
> +  else if (type == unsigned_intHI_type_node)
> +return "us";
> +  else if (type == unsigned_intQI_type_node)
> +return "uc";
> +  else if (type == unsigned_intSI_type_node)
> +return "ui";
> +  else if (type == unsigned_intTI_type_node)
> +return "uq";
> +  else if (type == unsigned_V16QI_type_node)
> +return "vuc";
> +  else if (type == unsigned_V1TI_type_node)
> +return "vuq";
> +  else if (type == unsigned_V2DI_type_node)
> +return "vull";
> +  else if (type == unsigned_V4SI_type_node)
> +return "vui";
> +  else if (type == unsigned_V8HI_type_node)
> +return "vus";
> +  else if (type == V16QI_type_node)
> +return "vsc";
> +  else if (type == V1TI_type_node)
> +return "vsq";
> +  else if (type == V2DF_type_node)
> +return "vd";
> +  else if (type == V2DI_type_node)
> +return "vsll";
> +  else if (type == V4SF_type_node)
> +return "vf";
> +  else if (type == V4SI_type_node)
> +return "vsi";
> +  else if (type == V8HI_type_node)
> +return "vss";
> +  else if (type == pixel_V8HI_type_node)
> +return "vp";
> +  else if (type == pcvoid_type_node)
> +return "voidc*";
> +  else if (type == float128_type_node)
> +return "_Float128";
> +  else if (type == vector_pair_type_node)
> +return "__vector_pair";
> +  else if (type == vector_quad_type_node)
> +return "__vector_quad";
> +  else
> +return "unknown";
> +}
> +

Ok

>  static void
>  def_builtin (const char *name, tree type, enum rs6000_builtins code)
>  {
> @@ -8908,7 +9008,7 @@ def_builtin (const char *name, tree type, enum 
> rs6000_builtins code)
>/* const function, function only depends on the inputs.  */
>TREE_READONLY (t) = 1;
>TREE_NOTHROW (t) = 1;
> -  attr_string = ", const";
> +  attr_string = "= const";
>  }
>else if ((classify & RS6000_BTC_PURE) != 0)
>  {
> @@ -8916,7 +9016,7 @@ def_builtin (const char *name, tree type, enum 
> rs6000_builtins code)
>external state.  */
>DECL_PURE_P (t) = 1;
>TREE_NOTHROW (t) = 1;
> -   

Re: [PATCH 47/55] rs6000: Builtin expansion, part 4

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-03-05  Bill Schmidt  
> 


Hi,


> gcc/
>   * config/rs6000/rs6000-call.c (elemrev_icode): Implement.
>   (ldv_expand_builtin): Likewise.
>   (lxvrse_expand_builtin): Likewise.
>   (lxvrze_expand_builtin): Likewise.
>   (stv_expand_builtin): Likewise.


> ---
>  gcc/config/rs6000/rs6000-call.c | 217 
>  1 file changed, 217 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index ad3e6a4bbe5..981eabc1187 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -14710,12 +14710,114 @@ new_cpu_expand_builtin (enum rs6000_gen_builtins 
> fcode,
>  static insn_code
>  elemrev_icode (rs6000_gen_builtins fcode)
>  {
> +  switch (fcode)
> +{
> +default:
> +  gcc_unreachable ();
> +case RS6000_BIF_ST_ELEMREV_V1TI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v1ti
> +   : CODE_FOR_vsx_st_elemrev_v1ti);


Hmm, would it be worthy to rename one of the pair so they both match "_st_" or 
"_store_" ?  

CODE_FOR_vsx_store_v1ti
CODE_FOR_vsx_st_elemrev_v1ti

Same for _ld_ and _load_ , but it's all a conversation for elsewhere... :-)

Ok,



> +case RS6000_BIF_ST_ELEMREV_V2DF:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v2df
> +   : CODE_FOR_vsx_st_elemrev_v2df);
> +case RS6000_BIF_ST_ELEMREV_V2DI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v2di
> +   : CODE_FOR_vsx_st_elemrev_v2di);
> +case RS6000_BIF_ST_ELEMREV_V4SF:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v4sf
> +   : CODE_FOR_vsx_st_elemrev_v4sf);
> +case RS6000_BIF_ST_ELEMREV_V4SI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v4si
> +   : CODE_FOR_vsx_st_elemrev_v4si);
> +case RS6000_BIF_ST_ELEMREV_V8HI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v8hi
> +   : CODE_FOR_vsx_st_elemrev_v8hi);
> +case RS6000_BIF_ST_ELEMREV_V16QI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v16qi
> +   : CODE_FOR_vsx_st_elemrev_v16qi);
> +case RS6000_BIF_LD_ELEMREV_V2DF:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v2df
> +   : CODE_FOR_vsx_ld_elemrev_v2df);
> +case RS6000_BIF_LD_ELEMREV_V1TI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v1ti
> +   : CODE_FOR_vsx_ld_elemrev_v1ti);
> +case RS6000_BIF_LD_ELEMREV_V2DI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v2di
> +   : CODE_FOR_vsx_ld_elemrev_v2di);
> +case RS6000_BIF_LD_ELEMREV_V4SF:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v4sf
> +   : CODE_FOR_vsx_ld_elemrev_v4sf);
> +case RS6000_BIF_LD_ELEMREV_V4SI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v4si
> +   : CODE_FOR_vsx_ld_elemrev_v4si);
> +case RS6000_BIF_LD_ELEMREV_V8HI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v8hi
> +   : CODE_FOR_vsx_ld_elemrev_v8hi);
> +case RS6000_BIF_LD_ELEMREV_V16QI:
> +  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v16qi
> +   : CODE_FOR_vsx_ld_elemrev_v16qi);
> +}
> +  gcc_unreachable ();
>return (insn_code) 0;
>  }

ok


>  static rtx
>  ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
>  {
> +  rtx pat, addr;
> +  bool blk = (icode == CODE_FOR_altivec_lvlx
> +   || icode == CODE_FOR_altivec_lvlxl
> +   || icode == CODE_FOR_altivec_lvrx
> +   || icode == CODE_FOR_altivec_lvrxl);
> +
> +  if (target == 0
> +  || GET_MODE (target) != tmode
> +  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))

No space after "!" ?  (here and later on 'pat'.).

> +target = gen_reg_rtx (tmode);
> +

> +  op[1] = copy_to_mode_reg (Pmode, op[1]);
> +
> +  /* For LVX, express the RTL accurately by ANDing the address with -16.
> + LVXL and LVE*X expand to use UNSPECs to hide their special behavior,
> + so the raw address is fine.  */
good comment. :-)

> +  if (icode == CODE_FOR_altivec_lvx_v1ti
> +  || icode == CODE_FOR_altivec_lvx_v2df
> +  || icode == CODE_FOR_altivec_lvx_v2di
> +  || icode == CODE_FOR_altivec_lvx_v4sf
> +  || icode == CODE_FOR_altivec_lvx_v4si
> +  || icode == CODE_FOR_altivec_lvx_v8hi
> +  || icode == CODE_FOR_altivec_lvx_v16qi)
> +{
> +  rtx rawaddr;
> +  if (op[0] == const0_rtx)
> + rawaddr = op[1];
> +  else
> + {
> +   op[0] = copy_to_mode_reg (Pmode, op[0]);
> +   rawaddr = gen_rtx_PLUS (Pmode, op[1], op[0]);
> + }
> +  addr = gen_rtx_AND (Pmode, rawaddr, gen_rtx_CONST_INT (Pmode, -16));
> +  addr = gen_rtx_MEM (blk ? BLKmode : tmode, addr);
> +
> +  emit_insn (gen_rtx_SET (target, addr));
> +}
> +  else
> +{
> +  if (op[0] == const0_rtx)
> + addr = gen_rtx_MEM (blk ? BLKmode : tmode, op[1]);
> +  else
> + {
> +   op[0] = copy_to_mo

Re: [PATCH 49/55] rs6000: Builtin expansion, part 6

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-03-24  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-call.c (new_htm_spr_num): New function.
>   (new_htm_expand_builtin): Implement.
>   (rs6000_expand_new_builtin): Handle 32-bit and endian cases.
> ---
>  gcc/config/rs6000/rs6000-call.c | 202 
>  1 file changed, 202 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index e1b685fb874..6b60f0852ef 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -15068,11 +15068,171 @@ new_mma_expand_builtin (tree exp, rtx target, 
> insn_code icode,
>return target;
>  }
> 
> +/* Return the appropriate SPR number associated with the given builtin.  */
> +static inline HOST_WIDE_INT
> +new_htm_spr_num (enum rs6000_gen_builtins code)
> +{
> +  if (code == RS6000_BIF_GET_TFHAR
> +  || code == RS6000_BIF_SET_TFHAR)
> +return TFHAR_SPR;
> +  else if (code == RS6000_BIF_GET_TFIAR
> +|| code == RS6000_BIF_SET_TFIAR)
> +return TFIAR_SPR;
> +  else if (code == RS6000_BIF_GET_TEXASR
> +|| code == RS6000_BIF_SET_TEXASR)
> +return TEXASR_SPR;
> +  gcc_assert (code == RS6000_BIF_GET_TEXASRU
> +   || code == RS6000_BIF_SET_TEXASRU);
> +  return TEXASRU_SPR;
> +}

Ok,


> +
>  /* Expand the HTM builtin in EXP and store the result in TARGET.  */
>  static rtx
>  new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
>   tree exp, rtx target)
>  {
> +  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> +  bool nonvoid = TREE_TYPE (TREE_TYPE (fndecl)) != void_type_node;
> +
> +  if (!TARGET_POWERPC64
> +  && (fcode == RS6000_BIF_TABORTDC
> +   || fcode == RS6000_BIF_TABORTDCI))
> +{
> +  error ("builtin %qs is only valid in 64-bit mode", bifaddr->bifname);
> +  return const0_rtx;
> +}
ok

> +
> +  rtx op[MAX_HTM_OPERANDS], pat;
> +  int nopnds = 0;
> +  tree arg;
> +  call_expr_arg_iterator iter;
> +  insn_code icode = bifaddr->icode;
> +  bool uses_spr = bif_is_htmspr (*bifaddr);
> +  rtx cr = NULL_RTX;
> +
> +  if (uses_spr)
> +icode = rs6000_htm_spr_icode (nonvoid);
> +  const insn_operand_data *insn_op = &insn_data[icode].operand[0];
> +
> +  if (nonvoid)
> +{
> +  machine_mode tmode = (uses_spr) ? insn_op->mode : E_SImode;
> +  if (!target
> +   || GET_MODE (target) != tmode
> +   || (uses_spr && !(*insn_op->predicate) (target, tmode)))
> + target = gen_reg_rtx (tmode);
> +  if (uses_spr)
> + op[nopnds++] = target;
> +}
> +
> +  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
> +{
> +  if (arg == error_mark_node || nopnds >= MAX_HTM_OPERANDS)
> + return const0_rtx;
> +
> +  insn_op = &insn_data[icode].operand[nopnds];
> +  op[nopnds] = expand_normal (arg);
> +
> +  if (!(*insn_op->predicate) (op[nopnds], insn_op->mode))
> + {
> +   if (!strcmp (insn_op->constraint, "n"))
> + {
> +   int arg_num = (nonvoid) ? nopnds : nopnds + 1;
> +   if (!CONST_INT_P (op[nopnds]))
> + error ("argument %d must be an unsigned literal", arg_num);
> +   else
> + error ("argument %d is an unsigned literal that is "
> +"out of range", arg_num);
> +   return const0_rtx;
> + }
> +   op[nopnds] = copy_to_mode_reg (insn_op->mode, op[nopnds]);
> + }
> +
> +  nopnds++;
> +}
> +
> +  /* Handle the builtins for extended mnemonics.  These accept
> + no arguments, but map to builtins that take arguments.  */
> +  switch (fcode)
> +{
> +case RS6000_BIF_TENDALL:  /* Alias for: tend. 1  */
> +case RS6000_BIF_TRESUME:  /* Alias for: tsr. 1  */
> +  op[nopnds++] = GEN_INT (1);
> +  break;
> +case RS6000_BIF_TSUSPEND: /* Alias for: tsr. 0  */
> +  op[nopnds++] = GEN_INT (0);
> +  break;
> +default:
> +  break;
> +}

ok

> +
> +  /* If this builtin accesses SPRs, then pass in the appropriate
> + SPR number and SPR regno as the last two operands.  */
> +  if (uses_spr)
> +{
> +  machine_mode mode = (TARGET_POWERPC64) ? DImode : SImode;
> +  op[nopnds++] = gen_rtx_CONST_INT (mode, new_htm_spr_num (fcode));
> +}
> +  /* If this builtin accesses a CR, then pass in a scratch
> + CR as the last operand.  */
> +  else if (bif_is_htmcr (*bifaddr))

Given this is an if/else, presumably there are no builtins that use
both a SPR and access a CR ?

> +{
> +  cr = gen_reg_rtx (CCmode);
> +  op[nopnds++] = cr;
> +}
> +
> +  switch (nopnds)
> +{
> +case 1:
> +  pat = GEN_FCN (icode) (op[0]);
> +  break;
> +case 2:
> +  pat = GEN_FCN (icode) (op[0], op[1]);
> +  break;
> +case 3:
> +  pat = GEN_FCN (icode) (op[0], op[1], op[2]);
> +  break;
> +case 4:
> +  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
> +  

Re: [PATCH 48/55] rs6000: Builtin expansion, part 5

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-06-17  Bill Schmidt  
> 

Hi,


> gcc/
>   * config/rs6000/rs6000-call.c (new_mma_expand_builtin):
>   Implement.

Ok,

> ---
>  gcc/config/rs6000/rs6000-call.c | 103 
>  1 file changed, 103 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 981eabc1187..e1b685fb874 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -14962,6 +14962,109 @@ static rtx
>  new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
>   rs6000_gen_builtins fcode)
>  {
> +  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> +  tree arg;
> +  call_expr_arg_iterator iter;
> +  const struct insn_operand_data *insn_op;
> +  rtx op[MAX_MMA_OPERANDS];
> +  unsigned nopnds = 0;
> +  bool void_func = TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
> +  machine_mode tmode = VOIDmode;
> +
> +  if (!void_func)
> +{
> +  tmode = insn_data[icode].operand[0].mode;
> +  if (!target
> +   || GET_MODE (target) != tmode
> +   || !(*insn_data[icode].operand[0].predicate) (target, tmode))
> + target = gen_reg_rtx (tmode);
> +  op[nopnds++] = target;
> +}
> +  else
> +target = const0_rtx;
> +
> +  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
> +{
> +  if (arg == error_mark_node)
> + return const0_rtx;
> +
> +  rtx opnd;
> +  insn_op = &insn_data[icode].operand[nopnds];
> +  if (TREE_CODE (arg) == ADDR_EXPR
> +   && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0
> + opnd = DECL_RTL (TREE_OPERAND (arg, 0));
> +  else
> + opnd = expand_normal (arg);
> +
> +  if (!(*insn_op->predicate) (opnd, insn_op->mode))
> + {
> +   if (!strcmp (insn_op->constraint, "n"))
> + {
> +   if (!CONST_INT_P (opnd))
> + error ("argument %d must be an unsigned literal", nopnds);
> +   else
> + error ("argument %d is an unsigned literal that is "
> +"out of range", nopnds);
> +   return const0_rtx;
> + }
> +   opnd = copy_to_mode_reg (insn_op->mode, opnd);
> + }
> +
> +  /* Some MMA instructions have INOUT accumulator operands, so force
> +  their target register to be the same as their input register.  */
> +  if (!void_func
> +   && nopnds == 1
> +   && !strcmp (insn_op->constraint, "0")
> +   && insn_op->mode == tmode
> +   && REG_P (opnd)
> +   && (*insn_data[icode].operand[0].predicate) (opnd, tmode))
> + target = op[0] = opnd;
> +
> +  op[nopnds++] = opnd;
> +}
> +
> +  rtx pat;
> +  switch (nopnds)
> +{
> +case 1:
> +  pat = GEN_FCN (icode) (op[0]);
> +  break;
> +case 2:
> +  pat = GEN_FCN (icode) (op[0], op[1]);
> +  break;
> +case 3:
> +  /* The ASSEMBLE builtin source operands are reversed in little-endian
> +  mode, so reorder them.  */
> +  if (fcode == RS6000_BIF_ASSEMBLE_PAIR_V_INTERNAL && !WORDS_BIG_ENDIAN)
> + std::swap (op[1], op[2]);
> +  pat = GEN_FCN (icode) (op[0], op[1], op[2]);
> +  break;
> +case 4:
> +  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
> +  break;
> +case 5:
> +  /* The ASSEMBLE builtin source operands are reversed in little-endian
> +  mode, so reorder them.  */

I'd be tempted to consolidate the source operand reversal comments for
RS6000_BIF_ASSEMBLE_PAIR_V_INTERNAL and 
RS6000_BIF_ASSEMBLE_ACC_INTERNAL..  up at the start of the case
statement..  but actually i think this makes sense as-is. 
Ok.


> +  if (fcode == RS6000_BIF_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
> + {
> +   std::swap (op[1], op[4]);
> +   std::swap (op[2], op[3]);
> + }
> +  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
> +  break;
> +case 6:
> +  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
> +  break;
> +case 7:
> +  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], 
> op[6]);
> +  break;
> +default:
> +  gcc_unreachable ();
> +}
> +  if (!pat)
> +return NULL_RTX;
> +  emit_insn (pat);
> +

Ok,
lgtm,
thanks
-Will

>return target;
>  }
> 



Re: [PATCH 46/55] rs6000: Builtin expansion, part 3

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-03-05  Bill Schmidt  
> 

Hi,


> gcc/
>   * config/rs6000/rs6000-call.c (new_cpu_expand_builtin):
>   Implement.

ok


> ---
>  gcc/config/rs6000/rs6000-call.c | 100 
>  1 file changed, 100 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 754cd46b1c1..ad3e6a4bbe5 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -14604,6 +14604,106 @@ static rtx
>  new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
>   tree exp ATTRIBUTE_UNUSED, rtx target)
>  {
> +  /* __builtin_cpu_init () is a nop, so expand to nothing.  */
> +  if (fcode == RS6000_BIF_CPU_INIT)
> +return const0_rtx;
> +
> +  if (target == 0 || GET_MODE (target) != SImode)
> +target = gen_reg_rtx (SImode);
> +
> +#ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
> +  tree arg = TREE_OPERAND (CALL_EXPR_ARG (exp, 0), 0);
> +  /* Target clones creates an ARRAY_REF instead of STRING_CST, convert it 
> back
> + to a STRING_CST.  */
> +  if (TREE_CODE (arg) == ARRAY_REF
> +  && TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST
> +  && TREE_CODE (TREE_OPERAND (arg, 1)) == INTEGER_CST
> +  && compare_tree_int (TREE_OPERAND (arg, 1), 0) == 0)
> +arg = TREE_OPERAND (arg, 0);
> +
> +  if (TREE_CODE (arg) != STRING_CST)
> +{
> +  error ("builtin %qs only accepts a string argument",
> +  rs6000_builtin_info_x[(size_t) fcode].bifname);
> +  return const0_rtx;
> +}
> +
> +  if (fcode == RS6000_BIF_CPU_IS)
> +{
> +  const char *cpu = TREE_STRING_POINTER (arg);
> +  rtx cpuid = NULL_RTX;
> +  for (size_t i = 0; i < ARRAY_SIZE (cpu_is_info); i++)
> + if (strcmp (cpu, cpu_is_info[i].cpu) == 0)
> +   {
> + /* The CPUID value in the TCB is offset by _DL_FIRST_PLATFORM.  */
> + cpuid = GEN_INT (cpu_is_info[i].cpuid + _DL_FIRST_PLATFORM);
> + break;
> +   }

ok

> +  if (cpuid == NULL_RTX)
> + {
> +   /* Invalid CPU argument.  */
> +   error ("cpu %qs is an invalid argument to builtin %qs",
> +  cpu, rs6000_builtin_info_x[(size_t) fcode].bifname);
> +   return const0_rtx;
> + }
> +
> +  rtx platform = gen_reg_rtx (SImode);
> +  rtx tcbmem = gen_const_mem (SImode,
> +   gen_rtx_PLUS (Pmode,
> + gen_rtx_REG (Pmode, TLS_REGNUM),
> + GEN_INT (TCB_PLATFORM_OFFSET)));
> +  emit_move_insn (platform, tcbmem);
> +  emit_insn (gen_eqsi3 (target, platform, cpuid));
> +}
> +  else if (fcode == RS6000_BIF_CPU_SUPPORTS)
> +{
> +  const char *hwcap = TREE_STRING_POINTER (arg);
> +  rtx mask = NULL_RTX;
> +  int hwcap_offset;
> +  for (size_t i = 0; i < ARRAY_SIZE (cpu_supports_info); i++)
> + if (strcmp (hwcap, cpu_supports_info[i].hwcap) == 0)
> +   {
> + mask = GEN_INT (cpu_supports_info[i].mask);
> + hwcap_offset = TCB_HWCAP_OFFSET (cpu_supports_info[i].id);
> + break;
> +   }
> +  if (mask == NULL_RTX)
> + {
> +   /* Invalid HWCAP argument.  */
> +   error ("%s %qs is an invalid argument to builtin %qs",
> +  "hwcap", hwcap,
> +  rs6000_builtin_info_x[(size_t) fcode].bifname);
> +   return const0_rtx;
> + }
> +
> +  rtx tcb_hwcap = gen_reg_rtx (SImode);
> +  rtx tcbmem = gen_const_mem (SImode,
> +   gen_rtx_PLUS (Pmode,
> + gen_rtx_REG (Pmode, TLS_REGNUM),
> + GEN_INT (hwcap_offset)));
> +  emit_move_insn (tcb_hwcap, tcbmem);
> +  rtx scratch1 = gen_reg_rtx (SImode);
> +  emit_insn (gen_rtx_SET (scratch1, gen_rtx_AND (SImode, tcb_hwcap, 
> mask)));
> +  rtx scratch2 = gen_reg_rtx (SImode);
> +  emit_insn (gen_eqsi3 (scratch2, scratch1, const0_rtx));
> +  emit_insn (gen_rtx_SET (target, gen_rtx_XOR (SImode, scratch2, 
> const1_rtx)));
> +}
> +  else
> +gcc_unreachable ();
> +
> +  /* Record that we have expanded a CPU builtin, so that we can later
> + emit a reference to the special symbol exported by LIBC to ensure we
> + do not link against an old LIBC that doesn't support this feature.  */
> +  cpu_builtin_p = true;
> +
> +#else
> +  warning (0, "builtin %qs needs GLIBC (2.23 and newer) that exports 
> hardware "
> +"capability bits", rs6000_builtin_info_x[(size_t) fcode].bifname);
> +

This seems OK. 
It appears to comply with the documentation at least  :-)
"If GCC was configured to use a GLIBC before 2.23, the built-in
function __builtin_cpu_is always returns a 0 and the compiler
issues a warning."

ok
lgtm,
thanks
-Will

> +  /* For old LIBCs, always return FALSE.  */
> +  emit_move_insn 

Re: [PATCH 45/55] rs6000: Builtin expansion, part 2

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-03-05  Bill Schmidt  
> 

Hi,

> gcc/
>   * config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin):
>   Implement.
>   (rs6000_expand_ldst_mask): Likewise.
>   (rs6000_init_builtins): Initialize altivec_builtin_mask_for_load.

ok

> ---
>  gcc/config/rs6000/rs6000-call.c | 101 +++-
>  1 file changed, 100 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 8693836cd5a..754cd46b1c1 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -11671,6 +11671,75 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
>  static void
>  rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
>  {
> +  size_t uns_fncode = (size_t) fncode;
> +  const char *name = rs6000_builtin_info_x[uns_fncode].bifname;
> +
> +  switch (rs6000_builtin_info_x[uns_fncode].enable)
> +{
> +case ENB_P5:
> +  error ("%qs requires the %qs option", name, "-mcpu=power5");
> +  break;
> +case ENB_P6:
> +  error ("%qs requires the %qs option", name, "-mcpu=power6");
> +  break;
> +case ENB_ALTIVEC:
> +  error ("%qs requires the %qs option", name, "-maltivec");
> +  break;
> +case ENB_CELL:
> +  error ("%qs is only valid for the cell processor", name);
> +  break;
> +case ENB_VSX:
> +  error ("%qs requires the %qs option", name, "-mvsx");
> +  break;
> +case ENB_P7:
> +  error ("%qs requires the %qs option", name, "-mcpu=power7");
> +  break;
> +case ENB_P7_64:
> +  error ("%qs requires the %qs option and either the %qs or %qs option",
> +  name, "-mcpu=power7", "-m64", "-mpowerpc64");
> +  break;
> +case ENB_P8:
> +  error ("%qs requires the %qs option", name, "-mcpu=power8");
> +  break;
> +case ENB_P8V:
> +  error ("%qs requires the %qs option", name, "-mpower8-vector");
> +  break;
> +case ENB_P9:
> +  error ("%qs requires the %qs option", name, "-mcpu=power9");
> +  break;
> +case ENB_P9_64:
> +  error ("%qs requires the %qs option and either the %qs or %qs option",
> +  name, "-mcpu=power9", "-m64", "-mpowerpc64");
> +  break;
> +case ENB_P9V:
> +  error ("%qs requires the %qs option", name, "-mpower9-vector");
> +  break;
> +case ENB_IEEE128_HW:
> +  error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
> +  break;
> +case ENB_DFP:
> +  error ("%qs requires the %qs option", name, "-mhard-dfp");
> +  break;
> +case ENB_CRYPTO:
> +  error ("%qs requires the %qs option", name, "-mcrypto");
> +  break;
> +case ENB_HTM:
> +  error ("%qs requires the %qs option", name, "-mhtm");
> +  break;
> +case ENB_P10:
> +  error ("%qs requires the %qs option", name, "-mcpu=power10");
> +  break;
> +case ENB_P10_64:
> +  error ("%qs requires the %qs option and either the %qs or %qs option",
> +  name, "-mcpu=power10", "-m64", "-mpowerpc64");
> +  break;
> +case ENB_MMA:
> +  error ("%qs requires the %qs option", name, "-mmma");
> +  break;
> +default:
> +case ENB_ALWAYS:
> +  gcc_unreachable ();
> +};

ok

>  }
> 
>  /* Target hook for early folding of built-ins, shamelessly stolen
> @@ -14501,7 +14570,33 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
> subtarget ATTRIBUTE_UNUSED,
>  rtx
>  rs6000_expand_ldst_mask (rtx target, tree arg0)
>   {
> -  return target;
> +  int icode2 = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr_direct
> + : (int) CODE_FOR_altivec_lvsl_direct);
> +  machine_mode tmode = insn_data[icode2].operand[0].mode;
> +  machine_mode mode = insn_data[icode2].operand[1].mode;
> +  rtx op, addr, pat;
> +
> +  gcc_assert (TARGET_ALTIVEC);
> +
> +  gcc_assert (POINTER_TYPE_P (TREE_TYPE (arg0)));
> +  op = expand_expr (arg0, NULL_RTX, Pmode, EXPAND_NORMAL);
> +  addr = memory_address (mode, op);
> +  /* We need to negate the address.  */
> +  op = gen_reg_rtx (GET_MODE (addr));
> +  emit_insn (gen_rtx_SET (op, gen_rtx_NEG (GET_MODE (addr), addr)));
> +  op = gen_rtx_MEM (mode, op);
> +
> +  if (target == 0
> +  || GET_MODE (target) != tmode
> +  || ! (*insn_data[icode2].operand[0].predicate) (target, tmode))
> +target = gen_reg_rtx (tmode);
> +
> +  pat = GEN_FCN (icode2) (target, op);
> +  if (!pat)
> +return 0;
> +  emit_insn (pat);
> +
> +   return target;

ok

>   }
> 
>  /* Expand the CPU builtin in FCODE and store the result in TARGET.  */
> @@ -15401,6 +15496,10 @@ rs6000_init_builtins (void)
>/* Execute the autogenerated initialization code for builtins.  */
>rs6000_autoinit_builtins ();
> 
> +  if (new_builtins_are_live)
> +altivec_builtin_mask_for_load
> +  = rs6000_builtin_decls_x[RS6000_BIF_MASK_FOR_LOAD];
> +

ok

>if (new_builtins_are_live)
>  {
>  #ifdef SUBTAR

Re: [PATCH 44/55] rs6000: Builtin expansion, part 1

2021-07-27 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-06-17  Bill Schmidt  
> 

Hi,

> gcc/
>   * config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): New
>   forward decl.
>   (rs6000_invalid_new_builtin): New stub function.
>   (rs6000_expand_builtin): Call rs6000_expand_new_builtin.
>   (rs6000_expand_ldst_mask): New stub function.
>   (new_cpu_expand_builtin): Likewise.
>   (elemrev_icode): Likewise.
>   (ldv_expand_builtin): Likewise.
>   (lxvrse_expand_builtin): Likewise.
>   (lxvrze_expand_builtin): Likewise.
>   (stv_expand_builtin): Likewise.
>   (new_mma_expand_builtin): Likewise.
>   (new_htm_expand_builtin): Likewise.
>   (rs6000_expand_new_builtin): New function.
> ---
>  gcc/config/rs6000/rs6000-call.c | 526 
>  1 file changed, 526 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 52df3d165e1..8693836cd5a 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, 
> machine_mode,
>  static void rs6000_common_init_builtins (void);
>  static void htm_init_builtins (void);
>  static void mma_init_builtins (void);
> +static rtx rs6000_expand_new_builtin (tree, rtx, rtx, machine_mode, int);
>  static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);

Ok if the forward declaration is necessary.   I think we still have a 
todo to rearrange and eliminate some of those, IMO not worth holding up this
patch series for that.

> 
> 
> @@ -11664,6 +11665,14 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
>  error ("%qs is not supported with the current options", name);
>  }
> 
> +/* Raise an error message for a builtin function that is called without the
> +   appropriate target options being set.  */
> +
> +static void
> +rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
> +{
> +}
> +
>  /* Target hook for early folding of built-ins, shamelessly stolen
> from ia64.c.  */
> 
> @@ -14193,6 +14202,9 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
> subtarget ATTRIBUTE_UNUSED,
>  machine_mode mode ATTRIBUTE_UNUSED,
>  int ignore ATTRIBUTE_UNUSED)
>  {
> +  if (new_builtins_are_live)
> +return rs6000_expand_new_builtin (exp, target, subtarget, mode, ignore);
> +
>tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
>enum rs6000_builtins fcode
>  = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
> @@ -14485,6 +14497,520 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
> subtarget ATTRIBUTE_UNUSED,
>gcc_unreachable ();
>  }
> 
> +/* Expand ALTIVEC_BUILTIN_MASK_FOR_LOAD.  */
> +rtx
> +rs6000_expand_ldst_mask (rtx target, tree arg0)
> + {
> +  return target;
> + }
> +
> +/* Expand the CPU builtin in FCODE and store the result in TARGET.  */
> +static rtx
> +new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
> + tree exp ATTRIBUTE_UNUSED, rtx target)
> +{
> +  return target;
> +}
> +
> +static insn_code
> +elemrev_icode (rs6000_gen_builtins fcode)
> +{
> +  return (insn_code) 0;
> +}
> +
> +static rtx
> +ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
> +{
> +  return target;
> +}
> +
> +static rtx
> +lxvrse_expand_builtin (rtx target, insn_code icode, rtx *op,
> +machine_mode tmode, machine_mode smode)
> +{
> +  return target;
> +}
> +
> +static rtx
> +lxvrze_expand_builtin (rtx target, insn_code icode, rtx *op,
> +machine_mode tmode, machine_mode smode)
> +{
> +  return target;
> +}
> +
> +static rtx
> +stv_expand_builtin (insn_code icode, rtx *op,
> + machine_mode tmode, machine_mode smode)
> +{
> +  return NULL_RTX;
> +}
> +
> +/* Expand the MMA built-in in EXP.  */
> +static rtx
> +new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
> + rs6000_gen_builtins fcode)
> +{
> +  return target;
> +}
> +
> +/* Expand the HTM builtin in EXP and store the result in TARGET.  */
> +static rtx
> +new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
> + tree exp, rtx target)
> +{
> +  return const0_rtx;
> +}
> +
> +/* Expand an expression EXP that calls a built-in function,
> +   with result going to TARGET if that's convenient
> +   (and in mode MODE if that's convenient).
> +   SUBTARGET may be used as the target for computing one of EXP's operands.
> +   IGNORE is nonzero if the value is to be ignored.
> +   Use the new builtin infrastructure.  */
> +static rtx
> +rs6000_expand_new_builtin (tree exp, rtx target,
> +rtx subtarget ATTRIBUTE_UNUSED,
> +machine_mode ignore_mode ATTRIBUTE_UNUSED,
> +int ignore ATTRIBUTE_UNUSED)
> +{
> +  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> +  enum r

Re: PING: [RS6000] rotate and mask constants [PR94393]

2021-07-27 Thread will schmidt via Gcc-patches
On Fri, 2021-07-23 at 15:23 -0500, Pat Haugen via Gcc-patches wrote:
> Ping 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html
> 
> I've done a current bootstrap/regtest on powerpc64/powerpc64le with
> no regressions.
> 
> -Pat

That patch was previously posted by Alan Modra.
Given the time lapse this may need to be re-posted entirely, pending
what the maintainers suggest.. :-)


> gcc/
>   * config/rs6000/rs6000.c (rotate_and_mask_constant): New function.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.

> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.
> 

I did a brief look-over at the time, and things looked OK to me then.


Thanks, 
-Will



Re: [PATCH 05/55] rs6000: Add helper functions for parsing

2021-07-09 Thread will schmidt via Gcc-patches
On Thu, 2021-06-17 at 10:18 -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-06-07  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-gen-builtins.c (consume_whitespace): New
>   function.
>   (advance_line): Likewise.
>   (safe_inc_pos): Likewise.
>   (match_identifier): Likewise.
>   (match_integer): Likewise.
>   (match_to_right_bracket): Likewise.
> ---
>  gcc/config/rs6000/rs6000-gen-builtins.c | 111 
>  1 file changed, 111 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c 
> b/gcc/config/rs6000/rs6000-gen-builtins.c
> index 3c53c3401b2..c5d5590e865 100644
> --- a/gcc/config/rs6000/rs6000-gen-builtins.c
> +++ b/gcc/config/rs6000/rs6000-gen-builtins.c
> @@ -210,3 +210,114 @@ ovld_diag (const char * fmt, ...)
>vfprintf (stderr, fmt, args);
>va_end (args);
>  }
> +
> +/* Pass over unprintable characters and whitespace (other than a newline,
> +   which terminates the scan).  */

AFAIK isspace() and thusly this helper only skips whitespace, so
nothing unprintable is actually handled or skipped here.
Beyond that comment nit the function seems OK.

> +static void
> +consume_whitespace (void)
> +{
> +  while (pos < LINELEN && isspace(linebuf[pos]) && linebuf[pos] != '\n')
> +pos++;
> +  return;
> +}
> +
> +/* Get the next nonblank, noncomment line, returning 0 on EOF, 1 otherwise.  
> */
> +static int
> +advance_line (FILE *file)
> +{
> +  while (1)
> +{
> +  /* Read ahead one line and check for EOF.  */
> +  if (!fgets (linebuf, sizeof linebuf, file))
> + return 0;
> +  line++;
> +  size_t len = strlen (linebuf);
> +  if (linebuf[len - 1] != '\n')
> + (*diag) ("line doesn't terminate with newline\n");
> +  pos = 0;
> +  consume_whitespace ();
> +  if (linebuf[pos] != '\n' && linebuf[pos] != ';')
> + return 1;
> +}
> +}
ok

> +
> +static inline void
> +safe_inc_pos (void)
> +{
> +  if (pos++ >= LINELEN)
> +{
> +  (*diag) ("line length overrun.\n");
> +  exit (1);
> +}
> +}

ok

> +
> +/* Match an identifier, returning NULL on failure, else a pointer to a
> +   buffer containing the identifier.  */
> +static char *
> +match_identifier (void)
> +{
> +  int lastpos = pos - 1;
> +  while (isalnum (linebuf[lastpos + 1]) || linebuf[lastpos + 1] == '_')
> +++lastpos;
> +
> +  if (lastpos < pos)
> +return 0;
> +
> +  char *buf = (char *) malloc (lastpos - pos + 2);
> +  memcpy (buf, &linebuf[pos], lastpos - pos + 1);
> +  buf[lastpos - pos + 1] = '\0';
> +
> +  pos = lastpos + 1;
> +  return buf;
> +}
ok


> +
> +/* Match an integer and return the string representing its value,
> +   or a null string on failure.  */
> +static char *
> +match_integer (void)
> +{
> +  int startpos = pos;
> +  if (linebuf[pos] == '-')
> +safe_inc_pos ();
> +
> +  int lastpos = pos - 1;
> +  while (isdigit (linebuf[lastpos + 1]))
> +++lastpos;
> +
> +  if (lastpos < pos)
> +return NULL;
> +
> +  pos = lastpos + 1;
> +  char *buf = (char *) malloc (lastpos - startpos + 2);
> +  memcpy (buf, &linebuf[startpos], lastpos - startpos + 1);
> +  buf[lastpos - startpos + 1] = '\0';
> +  return buf;
> +}
Ok

> +
> +/* Match a string up to but not including a ']', and return its value,
> +   or zero if there is nothing before the ']'.  Error if we don't find
> +   such a character.  */
> +static const char *
> +match_to_right_bracket (void)
> +{
> +  int lastpos = pos - 1;
> +  while (linebuf[lastpos + 1] != ']')
> +{
> +  if (linebuf[lastpos + 1] == '\n')
> + {
> +   (*diag) ("no ']' found before end of line.\n");
> +   exit (1);
> + }
> +  ++lastpos;
> +}
> +
> +  if (lastpos < pos)
> +return 0;
> +
> +  char *buf = (char *) malloc (lastpos - pos + 2);
> +  memcpy (buf, &linebuf[pos], lastpos - pos + 1);
> +  buf[lastpos - pos + 1] = '\0';
> +
> +  pos = lastpos + 1;
> +  return buf;
> +}

Ok. 

presumably all tested OK.. :-)

lgtm, 
thanks
-Will



Re: Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-07-09 Thread will schmidt via Gcc-patches
On Wed, 2021-06-30 at 09:44 +0800, Xionghu Luo via Gcc-patches wrote:
> Gentle ping ^2, thanks.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html
> 
> 
> On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote:
> > Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%,
> > 526.blender_r +1.72%, no obvious changes to others.

Ok.

> > 
> > 
> > On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote:
> > > Gentle ping, thanks.
> > > 
> > > 
> > > On 2021/4/16 15:10, Xiong Hu Luo wrote:
> > > > fmod/fmodf and remainder/remainderf could be expanded instead of library
> > > > call when fast-math build, which is much faster.
> > > > 
> > > > fmodf:
> > > >   fdivs   f0,f1,f2
> > > >   frizf0,f0
> > > >   fnmsubs f1,f2,f0,f1
> > > > 
> > > > remainderf:
> > > >   fdivs   f0,f1,f2
> > > >   frinf0,f0
> > > >   fnmsubs f1,f2,f0,f1
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > 2021-04-16  Xionghu Luo  
> > > > 
> > > > PR target/97142

That PR is " Bug 97142 
  - __builtin_fmod not optimized on POWER   "

OK.


> > > > * config/rs6000/rs6000.md (fmod3): New define_expand.
> > > > (remainder3): Likewise.


> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > 2021-04-16  Xionghu Luo  
> > > > 
> > > > PR target/97142
> > > > * gcc.target/powerpc/pr97142.c: New test.

Ok.

> > > > ---
> > > >   gcc/config/rs6000/rs6000.md| 36 ++
> > > >   gcc/testsuite/gcc.target/powerpc/pr97142.c | 30 ++
> > > >   2 files changed, 66 insertions(+)
> > > >   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97142.c
> > > > 
> > > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > > > index a1315523fec..7e0e94e6ba4 100644
> > > > --- a/gcc/config/rs6000/rs6000.md
> > > > +++ b/gcc/config/rs6000/rs6000.md
> > > > @@ -4902,6 +4902,42 @@ (define_insn "fre"
> > > > [(set_attr "type" "fp")
> > > >  (set_attr "isa" "*,")])
> > > > +(define_expand "fmod3"
> > > > +  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
> > > > +(use (match_operand:SFDF 1 "gpc_reg_operand"))
> > > > +(use (match_operand:SFDF 2 "gpc_reg_operand"))]
> > > > +  "TARGET_HARD_FLOAT
> > > > +  && TARGET_FPRND
> > > > +  && flag_unsafe_math_optimizations"
> > > > +{
> > > > +  rtx div = gen_reg_rtx (mode);
> > > > +  emit_insn (gen_div3 (div, operands[1], operands[2]));
> > > > +
> > > > +  rtx friz = gen_reg_rtx (mode);
> > > > +  emit_insn (gen_btrunc2 (friz, div));
> > > > +
> > > > +  emit_insn (gen_nfms4 (operands[0], operands[2], friz, 
> > > > operands[1]));
> > > > +  DONE;
> > > > + })
> > > > +
> > > > +(define_expand "remainder3"
> > > > +  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
> > > > +(use (match_operand:SFDF 1 "gpc_reg_operand"))
> > > > +(use (match_operand:SFDF 2 "gpc_reg_operand"))]
> > > > +  "TARGET_HARD_FLOAT
> > > > +  && TARGET_FPRND
> > > > +  && flag_unsafe_math_optimizations"
> > > > +{
> > > > +  rtx div = gen_reg_rtx (mode);
> > > > +  emit_insn (gen_div3 (div, operands[1], operands[2]));
> > > > +
> > > > +  rtx frin = gen_reg_rtx (mode);
> > > > +  emit_insn (gen_round2 (frin, div));
> > > > +
> > > > +  emit_insn (gen_nfms4 (operands[0], operands[2], frin, 
> > > > operands[1]));
> > > > +  DONE;
> > > > + })

I notice the pattern of arguments to the final emit
is op[0],op[2],fri*,op[1]
while the description comment suggests the generated instruction 
will be fnmsubs  f1,f2,f0,f1  ;

I don't see any rearranging in the nfms4 expansions, but
presumably this is correct and just a cosmetic nit that catches my eye.

Ok.


> > > > +
> > > >   (define_insn "*rsqrt2"
> > > > [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa")
> > > >   (unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" ",wa")]
> > > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr97142.c 
> > > > b/gcc/testsuite/gcc.target/powerpc/pr97142.c
> > > > new file mode 100644
> > > > index 000..48f25ca5b5b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/pr97142.c
> > > > @@ -0,0 +1,30 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-Ofast" } */
> > > > +
> > > > +#include 
> > > > +
> > > > +float test1 (float x, float y)
> > > > +{
> > > > +  return fmodf (x, y);
> > > > +}
> > > > +
> > > > +double test2 (double x, double y)
> > > > +{
> > > > +  return fmod (x, y);
> > > > +}
> > > > +
> > > > +float test3 (float x, float y)
> > > > +{
> > > > +  return remainderf (x, y);
> > > > +}
> > > > +
> > > > +double test4 (double x, double y)
> > > > +{
> > > > +  return remainder (x, y);
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not {\mbl fmod\M} } } */
> > > > +/* { dg-final { scan-assembler-not {\mbl fmodf\M} } } */
> > > > +/* { dg-final { scan-assembler-not {\mbl remainder\M} } } */
> > > > +/* { dg-final { scan-assembler-not {\mbl remainderf\M} } } */


Ok.
I'd be tempted to add scan-assembler checks for the fdivs,

Re: Generate 128-bit divide/modulus

2021-06-04 Thread will schmidt via Gcc-patches
On Fri, 2021-06-04 at 11:10 -0400, Michael Meissner wrote:


Hi,


> Generate 128-bit divide/modulus.
> 
> This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ
> instructions to do 128-bit arithmetic.

vdivsq,vdivuq,vmodsq,vmoduq should be lowercase ? 

> 
> I have tested this on 3 compilers:
> * Power9 little endian, --with-cpu=power9
> * Power8 big endian, --with-cpu=power8, both 32/64-bit tested
> * Power10 little endian, --with-cpu=power10
> 
> There were no issues found in the runs.  Can I check this into the
> master
> branch and later into the GCC 11 branch after a soak-in period?





> 
> gcc/
> 2021-06-03  Michael Meissner  
> 
>   PR target/100809

Add some reference to [PR/100809] in the subject?

>From the GCC bugzilla 

> [tag] [reply] [−] Comment 3 Michael Meissner 2021-06-01 22:55:20 UTC
> 
> Carl Love submitted a patch for this on April 26th.
> 
> [tag] [reply] [−] Comment 4 Michael Meissner 2021-06-01 22:58:31 UTC
> 
> Note, in looking at Carl's patch, it is only for adding the built-
> ins.  I don't believe it adds direct support for {,u}divti3 and
> {,u}moddti3 to implement these for normal __int128 variables.
> 

A few words to clarify the situation in the description may be good.. 
Since that patch did not directly address the PR, i imagine that was a
happy accident that it partially implemented/resolved the situation
here.


>   * config/rs6000/rs6000.md (udivti3): New insn.
>   (divti3): New insn.
>   (umodti3): New insn.
>   (modti3): New insn.

ok

> 
> gcc/testsuite/
> 2021-06-03  Michael Meissner  
> 
>   PR target/100809
>   * gcc.target/powerpc/p10-vdiv-vmod.c: New test.


ok



> ---
>  gcc/config/rs6000/rs6000.md   | 34
> +++
>  .../gcc.target/powerpc/p10-vdivq-vmodq.c  | 27 +++
>  2 files changed, 61 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-vdivq-
> vmodq.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md
> b/gcc/config/rs6000/rs6000.md
> index 2517901f239..e70dbe409df 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -3234,6 +3234,14 @@ (define_insn "udiv3"
>[(set_attr "type" "div")
> (set_attr "size" "")])
> 
> +(define_insn "udivti3"
> +  [(set (match_operand:TI 0 "altivec_register_operand" "=v")
> +(udiv:TI (match_operand:TI 1 "altivec_register_operand" "v")
> +  (match_operand:TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10 && TARGET_POWERPC64"
> +  "vdivuq %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> 
>  ;; For powers of two we can do sra[wd]i/addze for divide and then
> adjust for
>  ;; modulus.  If it isn't a power of two, force operands into
> register and do
> @@ -3324,6 +3332,15 @@ (define_insn_and_split "*div3_sra_dot2"
> (set_attr "length" "8,12")
> (set_attr "cell_micro" "not")])
> 
> +(define_insn "divti3"
> +  [(set (match_operand:TI 0 "altivec_register_operand" "=v")
> +(div:TI (match_operand:TI 1 "altivec_register_operand" "v")
> + (match_operand:TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10 && TARGET_POWERPC64"
> +  "vdivsq %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
>  (define_expand "mod3"
>[(set (match_operand:GPR 0 "gpc_reg_operand")
>   (mod:GPR (match_operand:GPR 1 "gpc_reg_operand")
> @@ -3424,6 +3441,23 @@ (define_peephole2
>   (minus:GPR (match_dup 1)
>  (match_dup 3)))])
> 
> +(define_insn "umodti3"
> +  [(set (match_operand:TI 0 "altivec_register_operand" "=v")
> +(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
> +  (match_operand:TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10 && TARGET_POWERPC64"
> +  "vmoduq %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "modti3"
> +  [(set (match_operand:TI 0 "altivec_register_operand" "=v")
> +(mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
> + (match_operand:TI 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10 && TARGET_POWERPC64"
> +  "vmodsq %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])

ok

>  
>  ;; Logical instructions
>  ;; The logical instructions are mostly combined by using
> match_operator,
> diff --git a/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c
> b/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c
> new file mode 100644
> index 000..cd29b0a4b6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c
> @@ -0,0 +1,27 @@
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */

ok

> +
> +unsigned __int128 u_div(unsigned __int128 a, unsigned __int128 b)
> +{
> +   return a/b;
> +}
> +
> +unsigned __int128 u_mod(unsigned __int128 a, unsigned __int128 b)
> +{
> +   return a

Re: [PATCH 2/2] Fix tests when running on power10, PR testsuite/100166

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:59 -0400, Michael Meissner wrote:
> [PATCH 2/2] Fix tests when running on power10, PR testsuite/100166
> 
Hi,


> This patch updates the various tests in the testsuite to adjust the test
> if power10 code generation is used.
> 
> Some tests would not generate the expected instructions because power10
> provides new instructions that the compiler now generates.  These tests are
> adjusted to use '#pragma GCC target ("cpu=power9"), or the new instructions
> were added to regex.

ok

> 
> One test was checking for 64-bit TOC calls, and it was adjusted to also allow
> PC-relative calls.
> 
> I have bootstraped this on LE power9 and BE power8 systems.  There were no
> regressions in the tests.  Can I check this into the trunk?
> 
> I would like to back port these patches to GCC 11 after a cooling off period.
> Is that ok?
> 
> gcc/testsuite/
> 2021-05-18  Michael Meissner  
> 
>   PR testsuite/100166
>   * gcc.dg/pr56727-2.c: Add support for PC-relative calls.
>   * gcc.target/powerpc/fold-vec-div-longlong.c:
>   * gcc.target/powerpc/fold-vec-mult-longlong.c: Disable power10
>   code generation.
>   * gcc.target/powerpc/ppc-eq0-1.c: Add support for the setbc
>   instruction.
>   * gcc.target/powerpc/ppc-ne0-1.c: Disable power10 code
>   generation.
> ---
>  gcc/testsuite/gcc.dg/pr56727-2.c  | 2 +-
>  gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c  | 7 +++
>  gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c | 7 +++
>  gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c  | 2 +-
>  gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c  | 8 
>  5 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c 
> b/gcc/testsuite/gcc.dg/pr56727-2.c
> index c54369ed25e..77fdf4bc350 100644
> --- a/gcc/testsuite/gcc.dg/pr56727-2.c
> +++ b/gcc/testsuite/gcc.dg/pr56727-2.c
> @@ -18,4 +18,4 @@ void h ()
> 
>  /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* } } 
> } */
>  /* { dg-final { scan-assembler "@(PLT|plt)" { target { powerpc*-*-linux* && 
> ilp32 } } } } */
> -/* { dg-final { scan-assembler "bl f\n\\s*nop" { target { powerpc*-*-linux* 
> && lp64 } } } } */
> +/* { dg-final { scan-assembler "(bl f\n\\s*nop)|(bl f@notoc)" { target { 
> powerpc*-*-linux* && lp64 } } } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
> index 312e984d3cc..1d20b7ff100 100644
> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
> @@ -6,6 +6,13 @@
>  /* { dg-require-effective-target lp64 } */
>  /* { dg-options "-mvsx -O2" } */
> 
> +/* If the compiler was configured to automatically generate power10 support 
> with
> +   --with-cpu=power10, turn it off.  Otherwise, it will generate VDIVSD and
> +   VDIVUD instructions.  */
> +#ifdef _ARCH_PWR10
> +#pragma GCC target ("cpu=power9")
> +#endif
> +
>  #include 
> 
>  vector signed long long
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
> index 38dba9f5023..7510dc5c7a7 100644
> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
> @@ -6,6 +6,13 @@
>  /* { dg-options "-maltivec -mvsx -mpower8-vector" } */
>  /* { dg-additional-options "-maix64" { target powerpc-ibm-aix* } } */
> 
> +/* If the compiler was configured to automatically generate power10 support 
> with
> +   --with-cpu=power10, turn it off.  Otherwise, it will generate VMULLD
> +   instructions.  */
> +#ifdef _ARCH_PWR10
> +#pragma GCC target ("cpu=power9")
> +#endif
> +
>  #include 
> 
>  vector signed long long
> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c 
> b/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c
> index 496a6e340c0..2ddf03117ab 100644
> --- a/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-eq0-1.c
> @@ -7,4 +7,4 @@ int foo(int x)
>return x == 0;
>  }
> 
> -/* { dg-final { scan-assembler "cntlzw|isel" } } */
> +/* { dg-final { scan-assembler "cntlzw|isel|setbc" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c 
> b/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c
> index 63c4b6087df..bf777979833 100644
> --- a/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-ne0-1.c
> @@ -2,6 +2,14 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mno-isel" } */
> 
> +/* If the compiler was configured to automatically generate power10 support 
> with
> +   --with-cpu=power10, turn it off.  Otherwise, it will generate a SETBCR
> +   instruction instead of ADDIC/SUBFE.  */
> +
> +#ifdef _ARCH_PWR10
> +#pragma GCC target ("cpu=power9")
> +#endif
> +
>  /* { dg-final { scan-assembler-times "addic" 4 } } */
>  /*

Re: [PATCH 1/2] Deal with prefixed loads/stores in tests, PR testsuite/100166

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:57 -0400, Michael Meissner wrote:
> [PATCH 1/2] Deal with prefixed loads/stores in tests, PR testsuite/100166
> 

Hi,

> This patch updates the various tests in the testsuite to treat plxv
> and pstxv as being vector loads/stores.  This shows up if you run the
> testsuite with a compiler configured with the option: --with-cpu=power10.
> 
> I have bootstraped this on LE power9 and BE power8 systems.  There were no
> regressions in the tests.  Can I check this into the trunk?
> 
> I would like to back port these patches to GCC 11 after a cooling off period.
> Is that ok?
> 
> gcc/testsuite/
> 2021-05-18  Michael Meissner  
> 
>   PR testsuite/100166
>   * gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c:
>   * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c:
>   * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c:
>   * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c:
>   * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c:
>   * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c:
>   * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c:
>   * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c:
>   * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c:
>   * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c:
>   * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c:
>   * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c:
>   * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c:
>   * gcc.target/powerpc/fold-vec-load-vec_xl-char.c:
>   * gcc.target/powerpc/fold-vec-load-vec_xl-double.c:
>   * gcc.target/powerpc/fold-vec-load-vec_xl-float.c:
>   * gcc.target/powerpc/fold-vec-load-vec_xl-int.c:
>   * gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c:
>   * gcc.target/powerpc/fold-vec-load-vec_xl-short.c:
>   * gcc.target/powerpc/fold-vec-splat-floatdouble.c:
>   * gcc.target/powerpc/fold-vec-splat-longlong.c:
>   * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c:
>   * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c:
>   * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c:
>   * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c:
>   * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-longlong.c:
>   * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c:
>   * gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c:
>   * gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c:
>   * gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c:
>   * gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c:
>   * gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c:
>   * gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c:
>   * gcc.target/powerpc/fold-vec-store-vec_xst-char.c:
>   * gcc.target/powerpc/fold-vec-store-vec_xst-double.c:
>   * gcc.target/powerpc/fold-vec-store-vec_xst-float.c:
>   * gcc.target/powerpc/fold-vec-store-vec_xst-int.c:
>   * gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c:
>   * gcc.target/powerpc/fold-vec-store-vec_xst-short.c:
>   * gcc.target/powerpc/lvsl-lvsr.c:
>   * gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c:
>   Update insn counts to account for power10 prefixed loads and
>   stores.
> ---
>  .../vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c   | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c | 2 +-
>  .../powerpc/fold-vec-load-builtin_vec_xl-double.c  | 2 +-
>  .../powerpc/fold-vec-load-builtin_vec_xl-float.c   | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c  | 2 +-
>  .../powerpc/fold-vec-load-builtin_vec_xl-longlong.c| 2 +-
>  .../powerpc/fold-vec-load-builtin_vec_xl-short.c   | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c   | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c| 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c  | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c| 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_xl-char.c | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_xl-double.c   | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_xl-float.c| 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_xl-int.c  | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c | 2 +-
>  .../gcc.target/powerpc/fold-vec-load-vec_xl-short.c| 2 +-
>  .../gcc.target/powerpc/fold-vec-splat-floatdouble.c| 7 ---
>  gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c | 2 +-
>  .../powerpc/fold-vec-store-builtin_vec_xst-char.c  | 2 +-
>  .../powerpc/fold-vec-store-builtin_vec_xst-double.c| 2 +-
>  .../powerpc/fold

Re: [PATCH] Fix vec-splati-runnable.c test.

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:49 -0400, Michael Meissner wrote:
> [PATCH] Fix vec-splati-runnable.c test.
> 

hi,


> I noticed that the vec-splati-runnable.c did not have an abort after one
> of the tests.  If the test was run with optimization, the optimizer could
> delete some of the tests and throw off the count.
> 


> I have bootstraped this on LE power9 and BE power8 systems.  There were no
> regressions in the tests.  Can I check this into the trunk?
> 
> I do not expect to back port this to GCC 11 unless we will be back porting the
> future patches that add support for the XXSPLITW, XXSPLTIDP, and XXSPLTI32DX
> instructions.
> 
> gcc/testsuite/
> 2021-05-18  Michael Meissner  
> 
>   * gcc.target/powerpc/vec-splati-runnable.c: Run test with -O2
>   optimization.  Do not check what XXSPLTIDP generates if the value
>   is undefined.
> ---
>  .../gcc.target/powerpc/vec-splati-runnable.c  | 29 ++-
>  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
> index e84ce77a21d..a135279b1d7 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
> @@ -1,7 +1,7 @@
>  /* { dg-do run { target { power10_hw } } } */
>  /* { dg-do link { target { ! power10_hw } } } */
>  /* { dg-require-effective-target power10_ok } */
> -/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps -O2" } */
>  #include 
> 
>  #define DEBUG 0
> @@ -12,6 +12,8 @@
> 
>  extern void abort (void);
> 
> +volatile vector double vresult_d_undefined;
> +
>  int
>  main (int argc, char *argv [])
>  {
> @@ -85,25 +87,12 @@ main (int argc, char *argv [])
>  #endif
>}
> 
> -  /* This test will generate a "note" to the user that the argument
> - is subnormal.  It is not an error, but results are not defined.  */
> -  vresult_d = (vector double) { 2.0, 3.0 };
> -  expected_vresult_d = (vector double) { 6.6E-42f, 6.6E-42f };
> -
> -  vresult_d = vec_splatid (6.6E-42f);
> -
> -  /* Although the instruction says the results are not defined, it does seem
> - to work, at least on Mambo.  But no guarentees!  */
> -  if (!vec_all_eq (vresult_d,  expected_vresult_d)) {
> -#if DEBUG
> -printf("ERROR, vec_splati (6.6E-42f)\n");
> -for(i = 0; i < 2; i++)
> -  printf(" vresult_d[%i] = %e, expected_vresult_d[%i] = %e\n",
> -  i, vresult_d[i], i, expected_vresult_d[i]);
> -#else
> -;
> -#endif
> -  }
> +  /* This test will generate a "note" to the user that the argument is
> + subnormal.  It is not an error, but results are not defined.  Because 
> this
> + is undefined, we cannot check that any value is correct.  Just store it 
> in

as in undefined-behavior..?

> + a volatile variable so the XXSPLTIDP instruction gets generated and the
> + warning message printed. */
> +  vresult_d_undefined = vec_splatid (6.6E-42f);


This does not look like it adds an abort() call as I would have
expected per the patch description. 

So this looks like it still calls vec_splatid(), but instead assigns
result to a variable name vresult_d_undefined.   Also removes some
DEBUG code, which is fine.  So just the vec_all_eq() call is removed?  
I'm not certain I see how that will change the results, just the -O2
optimization makes the difference?
I may be missing something...


Thanks,
-Will

> 
>/* Vector splat immediate */
>vsrc_a_int = (vector int) { 2, 3, 4, 5 };
> -- 
> 2.31.1
> 



Re: [PATCH 2/2] Fix xxeval predicates.

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:47 -0400, Michael Meissner wrote:
> [PATCH 2/2] Fix xxeval predicates.
> 
> In doing the patch to move the XX* built-in functions from altivec.md to
> vsx.md, I noticed that the xxeval built-in function used the
> altivec_register_operand predicate.  Since it takes vsx registers, this
> might force the register allocate to issue a move when it could use a
> traditional floating point register.  This patch fixes that.

allocator ?

> 
> gcc/
> 2021-05-18  Michael Meissner  
> 
>   * config/rs6000/vsx.md (xxeval): Use register_predicate instead of
>   altivec_register_predicate.
> ---
>  gcc/config/rs6000/vsx.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index a859038d399..15a8c0e22d8 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6410,9 +6410,9 @@ (define_insn "xxpermx_inst"
>  ;; XXEVAL built-in function support
>  (define_insn "xxeval"
>[(set (match_operand:V2DI 0 "register_operand" "=wa")
> - (unspec:V2DI [(match_operand:V2DI 1 "altivec_register_operand" "wa")
> -   (match_operand:V2DI 2 "altivec_register_operand" "wa")
> -   (match_operand:V2DI 3 "altivec_register_operand" "wa")
> + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "wa")
> +   (match_operand:V2DI 2 "register_operand" "wa")
> +   (match_operand:V2DI 3 "register_operand" "wa")
> (match_operand:QI 4 "u8bit_cint_operand" "n")]
>UNSPEC_XXEVAL))]
> "TARGET_POWER10"
> -- 


ok
Thanks,
-Will

> 2.31.1
> 



Re: [PATCH 1/2] Move xx* builtins to vsx.md.

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:46 -0400, Michael Meissner wrote:
> [PATCH 1/2] Move xx* builtins to vsx.md.
> 

Hi,


> I noticed that the xx built-in functions (xxspltiw, xxspltidp, xxsplti32dx,
> xxeval, xxblend, and xxpermx) were all defined in altivec.md.  However, since
> the XX instructions can take both traditional floating point and Altivec
> registers, these built-in functions should be in vsx.md.
> 
> This patch just moves the insns from altivec.md to vsx.md.
> 
> I also moved the VM3 mode iterator and VM3_char mode attribute from altivec.md
> to vsx.md, since the only use of these were for the XXBLEND insns.
> 
> I have bootstraped this on LE power9 and BE power8 systems.  There were no
> regressions in the tests.  Can I check this into the trunk?
> 
> I do not expect to back port this to GCC 11 unless we will be back porting the
> future patches that add support for the XXSPLITW, XXSPLTIDP, and XXSPLTI32DX
> instructions.
> 
> gcc/
> 2021-05-18  Michael Meissner  
> 
>   * config/rs6000/altivec.md (UNSPEC_XXEVAL): Move to vsx.md.
>   (UNSPEC_XXSPLTIW): Move to vsx.md.
>   (UNSPEC_XXSPLTID): Move to vsx.md.
>   (UNSPEC_XXSPLTI32DX): Move to vsx.md.
>   (UNSPEC_XXBLEND): Move to vsx.md.
>   (UNSPEC_XXPERMX): Move to vsx.md.
>   (VM3): Move to vsx.md.
>   (VM3_char): Move to vsx.md.
>   (xxspltiw_v4si): Move to vsx.md.
>   (xxspltiw_v4sf): Move to vsx.md.
>   (xxspltiw_v4sf_inst): Move to vsx.md.
>   (xxspltidp_v2df): Move to vsx.md.
>   (xxspltidp_v2df_inst): Move to vsx.md.
>   (xxsplti32dx_v4si_inst): Move to vsx.md.
>   (xxsplti32dx_v4sf): Move to vsx.md.
>   (xxsplti32dx_v4sf_inst): Move to vsx.md.
>   (xxblend_): Move to vsx.md.
>   (xxpermx): Move to vsx.md.
>   (xxpermx_inst): Move to vsx.md.
>   * config/rs6000/vsx.md (UNSPEC_XXEVAL): Move from altivec.md.
>   (UNSPEC_XXSPLTIW): Move from altivec.md.
>   (UNSPEC_XXSPLTID): Move from altivec.md.
>   (UNSPEC_XXSPLTI32DX): Move from altivec.md.
>   (UNSPEC_XXBLEND): Move from altivec.md.
>   (UNSPEC_XXPERMX): Move from altivec.md.
>   (VM3): Move from altivec.md.
>   (VM3_char): Move from altivec.md.
>   (xxspltiw_v4si): Move from altivec.md.
>   (xxspltiw_v4sf): Move from altivec.md.
>   (xxspltiw_v4sf_inst): Move from altivec.md.
>   (xxspltidp_v2df): Move from altivec.md.
>   (xxspltidp_v2df_inst): Move from altivec.md.
>   (xxsplti32dx_v4si_inst): Move from altivec.md.
>   (xxsplti32dx_v4sf): Move from altivec.md.
>   (xxsplti32dx_v4sf_inst): Move from altivec.md.
>   (xxblend_): Move from altivec.md.
>   (xxpermx): Move from altivec.md.
>   (xxpermx_inst): Move from altivec.md.
> ---
>  gcc/config/rs6000/altivec.md | 196 -
>  gcc/config/rs6000/vsx.md | 204 +++
>  2 files changed, 204 insertions(+), 196 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 1351dafbc41..8a9f55c561b 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -171,16 +171,10 @@ (define_c_enum "unspec"
> UNSPEC_VPEXTD
> UNSPEC_VCLRLB
> UNSPEC_VCLRRB
> -   UNSPEC_XXEVAL
> UNSPEC_VSTRIR
> UNSPEC_VSTRIL
> UNSPEC_SLDB
> UNSPEC_SRDB
> -   UNSPEC_XXSPLTIW
> -   UNSPEC_XXSPLTID
> -   UNSPEC_XXSPLTI32DX
> -   UNSPEC_XXBLEND
> -   UNSPEC_XXPERMX
>  ])
> 
>  (define_c_enum "unspecv"
> @@ -221,21 +215,6 @@ (define_mode_iterator VM2 [V4SI
>  (KF "FLOAT128_VECTOR_P (KFmode)")
>  (TF "FLOAT128_VECTOR_P (TFmode)")])
> 
> -;; Like VM2, just do char, short, int, long, float and double
> -(define_mode_iterator VM3 [V4SI
> -V8HI
> -V16QI
> -V4SF
> -V2DF
> -V2DI])
> -
> -(define_mode_attr VM3_char [(V2DI "d")
> -(V4SI "w")
> -(V8HI "h")
> -(V16QI "b")
> -(V2DF  "d")
> -(V4SF  "w")])
> -
>  ;; Map the Vector convert single precision to double precision for integer
>  ;; versus floating point
>  (define_mode_attr VS_sxwsp [(V4SI "sxw") (V4SF "sp")])
> @@ -820,169 +799,6 @@ (define_insn "vsdb_"
>"vsdbi %0,%1,%2,%3"
>[(set_attr "type" "vecsimple")])
> 
> -(define_insn "xxspltiw_v4si"
> -  [(set (match_operand:V4SI 0 "register_operand" "=wa")
> - (unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
> -  UNSPEC_XXSPLTIW))]
> - "TARGET_POWER10"
> - "xxspltiw %x0,%1"
> - [(set_attr "type" "vecsimple")
> -  (set_attr "prefixed" "yes")])
> -
> -(define_expand "xxspltiw_v4sf"
> -  [(set (match_operand:V4SF 0 "register_operand" "=wa")
> - (unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
> -  UNSPEC_XXSPLTIW))]
> - "TARGET_POWER10"

Re: [PATCH] Change rs6000_const_f32_to_i32 return type.

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:39 -0400, Michael Meissner wrote:
> [PATCH] Change rs6000_const_f32_to_i32 return type.
> 
> The function rs6000_const_f32_to_i32 called REAL_VALUE_TO_TARGET_SINGLE
> with a long long type and returns it.  This patch changes the type to long
> which is the proper type for REAL_VALUE_TO_TARGET_SINGLE.

ok

That seems consistent with the tm.texi blurb: 
For @code{REAL_VALUE_TO_TARGET_SINGLE} and
@code{REAL_VALUE_TO_TARGET_DECIMAL32}, this variable should be
a simple @code{long int}. 

> 
> I have done bootstraps on little endian power9 and big endian power8 systems.
> Can I check this into the trunk?
> 
> This does not need to go into GCC 11, unless some of the other patches that 
> use
> this function are also back ported.
> 
> gcc/
> 2021-05-18  Michael Meissner  
> 
>   * config/rs6000/rs6000-protos.h (rs6000_const_f32_to_i32): Change
>   return type to long.
>   * config/rs6000/rs6000.c (rs6000_const_f32_to_i32): Change return
>   type to long.
> ---
>  gcc/config/rs6000/rs6000-protos.h | 2 +-
>  gcc/config/rs6000/rs6000.c| 6 --
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index bef727e0a64..c407034d58c 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -282,7 +282,7 @@ extern void rs6000_asm_output_dwarf_pcrel (FILE *file, 
> int size,
>  const char *label);
>  extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
>const char *label);
> -extern long long rs6000_const_f32_to_i32 (rtx operand);
> +extern long rs6000_const_f32_to_i32 (rtx operand);
> 
>  /* Declare functions in rs6000-c.c */

ok

> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 86f53297cb9..ef1ebaaee05 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -27937,10 +27937,12 @@ rs6000_invalid_conversion (const_tree fromtype, 
> const_tree totype)
>return NULL;
>  }
> 
> -long long
> +/* Convert a SFmode constant to the integer bit pattern.  */
> +
> +long
>  rs6000_const_f32_to_i32 (rtx operand)
>  {
> -  long long value;
> +  long value;
>const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand);

ok

Thanks
-Will

> 
>gcc_assert (GET_MODE (operand) == SFmode);
> -- 
> 2.31.1
> 



Re: [PATCH] Allow __ibm128 on older PowerPC systems.

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:36 -0400, Michael Meissner wrote:
> [PATCH] Allow __ibm128 on older PowerPC systems.
> 

Hi,


> On January 8th, 2018, I added code to ibm-ldouble.c to use the built-in
> function __builtin_pack_ibm128 if long double is IEEE 128-bit and continue to
> use __builtin_pack_longdouble if long double is IBM extended double.  This 
> code
> was needed because __builtin_pack_ibm128 is not available unless the __ibm128
> keyword is availabe.  In the current code, __ibm128 is only enabled if we have
> support for both IBM and IEEE 128-bit long double.

"available."

May be worth re-sifting the description to drop the history not
directly applicable to what this patch is doing.

> 
> Segher suggested that instead I should make __ibm128, __builtin_pack_ibm128,
> and __builtin_unpack_ibm128 available on older systems that don't support IEEE
> 128-bit floating point but does support the IBM extended double floating 
> point.
> 
> This patch changes the code so that __ibm128 is now exported if either
> long double uses the IBM extended double format, or IEEE 128-bit floating
> point is available.
> 
> I changed the internal built-in types from float128 to ibm128, since the
> only built-in functions that use this are __builtin_pack_ibm128 and
> __builtin_unpack_ibm128, and the new name matches the function.
> 
> In addition, this patch changes the function within libgcc that handles
> IBM long double to use the __builtin_pack_ibm128 function.

ok

> 
> I have done bootstrap builds with this patch on the following 3 systems:
> 1)power9 running LE Linux using --with-cpu=power9
> 2)power8 running BE Linux using --with-cpu=power8, testing both
>   32/64-bit.
> 3)power10 prototype running LE Linux using --with-cpu=power10.
> 
> There were no regressions to the tests, and the new test added passed.  Can I
> check these patches into trunk branch for GCC 12?
> 
> At the moment, I'm not sure this should be backported to GCC 11.  But I can
> easily do the back port after a stabilizing period.
> 
> gcc/
> 2021-05-18  Michael Meissner  
> 
>   * config/rs6000/rs6000-builtin.def (BU_IBM128_2): Rename
>   RS6000_BTM_IBM128 from RS6000_BTM_FLOAT128.

>   * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Update
>   error message for __ibm128 built-in functions.
>   (rs6000_init_builtins): Create the __ibm128 keyword on older
>   systems where long double uses the IBM extended double format,
>   even if they don't support IEEE 128-bit floating point.

Could drop 'older', ok.

>   * config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Rename
>   RS6000_BTM_IBM128 from RS6000_BTM_FLOAT128.
>   (rs6000_builtin_mask_names): Rename RS6000_BTM_IBM128 from
>   RS6000_BTM_FLOAT128.
>   * config/rs6000/rs6000.h (TARGET_IBM128): New macro.
>   (RS6000_BTM_IBM128): Rename from RS6000_BTM_FLOAT128.
>   (RS6000_BTM_COMMON): Rename RS6000_BTM_IBM128 from
>   RS6000_BTM_FLOAT128.
ok
> 
> libgcc/
> 2021-05-18  Michael Meissner  
> 
>   * config/rs6000/ibm-ldouble.c (pack_ldouble): Use
>   __builtin_pack_ibm128 instead of __builtin_pack_longdouble.
> ---
>  gcc/config/rs6000/rs6000-builtin.def |  5 ++---
>  gcc/config/rs6000/rs6000-call.c  | 14 ++
>  gcc/config/rs6000/rs6000.c   |  4 ++--
>  gcc/config/rs6000/rs6000.h   | 12 +---
>  libgcc/config/rs6000/ibm-ldouble.c   |  4 ++--
>  5 files changed, 25 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.def 
> b/gcc/config/rs6000/rs6000-builtin.def
> index 609bebdfd74..6d82ed224fb 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -796,13 +796,12 @@
>| RS6000_BTC_BINARY),  \
>   CODE_FOR_ ## ICODE) /* ICODE */
> 
> -/* 128-bit __ibm128 floating point builtins (use -mfloat128 to indicate that
> -   __ibm128 is available).  */
> +/* 128-bit __ibm128 floating point builtins.  */
>  #define BU_IBM128_2(ENUM, NAME, ATTR, ICODE) \
>RS6000_BUILTIN_2 (MISC_BUILTIN_ ## ENUM,   /* ENUM */  \
>   "__builtin_" NAME,  /* NAME */  \
>   (RS6000_BTM_HARD_FLOAT  /* MASK */  \
> -  | RS6000_BTM_FLOAT128),\
> +  | RS6000_BTM_IBM128),  \
>   (RS6000_BTC_ ## ATTR/* ATTR */  \
>| RS6000_BTC_BINARY),  \
>   CODE_FOR_ ## ICODE) /* ICODE */
ok


> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index c4332a61862..7bdc4eeca5f 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -11540,8 +11540,8 @@ rs6000_invalid_buil

Re: [PATCH] Fix long double tests when default long double is not IBM.

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:32 -0400, Michael Meissner wrote:
> [PATCH] Fix long double tests when default long double is not IBM.
> 

Hi,


> This patch adds 3 more selections to target-supports.exp to see if we can 
> force
> the compiler to use a particular long double format (IEEE 128-bit, IBM 
> extended
> double, 64-bit), and the library support will track the changes for the long
> double.  This is needed because two of the tests in the test suite use long
> double, and they are actually testing IBM extended double.
> 
> This patch also forces the two tests that explicitly require long double
> to use the IBM double-double encoding to explicitly run the test.  This
> requires GLIBC 2.32 or greater in order to do the switch.
> 
> I have run tests on a little endian power9 system with 3 compilers.  There 
> were
> no regressions with these patches, and the two tests in the following patches
> now work if the default long double is not IBM 128-bit:
> 
> * One compiler used the default IBM 128-bit format;
> * One compiler used the IEEE 128-bit format; (and)
> * One compiler used 64-bit long doubles.
> 
> I have also tested compilers on a big endian power8 system with a compiler
> defaulting to power8 code generation and another with the default cpu
> set.  There were no regressions.
> 
> Can I check this patch into the master branch?
> 
> I have done bootstrap builds with this patch on the following 4 systems:
> 1)power9 running LE Linux using --with-cpu=power9 with long 
> double == IBM
> 2)power9 running LE Linux using --with-cpu=power9 with long 
> double == IEEE
> 3)power8 running BE Linux using --with-cpu=power8, testing both
>   32/64-bit.
> 4)power10 prototype running LE Linux using --with-cpu=power10.
> 
> There were no regressions to the tests, and the two test cases that previously
> failed with I ran the compiler defaulting to long double using IEEE 128-bit 
> now
> passed.  Can I check these patches into trunk branch for GCC 12?
> 
> I would like to check these patches into GCC 11 after a cooling off period, 
> but
> I can also not do the backport if desired.
> 
> gcc/testsuite/
> 2021-05-18  Michael Meissner  
> 
>   PR target/70117
>   * gcc.target/powerpc/pr70117.c: Force the long double type to use
>   the IBM 128-bit format.
>   * c-c++-common/dfp/convert-bfp-11.c: Force using IBM 128-bit long
>   double.  Remove check for 64-bit long double.
>   * lib/target-supports.exp
>   (add_options_for_ppc_long_double_override_ibm128): New function.
>   (check_effective_target_ppc_long_double_override_ibm128): New
>   function.
>   (add_options_for_ppc_long_double_override_ieee128): New function.
>   (check_effective_target_ppc_long_double_override_ieee128): New
>   function.
>   (add_options_for_ppc_long_double_override_64bit): New function.
>   (check_effective_target_ppc_long_double_override_64bit): New
>   function.

ok.

> ---
>  .../c-c++-common/dfp/convert-bfp-11.c |  18 +--
>  gcc/testsuite/gcc.target/powerpc/pr70117.c|   6 +-
>  gcc/testsuite/lib/target-supports.exp | 107 ++
>  3 files changed, 121 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c 
> b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
> index 95c433d2c24..35da07d1fa4 100644
> --- a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
> +++ b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
> @@ -1,9 +1,14 @@
> -/* { dg-skip-if "" { ! "powerpc*-*-linux*" } } */
> +/* { dg-require-effective-target dfp } */
> +/* { dg-require-effective-target ppc_long_double_override_ibm128 } */
> +/* { dg-add-options ppc_long_double_override_ibm128 } */
> 
> -/* Test decimal float conversions to and from IBM 128-bit long double. 
> -   Checks are skipped at runtime if long double is not 128 bits.
> -   Don't force 128-bit long doubles because runtime support depends
> -   on glibc.  */
> +/* We force the long double type to be IBM 128-bit because the 
> CONVERT_TO_PINF
> +   tests will fail if we use IEEE 128-bit floating point.  This is due to 
> IEEE
> +   128-bit having a larger exponent range than IBM 128-bit extended double.  
> So
> +   tests that would generate an infinity with IBM 128-bit will generate a
> +   normal number with IEEE 128-bit.  */

ok

> +
> +/* Test decimal float conversions to and from IBM 128-bit long double.   */
> 
>  #include "convert.h"
> 
> @@ -36,9 +41,6 @@ CONVERT_TO_PINF (312, tf, sd, 1.6e+308L, d32)
>  int
>  main ()
>  {
> -  if (sizeof (long double) != 16)
> -return 0;
> -
>convert_101 ();
>convert_102 ();
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr70117.c 
> b/gcc/testsuite/gcc.target/powerpc/pr70117.c
> index 3bbd2c595e0..8a5fad1dee0 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr70117.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70117.c
> @@ -1,5 +1,7 @@
> -/* { dg-do run { target {

Re: [PATCH 2/2] Add IEEE 128-bit fp conditional move on PowerPC.

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:28 -0400, Michael Meissner wrote:
> [PATCH 2/2] Add IEEE 128-bit fp conditional move on PowerPC.
> 

Hi,


> This patch adds the support for power10 IEEE 128-bit floating point 
> conditional
> move and for automatically generating min/max.
> 
> In this patch, I simplified things compared to previous patches.  Instead of
> allowing any four of the modes to be used for the conditional move comparison
> and the move itself could use different modes, I restricted the conditional
> move to just the same mode.  I.e. you can do:

ok.

> 
> _Float128 a, b, c, d, e, r;
> 
> r = (a == b) ? c : d;
> 
> But you can't do:
> 
> _Float128 c, d, r;
> double a, b;
> 
> r = (a == b) ? c : d;
> 
> or:
> 
> _Float128 a, b;
> double c, d, r;
> 
> r = (a == b) ? c : d;
> 
> This eliminates a lot of the complexity of the code, because you don't have to
> worry about the sizes being different, and the IEEE 128-bit types being
> restricted to Altivec registers, while the SF/DF modes can use any VSX
> register.
> 
> I did not modify the existing support that allowed conditional moves where
> SFmode operands are compared and DFmode operands are moved (and vice versa).
> 
> I modified the test cases that I added to reflect this change.  I have also
> fixed the test for not equal to use '!=' instead of '=='.
> 
> I have done bootstrap builds with this patch on the following 3 systems:
> 1)power9 running LE Linux using --with-cpu=power9
> 2)power8 running BE Linux using --with-cpu=power8, testing both
>   32/64-bit.
> 3)power10 prototype running LE Linux using --with-cpu=power10.
> 
> There were no regressions to the tests, and the new test added passed.  Can I
> check these patches into trunk branch for GCC 12?
> 
> I would like to check these patches into GCC 11 after a cooling off period, 
> but
> I can also not do the backport if desired.
> 
> gcc/
> 2021-05-18 Michael Meissner  
> 
> * config/rs6000/rs6000.c (rs6000_maybe_emit_fp_cmove): Add IEEE
>   128-bit floating point conditional move support.
>   (have_compare_and_set_mask): Add IEEE 128-bit floating point
>   types.
>   * config/rs6000/rs6000.md (movcc, IEEE128 iterator): New insn.
>   (movcc_p10, IEEE128 iterator): New insn.
>   (movcc_invert_p10, IEEE128 iterator): New insn.
>   (fpmask, IEEE128 iterator): New insn.
>   (xxsel, IEEE128 iterator): New insn.
> 
> gcc/testsuite/
> 2021-05-18  Michael Meissner  
> 
> * gcc.target/powerpc/float128-cmove.c: New test.
> * gcc.target/powerpc/float128-minmax-3.c: New test.

ok


> ---
>  gcc/config/rs6000/rs6000.c|  38 ++-
>  gcc/config/rs6000/rs6000.md   | 106 ++
>  .../gcc.target/powerpc/float128-cmove.c   |  58 ++
>  .../gcc.target/powerpc/float128-minmax-3.c|  15 +++
>  4 files changed, 215 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-cmove.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index fdaf12aeda0..ef1ebaaee05 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15706,8 +15706,8 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, 
> rtx op_false,
>return 1;
>  }
> 
> -/* Possibly emit the xsmaxcdp and xsmincdp instructions to emit a maximum or
> -   minimum with "C" semantics.
> +/* Possibly emit the xsmaxc{dp,qp} and xsminc{dp,qp} instructions to emit a
> +   maximum or minimum with "C" semantics.
> 
> Unless you use -ffast-math, you can't use these instructions to replace
> conditions that implicitly reverse the condition because the comparison
> @@ -15783,6 +15783,7 @@ rs6000_maybe_emit_fp_cmove (rtx dest, rtx op, rtx 
> true_cond, rtx false_cond)
>enum rtx_code code = GET_CODE (op);
>rtx op0 = XEXP (op, 0);
>rtx op1 = XEXP (op, 1);
> +  machine_mode compare_mode = GET_MODE (op0);
>machine_mode result_mode = GET_MODE (dest);
>rtx compare_rtx;
>rtx cmove_rtx;
> @@ -15791,6 +15792,35 @@ rs6000_maybe_emit_fp_cmove (rtx dest, rtx op, rtx 
> true_cond, rtx false_cond)
>if (!can_create_pseudo_p ())
>  return 0;
> 
> +  /* We allow the comparison to be either SFmode/DFmode and the true/false
> + condition to be either SFmode/DFmode.  I.e. we allow:
> +
> + float a, b;
> + double c, d, r;
> +
> + r = (a == b) ? c : d;
> +
> +and:
> +
> + double a, b;
> + float c, d, r;
> +
> + r = (a == b) ? c : d;


This new comment does not seem to align with the comments in the
description, which statee "But you can't do ..." 


> +
> +but we don't allow intermixing the IEEE 128-bit floating point types with
> +the 32/64-bit scalar types.
> +
> +It gets too messy where SFmode/DFmode can use any register and 
> TFm

Re: [PATCH 1/2] Add IEEE 128-bit min/max support on PowerPC.

2021-05-20 Thread will schmidt via Gcc-patches
On Tue, 2021-05-18 at 16:26 -0400, Michael Meissner wrote:
> [PATCH 1/2] Add IEEE 128-bit min/max support on PowerPC.
> 

Hi,


> This patch adds the support for the IEEE 128-bit floating point C minimum and
> maximum instructions.  The next patch will add the support for using the
> compare and set mask instruction to implement conditional moves.
> 
> This patch does not try to re-use the code used for SF/DF min/max
> support.  It defines a separate insn for the IEEE 128-bit support.  It
> uses the code iterator  to simplify adding both operations.
> 
> GCC will not convert ?: operations into using min/max instructions provided in

I'd throw the ternary term in there, easier to search for later. 
s/?: operations/ternary (?:) operations /

> this patch unless the user uses -Ofast or similar switches due to issues with
> NaNs.  The next patch that adds conditional move instructions will enable the
> ?: conversion in many cases.
> 
> I have done bootstrap builds with this patch on the following 3 systems:
> 1)power9 running LE Linux using --with-cpu=power9
> 2)power8 running BE Linux using --with-cpu=power8, testing both
>   32/64-bit.
> 3)power10 prototype running LE Linux using --with-cpu=power10.
> 
> There were no regressions to the tests, and the new test added passed.  Can I
> check these patches into trunk branch for GCC 12?
> 
> I would like to check these patches into GCC 11 after a cooling off period, 
> but
> I can also not do the backport if desired.
> 
> gcc/
> 2021-05-18  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (rs6000_emit_minmax): Add support for ISA
>   3.1   IEEE   128-bit   floating  point   xsmaxcqp   and   xsmincqp
>   instructions.
>   * config/rs6000/rs6000.md (s3, IEEE128 iterator):
>   New insns.

ok

> 
> gcc/testsuite/
> 2021-05-18  Michael Meissner  
> 
>   * gcc.target/powerpc/float128-minmax-2.c: New test.
>   * gcc.target/powerpc/float128-minmax.c: Turn off power10 code
>   generation.

So, presumably the float128-minmax-2.c test adds/replaces the power10
code gen tests that were removed or disabled from float128-minmax.c. 



> ---
>  gcc/config/rs6000/rs6000.c|  3 ++-
>  gcc/config/rs6000/rs6000.md   | 11 +++
>  .../gcc.target/powerpc/float128-minmax-2.c| 15 +++
>  .../gcc.target/powerpc/float128-minmax.c  |  7 +++
>  4 files changed, 35 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 0d05956..fdaf12aeda0 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -16111,7 +16111,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx 
> op0, rtx op1)
>/* VSX/altivec have direct min/max insns.  */
>if ((code == SMAX || code == SMIN)
>&& (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
> -   || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode
> +   || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))
> +   || (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode
>  {
>emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1)));
>return;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 0bfeb24d9e8..3a1bc1f8547 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -5196,6 +5196,17 @@ (define_insn "*s3_vsx"
>  }
>[(set_attr "type" "fp")])
> 
> +;; Min/max for ISA 3.1 IEEE 128-bit floating point
> +(define_insn "s3"
> +  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
> + (fp_minmax:IEEE128
> +  (match_operand:IEEE128 1 "altivec_register_operand" "v")
> +  (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "xscqp %0,%1,%2"
> +  [(set_attr "type" "vecfloat")
> +   (set_attr "size" "128")])
> +
>  ;; The conditional move instructions allow us to perform max and min 
> operations
>  ;; even when we don't have the appropriate max/min instruction using the FSEL
>  ;; instruction.

ok


> diff --git a/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c 
> b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
> new file mode 100644
> index 000..c71ba08c9f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
> @@ -0,0 +1,15 @@
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */
> +
> +#ifndef TYPE
> +#define TYPE _Float128
> +#endif
> +
> +/* Test that the fminf128/fmaxf128 functions generate if/then/else and not a
> +   call.  */
> +TYPE f128_min (TYPE a, TYPE b) { return __builtin_fminf128 (a, b); }
> +TYPE f128_max (TYPE a, TYPE b) { return __builtin_fmaxf128 (a, b); }
> +
> +/* { dg-final { scan-assembler {\mxsmaxcqp\M} }

Re: PowerPC64 ELFv1 -fpatchable-function-entry

2021-05-07 Thread will schmidt via Gcc-patches
On Fri, 2021-05-07 at 12:19 +0930, Alan Modra via Gcc-patches wrote:
> On PowerPC64 ELFv1 function symbols are defined on function
> descriptors in an .opd section rather than in the function code.
> .opd is not split up by the PowerPC64 backend for comdat groups or
> other situations where per-function sections are required.  Thus
> SECTION_LINK_ORDER can't use the function name to reference a
> suitable
> section for ordering:  The .opd section might contain many other
> function descriptors and they may be in a different order to the
> final
> function code placement.  This patch arranges to use a code label
> instead of the function name symbol.
> 
> I chose to emit the label inside default_elf_asm_named_section,
> immediately before the .section directive using the label, and in
> case
> someone uses .previous or the like, need to save and restore the
> current section when switching to the function code section to emit
> the label.  That requires a tweak to switch_to_section in order to
> get
> the current section.  I checked all the TARGET_ASM_NAMED_SECTION
> functions and unnamed.callback functions and it appears none will be
> affected by that tweak.


Hi,

good description.  thanks :-)


> 
>   PR target/98125
>   * varasm.c (default_elf_asm_named_section): Use a function
>   code label rather than the function symbol as the "o" argument.
>   (switch_to_section): Don't set in_section until section
>   directive has been emitted.
> 
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 97c1e6fff25..5f95f8cfa75 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -6866,6 +6866,26 @@ default_elf_asm_named_section (const char
> *name, unsigned int flags,
>*f = '\0';
>  }
> 
> +  char func_label[256];
> +  if (flags & SECTION_LINK_ORDER)
> +{
> +  static int recur;
> +  if (recur)
> + gcc_unreachable ();

Interesting..   Is there any anticipation of re-entry or parallel runs
through this function that requires the recur lock/protection?


> +  else
> + {
> +   ++recur;
> +   section *save_section = in_section;
> +   static int func_code_labelno;
> +   switch_to_section (function_section (decl));
> +   ++func_code_labelno;
> +   ASM_GENERATE_INTERNAL_LABEL (func_label, "LPFC",
> func_code_labelno);
> +   ASM_OUTPUT_LABEL (asm_out_file, func_label);
> +   switch_to_section (save_section);
> +   --recur;
> + }
> +}


ok

> +
>fprintf (asm_out_file, "\t.section\t%s,\"%s\"", name, flagchars);
> 
>/* default_section_type_flags (above) knows which flags need
> special
> @@ -6893,11 +6913,8 @@ default_elf_asm_named_section (const char
> *name, unsigned int flags,
>   fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
>if (flags & SECTION_LINK_ORDER)
>   {
> -   tree id = DECL_ASSEMBLER_NAME (decl);
> -   ultimate_transparent_alias_target (&id);
> -   const char *name = IDENTIFIER_POINTER (id);
> -   name = targetm.strip_name_encoding (name);
> -   fprintf (asm_out_file, ",%s", name);
> +   fputc (',', asm_out_file);
> +   assemble_name_raw (asm_out_file, func_label);


ok as far as I can tell :-)assemble_name_raw is an if/else that
outputs 'name' or a LABELREF based on the file & name.  It's not an
obvious analog to the untimate_transparent_alias_target() and name
processing that is being replaced, but seems to fit the changes as
described.


>   }
>if (HAVE_COMDAT_GROUP && (flags & SECTION_LINKONCE))
>   {
> @@ -7821,11 +7838,6 @@ switch_to_section (section *new_section, tree
> decl)
>else if (in_section == new_section)
>  return;
> 
> -  if (new_section->common.flags & SECTION_FORGET)
> -in_section = NULL;
> -  else
> -in_section = new_section;
> -
>switch (SECTION_STYLE (new_section))
>  {
>  case SECTION_NAMED:
> @@ -7843,6 +7855,11 @@ switch_to_section (section *new_section, tree
> decl)
>break;
>  }
> 
> +  if (new_section->common.flags & SECTION_FORGET)
> +in_section = NULL;
> +  else
> +in_section = new_section;
> +
>new_section->common.flags |= SECTION_DECLARED;


OK. 
lgtm, thx
-Will

>  }
> 



  1   2   3   4   5   >