Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-18 Thread will schmidt via Gcc-patches
On Mon, 2022-10-17 at 13:08 -0500, Segher Boessenkool wrote:
> On Mon, Sep 19, 2022 at 11:13:20AM -0500, will schmidt wrote:
> >   The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
> > and can be disabled by dependent options when it should not be.
> > This manifests in the issue seen in PR101865 where -mno-vsx
> > mistakenly disables _ARCH_PWR8.
> > This change replaces the relevant TARGET_DIRECT_MOVE references
> > with a TARGET_POWER8 entry so that the direct_move and power8
> > features can be enabled or disabled independently.
> 
> We should get rid of TARGET_DIRECT_MOVE altogether.  Please see
> 57f108f5a1e1:
> rs6000: Disable -m[no-]direct-move (PR85293)
> 
> The -mno-direct-move option causes a lot of problems, since it
> forces
> us to be able to generate code for p8 and up with some crucial
> instructions missing.  This patch removes the -m[no-]direct-move
> options so that the user cannot put us into this unexpected
> situation
> anymore.  Internally we still have all the same flags, and they
> are
> automatically set based on -mcpu; getting rid of that is a lot
> more
> work and will have to wait for GCC 9 (in some places the flag is
> used
> to see if we are compiling for a p8 _at all_).
> 
> It did not happen in GCC 9 obviously.  Do you want to take a
> shot?  It
> doesn't have to be all at once, it's probably best if not even -- as
> I
> wrote in the commit message, the flag always was used to mean
> different
> things.

As long as it's OK to be removed, I'll certainly take a shot at it. 
With that in mind that may simplify things for me here.
I expect that
anything currently guarded by DIRECT_MOVE should instead be guarded by
POWER8.


> 
> > The existing (and rather lengthy) commentary for DIRECT_MOVE
> > remains
> > in place in rs6000-c.cc:rs6000_target_modify_macros().  The
> > if-defined logic there will now set a __DIRECT_MOVE__ define when
> > TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
> > purposes, but is otherwise unused.  This can be removed in a
> > subsequent patch, or in an update of this patch, depending on
> > feedback.
> 
> There should be no such macro, for the same reason there should be no
> -mdirect-move option: it is so very essential to all code we
> generate,
> it *always* is enabled if we have P8 or later.

fair enough.

> 
> > gcc/
> > PR Target/101865
> > * config/rs6000/rs6000-builtin.cc
> > (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
> > usage with TARGET_POWER8.
> 
> Please don't arbitrarily wrap lines.  It is harder to read, and it
> looks
> like something is missing.

> 
> > * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
> > Add OPTION_MASK_POWER8 entry.
> 
> Especially in cases like this, where it looks like you forgot to
> write
> something after the colon.
> 
> > @@ -24046,10 +24045,11 @@ static struct rs6000_opt_mask const
> > rs6000_opt_masks[] =
> >{ "block-ops-vector-pair",   OPTION_MASK_BLOCK_OPS_VECTOR_PA
> > IR,
> > false,
> > true  },
> >{ "cmpb",OPTION_MASK_CMPB,   fal
> > se, true  },
> >{ "crypto",  OPTION_MASK_CRYPTO, fal
> > se, true  },
> >{ "direct-move", OPTION_MASK_DIRECT_MOVE,false,
> > true  },
> > +  { "power8",  OPTION_MASK_POWER8, fal
> > se, true  },
> 
> Why would we want a #pragma power8 ?

Hmm, thinko on my part, i'll reevaluate.


> 
> > --- a/gcc/config/rs6000/rs6000.opt
> > +++ b/gcc/config/rs6000/rs6000.opt
> > @@ -490,10 +490,15 @@ mcrypto
> >  Target Mask(CRYPTO) Var(rs6000_isa_flags)
> >  Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2
> > instructions.
> >  
> >  mdirect-move
> >  Target Undocumented Mask(DIRECT_MOVE) Var(rs6000_isa_flags)
> > WarnRemoved
> > +Enable direct move (ISA 2.07).
> 
> It is undocumented and should remain that, except eventually we
> should
> remove it completely (but leave some stubs so that code in the wild
> keeps compiling).
> 
> > +mpower8
> > +Target Mask(POWER8) Var(rs6000_isa_flags)
> > +Use instructions added in ISA 2.07 (power8).
> 
> There should not be such an option.  It is set by -mcpu=power8 and
> later, but can never be enabled or disabled direfctly by the user.

OK.


Thanks for the detailed review.  :-)
-Will


> 
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -3407,11 +3407,11 @@ (define_insn "vsx_extract_"
> >if (element == VECTOR_ELEMENT_SCALAR_64BIT)
> >  {
> >if (op0_regno == op1_regno)
> > return ASM_COMMENT_START " vec_extract to same register";
> >  
> > -  else if (INT_REGNO_P (op0_regno) && TARGET_DIRECT_MOVE
> > +  else if (INT_REGNO_P (op0_regno) && TARGET_POWER8
> >&& TARGET_POWERPC64)
> 
> That fits on one line now.
> 
> Thanks,
> 
> 
> Segher



Re: [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-10-17 Thread will schmidt via Gcc-patches
On Mon, 2022-10-17 at 10:32 -0500, Segher Boessenkool wrote:
> Hi!
> 
> Everything Ke Wen said.  Some more commments / hints:

Thanks for the reviews. :-)

I'll rework things and repost 'soon'.

Thanks
-WIll



Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-13 Thread will schmidt via Gcc-patches


Ping.

On Mon, 2022-09-19 at 11:13 -0500, will schmidt wrote:
> [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865]
> 
> Hi,
>   The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
> and can be disabled by dependent options when it should not be.
> This manifests in the issue seen in PR101865 where -mno-vsx
> mistakenly disables _ARCH_PWR8.
> 
> This change replaces the relevant TARGET_DIRECT_MOVE references
> with a TARGET_POWER8 entry so that the direct_move and power8
> features can be enabled or disabled independently.
> 
> This is done via the OPTION_MASK definitions, so this
> means that some references to the OPTION_MASK_DIRECT_MOVE
> option are now replaced with OPTION_MASK_POWER8.
> 
> The existing (and rather lengthy) commentary for DIRECT_MOVE remains
> in place in rs6000-c.cc:rs6000_target_modify_macros().  The
> if-defined logic there will now set a __DIRECT_MOVE__ define when
> TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
> purposes, but is otherwise unused.  This can be removed in a
> subsequent patch, or in an update of this patch, depending on feedback.
> 
> This regests cleanly (power8,power9,power10), and resolves
> PR 101865 as represented in the tests from (1/2).
> 
> OK for trunk?
> Thanks,
> -Will
> 
> 
> gcc/
>   PR Target/101865
>   * config/rs6000/rs6000-builtin.cc
>   (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
>   usage with TARGET_POWER8.
>   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros):
>   Add __DIRECT_MOVE__ define.  Replace _ARCH_PWR8_ define
>   conditional with OPTION_MASK_POWER8.
>   * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
>   Add OPTION_MASK_POWER8 entry.
>   (POWERPC_MASKS): Same.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8.
>   (rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8.
>   * config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8.
>   * config/rs6000/vsx.md (vsx_extract_): Replace
>   TARGET_DIRECT_MOVE usage with TARGET_POWER8.
>   (define_peephole2): Same.
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 3ce729c1e6de..91a0f39bd796 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
> fncode)
>  case ENB_P7:
>return TARGET_POPCNTD;
>  case ENB_P7_64:
>return TARGET_POPCNTD && TARGET_POWERPC64;
>  case ENB_P8:
> -  return TARGET_DIRECT_MOVE;
> +  return TARGET_POWER8;
>  case ENB_P8V:
>return TARGET_P8_VECTOR;
>  case ENB_P9:
>return TARGET_MODULO;
>  case ENB_P9_64:
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index ca9cc42028f7..41d51b039061 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, 
> HOST_WIDE_INT flags)
>   turned off in any of the following conditions:
>   1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
>   disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
>   enabled.
>   2. TARGET_VSX is off.  */
> -  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
> +  if ((OPTION_MASK_DIRECT_MOVE) != 0)
> +rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__");
> +  if ((flags & OPTION_MASK_POWER8) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
>if ((flags & OPTION_MASK_MODULO) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
>if ((flags & OPTION_MASK_POWER10) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index c3825bcccd84..c873f6d58989 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -48,10 +48,11 @@
> system.  */
>  #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER   \
>| OPTION_MASK_P8_VECTOR\
>| OPTION_MASK_CRYPTO   \
>| OPTION_MASK_DIRECT_MOVE  \
> +  | OPTION_MASK_POWER8   \
>| OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
>| OPTION_MASK_QUAD_MEMORY  \
>| OPTION_MASK_QUAD_MEMORY_ATOMIC)
> 
>  /* ISA masks setting fusion options.  */
> @@ -124,10 +125,11 @@
>  #define POWERPC_MASKS(OPTION_MASK_ALTIVEC
> \
>| OPTION_MASK_CMPB \
>| 

[PATCH, rs6000] Fix addg6s builtin with long long parameters. (PR100693)

2022-10-06 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Fix addg6s builtin with long long parameters. (PR100693)

Hi,
  As reported in PR 100693, attempts to use __builtin_addg6s
with long long arguments result in truncated results.

Since the int and long long types can be coerced into each other,
(documented further near the rs6000-c.cc change), this is handled
by adding a builtin overload (ADDG6S_OV), and the addition of some
special handling in altivec_resolve_overloaded_builtin() to map
the calls to addg6s_32 or addg6s_64; similar to how the SCAL_CMPB
builtins are currently handled.

This has sniff-tested cleanly.

I'm seeing a regression failure show up in
testsuite/g++.dg/modules/adl-3*.c; which seems entirely unrelated
to the content in this change.  I'm poking at that a bit more to
see if I can tell the what/why for that.

OK for trunk?

Thanks,
-Will

gcc/
PR target/100693

* config/rs6000/rs6000-builtins.def ([POWER7]): Replace bif-name
__builtin_addg6s with bif-name __builtin_addg6s_32.
([POWER7-64]): New bif-name __builtin_addg6s_64.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Add handler mapping RS6000_OVLD_ADDG6S_OV to RS6000_BIF_ADDG6S
and RS6000_BIF_ADDG6S_32.
* config/rs6000/rs6000-overload.def (ADDG6S_OV): Add overloaded
entry __builtin_addg6s mapped to ADDG6S_32 and ADDG6S.
* config/rs6000/rs6000.md ("addg6s", UNSPEC_ADDG6S): Replace with
("addg6s3") and rework.
* doc/extend.texi (__builtin_addg6s): Add documentation for
__builtin_addg6s with unsigned long long parameters.

gcc/testsuite/
* testsuite/gcc.target/powerpc/pr100693-compile.c: New.
* testsuite/gcc.target/powerpc/pr100693-run.c: New.

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f76f54793d73..11050e4c26d5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2010,12 +2010,13 @@
 XXSPLTD_V2DI vsx_xxspltd_v2di {}
 
 
 ; Power7 builtins (ISA 2.06).
 [power7]
-  const unsigned int __builtin_addg6s (unsigned int, unsigned int);
-ADDG6S addg6s {}
+
+  const unsigned int __builtin_addg6s_32 (unsigned int, unsigned int);
+ADDG6S_32 addg6ssi3 {}
 
   const signed long __builtin_bpermd (signed long, signed long);
 BPERMD bpermd_di {32bit}
 
   const unsigned int __builtin_cbcdtd (unsigned int);
@@ -2041,10 +2042,14 @@
 UNPACK_V1TI unpackv1ti {}
 
 
 ; Power7 builtins requiring 64-bit GPRs (even with 32-bit addressing).
 [power7-64]
+
+  const unsigned long __builtin_addg6s_64 (unsigned long, unsigned long);
+ADDG6S addg6sdi3 {no32bit}
+
   const signed long long __builtin_divde (signed long long, signed long long);
 DIVDE dive_di {}
 
   const unsigned long long __builtin_divdeu (unsigned long long, \
  unsigned long long);
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 566094626293..28e8b6761ce5 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1919,10 +1919,35 @@ altivec_resolve_overloaded_builtin (location_t loc, 
tree fndecl,
   instance_code, fcode, types, args);
if (call != error_mark_node)
  return call;
break;
   }
+  /* We need to special case __builtin_addg6s because the overloaded
+forms of this function take (unsigned int, unsigned int) or
+(unsigned long long, unsigned long long).  Since C conventions
+allow the respective argument types to be implicitly coerced into
+each other, the default handling does not provide adequate
+discrimination between the desired forms of the function.  */
+case RS6000_OVLD_ADDG6S_OV:
+  {
+   machine_mode arg1_mode = TYPE_MODE (types[0]);
+   machine_mode arg2_mode = TYPE_MODE (types[1]);
+
+   /* If any supplied arguments are wider than 32 bits, resolve to
+  64-bit variant of built-in function.  */
+   if (GET_MODE_PRECISION (arg1_mode) > 32
+   || GET_MODE_PRECISION (arg2_mode) > 32)
+ instance_code = RS6000_BIF_ADDG6S;
+   else
+ instance_code = RS6000_BIF_ADDG6S_32;
+
+   tree call = find_instance (_builtin, ,
+  instance_code, fcode, types, args);
+   if (call != error_mark_node)
+ return call;
+   break;
+  }
 case RS6000_OVLD_VEC_VSIE:
   {
machine_mode arg1_mode = TYPE_MODE (types[0]);
 
/* If supplied first argument is wider than 64 bits, resolve to
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 44e2945aaa0e..41b74c0c1500 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -193,10 +193,16 @@
   unsigned int __builtin_cmpb (unsigned int, unsigned int);
 CMPB_32
   unsigned long long __builtin_cmpb (unsigned 

Re: [PATCH] fixincludes: Deal also with the _Float128x cases [PR107059]

2022-10-04 Thread will schmidt via Gcc-patches
On Fri, 2022-09-30 at 09:20 +0200, Jakub Jelinek via Gcc-patches wrote:
> On Wed, Sep 28, 2022 at 08:19:43PM +0200, Jakub Jelinek via Gcc-
> patches wrote:
> > Another case are the following 3 snippets:
> > #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> > #   error "_Float128X supported but no constant suffix"
> > #  else
> > #   define __f128x(x) x##f128x
> > #  endif
> > ...
> > #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> > #   error "_Float128X supported but no complex type"
> > #  else
> > #   define __CFLOAT128X _Complex _Float128x
> > #  endif
> > ...
> > #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> > #   error "_Float128x supported but no type"
> > #  endif
> > but as no target has _Float128x right now and don't see it
> > coming soon, it isn't a big deal (on the glibc side it is of
> > course ok to adjust those).
> 
> This incremental patch deals handles the above 3 cases, so we
> fixinclude what glibc itself changed too.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux (together with
> the
> previously posted fixincludes/ change too), ok for trunk?

Hi,

The combination of these two patches allows me to build gcc
successfully.  (PPC64LE with RHEL9).

A nit that Part1 needed massaging
of the path/to/files (i.e. gcc/inclhack.def versus
fixincludes/inclhack.def) to apply.

I can't otherwise speak to the
changes, aside from they seem to work for me.

Thanks
-WIll



> 
> 2022-09-30  Jakub Jelinek  
> 
>   PR bootstrap/107059
>   * inclhack.def (glibc_cxx_floatn_5): New.
>   * fixincl.x: Regenerated.
>   * tests/base/bits/floatn.h: Regenerated.
> 
> --- fixincludes/inclhack.def.jj   2022-09-29 22:18:47.974402688
> +0200
> +++ fixincludes/inclhack.def  2022-09-29 22:22:48.151145670 +0200
> @@ -2131,6 +2131,23 @@ fix = {
>   EOT;
>  };
> 
> +fix = {
> +hackname  = glibc_cxx_floatn_5;
> +files = bits/floatn.h, bits/floatn-common.h,
> "*/bits/floatn.h", "*/bits/floatn-common.h";
> +select= "^([ \t]*#[ \t]*if !__GNUC_PREREQ \\(7, 0\\) \\|\\|
> )defined __cplusplus\n"
> + "([ \t]*#[ \t]+error \"_Float128[xX] supported but no
> )";
> +c_fix = format;
> +c_fix_arg = "%1(defined __cplusplus && !__GNUC_PREREQ (13,
> 0))\n%2";
> +test_text = <<-EOT
> + #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> + #   error "_Float128X supported but no constant suffix"
> + #  endif
> + #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> + #   error "_Float128x supported but no type"
> + #  endif
> + EOT;
> +};
> +
>  /*  glibc-2.3.5 defines pthread mutex initializers incorrectly,
>   *  so we replace them with versions that correspond to the
>   *  definition.
> --- fixincludes/fixincl.x.jj  2022-09-29 22:18:47.975402675 +0200
> +++ fixincludes/fixincl.x 2022-09-29 22:22:55.675909244 +0200
> @@ -2,11 +2,11 @@
>   *
>   * DO NOT EDIT THIS FILE   (fixincl.x)
>   *
> - * It has been AutoGen-ed  September 28, 2022 at 07:56:15 PM by
> AutoGen 5.18.16
> + * It has been AutoGen-ed  September 29, 2022 at 10:22:55 PM by
> AutoGen 5.18.16
>   * From the definitionsinclhack.def
>   * and the template file   fixincl
>   */
> -/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Sep 28 19:56:15 CEST 2022
> +/* DO NOT SVN-MERGE THIS FILE, EITHER Thu Sep 29 22:22:55 CEST 2022
>   *
>   * You must regenerate it.  Use the ./genfixes script.
>   *
> @@ -15,7 +15,7 @@
>   * certain ANSI-incompatible system header files which are fixed to
> work
>   * correctly with ANSI C and placed in a directory that GNU C will
> search.
>   *
> - * This file contains 271 fixup descriptions.
> + * This file contains 272 fixup descriptions.
>   *
>   * See README for more information.
>   *
> @@ -4273,6 +4273,43 @@ static const char* apzGlibc_Cxx_Floatn_4
> 
>  /* * * * * * * * * * * * * * * * * * * * * * * * * *
>   *
> + *  Description of Glibc_Cxx_Floatn_5 fix
> + */
> +tSCC zGlibc_Cxx_Floatn_5Name[] =
> + "glibc_cxx_floatn_5";
> +
> +/*
> + *  File name selection pattern
> + */
> +tSCC zGlibc_Cxx_Floatn_5List[] =
> +  "bits/floatn.h\0bits/floatn-
> common.h\0*/bits/floatn.h\0*/bits/floatn-common.h\0";
> +/*
> + *  Machine/OS name selection pattern
> + */
> +#define apzGlibc_Cxx_Floatn_5Machs (const char**)NULL
> +
> +/*
> + *  content selection pattern - do fix if pattern found
> + */
> +tSCC zGlibc_Cxx_Floatn_5Select0[] =
> +   "^([ \t]*#[ \t]*if !__GNUC_PREREQ \\(7, 0\\) \\|\\| )defined
> __cplusplus\n\
> +([ \t]*#[ \t]+error \"_Float128[xX] supported but no )";
> +
> +#defineGLIBC_CXX_FLOATN_5_TEST_CT  1
> +static tTestDesc aGlibc_Cxx_Floatn_5Tests[] = {
> +  { TT_EGREP,zGlibc_Cxx_Floatn_5Select0, (regex_t*)NULL }, };
> +
> +/*
> + *  Fix Command Arguments for Glibc_Cxx_Floatn_5
> + */
> +static const char* apzGlibc_Cxx_Floatn_5Patch[] = {
> +"format",
> +"%1(defined __cplusplus && !__GNUC_PREREQ (13, 0))\n\
> +%2",
> +(char*)NULL };
> +
> +/* * * * * * * * * * * * * * * * * * * * * * * * * 

Re: [PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines

2022-09-20 Thread will schmidt via Gcc-patches
On Tue, 2022-09-20 at 16:14 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Sep 19, 2022 at 06:19:15PM -0500, will schmidt wrote:
> >   This is the first of a batch of changes that eliminate a number
> > of define TARGET_foo entries we have collected over time.
> 
> Good good :-)
> 
> > TARGET_CTZ is defined as TARGET_MODULO, and has a low number
> > of uses.  References to TARGET_CTZ should be safe to replace
> > with TARGET_MODULO throughout.
> 
> No, please don't.  This has nothing to with "modulo".  If you want to
> say this is just whether we have ISA 3.0 or p9, make a new target
> macro
> for *that* and use that everywhere.
> 
> This is a general issue, that will make the code much more sane if
> you
> can fix it!

> 
> > TARGET_FCTIDZ is entirely unused, and safe to remove.
> 
> Please make separate patches for separate issues.  This makes it much
> easier to review, and MUCH easier for all other ways we need to
> handle
> it (backports, reverts, everything else).  With Git it is *easier* to
> keep separate patches separate than it is to lump it all
> together.  So,
> the trick is to keep things in separate commits during development
> already (and you will find more benefits doing that, too!)

Yup, I actually developed these three (plus a bunch more) separately,
but combined the first three for posting.   I'll split them back out
and repost after a bit. 

> 
> TARGET_FCTIDZ was never used, it always used TARGET_FCFID directly.
> 
> The original PEM mistakenly said this insn is "64-bit only".  This
> was
> fixed in ISA 2.01 .
> 
> > TARGET_FCTIWUZ has a low number of uses, and can be directly
> > replaced with TARGET_POPCNTD.
> 
> It is a p7 (ISA 2.06) insn.  Please make a TARGET_P7 or such?


Yes.  I do have a change later in the (unposted) series to replace
POPCNTD with POWER7, at a glance thats #17 down the line. In review I
agree with your comment that the in-between changes aren't the best
choices. I'll see about skipping the in-between values and going
straight for POPCNTD->POWER7.

I am looking at the TARGET_POWER10 notation as the target style, versus
TARGET_P7, but I can go that direction if we think that would be
preferred.   Maybe it is since this is a retro-fix versus new. :-)


> 
> In the current situation target macros like TARGET_POPCNTD are abused
> to
> mean either "can we use the popcntd insn", or to mean "can we use
> insn
> new on p7".  Or sometimes something in between, or something in this
> general neighbourhood.  It is never clear which is meant, which makes
> it
> very hard to untangle this.  But thanks for trying!  :-)
>
> (Don't let me dicsourage you btw, most is pretty straightforward).

Absolutely..   I do have this mostly covered locally, I just need to
refine a few parts.  :-)

> 
> 
> > * config/rs6000/rs6000.h (TARGET_CTZ): Replace with
> > TARGET_MODULO.
> 
> Changelogs are indented with tabs, and this fits on one line.
> 
> So, please make TARGET_P7 and such, and OPTION_MASKs for those in
> rs6000-cpus.def?

willdo, 
thanks
-Will


> 
> 
> Segher



[PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines

Hi,
  This is the first of a batch of changes that eliminate a number
of define TARGET_foo entries we have collected over time.

TARGET_CTZ is defined as TARGET_MODULO, and has a low number
of uses.  References to TARGET_CTZ should be safe to replace
with TARGET_MODULO throughout.

TARGET_FCTIDZ is entirely unused, and safe to remove.

TARGET_FCTIWUZ has a low number of uses, and can be directly
replaced with TARGET_POPCNTD.

This eliminates three defines.

There should be no codegen changes, and this has regtested OK.
OK for trunk?
Thanks,

gcc/
* config/rs6000/rs6000.h (TARGET_CTZ): Replace with
TARGET_MODULO.
(TARGET_FCTIDZ): Remove.
(TARGET_FCTIWUZ): Replace with TARGET_POPCNTD.
* config/rs6000/rs6000.cc (TARGET_CTZ): Replace with TARGET_MODULO.
* config/rs6000/rs6000.md (ctz2): Replace TARGET_CTZ
with TARGET_MODULO.
(ctz2_hw): Same.
(fixuns_truncsi2): Replace TARGET_FCTIWUZ
with TARGET_POPCNTD.
(fixuns_truncsi2_stfiwx): Same.
(fctiwz_): Same.

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index fcca062a8709..eea427b1ca51 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -21998,11 +21998,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   if (!TARGET_MODULO && (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
 case CTZ:
-  *total = COSTS_N_INSNS (TARGET_CTZ ? 1 : 4);
+  *total = COSTS_N_INSNS (TARGET_MODULO ? 1 : 4);
   return false;
 
 case FFS:
   *total = COSTS_N_INSNS (4);
   return false;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index eb7b21584970..ee887efd1122 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -456,20 +456,17 @@ extern int rs6000_vector_align[];
 || TARGET_PPC_GPOPT/* 970/power4 */\
 || TARGET_POPCNTB  /* ISA 2.02 */  \
 || TARGET_CMPB /* ISA 2.05 */  \
 || TARGET_POPCNTD) /* ISA 2.06 */
 
-#define TARGET_FCTIDZ  TARGET_FCFID
 #define TARGET_STFIWX  TARGET_PPC_GFXOPT
 #define TARGET_LFIWAX  TARGET_CMPB
 #define TARGET_LFIWZX  TARGET_POPCNTD
 #define TARGET_FCFIDS  TARGET_POPCNTD
 #define TARGET_FCFIDU  TARGET_POPCNTD
 #define TARGET_FCFIDUS TARGET_POPCNTD
 #define TARGET_FCTIDUZ TARGET_POPCNTD
-#define TARGET_FCTIWUZ TARGET_POPCNTD
-#define TARGET_CTZ TARGET_MODULO
 #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
 #define TARGET_MADDLD  TARGET_MODULO
 
 #define TARGET_XSCVDPSPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
@@ -1751,11 +1748,11 @@ typedef struct rs6000_args
 
 /* The CTZ patterns that are implemented in terms of CLZ return -1 for input of
zero.  The hardware instructions added in Power9 and the sequences using
popcount return 32 or 64.  */
 #define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
-  (TARGET_CTZ || TARGET_POPCNTD
\
+  (TARGET_MODULO || TARGET_POPCNTD 
\
? ((VALUE) = GET_MODE_BITSIZE (MODE), 2)\
: ((VALUE) = -1, 2))
 
 /* Specify the machine mode that pointers have.
After generation of rtl, the compiler makes no further distinction
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ad5a4cf2ef83..619a87374734 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2414,11 +2414,11 @@ (define_insn "clz2"
 (define_expand "ctz2"
[(set (match_operand:GPR 0 "gpc_reg_operand")
 (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand")))]
   ""
 {
-  if (TARGET_CTZ)
+  if (TARGET_MODULO)
 {
   emit_insn (gen_ctz2_hw (operands[0], operands[1]));
   DONE;
 }
 
@@ -2445,11 +2445,11 @@ (define_expand "ctz2"
 })
 
 (define_insn "ctz2_hw"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
-  "TARGET_CTZ"
+  "TARGET_MODULO"
   "cnttz %0,%1"
   [(set_attr "type" "cntlz")])
 
 (define_expand "ffs2"
   [(set (match_operand:GPR 0 "gpc_reg_operand")
@@ -6326,11 +6326,11 @@ (define_insn_and_split 
"*fix_trunc2_mem"
 })
 
 (define_expand "fixuns_truncsi2"
   [(set (match_operand:SI 0 "gpc_reg_operand")
(unsigned_fix:SI (match_operand:SFDF 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_FCTIWUZ && TARGET_STFIWX"
+  "TARGET_HARD_FLOAT && TARGET_POPCNTD && TARGET_STFIWX"
 {
   if (!TARGET_P8_VECTOR)
 {
   emit_insn (gen_fixuns_truncsi2_stfiwx (operands[0], operands[1]));
   DONE;
@@ -6339,11 +6339,11 @@ (define_expand 

[PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865]

Hi,
  The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
and can be disabled by dependent options when it should not be.
This manifests in the issue seen in PR101865 where -mno-vsx
mistakenly disables _ARCH_PWR8.

This change replaces the relevant TARGET_DIRECT_MOVE references
with a TARGET_POWER8 entry so that the direct_move and power8
features can be enabled or disabled independently.

This is done via the OPTION_MASK definitions, so this
means that some references to the OPTION_MASK_DIRECT_MOVE
option are now replaced with OPTION_MASK_POWER8.

The existing (and rather lengthy) commentary for DIRECT_MOVE remains
in place in rs6000-c.cc:rs6000_target_modify_macros().  The
if-defined logic there will now set a __DIRECT_MOVE__ define when
TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
purposes, but is otherwise unused.  This can be removed in a
subsequent patch, or in an update of this patch, depending on feedback.

This regests cleanly (power8,power9,power10), and resolves
PR 101865 as represented in the tests from (1/2).

OK for trunk?
Thanks,
-Will


gcc/
PR Target/101865
* config/rs6000/rs6000-builtin.cc
(rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
usage with TARGET_POWER8.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros):
Add __DIRECT_MOVE__ define.  Replace _ARCH_PWR8_ define
conditional with OPTION_MASK_POWER8.
* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
Add OPTION_MASK_POWER8 entry.
(POWERPC_MASKS): Same.
* config/rs6000/rs6000.cc (rs6000_option_override_internal):
Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8.
(rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8.
* config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8.
* config/rs6000/vsx.md (vsx_extract_): Replace
TARGET_DIRECT_MOVE usage with TARGET_POWER8.
(define_peephole2): Same.

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 3ce729c1e6de..91a0f39bd796 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P7:
   return TARGET_POPCNTD;
 case ENB_P7_64:
   return TARGET_POPCNTD && TARGET_POWERPC64;
 case ENB_P8:
-  return TARGET_DIRECT_MOVE;
+  return TARGET_POWER8;
 case ENB_P8V:
   return TARGET_P8_VECTOR;
 case ENB_P9:
   return TARGET_MODULO;
 case ENB_P9_64:
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index ca9cc42028f7..41d51b039061 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
  turned off in any of the following conditions:
  1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
enabled.
  2. TARGET_VSX is off.  */
-  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
+  if ((OPTION_MASK_DIRECT_MOVE) != 0)
+rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__");
+  if ((flags & OPTION_MASK_POWER8) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
   if ((flags & OPTION_MASK_MODULO) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_POWER10) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index c3825bcccd84..c873f6d58989 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -48,10 +48,11 @@
system.  */
 #define ISA_2_7_MASKS_SERVER   (ISA_2_6_MASKS_SERVER   \
 | OPTION_MASK_P8_VECTOR\
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DIRECT_MOVE  \
+| OPTION_MASK_POWER8   \
 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
 | OPTION_MASK_QUAD_MEMORY  \
 | OPTION_MASK_QUAD_MEMORY_ATOMIC)
 
 /* ISA masks setting fusion options.  */
@@ -124,10 +125,11 @@
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \
 | OPTION_MASK_DIRECT_MOVE  \
+| OPTION_MASK_POWER8   

[PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option.

Hi,

This adds an assortment of tests to exercise the -mno-vsx option and
confirm the impacts on the ARCH_PWR8 define.

These are based on and inspired by PR 101865, which
reports that _ARCH_PWR8 is disabled when -mno-vsx
is passed on the commandline.

There are a small number of failures introduced by these tests,
those are resolved with the changes in part 2.

OK for trunk?
Thanks,
-Will


gcc/testsuite:
* gcc.target/powerpc/predefine_p7-novsx.c: New test.
* gcc.target/powerpc/predefine_p8-noaltivec-novsx.c: New test.
* gcc.target/powerpc/predefine_p8-novsx.c: New test.
* gcc.target/powerpc/predefine_p9-novsx.c: New test.
* gcc.target/powerpc/predefine_pragma_vsx.c: New test.


diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
new file mode 100644
index ..e842025b4d3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
@@ -0,0 +1,9 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR7 and ARCH_PWR8 defines gets set
+ * when we specify power7, plus options.
+/* This is a variation of the test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power7 -mno-vsx" } */
+/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR7 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
new file mode 100644
index ..c3b705ca3d48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
@@ -0,0 +1,7 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling both altivec 
and vsx. */
+/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-altivec -mno-vsx" } */
+/* { dg-final { scan-file predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
_ARCH_PWR8 1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
_ARCH_PWR9 1($|\\n)" } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
__VSX__ 1($|\\n)" } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
__ALTIVEC__ 1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
new file mode 100644
index ..8b6c69b20104
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
@@ -0,0 +1,8 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
+   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
+/* This is the primary test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-vsx" } */
+/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p8-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
new file mode 100644
index ..eef42c111663
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
@@ -0,0 +1,10 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
+   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
+/* This is the primary test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power9 -mno-vsx" } */
+/* {xfail *-*-*} */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR9 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p9-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c
new file mode 100644
index ..b300600af999
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c
@@ -0,0 +1,83 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */
+
+/* Ensure that if we set a pragma gcc target for an
+   older processor, we do not compile builtins that
+   the older target does not 

[PATCH, rs6000, v2] Cleanup some vstrir define_expand naming inconsistencies

2022-07-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000, v2] Cleanup some vstrir define_expand naming inconsistencies

Hi,
  This cleans up some of the naming around the vstrir and vstril
instruction definitions, with some cosmetic changes for consistency.
No functional changes.
Regtested just in case, no regressions.

[V2]
Used 'direct' instead of 'internal', and cosmetically reworked
the changelog.

OK for trunk?

Thanks,

gcc/
* config/rs6000/altivec.md:
(vstrir_code_): Rename to...
(vstrir_direct_): ... this.
(vstrir_p_code_): Rename to...
(vstrir_p_direct_): ... this.
(vstril_code_): Rename to...
(vstril_direct_): ... this.
(vstril_p_code_): Rename to...
(vstril_p_direct_): ... this.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index efc8ae35c2e7..2c4940f2e21c 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -884,44 +884,44 @@ (define_expand "vstrir_"
(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
UNSPEC_VSTRIR))]
   "TARGET_POWER10"
 {
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstrir_code_ (operands[0], operands[1]));
+emit_insn (gen_vstrir_direct_ (operands[0], operands[1]));
   else
-emit_insn (gen_vstril_code_ (operands[0], operands[1]));
+emit_insn (gen_vstril_direct_ (operands[0], operands[1]));
   DONE;
 })
 
-(define_insn "vstrir_code_"
+(define_insn "vstrir_direct_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))]
   "TARGET_POWER10"
   "vstrir %0,%1"
   [(set_attr "type" "vecsimple")])
 
-;; This expands into same code as vstrir_ followed by condition logic
+;; This expands into same code as vstrir followed by condition logic
 ;; so that a single vstribr. or vstrihr. or vstribl. or vstrihl. instruction
 ;; can, for example, satisfy the needs of a vec_strir () function paired
 ;; with a vec_strir_p () function if both take the same incoming arguments.
 (define_expand "vstrir_p_"
   [(match_operand:SI 0 "gpc_reg_operand")
(match_operand:VIshort 1 "altivec_register_operand")]
   "TARGET_POWER10"
 {
   rtx scratch = gen_reg_rtx (mode);
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstrir_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstrir_p_direct_ (scratch, operands[1]));
   else
-emit_insn (gen_vstril_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstril_p_direct_ (scratch, operands[1]));
   emit_insn (gen_cr6_test_for_zero (operands[0]));
   DONE;
 })
 
-(define_insn "vstrir_p_code_"
+(define_insn "vstrir_p_direct_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))
(set (reg:CC CR6_REGNO)
@@ -936,17 +936,17 @@ (define_expand "vstril_"
(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
UNSPEC_VSTRIR))]
   "TARGET_POWER10"
 {
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstril_code_ (operands[0], operands[1]));
+emit_insn (gen_vstril_direct_ (operands[0], operands[1]));
   else
-emit_insn (gen_vstrir_code_ (operands[0], operands[1]));
+emit_insn (gen_vstrir_direct_ (operands[0], operands[1]));
   DONE;
 })
 
-(define_insn "vstril_code_"
+(define_insn "vstril_direct_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))]
   "TARGET_POWER10"
@@ -962,18 +962,18 @@ (define_expand "vstril_p_"
(match_operand:VIshort 1 "altivec_register_operand")]
   "TARGET_POWER10"
 {
   rtx scratch = gen_reg_rtx (mode);
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstril_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstril_p_direct_ (scratch, operands[1]));
   else
-emit_insn (gen_vstrir_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstrir_p_direct_ (scratch, operands[1]));
   emit_insn (gen_cr6_test_for_zero (operands[0]));
   DONE;
 })
 
-(define_insn "vstril_p_code_"
+(define_insn "vstril_p_direct_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))
(set (reg:CC CR6_REGNO)



[PATCH, rs6000, v2] Additional cleanup of rs6000_builtin_mask

2022-07-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000, v2] Additional cleanup of rs6000_builtin_mask

Hi,
  Post the rs6000 builtins rewrite, some of the leftover builtin
code is redundant and can be removed.
  This replaces the usage of bu_mask in rs6000_target_modify_macros
with checks against the rs6000_isa_flags equivalent directly.  Thusly
the bu_mask variable can be removed.  After this update there
are no other uses of rs6000_builtin_mask_calculate, so that function
can also be safely removed.

No functional change, though some output under debug has been removed.

[V2]
  Per patch review and subsequent investigations, the
rs6000_builtin_mask and x_rs6000_builtin_mask can also be removed, as
well as the entirety of the rs6000_builtin_mask_names table.

gcc/
* config/rs6000/rs6000-c.cc: Update comments.
(rs6000_target_modify_macros): Remove bu_mask references.
(rs6000_define_or_undefine_macro): Replace bu_mask reference
with a rs6000_cpu value check.
(rs6000_cpu_cpp_builtins): Remove rs6000_builtin_mask_calculate()
parameter from call to rs6000_target_modify_macros.
* config/rs6000/rs6000-protos.h (rs6000_target_modify_macros,
rs6000_target_modify_macros_ptr): Remove parameter from extern
for the prototype.
* config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): Remove
parameter from prototype, update calls to this function.
(rs6000_print_builtin_options): Remove prototype, call and function.
(rs6000_builtin_mask_calculate): Remove function.
(rs6000_debug_reg_global): Remove call to rs6000_print_builtin_options.
(rs6000_option_override_internal): Remove rs6000_builtin_mask var
and builtin_mask debug output.
(rs6000_builtin_mask_names): Remove.
(rs6000_pragma_target_parse): Remove prev_bumask, cur_bumask,
diff_bumask references; Update calls to
rs6000_target_modify_ptr.
* config/rs6000/rs6000.opt (rs6000_builtin_mask): Remove.

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 0d13645040ff..4d051b906582 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -333,24 +333,20 @@ rs6000_define_or_undefine_macro (bool define_p, const 
char *name)
   else
 cpp_undef (parse_in, name);
 }
 
 /* Define or undefine macros based on the current target.  If the user does
-   #pragma GCC target, we need to adjust the macros dynamically.  Note, some of
-   the options needed for builtins have been moved to separate variables, so
-   have both the target flags and the builtin flags as arguments.  */
+   #pragma GCC target, we need to adjust the macros dynamically.  */
 
 void
-rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
-HOST_WIDE_INT bu_mask)
+rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
 {
   if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
 fprintf (stderr,
-"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX
-", " HOST_WIDE_INT_PRINT_HEX ")\n",
+"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX ")\n",
 (define_p) ? "define" : "undef",
-flags, bu_mask);
+flags);
 
   /* Each of the flags mentioned below controls whether certain
  preprocessor macros will be automatically defined when
  preprocessing source files for compilation by this compiler.
  While most of these flags can be enabled or disabled
@@ -593,14 +589,12 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
   /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or
  via the target attribute/pragma.  */
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
 rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
 
-  /* options from the builtin masks.  */
-  /* Note that OPTION_MASK_FPRND is enabled only if
- (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell).  */
-  if ((bu_mask & OPTION_MASK_FPRND) != 0)
+  /* Tell the user if we are targeting CELL.  */
+  if (rs6000_cpu == PROCESSOR_CELL)
 rs6000_define_or_undefine_macro (define_p, "__PPU__");
 
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
 rs6000_define_or_undefine_macro (define_p, "__MMA__");
@@ -614,12 +608,11 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
 
 void
 rs6000_cpu_cpp_builtins (cpp_reader *pfile)
 {
   /* Define all of the common macros.  */
-  rs6000_target_modify_macros (true, rs6000_isa_flags,
-  rs6000_builtin_mask_calculate ());
+  rs6000_target_modify_macros (true, rs6000_isa_flags);
 
   if (TARGET_FRE)
 builtin_define ("__RECIP__");
   if (TARGET_FRES)
 builtin_define ("__RECIPF__");
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 3ea010236090..b3c16e7448d8 100644
--- 

Re: [PATCH, rs6000] Additional cleanup of rs6000_builtin_mask

2022-07-15 Thread will schmidt via Gcc-patches
On Thu, 2022-07-14 at 11:28 +0800, Kewen.Lin wrote:
> Hi Will,
> 
> Thanks for the cleanup!  Some comments are inlined.

Hi, 
Thanks for the review.  A few comments and responses below.  TLDR I'll
incorporate the suggestions in V2 that will show up ... after.  :-)

> 
> on 2022/7/14 05:39, will schmidt wrote:
> > [PATCH, rs6000] Additional cleanup of rs6000_builtin_mask
> > 
> > Hi,
> >   Post the rs6000 builtins rewrite, some of the leftover builtin
> > code is redundant and can be removed.
> >   This replaces the remaining usage of bu_mask in
> > rs6000_target_modify_macros() with checks against the rs6000_cpu
> > directly.
> > Thusly the bu_mask variable can be removed.  After that variable
> > is eliminated there are no other uses of
> > rs6000_builtin_mask_calculate(),
> > so that function can also be safely removed.
> > 
> 
> The TargetVariable rs6000_builtin_mask in rs6000.opt is useless, it
> seems
> it can be removed together?

Yes, if I also remove usage of x_rs6000_builtin_mask.   There are a few
remaining reference to x_r_b_m, but those appear safe to remove after
this cleanup as well.  I'll confirm and likely include the removal in
V2.   


> 
> > I have tested this on current systems (P8,P9,P10) without
> > regressions.
> > 
> > OK for trunk?
> > 
> > 
> > Thanks,
> > -Will
> > 
> > 


> >  
> > -  /* Set the builtin mask of the various options used that could
> > affect which
> > - builtins were used.  In the past we used target_flags, but
> > we've run out
> > - of bits, and some options are no longer in target_flags.  */
> > -  rs6000_builtin_mask = rs6000_builtin_mask_calculate ();
> > -  if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
> > -rs6000_print_builtin_options (stderr, 0, "builtin mask",
> > - rs6000_builtin_mask);
> > -
> 
> I wonder if it's a good idea to still dump some information for
> built-in
> functions debugging even with new bif framework, it can be handled in
> a
> separated patch if yes.  The new bif framework adopts stanzas for bif
> guarding, if we want to do similar things, we can refer to the code
> like:





> TARGET_POPCNTB means all bifs with ENB_P5 are available
> TARGET_CMPB means all bifs with ENB_P6 are available
> ...
> 
> , dump information like "current enabled stanzas: ENB_xx, ENB_xxx,
> ..."
> (even without ENB_ prefix).

Possibly.  There does exist some debug already, and I still have some
work in progress related to some of the OPTION and TARGET handling. 
I'll keep this in mind as I continue poking in this space. :-)


> >/* Initialize all of the registers.  */
> >rs6000_init_hard_regno_mode_ok (global_init_p);
> >  
> >/* Save the initial options in case the user does function
> > specific options */
> >if (global_init_p)
> > @@ -24495,17 +24442,15 @@ rs6000_pragma_target_parse (tree args,
> > tree pop_target)
> >  
> >if ((diff_flags != 0) || (diff_bumask != 0))
> > {
> >   /* Delete old macros.  */
> >   rs6000_target_modify_macros_ptr (false,
> > -  prev_flags & diff_flags,
> > -  prev_bumask & diff_bumask);
> > +  prev_flags & diff_flags);
> >  
> >   /* Define new macros.  */
> >   rs6000_target_modify_macros_ptr (true,
> > -  cur_flags & diff_flags,
> > -  cur_bumask & diff_bumask);
> > +  cur_flags & diff_flags);
> > }
> >  }
> >  
> >return true;
> >  }
> > @@ -24732,19 +24677,10 @@ rs6000_print_isa_options (FILE *file, int
> > indent, const char *string,
> >rs6000_print_options_internal (file, indent, string, flags, "-
> > m",
> >  _opt_masks[0],
> >  ARRAY_SIZE (rs6000_opt_masks));
> >  }
> >  
> > -static void
> > -rs6000_print_builtin_options (FILE *file, int indent, const char
> > *string,
> > - HOST_WIDE_INT flags)
> > -{
> > -  rs6000_print_options_internal (file, indent, string, flags, "",
> > -_builtin_mask_names[0],
> > -ARRAY_SIZE
> > (rs6000_builtin_mask_names));
> > -}
> 
> rs6000_builtin_mask_names becomes useless too, can be removed too?

It can.  I'll include removal in V2.
Thanks
-Will

> 
> BR,
> Kewen



[PATCH, rs6000] Additional cleanup of rs6000_builtin_mask

2022-07-13 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Additional cleanup of rs6000_builtin_mask

Hi,
  Post the rs6000 builtins rewrite, some of the leftover builtin
code is redundant and can be removed.
  This replaces the remaining usage of bu_mask in
rs6000_target_modify_macros() with checks against the rs6000_cpu directly.
Thusly the bu_mask variable can be removed.  After that variable
is eliminated there are no other uses of rs6000_builtin_mask_calculate(),
so that function can also be safely removed.

I have tested this on current systems (P8,P9,P10) without regressions.

OK for trunk?


Thanks,
-Will

gcc/
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Remove
bu_mask references.  (rs6000_define_or_undefine_macro): Replace
bu_mask reference with a rs6000_cpu value check.
(rs6000_cpu_cpp_builtins): Remove rs6000_builtin_mask_calculate()
parameter from call to rs6000_target_modify_macros.
* config/rs6000/rs6000-protos.h (rs6000_target_modify_macros,
rs6000_target_modify_macros_ptr): Remove parameter from extern
for the prototype.
* config/rs6000/rs6000.cc (rs6000_target_modify_macros_ptr): Remove
parameter from prototype, update calls to this function.
(rs6000_print_builtin_options): Remove prototype, call and function.
(rs6000_builtin_mask_calculate): Remove function.
(rs6000_debug_reg_global): Remove call to rs6000_print_builtin_options.
(rs6000_option_override_internal): Remove rs6000_builtin_mask var
and builtin_mask debug output.
(rs6000_pragma_target_parse): Update calls to
rs6000_target_modify_ptr.


diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 0d13645040ff..4d051b906582 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -333,24 +333,20 @@ rs6000_define_or_undefine_macro (bool define_p, const 
char *name)
   else
 cpp_undef (parse_in, name);
 }
 
 /* Define or undefine macros based on the current target.  If the user does
-   #pragma GCC target, we need to adjust the macros dynamically.  Note, some of
-   the options needed for builtins have been moved to separate variables, so
-   have both the target flags and the builtin flags as arguments.  */
+   #pragma GCC target, we need to adjust the macros dynamically.  */
 
 void
-rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags,
-HOST_WIDE_INT bu_mask)
+rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
 {
   if (TARGET_DEBUG_BUILTIN || TARGET_DEBUG_TARGET)
 fprintf (stderr,
-"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX
-", " HOST_WIDE_INT_PRINT_HEX ")\n",
+"rs6000_target_modify_macros (%s, " HOST_WIDE_INT_PRINT_HEX ")\n",
 (define_p) ? "define" : "undef",
-flags, bu_mask);
+flags);
 
   /* Each of the flags mentioned below controls whether certain
  preprocessor macros will be automatically defined when
  preprocessing source files for compilation by this compiler.
  While most of these flags can be enabled or disabled
@@ -593,14 +589,12 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
   /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or
  via the target attribute/pragma.  */
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
 rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
 
-  /* options from the builtin masks.  */
-  /* Note that OPTION_MASK_FPRND is enabled only if
- (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell).  */
-  if ((bu_mask & OPTION_MASK_FPRND) != 0)
+  /* Tell the user if we are targeting CELL.  */
+  if (rs6000_cpu == PROCESSOR_CELL)
 rs6000_define_or_undefine_macro (define_p, "__PPU__");
 
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
 rs6000_define_or_undefine_macro (define_p, "__MMA__");
@@ -614,12 +608,11 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
 
 void
 rs6000_cpu_cpp_builtins (cpp_reader *pfile)
 {
   /* Define all of the common macros.  */
-  rs6000_target_modify_macros (true, rs6000_isa_flags,
-  rs6000_builtin_mask_calculate ());
+  rs6000_target_modify_macros (true, rs6000_isa_flags);
 
   if (TARGET_FRE)
 builtin_define ("__RECIP__");
   if (TARGET_FRES)
 builtin_define ("__RECIPF__");
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 3ea010236090..b3c16e7448d8 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -318,13 +318,12 @@ extern void rs6000_pragma_longcall (struct cpp_reader *);
 extern void rs6000_cpu_cpp_builtins (struct cpp_reader *);
 #ifdef TREE_CODE
 extern bool rs6000_pragma_target_parse (tree, tree);
 #endif
 extern void rs6000_activate_target_options (tree new_tree);
-extern void 

Re: [PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies

2022-07-13 Thread will schmidt via Gcc-patches
On Wed, 2022-07-13 at 14:39 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jul 13, 2022 at 01:18:29PM -0500, will schmidt wrote:
> >   This cleans up some of the naming around the vstrir and vstril
> > instruction definitions, with some cosmetic changes for
> > consistency.
> > gcc/
> > * config/rs6000/altivec.md (vstrir_code_): Rename
> > to vstrir_internal_.
> > (vstrir_p_code_): Rename to vstrir_p_internal_.
> > (vstril_code_): Rename to vstril_internal_.
> > (vstril_p_code_): Rename to vstril_p_internal_.
> 
> It doesn't show the new names on the lhs this way.  One way to do
> better
> is to write e.g.
>   (vstril_code_): Rename to...
>   (vstril_internal_): ... this.

Ok.

> 
> It often is a good idea to say "... for VIshort" and similar
> btw.

Ok. 

> 
> I'm not a fan of "internal" either, it doesn't say anything.  At
> least
> put it at the very end of the names please?
I'm easily convinced. ;-)  I wonder if I should just drop "_internal"
entirely and go with "vstrir_".  Otherwise I'll rework to be
"vstrir__internal".
At a glance I see we do have some other existing define_insn entries
with _internal at the tail and a few others embedded in the middle. 
I'll leave a note and perhaps review those after.  :-)

Thanks,
-Will

> 
> Okay for trunk with that changed.  Thanks!
> 
> 
> Segher



[PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies

2022-07-13 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies

Hi,
  This cleans up some of the naming around the vstrir and vstril
instruction definitions, with some cosmetic changes for consistency.
No functional changes.
Regtested just in case, no regressions.  :-)
OK for trunk?

Thanks,

gcc/
* config/rs6000/altivec.md (vstrir_code_): Rename
to vstrir_internal_.
(vstrir_p_code_): Rename to vstrir_p_internal_.
(vstril_code_): Rename to vstril_internal_.
(vstril_p_code_): Rename to vstril_p_internal_.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index efc8ae35c2e7..5aea02e9ad6e 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -884,44 +884,44 @@ (define_expand "vstrir_"
(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
UNSPEC_VSTRIR))]
   "TARGET_POWER10"
 {
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstrir_code_ (operands[0], operands[1]));
+emit_insn (gen_vstrir_internal_ (operands[0], operands[1]));
   else
-emit_insn (gen_vstril_code_ (operands[0], operands[1]));
+emit_insn (gen_vstril_internal_ (operands[0], operands[1]));
   DONE;
 })
 
-(define_insn "vstrir_code_"
+(define_insn "vstrir_internal_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))]
   "TARGET_POWER10"
   "vstrir %0,%1"
   [(set_attr "type" "vecsimple")])
 
-;; This expands into same code as vstrir_ followed by condition logic
+;; This expands into same code as vstrir followed by condition logic
 ;; so that a single vstribr. or vstrihr. or vstribl. or vstrihl. instruction
 ;; can, for example, satisfy the needs of a vec_strir () function paired
 ;; with a vec_strir_p () function if both take the same incoming arguments.
 (define_expand "vstrir_p_"
   [(match_operand:SI 0 "gpc_reg_operand")
(match_operand:VIshort 1 "altivec_register_operand")]
   "TARGET_POWER10"
 {
   rtx scratch = gen_reg_rtx (mode);
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstrir_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstrir_p_internal_ (scratch, operands[1]));
   else
-emit_insn (gen_vstril_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstril_p_internal_ (scratch, operands[1]));
   emit_insn (gen_cr6_test_for_zero (operands[0]));
   DONE;
 })
 
-(define_insn "vstrir_p_code_"
+(define_insn "vstrir_p_internal_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIR))
(set (reg:CC CR6_REGNO)
@@ -936,17 +936,17 @@ (define_expand "vstril_"
(unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")]
UNSPEC_VSTRIR))]
   "TARGET_POWER10"
 {
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstril_code_ (operands[0], operands[1]));
+emit_insn (gen_vstril_internal_ (operands[0], operands[1]));
   else
-emit_insn (gen_vstrir_code_ (operands[0], operands[1]));
+emit_insn (gen_vstrir_internal_ (operands[0], operands[1]));
   DONE;
 })
 
-(define_insn "vstril_code_"
+(define_insn "vstril_internal_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))]
   "TARGET_POWER10"
@@ -962,18 +962,18 @@ (define_expand "vstril_p_"
(match_operand:VIshort 1 "altivec_register_operand")]
   "TARGET_POWER10"
 {
   rtx scratch = gen_reg_rtx (mode);
   if (BYTES_BIG_ENDIAN)
-emit_insn (gen_vstril_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstril_p_internal_ (scratch, operands[1]));
   else
-emit_insn (gen_vstrir_p_code_ (scratch, operands[1]));
+emit_insn (gen_vstrir_p_internal_ (scratch, operands[1]));
   emit_insn (gen_cr6_test_for_zero (operands[0]));
   DONE;
 })
 
-(define_insn "vstril_p_code_"
+(define_insn "vstril_p_internal_"
   [(set (match_operand:VIshort 0 "altivec_register_operand" "=v")
(unspec:VIshort
   [(match_operand:VIshort 1 "altivec_register_operand" "v")]
   UNSPEC_VSTRIL))
(set (reg:CC CR6_REGNO)



Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-08 Thread will schmidt via Gcc-patches
On Tue, 2022-06-07 at 23:16 -0400, Michael Meissner wrote:
> On Tue, Jun 07, 2022 at 07:59:34PM -0500, Peter Bergner wrote:
> > On 6/7/22 4:24 PM, Segher Boessenkool wrote:
> > > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> > > > I think I mentioned this offline, but I'd prefer a negative target flag,
> > > > something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, 
> > > > meaning we'd
> > > > generate stxvp by default.
> > > 
> > > NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
> > > positive.  All of them.
> > 
> > That's not what I was asking for.  I totally agree that 
> > -mno-store-vector-pair
> > should disable generating stxvp and that -mstore-vector-pair should enable
> > generating it.  What I asked for was that the internal flag we use to enable
> > and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR 
> > is
> > true when we use -mno-store-vector-pair and false when using 
> > -mstore-vector-pair.
> > That way we can add that flag to power10's rs6000-cpu.def entry and then 
> > we're
> > done.  What I don't want to have to do is that if/when power87 is released, 
> > we
> > still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to
> > get stxvp insns generated.  That adds a cost to every cpu after power10 
> > since
> > we'd have to remember to add that flag to every follow-on cpu.
> 
> FWIW, I really dislike having negative flags like that (just talking about the
> option mask internals, not the user option).

I can't tell there is agreement in either direction, i'll throw some
comments out and see if that helps make a decision. 

I agree with avoiding the negative flags.  Whenever I run across a code
snippet reading  "if (! TARGET_NOT_FOO) ... " it's time to double-check 
everything.  :-)  

If the proposal is to have "TARGET_NO_STORE_VECTOR_PAIR" set to "off",
I'd counter propose whatever variation possible to drop the "NO" from
the string. i.e. "TARGET_STORE_VECTOR_PAIR" set to however it makes
sense to indicate enabled, or not.

All that said, .. with a strong preference to have the internal flags
matching the option flags as closely as possible.


> 
> I don't view the cost to add one postive flag to the next CPU as bad, as it
> will be a one time cost.  Presumably it would be set also next++ CPU.  This is
> like power8 is all of the power7 flags + new flags.  Power9 is all of the
> power8 flags + new flags.  I.e. in general it is cumulative.  Yes, I'm aware
> there are times when there are breaks, but hopefully those are rare.

This sounds reasonable.   Some weight could be added for which way to
bias the flag based on a guess of what the 'power87' release will
allow, but ultimately that shouldn't really matter. 

And no, power87 isnt' real AFAIK,.. I'm just repeating the example
provided by Peter :-) 

Thanks
-Will

> 
> Otherwise it is like the mess with -mpower8-fusion, where going from power8 to
> power9 we have to clear the fusion flag.  If store vector pair is a postive
> flag, then it isn't set in power10 flags, but it might be set in next cpu
> flags.  But if it is a negative flag, we have to explicitly clear it.
> 
> We can do it, but I just prefer to go with the positive flag approach.
> 



Re: [PATCH, V3] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293

2022-06-07 Thread will schmidt via Gcc-patches
On Tue, 2022-06-07 at 15:21 -0500, Segher Boessenkool wrote:
> On Tue, Jun 07, 2022 at 02:26:17PM -0500, will schmidt wrote:
> > On Mon, 2022-06-06 at 20:31 -0400, Michael Meissner wrote:
> > >  (define_insn "vsx_xxspltd_"
> > >[(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> > > -(unspec:VSX_D [(match_operand:VSX_D 1
> > > "vsx_register_operand"
> > > "wa")
> 
> Someone (you?) uses format=flawed.  You cannot reply to emails that
> contain patches that way, it messes up everything :-(

Right..  Something on my end may be posessed, several of my emails
today have tried to go all HTML on me, and or otherwise gone
format-wonky, which I do not want.  ;-) 


> 
> > > -(match_operand:QI 2 "u5bit_cint_operand" "i")]
> > > -  UNSPEC_VSX_XXSPLTD))]
> > > + (vec_duplicate:VSX_D
> > > +  (vec_select:
> > > +   (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
> > > +   (parallel [(match_operand:QI 2 "const_0_to_1_operand"
> > > "i")]]
> > >"VECTOR_MEM_VSX_P (mode)"
> > 
> > Noting that
> > (define_mode_iterator VSX_D [V2DF V2DI])
> > (define_mode_attr VS_scalar [(V1TI  "TI")
> >  (V2DF  "DF")
> >  (V2DI  "DI")
> >  (V4SF  "SF")
> >  (V4SI  "SI")
> >  (V8HI  "HI")
> >  (V16QI "QI")])
> 
> Yeah, the comment
> ;; Map the scalar mode for a vector type
> is misleading, in more ways than one :-(
> 
> And the whole thing is just the same as VEC_base anyway, so it is
> much
> better to just use that.
> 
> 
> Segher



Re: [PATCH 3/3] Adjust MMA tests to account for no store vector pair.

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:56 -0400, Michael Meissner wrote:
> [PATCH 3/3] Adjust MMA tests to account for no store vector pair.
> 
> In changing the default for generating the store vector pair instructions,
> I had to adjust several of the MMA tests to remove checking for these
> instructions.  Mostly I just deleted the scan-assembler lines checking for
> stxvp.  In two of the tests, I added the -mstore-vector-pair option since
> the point of the test was to check for specific cases with store vector
> pair instructions.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
> 1)Little endian power10 using the --with-cpu=power10 option.
> 
> 2)Little endian power9 using the --with-cpu=power9 option.
> 
> 3)Big endian power8 using the --with-cpu=power8 option.  On this 
> system,
>   both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs.  Can I check this patch into the
> trunk?  If there are no changes needed for the backports, can I check this
> code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/mma-builtin-1.c: Eliminate checking for store
>   vector pair instructions.
>   * gcc.target/powerpc/mma-builtin-10-pair.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-10-quit.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-2.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-3.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-4.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-5.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-6.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-7.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-9.c: Likewise.
>   * gcc.target/powerpc/mma-builtin-8.c: Add -mstore-vector-pair.
>   * gcc.target/powerpc/pr102976.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c   | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-10-pair.c | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-10-quad.c | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-2.c   | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-3.c   | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c   | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c   | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c   | 1 -
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-7.c   | 2 --
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-8.c   | 2 +-
>  gcc/testsuite/gcc.target/powerpc/mma-builtin-9.c   | 2 --
>  gcc/testsuite/gcc.target/powerpc/pr102976.c| 6 +-
>  12 files changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c 
> b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> index 69ee826e1be..47b45b00403 100644
> --- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> @@ -260,7 +260,6 @@ foo13b (__vector_quad *dst, __vector_quad *src, vec_t 
> *vec)
> 
>  /* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
>  /* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
> -/* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
>  /* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
>  /* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
>  /* { dg-final { scan-assembler-times {\mxvbf16ger2\M} 1 } } */




This all seems straightforward.   LGTM, thanks. 
-Will




Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:55 -0400, Michael Meissner wrote:
> [PATCH 1/3] Disable generating store vector pair.
> 
> Testing has revealed that the power10 has some slowdowns if the store
> vector pair instruction is generated in some cases.  This patch disables
> generating the store vector pair instructions (stxvp, pstxvp, and stxvpx)
> unless an undocumented switch is used.  It is anticipated that perhaps
> with future machines we can generate the store vector pair instruction.
> 
> This patch does a split after reload to convert a store vector pair
> instruction into a pair of store vector instructions.
> 
> We do continue to generate the load vector pair instructions (lxvp, plxvp,
> and lxvpx), since we have found that in code that heavily uses MMA, it is
> still a win to generate the load vector pair instructions.
> 
> There are two future patches planed:
> 
> 1)Disable block moves from generating load/store vector pair
>   instructions unless the the store vector pair instructions are
>   being generted.
> 
> 2)Make the built-in functions for generating store vector pair
>   always generate those instructions even if store vector pair
>   instructions are disabled.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
> 1)Little endian power10 using the --with-cpu=power10 option.
> 
> 2)Little endian power9 using the --with-cpu=power9 option.
> 
> 3)Big endian power8 using the --with-cpu=power8 option.  On this 
> system,
>   both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs except for the tests that are
> modified in patch #3 in these series of patches.  Can I check this patch
> into the trunk?  If there are no changes needed for the backports, can I
> check this code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/mma.md (movoo): Disable generating store vector
>   pair instructions unless these are enabled by the user.
>   (movxo): Likewise.
>   * config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): If store
>   vector pair instructions are disabled, do not allow vector pair
>   addresses to be indexed.
>   (rs6000_split_multireg_move): Do not split XOmode stores into two
>   store vector pair instructions unless store vector pair
>   instructions are enabled.
>   * config/rs6000/rs6000.md (isa attribute): Add stxvp attribute.
>   (enabled attribute): Disable alternative using store vector pair
>   instructions unless they are enabled.
>   * config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/p10-store-vector-pair-1.c: New test.
>   * gcc.target/powerpc/p10-store-vector-pair-2.c: New test.
> ---
>  gcc/config/rs6000/mma.md  | 41 ++
>  gcc/config/rs6000/rs6000.cc   |  9 +-
>  gcc/config/rs6000/rs6000.md   |  8 +-
>  gcc/config/rs6000/rs6000.opt  |  4 +
>  .../powerpc/p10-store-vector-pair-1.c | 82 +++
>  .../powerpc/p10-store-vector-pair-2.c | 81 ++
>  6 files changed, 206 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index a183b6a168a..9b5f243b88d 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -274,26 +274,35 @@ (define_expand "movoo"
>DONE;
>  })
> 
> +;; By default for power10, do not generate the stxvp/pstxvp/stxvpx
> +;; instructions.  Instead, split these instructions into two separate store
> +;; vector instructions.  We do always generate a lxvp/plxvp/lxvpx 
> instruction.
> +;; We leave in the support for generating stxvp/pstxvp/stxvpx in future
> +;; machines.

... and if (undocumented) STORE_VECTOR_PAIR option is indicated ?

Nothing else jumps out at me.  

Thanks
-Will




Re: [PATCH 2/3] Disable generating load/store vector pairs for block copies.

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:55 -0400, Michael Meissner wrote:
> [PATCH 2/3] Disable generating load/store vector pairs for block copies.
> 
> If the store vector pair instruction is disabled, do not generate block
> copies that use load and store vector pair instructions.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
> 1)Little endian power10 using the --with-cpu=power10 option.
> 
> 2)Little endian power9 using the --with-cpu=power9 option.
> 
> 3)Big endian power8 using the --with-cpu=power8 option.  On this 
> system,
>   both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs.  Can I check this patch into the
> trunk?  If there are no changes needed for the backports, can I check this
> code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/rs6000-string.cc (expand_block_move): If the store
>   vector pair instructions are disabled, do not generate block
>   copies using load and store vector pairs.
> ---
>  gcc/config/rs6000/rs6000-string.cc | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index 59d901ac68d..1b18e043269 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -2787,14 +2787,16 @@ expand_block_move (rtx operands[], bool might_overlap)
>rtx src, dest;
>bool move_with_length = false;
> 
> -  /* Use OOmode for paired vsx load/store.  Use V2DI for single
> -  unaligned vsx load/store, for consistency with what other
> -  expansions (compare) already do, and so we can use lxvd2x on
> -  p8.  Order is VSX pair unaligned, VSX unaligned, Altivec, VSX
> -  with length < 16 (if allowed), then gpr load/store.  */
> +  /* Use OOmode for paired vsx load/store unless the store vector pair
> +  instructions are disabled.  Use V2DI for single unaligned vsx
> +  load/store, for consistency with what other expansions (compare)
> +  already do, and so we can use lxvd2x on p8.  Order is VSX pair
> +  unaligned, VSX unaligned, Altivec, VSX with length < 16 (if allowed),
> +  then gpr load/store.  */
> 
>if (TARGET_MMA && TARGET_BLOCK_OPS_UNALIGNED_VSX
> && TARGET_BLOCK_OPS_VECTOR_PAIR
> +   && TARGET_STORE_VECTOR_PAIR
> && bytes >= 32
> && (align >= 256 || !STRICT_ALIGNMENT))


Seems straightforward.  LGTM, 
Thanks
-Will




>   {
> -- 
> 2.35.3
> 
> 



Re: [PATCH, V3] Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target 99293

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:31 -0400, Michael Meissner wrote:
> Optimize vec_splats of constant vec_extract for V2DI/V2DF, PR target
> 99293.
> 
> This is version 3 of the patch.  The original patch was:
> 
> > Date: Mon, 28 Mar 2022 12:26:02 -0400
> > Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract
> > for V2DI/V2DF, PR target 99293.
> > Message-ID: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html
> 
> Version 2 of the patch was:
> 
> > Date: Fri, 13 May 2022 10:49:26 -0400
> > Subject: [PATCH] Optimize vec_splats of constant V2DI/V2DF
> > vec_extract, PR target/99293
> > Message-ID: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594797.html
> 
> The differences between version 2 and version 3 was to clean up the
> description
> of what the patch does, and to make the example test case clear.
> 
> In PR target/99293, it was pointed out that doing:
> 
>   vector long long dest0, dest1, src;
>   /* ... */
>   dest0 = vec_splats (vec_extract (src, 0));
>   dest1 = vec_splats (vec_extract (src, 1));
> 
> would generate slower code.
> 
> It generates the following code on power8:
> 
>   ;; vec_splats (vec_extract (src, 0))
>   xxpermdi 0,34,34,3
>   xxpermdi 34,0,0,0
> 
>   ;; vec_splats (vec_extract (src, 1))
>   xxlor 0,34,34
>   xxpermdi 34,0,0,0
> 
> However on power9 and power10 it generates:
> 
>   ;; vec_splats (vec_extract (src, 0))
>   mfvsld 3,34
>   mtvsrdd 34,9,9
> 
>   ;; vec_splats (vec_extract (src, 1))
>   mfvsrd 9,34
>   mtvsrdd 34,9,9
> 
> This is due to the power9 having the mfvsrld instruction which can
> extract
> either 64-bit element into a GPR.  While there are alternatives for
> both
> vector registers and GPR registers, the register allocator prefers to
> put
> DImode into GPR registers.
> 
> In this case, it is better to have a single combiner pattern that can
> generate
> a single xxpermdi, instead of 2 insnsns (the extract and then the
> concat).
> This is true if the two operations are move from vector register and
> move to
> vector register.  As Segher pointed out in a previous version of the
> patch, the
> combiner already tries doing creating a (vec_duplicate (vec_select
> ...))
> pattern, but we didn't provide one.
> 
> This patch reworks vsx_xxspltd_ for V2DImode and V2DFmode so
> that it now
> uses VEC_DUPLICATE, which the combiner checks for.

Ok.

> 
> I have built Spec 2017 with this patch installed, and the cam4_r
> benchmark
> is the only benchmark that generated different code (3
> mfvsrld/mtvsrdd
> pairs of instructions were replaced with xxpermdi).
> 
> I have built bootstrap versions on the following systems and I have
> run
> the regression tests.  There were no regressions in the runs:
> 
>   Power9 little endian, --with-cpu=power9
>   Power10 little endian, --with-cpu=power10
>   Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit
> tests)

Ok.


> 
> Can I install this into the trunk?  After a burn-in period, can I
> backport
> and install this into GCC 11 and GCC 10 branches?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/
>   PR target/99293
>   * config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove
>   UNSPEC_VSX_XXSPLTD case.
>   * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete.
>   (vsx_xxspltd_): Rewrite to use VEC_DUPLICATE.
> 
> gcc/testsuite:
>   PR target/99293
>   * gcc.target/powerpc/builtins-1.c: Update insn count.
>   * gcc.target/powerpc/pr99293.c: New test.
> ---
>  gcc/config/rs6000/rs6000-p8swap.cc|  1 -
>  gcc/config/rs6000/vsx.md  | 19 +++
>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr99293.c| 51
> +++
>  4 files changed, 62 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 275702fee1b..3160fcbdeca 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -807,7 +807,6 @@ rtx_is_swappable_p (rtx op, unsigned int
> *special)
> case UNSPEC_VUPKLU_V4SF:
>   return 0;
> case UNSPEC_VSPLT_DIRECT:
> -   case UNSPEC_VSX_XXSPLTD:
>   *special = SH_SPLAT;
>   return 1;
> case UNSPEC_REDUC_PLUS:
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 1b75538f42f..a1a1ce95195 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -296,7 +296,6 @@ (define_c_enum "unspec"
> UNSPEC_VSX_XXPERM
> 
> UNSPEC_VSX_XXSPLTW
> -   UNSPEC_VSX_XXSPLTD
> UNSPEC_VSX_DIVSD
> UNSPEC_VSX_DIVUD
> UNSPEC_VSX_DIVSQ

Ok.

> @@ -4673,16 +4672,18 @@ (define_insn "vsx_vsplt_di"
>  ;; V2DF/V2DI splat for use by vec_splat builtin
>  (define_insn "vsx_xxspltd_"
>[(set (match_operand:VSX_D 0 "vsx_register_operand" 

Re: [PATCH,RS6000 2/5] Rework the RS6000_BTM defines.

2022-06-07 Thread will schmidt via Gcc-patches
On Tue, 2022-06-07 at 10:50 +0800, Kewen.Lin wrote:
> Hi Will,


Hi!

> 
> The whole series looks good to me, thanks!

:-)

> IMHO one place can be
> further
refactored, not sure if it's worth to updating together in
> this series, it's ...

Additional comments below.  
I've made note of the comments, and request (ask) that this be
approved, with a pinky promise that I intend to follow up on the
suggestions in my next patch series.


> 
> on 2022/6/7 06:05, will schmidt wrote:
> > [PATCH,RS6000 2/5) Rework the RS6000_BTM defines.
> > 
> > The RS6000_BTM_ definitions are mostly unused after the
> > rs6000
> > builtin code was reworked.  The remaining references can be
> > replaced
> > with the OPTION_MASK_ and MASK_ equivalents.
> > 
> > This patch remvoes the defines:
> > RS6000_BTM_FRES, RS6000_BTM_FRSQRTE, RS6000_BTM_FRSQRTES,
> > RS6000_BTM_POPCNTD, RS6000_BTM_CELL, RS6000_BTM_DFP,
> > RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128, RS6000_BTM_64BIT,
> > RS6000_BTM_POWERPC64, RS6000_BTM_FLOAT128, RS6000_BTM_FLOAT128_HW
> > RS6000_BTM_MMA, RS6000_BTM_P10.
> > 
> > I note that the BTM -> OPTION_MASK mappings are not always 1-to-1.
> > in particular the BTM_FRES and BTM_FRSQRTE values were both mapped
> > to
> > OPTION_MASK_PPC_GFXOPT, while the BTM_FRE and BTM_FRSQRTES both
> > mapped
> > to OPTION_MASK_POPCNTB.  In total I spent quite a bit of time
> > double-checking these since it looked like copy/paste errors.  I
> > split
> > some of these changes out into a subsequent patch to limit the
> > amount
> > of potential confusion in any particular patch.
> > 
> > gcc/
> > * config/rs6000/rs6000-c.cc: Update comments.
> > * config/rs6000/rs6000.cc (RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
> > RS6000_BTM_FRSQRTES, RS6000_BTM_POPCNTD, RS6000_BTM_CELL,
> > RS6000_BTM_64BIT, RS6000_BTM_POWERPC64, RS6000_BTM_DFP,
> > RS6000_BTM_HARD_FLOAT,RS6000_BTM_LDBL128, RS6000_BTM_FLOAT128,
> > RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA, RS6000_BTM_P10):
> > Replace
> > with OPTION_MASK_PPC_GFXOPT, OPTION_MASK_PPC_GFXOPT,
> > OPTION_MASK_POPCNTB, OPTION_MASK_POPCNTD,
> > OPTION_MASK_FPRND, MASK_64BIT, MASK_POWERPC64,
> > OPTION_MASK_DFP, OPTION_MASK_SOFT_FLOAT, OPTION_MASK_MULTIPLE,
> > OPTION_MASK_FLOAT128_KEYWORD, OPTION_MASK_FLOAT128_HW,
> > OPTION_MASK_MMA, OPTION_MASK_POWER10.
> > * config/rs6000/rs6000.h (RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
> > RS6000_BTM_FRSQRTES, RS6000_BTM_POPCNTD, RS6000_BTM_CELL,
> > RS6000_BTM_DFP, RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128,
> > RS6000_BTM_64BIT, RS6000_BTM_POWERPC64, RS6000_BTM_FLOAT128,
> > RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA, RS6000_BTM_P10):
> > Delete.
> > 
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index 9c8cbd7a66e4..4c99afc761ae 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -594,13 +594,13 @@ rs6000_target_modify_macros (bool define_p,
> > HOST_WIDE_INT flags,
> >   via the target attribute/pragma.  */
> >if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
> >  rs6000_define_or_undefine_macro (define_p,
> > "__FLOAT128_HARDWARE__");
> >  
> >/* options from the builtin masks.  */
> > -  /* Note that RS6000_BTM_CELL is enabled only if (rs6000_cpu ==
> > - PROCESSOR_CELL) (e.g. -mcpu=cell).  */
> > -  if ((bu_mask & RS6000_BTM_CELL) != 0)
> > +  /* Note that OPTION_MASK_FPRND is enabled only if
> > + (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell).  */
> > +  if ((bu_mask & OPTION_MASK_FPRND) != 0)
> >  rs6000_define_or_undefine_macro (define_p, "__PPU__");
> >  
> 
> ... here.  In function rs6000_target_modify_macros, bu_mask is used
> by
> two places, the beginning debug outputting and the above
> OPTION_MASK_FPRND
> check.  I wonder if we can get rid of bu_mask and just use sth. like:
> 
> (rs6000_cpu == PROCESSOR_CELL) && (flags & OPTION_MASK_FPRND)
> 

Agreed.

> // the others are using "flags &", it's passed by rs6000_isa_flags,
> // should be the same as just using OPTION_MASK_FPRND.
> 
> If we drop bu_mask in function rs6000_target_modify_macros, function

> rs6000_builtin_mask_calculate will have only one use place in
> function
> rs6000_option_override_internal.  IMHO this function
> rs6000_builtin_mask_calculate also becomes stale after built-in
> function
> rewriting and needs some updates with new bif framework later.

The DEBUG output using the builtin_mask still appeared to have some
potential value, but I can make a point to investigate that further.

I do have in my queue to try to resolve PR 101865, that is the bug with
ARCH_PWR8.  I got into this OPTION_MASK side-quest as part of the
investigation into that bug.   I can make a point to investigate and
clean up the bu_mask usage as part of that series.

Thanks
-Will

> 
> BR,
> Kewen



[PATCH,RS6000 4/5] Replace MASK_ with OPTION_MASK_

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH,RS6000 4/5] Replace MASK_ with OPTION_MASK_

This replaces the MASK_ references with OPTION_MASK_
and removes the now unused defines.

This patch removes the defines for
MASK_ALTIVEC, MASK_CMPB, MASK_CRYPTO, MASK_DFP,
MASK_DIRECT_MOVE, MASK_DLMZB, MASK_EABI, MASK_FLOAT128_KEYWORD,
MASK_FLOAT128_HW, MASK_FPRND, MASK_P8_FUSION, MASK_HARD_FLOAT,
MASK_HTM, MASK_MFCRF, MASK_MMA, MASK_MULHW, MASK_MULTIPLE,
MASK_NO_UPDATE.

gcc/
* config/rs6000/aix71.h (TARGET_DEFAULT): Replace MASK_MFCRF with
OPTION_MASK_MFCRF.
* config/rs6000/darwin.h (TARGET_DEFAULT): Replace MASK_MULTIPLE with
OPTION_MASK_MULTIPLE.
* config/rs6000/darwin64-biarch.h (TARGET_DEFAULT): Same.
* config/rs6000/default.h (TARGET_DEFAULT): Replace MASK_MFCRF with
OPTION_MASK_MFCRF.
* config/rs6000/eabi.h (TARGET_DEFAULT): Replace MASK_EABI with
OPTION_MASK_EABI.
* config/rs6000/eabialtivec.h (TARGET_DEFAULT): Same.
* config/rs6000/linuxaltivec.h (TARGET_DEFAULT): Replace
MASK_ALTIVEC with OPTION_MASK_ALTIVEC.
* config/rs6000/rs6000-cpus.def (MASK_ALTIVEC, MASK_CMPB,
MASK_CRYPTO, MASK_DFP, MASK_DIRECT_MOVE, MASK_DLMZB, MASK_EABI,
MASK_FLOAT128_KEYWORD, MASK_FLOAT128_HW, MASK_FPRND,
MASK_P8_FUSION, MASK_HARD_FLOAT, MASK_HTM, MASK_ISEL, MASK_MFCRF,
MASK_MMA, MASK_MULHW, MASK_MULTIPLE, MASK_NO_UPDATE):
Replace with
OPTION_MASK_ALTIVEC, OPTION_MASK_CMPB, OPTION_MASK_CRYPTO,
OPTION_MASK_DFP, OPTION_MASK_DIRECT_MOVE, OPTION_MASK_DLMZB,
OPTION_MASK_EABI, OPTION_MASK_FLOAT128_KEYWORD,
OPTION_MASK_FLOAT128_HW, OPTION_MASK_FPRND, OPTION_MASK_P8_FUSION,
OPTION_MASK_HARD_FLOAT, OPTION_MASK_HTM, OPTION_MASK_ISEL,
OPTION_MASK_MFCRF, OPTION_MASK_MMA, OPTION_MASK_MULHW,
OPTION_MASK_MULTIPLE, OPTION_MASK_NO_UPDATE.
* config/rs6000/rs6000.cc (rs6000_darwin_file_start): Replace
MASK_MFCRF, MASK_ALTIVEC with OPTION_MASK_MFCRF, OPTION_MASK_ALTIVEC.
* config/rs6000/rs6000.h (TARGET_DEFAULT): Replace MASK_MULTIPLE
with OPTION_MASK_MULTIPLE.
(MASK_ALTIVEC, MASK_CMPB, MASK_CRYPTO, MASK_DFP,
MASK_DIRECT_MOVE, MASK_DLMZB, MASK_EABI, MASK_FLOAT128_KEYWORD,
MASK_FLOAT128_HW, MASK_FPRND, MASK_P8_FUSION, MASK_HARD_FLOAT,
MASK_HTM, MASK_ISEL, MASK_MFCRF, MASK_MMA, MASK_MULHW,
MASK_MULTIPLE, MASK_NO_UPDATE): Delete.
* config/rs6000/vxworks.h (TARGET_DEFAULT): Replace MASK_EABI
with OPTION_MASK_EABI.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 57e07bcc65ee..3f7e6e380ca8 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -135,13 +135,14 @@ do {  
\
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
 
 #undef  TARGET_DEFAULT
 #ifdef RS6000_BI_ARCH
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF | 
MASK_POWERPC64 | MASK_64BIT)
+#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #else
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF)
+#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | OPTION_MASK_MFCRF)
 #endif
 
 #undef  PROCESSOR_DEFAULT
 #define PROCESSOR_DEFAULT PROCESSOR_POWER7
 #undef  PROCESSOR_DEFAULT64
diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index b5cef42610f7..ec02022c6a9f 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_MULTIPLE | MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index 57b0fab084e3..a53e567f8b73 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | MASK_MULTIPLE | MASK_PPC_GFXOPT)
+   | OPTION_MASK_MULTIPLE | MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index 4bf0feef2f8e..f3a81404eff3 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -22,14 +22,16 @@ along with GCC; see the file COPYING3.  If not see
 

[PATCH,RS6000 5/5] Replace MASK_ usage with OPTION_MASK_

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH,RS6000 5/5] Replace MASK_ usage with OPTION_MASK_

This continues the changes of replacing the MASK_ defines
with their OPTION_MASK_ equivalents.

This patch removes the defines for
MASK_P8_VECTOR, MASK_P9_VECTOR, MASK_P9_MISC, MASK_POPCNTB,
MASK_POPCNTD, MASK_PPC_GFXOPT, MASK_PPC_GPOPT, MASK_RECIP_PRECISION,
MASK_SOFT_FLOAT, MASK_VSX, MASK_POWER10, MASK_P10_FUSION.

gcc/
* config/rs6000/aix71.h (MASK_PPC_GPOPT, MASK_PPC_GFXOPT): Replace with
OPTION_MASK_PPC_GPOPT, OPTION_MASK_PPC_GFXOPT.
* config/rs6000/darwin.h (MASK_PPC_GFXOPT): Replace with
OPTION_MASK_PPC_GFXOPT.
* config/rs6000/darwin64-biarch.h (MASK_PPC_GFXOPT): Same.
* config/rs6000/default64.h (MASK_PPC_GPOPT, MASK_PPC_GFXOPT): Replace 
with
OPTION_MASK_PPC_GPOPT, OPTION_MASK_PPC_GFXOPT.
* config/rs6000/rs6000-c.cc: Update comment.
* config/rs6000/rs6000-cpus.def: Update RS6000_CPU macro calls.
* config/rs6000/rs6000.cc (rs6000_darwin_file_start): Replace
MASK_PPC_GPOPT with OPTION_MASK_PPC_GPOPT.
(rs6000_builtin_mask_names): Replace MASK_PPC_GFXOPT, MASK_POPCNTB
with OPTION_MASK_PPC_GFXOPT, OPTION_MASK_POPCNTB.
* config/rs6000/rs6000.h: (MASK_P8_VECTOR, MASK_P9_VECTOR,
MASK_P9_MISC, MASK_POPCNTB, MASK_POPCNTD, MASK_PPC_GFXOPT,
MASK_PPC_GPOPT, MASK_RECIP_PRECISION, MASK_SOFT_FLOAT,
MASK_VSX, MASK_POWER10, MASK_P10_FUSION): Delete.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 3f7e6e380ca8..323d7c884d18 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -135,14 +135,15 @@ do {  
\
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
 
 #undef  TARGET_DEFAULT
 #ifdef RS6000_BI_ARCH
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT \
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
| OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #else
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | OPTION_MASK_MFCRF)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF)
 #endif
 
 #undef  PROCESSOR_DEFAULT
 #define PROCESSOR_DEFAULT PROCESSOR_POWER7
 #undef  PROCESSOR_DEFAULT64
diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index ec02022c6a9f..6a8845eb3bb7 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index a53e567f8b73..6515bcc8bf5a 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | OPTION_MASK_MULTIPLE | MASK_PPC_GFXOPT)
+   | OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index f3a81404eff3..0bec94935e2b 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -28,10 +28,10 @@ along with GCC; see the file COPYING3.  If not see
| MASK_LITTLE_ENDIAN)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower8"
 #else
 #undef TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_PPC_GFXOPT | MASK_PPC_GPOPT \
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT \
| OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower4"
 #endif
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 4c99afc761ae..0d13645040ff 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -382,11 +382,11 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
 
  3. If either of the above two conditions apply except that the
TARGET_DEFAULT macro is defined to equal zero, and
TARGET_POWERPC64 and
a) BYTES_BIG_ENDIAN and the flag to be enabled is either
-  MASK_PPC_GFXOPT or MASK_POWERPC64 (flags for "powerpc64"
+  OPTION_MASK_PPC_GFXOPT or MASK_POWERPC64 (flags for "powerpc64"
   

[PATCH, RS6000 3/5] Rework the RS6000_BTM defines, continued.

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH, RS6000 3/5] Rework the RS6000_BTM defines, continued.

The RS6000_BTM_ definitions are mostly unused after
the rs6000 builtin code was reworked.   This cleans
up the remaining RS6000_BTM_ references by replacing
them with their OPTION_MASK_ equivalents.

This patch removes the defines
RS6000_BTM_MODULO, RS6000_BTM_ALTIVEC, RS6000_BTM_CMPB,
RS6000_BTM_VSX, RS6000_BTM_P8_VECTOR, RS6000_BTM_P9_VECTOR,
RS6000_BTM_P9_MISC, RS6000_BTM_CRYPTO, RS6000_BTM_HTM,
RS6000_BTM_FRE.

gcc/
* config/rs6000/rs6000.cc (RS6000_BTM_ALTIVEC, RS6000_BTM_CMPB,
RS6000_BTM_VSX, RS6000_BTM_FRE, RS6000_BTM_P8_VECTOR,
RS6000_BTM_P9_VECTOR, RS6000_BTM_P9_MISC, RS6000_BTM_MODULO,
RS6000_BTM_CRYPTO, RS6000_BTM_HTM): Replace with OPTION_MASK_ALTIVEC,
OPTION_MASK_CMPB, OPTION_MASK_VSX, OPTION_MASK_POPCNTB,
OPTION_MASK_P8_VECTOR, OPTION_MASK_P9_VECTOR, OPTION_MASK_P9_MISC,
OPTION_MASK_MODULO, OPTION_MASK_CRYPTO, OPTION_MASK_HTM.
* config/rs6000/rs6000.h (RS6000_BTM_MODULO, RS6000_BTM_ALTIVEC,
RS6000_BTM_CMPB, RS6000_BTM_VSX, RS6000_BTM_P8_VECTOR,
RS6000_BTM_P9_VECTOR, RS6000_BTM_P9_MISC, RS6000_BTM_CRYPTO,
RS6000_BTM_HTM, RS6000_BTM_FRE): Remove.

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 253110910bfa..6b7a6db9a445 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3377,27 +3377,27 @@ darwin_rs6000_override_options (void)
bits, and some options are no longer in target_flags.  */
 
 HOST_WIDE_INT
 rs6000_builtin_mask_calculate (void)
 {
-  return (((TARGET_ALTIVEC)? RS6000_BTM_ALTIVEC   : 0)
- | ((TARGET_CMPB)  ? RS6000_BTM_CMPB  : 0)
- | ((TARGET_VSX)   ? RS6000_BTM_VSX   : 0)
- | ((TARGET_FRE)   ? RS6000_BTM_FRE   : 0)
+  return (((TARGET_ALTIVEC)? OPTION_MASK_ALTIVEC: 0)
+ | ((TARGET_CMPB)  ? OPTION_MASK_CMPB   : 0)
+ | ((TARGET_VSX)   ? OPTION_MASK_VSX: 0)
+ | ((TARGET_FRE)   ? OPTION_MASK_POPCNTB: 0)
  | ((TARGET_FRES)  ? OPTION_MASK_PPC_GFXOPT : 0)
  | ((TARGET_FRSQRTE)   ? OPTION_MASK_PPC_GFXOPT : 0)
  | ((TARGET_FRSQRTES)  ? OPTION_MASK_POPCNTB: 0)
  | ((TARGET_POPCNTD)   ? OPTION_MASK_POPCNTD: 0)
  | ((rs6000_cpu == PROCESSOR_CELL) ? OPTION_MASK_FPRND  : 0)
- | ((TARGET_P8_VECTOR) ? RS6000_BTM_P8_VECTOR : 0)
- | ((TARGET_P9_VECTOR) ? RS6000_BTM_P9_VECTOR : 0)
- | ((TARGET_P9_MISC)   ? RS6000_BTM_P9_MISC   : 0)
- | ((TARGET_MODULO)? RS6000_BTM_MODULO: 0)
+ | ((TARGET_P8_VECTOR) ? OPTION_MASK_P8_VECTOR  : 0)
+ | ((TARGET_P9_VECTOR) ? OPTION_MASK_P9_VECTOR  : 0)
+ | ((TARGET_P9_MISC)   ? OPTION_MASK_P9_MISC: 0)
+ | ((TARGET_MODULO)? OPTION_MASK_MODULO : 0)
  | ((TARGET_64BIT) ? MASK_64BIT : 0)
  | ((TARGET_POWERPC64) ? MASK_POWERPC64 : 0)
- | ((TARGET_CRYPTO)? RS6000_BTM_CRYPTO: 0)
- | ((TARGET_HTM)   ? RS6000_BTM_HTM   : 0)
+ | ((TARGET_CRYPTO)? OPTION_MASK_CRYPTO : 0)
+ | ((TARGET_HTM)   ? OPTION_MASK_HTM: 0)
  | ((TARGET_DFP)   ? OPTION_MASK_DFP: 0)
  | ((TARGET_HARD_FLOAT)? OPTION_MASK_SOFT_FLOAT : 0)
  | ((TARGET_LONG_DOUBLE_128
  && TARGET_HARD_FLOAT
  && !TARGET_IEEEQUAD)  ? OPTION_MASK_MULTIPLE   : 0)
@@ -24044,23 +24044,23 @@ static struct rs6000_opt_mask const 
rs6000_opt_masks[] =
 };
 
 /* Builtin mask mapping for printing the flags.  */
 static struct rs6000_opt_mask const rs6000_builtin_mask_names[] =
 {
-  { "altivec",  RS6000_BTM_ALTIVEC,false, false },
-  { "vsx",  RS6000_BTM_VSX,false, false },
-  { "fre",  RS6000_BTM_FRE,false, false },
+  { "altivec",  OPTION_MASK_ALTIVEC,   false, false },
+  { "vsx",  OPTION_MASK_VSX,   false, false },
+  { "fre",  OPTION_MASK_POPCNTB,   false, false },
   { "fres", OPTION_MASK_PPC_GFXOPT, false, false },
   { "frsqrte",  OPTION_MASK_PPC_GFXOPT, false, false },
   { "frsqrtes", OPTION_MASK_POPCNTB,   false, false },
   { "popcntd",  OPTION_MASK_POPCNTD,   false, false },
   { "cell", OPTION_MASK_FPRND, false, false },
-  { "power8-vector",RS6000_BTM_P8_VECTOR,  false, false },
-  { "power9-vector",RS6000_BTM_P9_VECTOR,  false, false },
-  { "power9-misc",  RS6000_BTM_P9_MISC,false, false },
-  { "crypto",  

[PATCH,RS6000 2/5] Rework the RS6000_BTM defines.

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH,RS6000 2/5) Rework the RS6000_BTM defines.

The RS6000_BTM_ definitions are mostly unused after the rs6000
builtin code was reworked.  The remaining references can be replaced
with the OPTION_MASK_ and MASK_ equivalents.

This patch remvoes the defines:
RS6000_BTM_FRES, RS6000_BTM_FRSQRTE, RS6000_BTM_FRSQRTES,
RS6000_BTM_POPCNTD, RS6000_BTM_CELL, RS6000_BTM_DFP,
RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128, RS6000_BTM_64BIT,
RS6000_BTM_POWERPC64, RS6000_BTM_FLOAT128, RS6000_BTM_FLOAT128_HW
RS6000_BTM_MMA, RS6000_BTM_P10.

I note that the BTM -> OPTION_MASK mappings are not always 1-to-1.
in particular the BTM_FRES and BTM_FRSQRTE values were both mapped to
OPTION_MASK_PPC_GFXOPT, while the BTM_FRE and BTM_FRSQRTES both mapped
to OPTION_MASK_POPCNTB.  In total I spent quite a bit of time
double-checking these since it looked like copy/paste errors.  I split
some of these changes out into a subsequent patch to limit the amount
of potential confusion in any particular patch.

gcc/
* config/rs6000/rs6000-c.cc: Update comments.
* config/rs6000/rs6000.cc (RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
RS6000_BTM_FRSQRTES, RS6000_BTM_POPCNTD, RS6000_BTM_CELL,
RS6000_BTM_64BIT, RS6000_BTM_POWERPC64, RS6000_BTM_DFP,
RS6000_BTM_HARD_FLOAT,RS6000_BTM_LDBL128, RS6000_BTM_FLOAT128,
RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA, RS6000_BTM_P10): Replace
with OPTION_MASK_PPC_GFXOPT, OPTION_MASK_PPC_GFXOPT,
OPTION_MASK_POPCNTB, OPTION_MASK_POPCNTD,
OPTION_MASK_FPRND, MASK_64BIT, MASK_POWERPC64,
OPTION_MASK_DFP, OPTION_MASK_SOFT_FLOAT, OPTION_MASK_MULTIPLE,
OPTION_MASK_FLOAT128_KEYWORD, OPTION_MASK_FLOAT128_HW,
OPTION_MASK_MMA, OPTION_MASK_POWER10.
* config/rs6000/rs6000.h (RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
RS6000_BTM_FRSQRTES, RS6000_BTM_POPCNTD, RS6000_BTM_CELL,
RS6000_BTM_DFP, RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128,
RS6000_BTM_64BIT, RS6000_BTM_POWERPC64, RS6000_BTM_FLOAT128,
RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA, RS6000_BTM_P10): Delete.

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 9c8cbd7a66e4..4c99afc761ae 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -594,13 +594,13 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags,
  via the target attribute/pragma.  */
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
 rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
 
   /* options from the builtin masks.  */
-  /* Note that RS6000_BTM_CELL is enabled only if (rs6000_cpu ==
- PROCESSOR_CELL) (e.g. -mcpu=cell).  */
-  if ((bu_mask & RS6000_BTM_CELL) != 0)
+  /* Note that OPTION_MASK_FPRND is enabled only if
+ (rs6000_cpu == PROCESSOR_CELL) (e.g. -mcpu=cell).  */
+  if ((bu_mask & OPTION_MASK_FPRND) != 0)
 rs6000_define_or_undefine_macro (define_p, "__PPU__");
 
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
 rs6000_define_or_undefine_macro (define_p, "__MMA__");
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d4defc855d02..253110910bfa 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3381,32 +3381,32 @@ rs6000_builtin_mask_calculate (void)
 {
   return (((TARGET_ALTIVEC)? RS6000_BTM_ALTIVEC   : 0)
  | ((TARGET_CMPB)  ? RS6000_BTM_CMPB  : 0)
  | ((TARGET_VSX)   ? RS6000_BTM_VSX   : 0)
  | ((TARGET_FRE)   ? RS6000_BTM_FRE   : 0)
- | ((TARGET_FRES)  ? RS6000_BTM_FRES  : 0)
- | ((TARGET_FRSQRTE)   ? RS6000_BTM_FRSQRTE   : 0)
- | ((TARGET_FRSQRTES)  ? RS6000_BTM_FRSQRTES  : 0)
- | ((TARGET_POPCNTD)   ? RS6000_BTM_POPCNTD   : 0)
- | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL  : 0)
+ | ((TARGET_FRES)  ? OPTION_MASK_PPC_GFXOPT : 0)
+ | ((TARGET_FRSQRTE)   ? OPTION_MASK_PPC_GFXOPT : 0)
+ | ((TARGET_FRSQRTES)  ? OPTION_MASK_POPCNTB: 0)
+ | ((TARGET_POPCNTD)   ? OPTION_MASK_POPCNTD: 0)
+ | ((rs6000_cpu == PROCESSOR_CELL) ? OPTION_MASK_FPRND  : 0)
  | ((TARGET_P8_VECTOR) ? RS6000_BTM_P8_VECTOR : 0)
  | ((TARGET_P9_VECTOR) ? RS6000_BTM_P9_VECTOR : 0)
  | ((TARGET_P9_MISC)   ? RS6000_BTM_P9_MISC   : 0)
  | ((TARGET_MODULO)? RS6000_BTM_MODULO: 0)
- | ((TARGET_64BIT) ? RS6000_BTM_64BIT : 0)
- | ((TARGET_POWERPC64) ? RS6000_BTM_POWERPC64 : 0)
+ | ((TARGET_64BIT) ? MASK_64BIT : 0)
+ | ((TARGET_POWERPC64) ? MASK_POWERPC64 : 0)
  | ((TARGET_CRYPTO)? 

[PATCH,RS6000 1/5] Clean-up MASK_ and RS6000_BTM_ definitions.

2022-06-06 Thread will schmidt via Gcc-patches
[PATCH,RS6000 1/5] Clean-up MASK_ and RS6000_BTM_ definitions.

Hi,

This patch removes the defines that are no longer used, and
updates the comment for the set of MASK_ defines.

This patch removes the defines for
MASK_REGNAMES, MASK_PROTOTYPE, RS6000_BTM_ALWAYS, RS6000_BTM_COMMON.

gcc/
* config/rs6000/rs6000.c (RS6000_BTM_COMMON, RS6000_BTM_ALWAYS,
MASK_REGNAMES, OPTION_MASK_REGNAMES, MASK_PROTOTYPE,
OPTION_MASK_PROTOTYPE, MASK_UPDATE, OPTION_MASK_UPDATE): Remove.

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3b8941a86584..2ff17a16e43c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -503,12 +503,13 @@ extern int rs6000_vector_align[];
answers if the arguments are not in the normal range.  */
 #define TARGET_MINMAX  (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT \
 && (TARGET_P9_MINMAX || !flag_trapping_math))
 
 /* In switching from using target_flags to using rs6000_isa_flags, the options
-   machinery creates OPTION_MASK_ instead of MASK_.  For now map
-   OPTION_MASK_ back into MASK_.  */
+   machinery creates OPTION_MASK_ instead of MASK_.  The MASK_
+   options that have not yet been replaced by their OPTION_MASK_
+   equivalents are defined here.  */
 #define MASK_ALTIVEC   OPTION_MASK_ALTIVEC
 #define MASK_CMPB  OPTION_MASK_CMPB
 #define MASK_CRYPTOOPTION_MASK_CRYPTO
 #define MASK_DFP   OPTION_MASK_DFP
 #define MASK_DIRECT_MOVE   OPTION_MASK_DIRECT_MOVE
@@ -534,11 +535,10 @@ extern int rs6000_vector_align[];
 #define MASK_PPC_GFXOPTOPTION_MASK_PPC_GFXOPT
 #define MASK_PPC_GPOPT OPTION_MASK_PPC_GPOPT
 #define MASK_RECIP_PRECISION   OPTION_MASK_RECIP_PRECISION
 #define MASK_SOFT_FLOATOPTION_MASK_SOFT_FLOAT
 #define MASK_STRICT_ALIGN  OPTION_MASK_STRICT_ALIGN
-#define MASK_UPDATEOPTION_MASK_UPDATE
 #define MASK_VSX   OPTION_MASK_VSX
 #define MASK_POWER10   OPTION_MASK_POWER10
 #define MASK_P10_FUSIONOPTION_MASK_P10_FUSION
 
 #ifndef IN_LIBGCC2
@@ -551,18 +551,10 @@ extern int rs6000_vector_align[];
 
 #ifdef TARGET_LITTLE_ENDIAN
 #define MASK_LITTLE_ENDIAN OPTION_MASK_LITTLE_ENDIAN
 #endif
 
-#ifdef TARGET_REGNAMES
-#define MASK_REGNAMES  OPTION_MASK_REGNAMES
-#endif
-
-#ifdef TARGET_PROTOTYPE
-#define MASK_PROTOTYPE OPTION_MASK_PROTOTYPE
-#endif
-
 #ifdef TARGET_MODULO
 #define RS6000_BTM_MODULO  OPTION_MASK_MODULO
 #endif
 
 
@@ -2250,11 +2242,10 @@ extern int frame_pointer_needed;
 
 
 /* Builtin targets.  For now, we reuse the masks for those options that are in
target flags, and pick a random bit for ldbl128, which isn't in
target_flags.  */
-#define RS6000_BTM_ALWAYS  0   /* Always enabled.  */
 #define RS6000_BTM_ALTIVEC MASK_ALTIVEC/* VMX/altivec vectors.  */
 #define RS6000_BTM_CMPBMASK_CMPB   /* ISA 2.05: compare 
bytes.  */
 #define RS6000_BTM_VSX MASK_VSX/* VSX (vector/scalar).  */
 #define RS6000_BTM_P8_VECTOR   MASK_P8_VECTOR  /* ISA 2.07 vector.  */
 #define RS6000_BTM_P9_VECTOR   MASK_P9_VECTOR  /* ISA 3.0 vector.  */
@@ -2275,32 +2266,10 @@ extern int frame_pointer_needed;
 #define RS6000_BTM_FLOAT128MASK_FLOAT128_KEYWORD /* IEEE 128-bit float.  */
 #define RS6000_BTM_FLOAT128_HW MASK_FLOAT128_HW /* IEEE 128-bit float h/w.  */
 #define RS6000_BTM_MMA MASK_MMA/* ISA 3.1 MMA.  */
 #define RS6000_BTM_P10 MASK_POWER10
 
-#define RS6000_BTM_COMMON  (RS6000_BTM_ALTIVEC \
-| RS6000_BTM_VSX   \
-| RS6000_BTM_P8_VECTOR \
-| RS6000_BTM_P9_VECTOR \
-| RS6000_BTM_P9_MISC   \
-| RS6000_BTM_MODULO\
-| RS6000_BTM_CRYPTO\
-| RS6000_BTM_FRE   \
-| RS6000_BTM_FRES  \
-| RS6000_BTM_FRSQRTE   \
-| RS6000_BTM_FRSQRTES  \
-| RS6000_BTM_HTM   \
-| RS6000_BTM_POPCNTD   \
-| RS6000_BTM_CELL  \
-| RS6000_BTM_DFP   \
-| RS6000_BTM_HARD_FLOAT\
-| RS6000_BTM_LDBL128   \
-   

[PATCH,RS6000 0/5] Clean up MASK_ and RS6000_BTM_ defines

2022-06-06 Thread will schmidt via Gcc-patches
Hi,
  This series cleans up the assorted MASK_, OPTION_MASK_,
and RS6000_BTM_ defines that we have sprinkled through the
rs6000 target code.

The MASK_ entries are currently defined as their OPTION_MASK_
equivalents since their introduction when the rs6000_isa_flags was
added via commit 4d9675496a28ef6184f2a9c3ac5e6e3ea63606c1 .
This series replaces references to the MASK_ entries with their
OPTION_MASK equivalents as much as possible.

The RS6000_BTM_ defines are mostly unused since the built-in rewrites
from late 2021 and early 2022, and the remaining usage is
straightforward to replace with OPTION_MASK_ values.

The OPTION_MASK_ definitions themselves remain.

Due to size and to keep some of these changes clean I have split this
into several parts.

After this series there are a few remaining MASK_ entries
(MASK_POWERPC64, MASK_64BIT and MASK_LITTLE_ENDIAN) which are
conditionally defined, and potentially more invasive to resolve.
Those are deliberately not addressed as part of this series.

This has cleanly regtested (no functional change).  When approved
this series will be committed as a group, though it should be
bisectable.

OK for trunk?

1/5: Remove unused defines and touch up comments.
2/5: Rework RS6000_BTM_foo defines, part 1.
3/5: Rework RS6000_BTM_foo defines, part 2.
4/5: Rework MASK_foo defines, part 1.
5/5. Rework MASK_foo defines, part 2.



Re: [PATCH, rs6000] Clean up the option_mask defines (part 1)

2022-05-26 Thread will schmidt via Gcc-patches
On Thu, 2022-05-26 at 13:31 -0500, Segher Boessenkool wrote:
> > > 



> On Thu, May 26, 2022 at 09:40:18AM -0500, will schmidt wrote:
> > On Thu, 2022-05-26 at 05:47 -0500, Segher Boessenkool wrote:
> 
> > I'll dig a bit more, but would handle that in a separate
> > patch.
> 
> Can you please make a new patch series that just does everything?  This
> is so much easier to handle for everyone, even you yourself :-)

Yes, willdo.  Thanks


-Will


> 
> First some small preparatory patches; then the long *boring* patches
> that are the meat of the matter, but are completely mechanical
> (formatting notwithstanding), so are easy to review; and then some more
> small patches to do final cleanup.
> 
> So each patch will be easy to write, write a commit message for, write a
> changelog for, and easy to review as well.  Long patches are no problem
> at all if they are completely boring!
> 
> 
> Segher



Re: [PATCH, rs6000] Clean up the option_mask defines (part 1)

2022-05-26 Thread will schmidt via Gcc-patches
On Thu, 2022-05-26 at 05:47 -0500, Segher Boessenkool wrote:
> Hi!
> 

Hi, 
Thanks Kewen and Segher for the reviews.  Additional comments below.


> On Thu, May 26, 2022 at 03:01:37PM +0800, Kewen.Lin wrote:
> > on 2022/5/26 14:12, Kewen.Lin via Gcc-patches wrote:
> > > on 2022/5/26 04:25, will schmidt via Gcc-patches wrote:
> > > > We have an assortment of MASK and OPTION_MASK #defines
> > > > throughout
> > > > the rs6000 code, MASK_ALTIVEC and OPTION_MASK_ALTIVEC as an
> > > > example.
> > > > 
> > > > We currently #define the MASK_ entries to their
> > > > OPTION_MASK_
> > > > equivalents so the two names could be used interchangeably.
> > > > 
> > > > The mapping is in place from when we switched from using
> > > > target_flags to rs6000_isa_flags via
> > > > commit 4d9675496a28ef6184f2a9c3ac5e6e3ea63606c1 in 2012.
> > > > 
> > > > This patch converts the references for most of the lingering
> > > > MASK_*
> > > > values to OPTION_MASK_*  and removes the now redundant defines.
> > > 
> > > Nice, thanks for the cleanup!
> 
> +1
> 
> > > I found there are still some masks left:
> > > 
> > > MASK_POWERPC64, MASK_64BIT and MASK_LITTLE_ENDIAN.
> > > 
> > > Is there one part 4 for them?  Or is there some particular reason
> > > not to clean up them?
> > 
> > aha, I see.  Those three are conditional definitions, I agree it's
> > better
> > to leave them alone. :)
> 
> It is much better to untangle this mess, and fix it :-)  But that is
> (potentially) a bigger job, of course, so let's not balloon this
> patch.

Right.  I have looked briefly at those, and was not convinced those
three would be trivial to rework.  In the interest if incremental
progress I didn't address those in this set.   :-)   If anything I'll
address those in a later patch, whether could be part4 but more likely
a different patchset.

> 
> > > > -{ "970", "ppc970", MASK_PPC_GPOPT | MASK_MFCRF |
> > > > MASK_POWERPC64 },
> > > > +{ "970", "ppc970", OPTION_MASK_PPC_GPOPT |
> > > > OPTION_MASK_MFCRF | MASK_POWERPC64 },
> > 
> > Nit: This line is too long.
> 
Yup, I missed that one. :-)

> Yeah, the longer names are a bit annoying in any case.  We'll get
> used
> to it (if those long lines are fixed ;-) )

Agree.  I would not be opposed to somewhat shorter names for these, but
naming is hard, and the long names are existing and sufficient for the
moment.

> 
> > Nit: Some of these BTM lines below exceed 80 characters, a few
> > already existed
> > previously.
> 
> Yes, and it is easily avoidable in this case.  Most of these comments
> have no content at all, and the rest could just be on separate lines.
> 
> But, are those builtin masks still used at all?  Can't we just use
> the
> option masks where they still are?  The builtins do not use them
> anymore :-)

They are still referenced in rs6000_builtin_mask_calculate() function,
which is used to assign a value to rs6000_builtin_mask, which is still
in use.  I had not yet dug deeper there, but agree it appears that is
only used to print the current options, so could probably be safely
eliminated.  I'll dig a bit more, but would handle that in a separate
patch.

Thanks
-Will


> 
> 
> Segher



[PATCH, rs6000] Clean up the option_mask defines (part 3)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines (part 3)

Hi,

Per code review, the MASK_REGNAMES, OPTION_MASK_REGNAMES,
MASK_PROTOTYPE, OPTION_MASK_PROTOTYPE options are not used
elsewhere in the codebase.  Thus it should be safe to remove them.

This includes an update to a nearby comment to hint that most
of the MASK_ options have now been replaced with their
OPTION_MASK_ equivalents.

Regtested OK on power10.  OK for trunk?

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index dcf632c1f1ad..fe77a343d2e1 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -503,12 +503,13 @@ extern int rs6000_vector_align[];
answers if the arguments are not in the normal range.  */
 #define TARGET_MINMAX  (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT \
 && (TARGET_P9_MINMAX || !flag_trapping_math))
 
 /* In switching from using target_flags to using rs6000_isa_flags, the options
-   machinery creates OPTION_MASK_ instead of MASK_.  For now map
-   OPTION_MASK_ back into MASK_.  */
+   machinery creates OPTION_MASK_ instead of MASK_.  The MASK_
+   options that have not yet been replaced by their OPTION_MASK_
+   equivalents are defined here.  */
 #define MASK_STRICT_ALIGN  OPTION_MASK_STRICT_ALIGN
 
 #ifndef IN_LIBGCC2
 #define MASK_POWERPC64 OPTION_MASK_POWERPC64
 #endif
@@ -519,18 +520,10 @@ extern int rs6000_vector_align[];
 
 #ifdef TARGET_LITTLE_ENDIAN
 #define MASK_LITTLE_ENDIAN OPTION_MASK_LITTLE_ENDIAN
 #endif
 
-#ifdef TARGET_REGNAMES
-#define MASK_REGNAMES  OPTION_MASK_REGNAMES
-#endif
-
-#ifdef TARGET_PROTOTYPE
-#define MASK_PROTOTYPE OPTION_MASK_PROTOTYPE
-#endif
-
 #ifdef TARGET_MODULO
 #define RS6000_BTM_MODULO  OPTION_MASK_MODULO
 #endif
 
 



Re: [PATCH, rs6000] Clean up the option_mask defines (part 2)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines (part 2)

Hi,
This patch reworks most of the lingering MASK_*
values to OPTION_MASK_* and removes the now redundant defines.

Regtested OK on power10.  OK for trunk?

gcc/
* rs6000.h (RS6000_BTM_VSX, RS6000_BTM_P8_VECTOR, RS6000_BTM_P9_VECTOR,
RS6000_BTM_P9_MISC, RS6000_BTM_HTM, RS6000_BTM_POPCNTD,
RS6000_BTM_DFP, RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128,
RS6000_BTM_FLOAT128, RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA,
RS6000_BTM_P10): Rework defines to use OPTION_MASK_.
(MASK_DFP, MASK_DIRECT_MOVE, MASK_FLOAT128_KEYWORD,
MASK_FLOAT128_HW, MASK_P8_FUSION, MASK_HARD_FLOAT, MASK_HTM,
MASK_MMA, MASK_MULTIPLE, MASK_NO_UPDATE, MASK_P8_VECTOR,
MASK_P9_VECTOR, MASK_P9_MISC, MASK_POPCNTD, MASK_RECIP_PRECISION,
MASK_SOFT_FLOAT, MASK_UPDATE, MASK_VSX, MASK_POWER10,
MASK_P10_FUSION): Remove unused defines.
* config/rs6000/rs6000-cpus.def (RS6000_CPU): Rework macro calls to
use OPTION_MASK_ defines.
* config/rs6000/darwin.h (TARGET_DEFAULT) Update define to use
OPTION_MASK_MULTIPLE.
* config/rs6000/darwin64-biarch.h (TARGET_DEFAULT): Same.

diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index 86556ccbbf58..6a8845eb3bb7 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index 6a700c61c4c2..6515bcc8bf5a 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
+   | OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index ca78bd8cf89f..4301b1bcb120 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -174,29 +174,31 @@
 
RS6000_CPU (NAME, CPU, FLAGS)
 
where the arguments are the fields of struct rs6000_ptt.  */
 
-RS6000_CPU ("401", PROCESSOR_PPC403, MASK_SOFT_FLOAT)
-RS6000_CPU ("403", PROCESSOR_PPC403, MASK_SOFT_FLOAT | MASK_STRICT_ALIGN)
-RS6000_CPU ("405", PROCESSOR_PPC405, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
-   | OPTION_MASK_DLMZB)
+RS6000_CPU ("401", PROCESSOR_PPC403, OPTION_MASK_SOFT_FLOAT)
+RS6000_CPU ("403", PROCESSOR_PPC403, OPTION_MASK_SOFT_FLOAT | 
MASK_STRICT_ALIGN)
+RS6000_CPU ("405", PROCESSOR_PPC405, OPTION_MASK_SOFT_FLOAT
+   | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
 RS6000_CPU ("405fp", PROCESSOR_PPC405, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("440", PROCESSOR_PPC440, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
+RS6000_CPU ("440", PROCESSOR_PPC440, OPTION_MASK_SOFT_FLOAT | OPTION_MASK_MULHW
| OPTION_MASK_DLMZB)
 RS6000_CPU ("440fp", PROCESSOR_PPC440, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("464", PROCESSOR_PPC440, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
+RS6000_CPU ("464", PROCESSOR_PPC440, OPTION_MASK_SOFT_FLOAT | OPTION_MASK_MULHW
| OPTION_MASK_DLMZB)
 RS6000_CPU ("464fp", PROCESSOR_PPC440, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("476", PROCESSOR_PPC476, MASK_SOFT_FLOAT | OPTION_MASK_PPC_GFXOPT
-   | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND
-   | OPTION_MASK_CMPB | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("476fp", PROCESSOR_PPC476, OPTION_MASK_PPC_GFXOPT
-   | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND
-   | OPTION_MASK_CMPB | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
+RS6000_CPU ("476", PROCESSOR_PPC476,
+   OPTION_MASK_SOFT_FLOAT | OPTION_MASK_PPC_GFXOPT | OPTION_MASK_MFCRF
+   | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND | OPTION_MASK_CMPB
+   | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
+RS6000_CPU ("476fp", PROCESSOR_PPC476,
+   OPTION_MASK_PPC_GFXOPT | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB
+   | OPTION_MASK_FPRND | OPTION_MASK_CMPB | OPTION_MASK_MULHW
+   | OPTION_MASK_DLMZB)
 RS6000_CPU ("505", PROCESSOR_MPCCORE, 0)
-RS6000_CPU ("601", PROCESSOR_PPC601, MASK_MULTIPLE)

[PATCH, rs6000] Clean up the option_mask defines (part 1)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines

Hi,

We have an assortment of MASK and OPTION_MASK #defines throughout
the rs6000 code, MASK_ALTIVEC and OPTION_MASK_ALTIVEC as an example.

We currently #define the MASK_ entries to their OPTION_MASK_
equivalents so the two names could be used interchangeably.

The mapping is in place from when we switched from using
target_flags to rs6000_isa_flags via
commit 4d9675496a28ef6184f2a9c3ac5e6e3ea63606c1 in 2012.

This patch converts the references for most of the lingering MASK_*
values to OPTION_MASK_*  and removes the now redundant defines.

I have split this into multiple parts due to size.

Regtested OK on power10.  OK for trunk?

gcc/
* rs6000.h (MASK_ALTIVEC, MASK_CMPB, MASK_CRYPTO
MASK_DLMZB, MASK_EABI, MASK_FPRND, MASK_ISEL
MASK_MFCRF, MASK_MULHW, MASK_POPCNTB, MASK_PPC_GFXOPT
MASK_PPC_GPOPT):  Remove defines.
(RS6000_BTM_ALTIVEC, RS6000_BTM_CMPB, RS6000_BTM_CRYPTO,
RS6000_BTM_FRE, RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
RS6000_BTM_FRSQRTES, RS6000_BTM_CELL) : Redefine using
OPTION_MASK_ instead of MASK_.
* rs6000-cpus.def (RS6000_CPU) Update macro calls to use
OPTION_MASK_ instead of MASK_.
* rs6000.cc (rs6000_darwin_file_start): Update mapping[] table
entries to use OPTION_MASK_PPC_GPOPT, OPTION_MASK_MFCRF,
OPTION_MASK_ALTIVEC instead of their MASK_ variants.
* rs6000-c.cc : Update comment to reference OPTION_MASK_GFXOPT.
* aix71.h (TARGET_DEFAULT): Update define to use OPTION_MASK_
instead of MASK_.
* darwin.h (TARGET_DEFAULT): Same.
* darwin64-biarch.h (TARGET_DEFAULT): Same.
* default64.h (TARGET_DEFAULT): Same.
* eabi.h (TARGET_DEFAULT): Same.
* eabialtivec.h (TARGET_DEFAULT): Same.
* linuxaltivec.h (TARGET_DEFAULT): Same.
* vxworks.h (TARGET_DEFAULT): Same.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 57e07bcc65ee..8c2ec5d36375 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -135,13 +135,15 @@ do {  
\
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
 
 #undef  TARGET_DEFAULT
 #ifdef RS6000_BI_ARCH
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF | 
MASK_POWERPC64 | MASK_64BIT)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #else
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF)
 #endif
 
 #undef  PROCESSOR_DEFAULT
 #define PROCESSOR_DEFAULT PROCESSOR_POWER7
 #undef  PROCESSOR_DEFAULT64
diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index b5cef42610f7..86556ccbbf58 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_MULTIPLE | MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index 57b0fab084e3..6a700c61c4c2 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | MASK_MULTIPLE | MASK_PPC_GFXOPT)
+   | MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index 4bf0feef2f8e..08b58c965d19 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -27,9 +27,10 @@ along with GCC; see the file COPYING3.  If not see
 #define TARGET_DEFAULT (ISA_2_7_MASKS_SERVER | MASK_POWERPC64 | MASK_64BIT | 
MASK_LITTLE_ENDIAN)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower8"
 #else
 #undef TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_PPC_GFXOPT | MASK_PPC_GPOPT | MASK_MFCRF | 
MASK_POWERPC64 | MASK_64BIT)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT \
+   | OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower4"
 #endif
diff --git a/gcc/config/rs6000/eabi.h b/gcc/config/rs6000/eabi.h
index 

Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109.

2022-05-18 Thread will schmidt via Gcc-patches
On Tue, 2022-05-17 at 23:15 -0400, Michael Meissner wrote:
> On Fri, May 13, 2022 at 01:20:30PM -0500, will schmidt wrote:
> > On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote:
> > > 
> > > 



> > > gcc/
> > >   PR target/103109
> > >   * config/rs6000/rs6000.md (su_int32): New code attribute.
> > >   (mul3): Convert from define_expand to
> > >   define_insn_and_split.
> > >   (maddld4): Add generator function.
> > 
> > -(define_insn "*maddld4"
> > +(define_insn "maddld4"
> > 
> > Is the removal of the "*" considering adding generator?  (Thats
> > terminology that I'm not immediately familiar with). 
> 
> Yes.  If you have a pattern:
> 
>   (define_insn "foosi2"
> [(set (match_operand:SI 0 "register_operand" "=r")
>   (foo:SI (match_operand:SI 1 "register_operand" "r")))]
>   ""
>   "foo %0,%1")
> 
> It creates a 'gen_foosi2' function that has 2 arguments, and it makes
> the insn
> listed.
> 
> It then has support for insn recognition and output.
> 
> If the pattern starts with a '*', there is no 'gen_foosi2' function
> created,
> but the insn recognitiion and output are still done.
> 
> In practice, you typically use the '*' names for patterns that are
> used as the
> targets of combination, or separate insns for different machines.
> 
> Here is the verbage from rtl.texi:
> 
> These names serve one of two purposes.  The first is to indicate that
> the
> instruction performs a certain standard job for the RTL-generation
> pass of the compiler, such as a move, an addition, or a conditional
> jump.  The second is to help the target generate certain target-
> specific
> operations, such as when implementing target-specific intrinsic
> functions.
> 
> It is better to prefix target-specific names with the name of the
> target, to avoid any clash with current or future standard names.
> 
> The absence of a name is indicated by writing an empty string
> where the name should go.  Nameless instruction patterns are never
> used for generating RTL code, but they may permit several simpler
> insns
> to be combined later on.
> 
> For the purpose of debugging the compiler, you may also specify a
> name beginning with the @samp{*} character.  Such a name is used only
> for identifying the instruction in RTL dumps; it is equivalent to
> having
> a nameless pattern for all other purposes.  Names beginning with the
> @samp{*} character are not required to be unique.


Thanks for the explanation.  :-)

-Will





[PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines.

2022-05-17 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Remove the (no longer used) RS6000_BTC defines.

Hi, 

These defines are no longer used once the rs6000 built-in
reworks were completed.   Would be good to remove them.

There was a reference to RS6000_BTC_SPECIAL in a TODO comment
in rs6000-builtins.def.  That comment remains, but I have updated
the comment to refer to "SPECIAL" processing, instead of having it
refer directly to the RS6000_BTC_SPECIAL macro.

2022-05-17  Will Schmidt  

gcc/
* config/rs6000/rs6000-builtins.def: rephrase
RS6000_BTC_SPECIAL in comment.
* config/rs6000/rs6000.h:  Remove definitions
RS6000_BTC_UNARY, RS6000_BTC_BINARY,
RS6000_BTC_TERNARY, RS6000_BTC_QUATERNARY,
RS6000_BTC_QUINARY, RS6000_BTC_SENARY, RS6000_BTC_OPND_MASK,
RS6000_BTC_SPECIAL, RS6000_BTC_PREDICATE, RS6000_BTC_ABS,
RS6000_BTC_DST, RS6000_BTC_TYPE_MASK, RS6000_BTC_MISC,
RS6000_BTC_CONST, RS6000_BTC_PURE, RS6000_BTC_FP,
RS6000_BTC_QUAD, RS6000_BTC_PAIR, RS6000_BTC_QUADPAIR,
RS6000_BTC_ATTR_MASK, RS6000_BTC_SPR, RS6000_BTC_VOID,
RS6000_BTC_CR, RS6000_BTC_OVERLOADED, RS6000_BTC_GIMPLE,
RS6000_BTC_MISC_MASK, RS6000_BTC_MEM, RS6000_BTC_SAT,
RS6000_BTM_ALWAYS


diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f4a9f24bcc5c..9a63a9eda580 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1423,11 +1423,11 @@
 
   pure vsc __builtin_vsx_ld_elemrev_v16qi (signed long, const void *);
 LD_ELEMREV_V16QI vsx_ld_elemrev_v16qi {ldvec,endian}
 
 ; TODO: There is apparent intent in rs6000-builtin.def to have
-; RS6000_BTC_SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are
+; SPECIAL processing for LXSDX, LXVDSX, and STXSDX, but there are
 ; no def_builtin calls for any of them.  At some point, we may want to add a
 ; set of built-ins for whichever vector types make sense for these.
 
   pure vsq __builtin_vsx_lxvd2x_v1ti (signed long, const void *);
 LXVD2X_V1TI vsx_load_v1ti {ldvec}
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 523256a5c9d5..90a357ab7932 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2247,58 +2247,10 @@ extern char rs6000_reg_names[][8];  /* register 
names (0 vs. %r0).  */
 /* #define  MACHINE_no_sched_speculative_load */
 
 /* General flags.  */
 extern int frame_pointer_needed;
 
-/* Classification of the builtin functions as to which switches enable the
-   builtin, and what attributes it should have.  We used to use the target
-   flags macros, but we've run out of bits, so we now map the options into new
-   settings used here.  */
-
-/* Builtin operand count.  */
-#define RS6000_BTC_UNARY   0x0001  /* normal unary function.  */
-#define RS6000_BTC_BINARY  0x0002  /* normal binary function.  */
-#define RS6000_BTC_TERNARY 0x0003  /* normal ternary function.  */
-#define RS6000_BTC_QUATERNARY  0x0004  /* normal quaternary
-  function. */
-#define RS6000_BTC_QUINARY 0x0005  /* normal quinary function.  */
-#define RS6000_BTC_SENARY  0x0006  /* normal senary function.  */
-#define RS6000_BTC_OPND_MASK   0x0007  /* Mask to isolate operands. */
-
-/* Builtin attributes.  */
-#define RS6000_BTC_SPECIAL 0x  /* Special function.  */
-#define RS6000_BTC_PREDICATE   0x0008  /* predicate function.  */
-#define RS6000_BTC_ABS 0x0010  /* Altivec/VSX ABS
-  function.  */
-#define RS6000_BTC_DST 0x0020  /* Altivec DST function.  */
-
-#define RS6000_BTC_TYPE_MASK   0x003f  /* Mask to isolate types */
-
-#define RS6000_BTC_MISC0x  /* No special 
attributes.  */
-#define RS6000_BTC_CONST   0x0100  /* Neither uses, nor
-  modifies global state.  */
-#define RS6000_BTC_PURE0x0200  /* reads global
-  state/mem and does
-  not modify global state.  */
-#define RS6000_BTC_FP  0x0400  /* depends on rounding mode.  */
-#define RS6000_BTC_QUAD0x0800  /* Uses a register 
quad.  */
-#define RS6000_BTC_PAIR0x1000  /* Uses a register 
pair.  */
-#define RS6000_BTC_QUADPAIR0x1800  /* Uses a quad and a pair.  */
-#define RS6000_BTC_ATTR_MASK   0x1f00  /* Mask of the attributes.  */
-
-/* Miscellaneous information.  */
-#define RS6000_BTC_SPR 0x0100  /* function references SPRs.  */
-#define RS6000_BTC_VOID0x0200  /* function has no 
return value.  */
-#define RS6000_BTC_CR  0x0400  /* function references a CR.  

Re: [PATCH] Generate vadduqm and vsubuqm for TImode add/subtract

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 12:19 -0400, Michael Meissner wrote:
> Generate vadduqm and vsubuqm for TImode add/subtract
> 
> If the TImode variable is in an Altivec register instead of a GPR
> register, then generate vadduqm and vsubuqm instead of having to move the
> value to the GPR registers and doing the add and subtract with carry
> instructions.  To do this, we have to delay the splitting of the addition
> and subtraction until after register allocation.

Ok.


> 
> I have built this patch on little endian power10, little endian power9, and 
> big
> endian power8 systems.  There were no regressions.  Can I install this patch 
> to
> the GCC 13 master branch?
> 
> 2022-05-13   Michael Meissner  
> 
> gcc/
>   * config/rs6000/rs6000.md (addti3): Generate vadduqm if we are
>   using the Altivec registers.
>   (subti3): Generate vsubuqm if we using the Altivec registers.
>   (negti3): New insn.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vadduqm-vsubuqm.c: New test.
> ---
>  gcc/config/rs6000/rs6000.md   | 82 ++-
>  .../gcc.target/powerpc/vadduqm-vsubuqm.c  | 22 +
>  2 files changed, 83 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vadduqm-vsubuqm.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 83eacec57ba..f120ca0b48d 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7139,15 +7139,22 @@ (define_expand "feraiseexceptsi"
>  ;;
>  ;; Addti3/subti3 are define_insn_and_splits instead of define_expand, to 
> allow
>  ;; for combine to make things like multiply and add with extend operations.
> +;;
> +;; Also add support in case the 128-bit integer happens to be an Altivec
> +;; register.
> 
>  (define_insn_and_split "addti3"
> -  [(set (match_operand:TI 0 "gpc_reg_operand"   "=,r,r")
> - (plus:TI (match_operand:TI 1 "gpc_reg_operand"   "r, 0,r")
> -  (match_operand:TI 2 "reg_or_short_operand"  "rI,r,0")))
> +  [(set (match_operand:TI 0 "gpc_reg_operand"  "=, r,r,v")
> + (plus:TI (match_operand:TI 1 "gpc_reg_operand"   "r, 0,r,v")
> +  (match_operand:TI 2 "reg_or_short_operand"  "rI,r,0,v")))

Nit..  I still can't tell of the "r, 0,r,v" should be comma-space, or
comma delimited.

Remainder looks OK.  
thanks
-Will



> (clobber (reg:DI CA_REGNO))]
>"TARGET_64BIT"
> -  "#"
> -  "&& 1"
> +  "@
> +   #
> +   #
> +   #
> +   vadduqm %0,%1,%2"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
>[(pc)]
>  {
>rtx lo0 = gen_lowpart (DImode, operands[0]);
> @@ -7157,27 +7164,27 @@ (define_insn_and_split "addti3"
>rtx hi1 = gen_highpart (DImode, operands[1]);
>rtx hi2 = gen_highpart_mode (DImode, TImode, operands[2]);
> 
> -  if (!reg_or_short_operand (lo2, DImode))
> -lo2 = force_reg (DImode, lo2);
> -  if (!adde_operand (hi2, DImode))
> -hi2 = force_reg (DImode, hi2);
> -
>emit_insn (gen_adddi3_carry (lo0, lo1, lo2));
>emit_insn (gen_adddi3_carry_in (hi0, hi1, hi2));
>DONE;
>  }
> -  [(set_attr "length" "8")
> +  [(set_attr "length" "8,8,8,*")
> +   (set_attr "isa""*,*,*,p8v")
> (set_attr "type"   "add")
> (set_attr "size"   "128")])
> 
>  (define_insn_and_split "subti3"
> -  [(set (match_operand:TI 0 "gpc_reg_operand""=,r,r")
> - (minus:TI (match_operand:TI 1 "reg_or_short_operand" "rI,0,r")
> -   (match_operand:TI 2 "gpc_reg_operand"  "r, r,0")))
> +  [(set (match_operand:TI 0 "gpc_reg_operand""=, r,r,v")
> + (minus:TI (match_operand:TI 1 "reg_or_short_operand"  "rI,0,r,v")
> +   (match_operand:TI 2 "gpc_reg_operand"   "r, r,0,v")))
> (clobber (reg:DI CA_REGNO))]
>"TARGET_64BIT"
> -  "#"
> -  "&& 1"
> +  "@
> +   #
> +   #
> +   #
> +   vsubuqm %0,%1,%2"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
>[(pc)]
>  {
>rtx lo0 = gen_lowpart (DImode, operands[0]);
> @@ -7187,16 +7194,49 @@ (define_insn_and_split "subti3"
>rtx hi1 = gen_highpart_mode (DImode, TImode, operands[1]);
>rtx hi2 = gen_highpart (DImode, operands[2]);
> 
> -  if (!reg_or_short_operand (lo1, DImode))
> -lo1 = force_reg (DImode, lo1);
> -  if (!adde_operand (hi1, DImode))
> -hi1 = force_reg (DImode, hi1);
> -
>emit_insn (gen_subfdi3_carry (lo0, lo2, lo1));
>emit_insn (gen_subfdi3_carry_in (hi0, hi2, hi1));
>DONE;
> +}
> +  [(set_attr "length" "8,8,8,*")
> +   (set_attr "isa""*,*,*,p8v")
> +   (set_attr "type"   "add")
> +   (set_attr "size"   "128")])
> +
> +;; 128-bit integer negation, normally use GPRs.  If we are using Altivec
> +;; registers, create a 0 and do a vsubuqm.
> +(define_insn_and_split "negti3"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=,")
> + (neg:TI (match_operand:TI 1 "gpc_reg_operand"   "r,v")))
> +   (clobber (reg:DI CA_REGNO))]
> +  "TARGET_64BIT"
> +  "#"
> +  "&& 

Re: [PATCH] Optimize multiply/add of DImode extended to TImode, PR target/103109.

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 12:17 -0400, Michael Meissner wrote:
> Optimize multiply/add of DImode extended to TImode, PR target/103109.
> 
> On power9 and power10 systems, we have instructions that support doing
> 64-bit integers converted to 128-bit integers and producing 128-bit
> results.  This patch adds support to generate these instructions.
> 
> Previously GCC had define_expands to handle conversion of the 64-bit
> extend to 128-bit and multiply.  This patch changes these define_expands
> to define_insn_and_split and then it provides combiner patterns to
> generate thes multiply/add instructions.
> 
> To support using this optimization on power9, this patch extends the sign
> extend DImode to TImode to also run on power9 (added for PR
> target/104698).
> 
> This patch needs the previous patch to add unsigned DImode to TImode
> conversion so that the combiner can combine the extend, multiply, and add
> instructions.
> 
> I have built this patch on little endian power10, little endian power9, and 
> big
> endian power8 systems.  There were no regressions when I ran it.  Can I 
> install
> this patch into the GCC 13 master branch?
> 
> 2022-05-13   Michael Meissner  
> 
> gcc/
>   PR target/103109
>   * config/rs6000/rs6000.md (su_int32): New code attribute.
>   (mul3): Convert from define_expand to
>   define_insn_and_split.
>   (maddld4): Add generator function.

-(define_insn "*maddld4"
+(define_insn "maddld4"

Is the removal of the "*" considering adding generator?  (Thats
terminology that I'm not immediately familiar with). 




>   (mulditi3_adddi3): New insn.
>   (mulditi3_add_const): New insn.
>   (mulditi3_adddi3_upper): New insn.
> 
> gcc/testsuite/
>   PR target/103109
>   * gcc.target/powerpc/pr103109.c: New test.


ok


> ---
>  gcc/config/rs6000/rs6000.md | 128 +++-
>  gcc/testsuite/gcc.target/powerpc/pr103109.c |  62 ++
>  2 files changed, 184 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103109.c
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 2aba70393d8..83eacec57ba 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -667,6 +667,9 @@ (define_code_attr uns [(fix   "")
>  (float   "")
>  (unsigned_float  "uns")])
> 
> +(define_code_attr su_int32 [(sign_extend "s32bit_cint_operand")
> + (zero_extend "c32bit_cint_operand")])
> +
>  ; Various instructions that come in SI and DI forms.
>  ; A generic w/d attribute, for things like cmpw/cmpd.
>  (define_mode_attr wd [(QI"b")
> @@ -3190,13 +3193,16 @@ (define_insn "mulsi3_highpart_64"
>"mulhw %0,%1,%2"
>[(set_attr "type" "mul")])
> 
> -(define_expand "mul3"
> -  [(set (match_operand: 0 "gpc_reg_operand")
> +(define_insn_and_split "mul3"
> +  [(set (match_operand: 0 "gpc_reg_operand" "=")
>   (mult: (any_extend:
> - (match_operand:GPR 1 "gpc_reg_operand"))
> +(match_operand:GPR 1 "gpc_reg_operand" "r"))
> (any_extend:
> - (match_operand:GPR 2 "gpc_reg_operand"]
> +(match_operand:GPR 2 "gpc_reg_operand" "r"]
>"!(mode == SImode && TARGET_POWERPC64)"
> +  "#"
> +  "&& 1"
> +  [(pc)]
>  {
>rtx l = gen_reg_rtx (mode);
>rtx h = gen_reg_rtx (mode);
> @@ -3205,9 +3211,10 @@ (define_expand "mul3"
>emit_move_insn (gen_lowpart (mode, operands[0]), l);
>emit_move_insn (gen_highpart (mode, operands[0]), h);
>DONE;
> -})
> +}
> +  [(set_attr "length" "8")])
> 

ok


> -(define_insn "*maddld4"
> +(define_insn "maddld4"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>   (plus:GPR (mult:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
>   (match_operand:GPR 2 "gpc_reg_operand" "r"))

ok

> @@ -3216,6 +3223,115 @@ (define_insn "*maddld4"
>"maddld %0,%1,%2,%3"
>[(set_attr "type" "mul")])
> 
> +(define_insn_and_split "*mulditi3_adddi3"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=")
> + (plus:TI
> +  (mult:TI
> +   (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
> +   (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
> +  (any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r"]
> +  "TARGET_MADDLD && TARGET_POWERPC64"
> +  "#"
> +  "&& 1"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx dest_hi = gen_highpart (DImode, dest);
> +  rtx dest_lo = gen_lowpart (DImode, dest);
> +  rtx op1 = operands[1];
> +  rtx op2 = operands[2];
> +  rtx op3 = operands[3];
> +  rtx tmp_hi, tmp_lo;
> +
> +  if (can_create_pseudo_p ())
> +{
> +  tmp_hi = gen_reg_rtx (DImode);
> +  tmp_lo = gen_reg_rtx (DImode);
> +}
> +  else
> +{
> +  tmp_hi = dest_hi;
> +  tmp_lo = dest_lo;
> +}
> +
> +  emit_insn (gen_mulditi3_adddi3_upper (tmp_hi, op1, op2, 

Re: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation.

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 12:13 -0400, Michael Meissner wrote:
> Add zero_extendditi2.  Improve lxvr*x code generation.
> 


Content here matches what I commented on in the prior email with
subject "Delay splitting addti3...".  






> This pattern adds zero_extendditi2 so that if we are extending DImode
> that
> is in a GPR register to TImode in a vector register, the compiler can
> generate MTVSRDDD.
> 
> In addition the patterns for generating lxvr{b,h,w,d}x were tuned to
> allow
> loading to gpr registers.  This prevents needlessly doing direct
> moves to
> get the value into the vector registers if the gpr register was
> already
> selected.
> 
> In updating the insn counts for two tests due to these changes, I
> noticed
> the tests were done at -O0.  I changed this so that the tests are now
> done
> at the normal -O2 optimization level.
> 
> This patch will be needed for an upcoming patch for PR target/103109.
> 
> I have built this patch on little endian power10, little endian
> power9,
> and big endian power8 systems.  There were no regressions with this
> patch.  Can I install this on the GCC 13 trunk?
> 
> 2022-05-013   Michael Meissner  
> 
> gcc/
>   * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading
> to
>   GPR registers.
>   (vsx_stxvrx): Add support for storing from GPR registers.
>   (zero_extendditi2): New insn.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2
>   instead of -O0 and update insn counts.
>   * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
>   * gcc.target/powerpc/zero-extend-di-ti.c: New test.
> ---
>  gcc/config/rs6000/vsx.md  | 82
> +--
>  .../powerpc/vsx-load-element-extend-int.c | 36 
>  .../powerpc/vsx-load-element-extend-short.c   | 35 
>  .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++
>  4 files changed, 164 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-
> ti.c
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index c091e5e2f47..ad971e3a1de 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_"
>  }
>  })
> 
> -;; Load rightmost element from load_data
> -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx.
> -(define_insn "vsx_lxvrx"
> -  [(set (match_operand:TI 0 "vsx_register_operand" "=wa")
> - (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand"
> "Z")))]
> -  "TARGET_POWER10"
> -  "lxvrx %x0,%y1"
> -  [(set_attr "type" "vecload")])
> +;; Load rightmost element from load_data using lxvrbx, lxvrhx,
> lxvrwx, lxvrdx.
> +;; Support TImode being in a GPR register to prevent generating
> lvxr{d,w,b}x
> +;; and then two direct moves if we ultimately need the value in a
> GPR register.
> +(define_insn_and_split "vsx_lxvrx"
> +  [(set (match_operand:TI 0 "register_operand" "=r,wa")
> + (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand"
> "m,Z")))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "@
> +   #
> +   lxvrx %x0,%y1"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
> +  [(set (match_dup 2) (match_dup 3))
> +   (set (match_dup 4) (const_int 0))]
> +{
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +
> +  operands[2] = gen_lowpart (DImode, op0);
> +  operands[3] = (mode == DImode
> +  ? op1
> +  : gen_rtx_ZERO_EXTEND (DImode, op1));
> +
> +  operands[4] = gen_highpart (DImode, op0);
> +}
> +  [(set_attr "type" "load,vecload")
> +   (set_attr "num_insns" "2,*")])
> 
>  ;; Store rightmost element into store_data
>  ;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di"
>DONE;
>  })
> 
> +;; Zero extend DI to TI.  If we don't have the MTVSRDD instruction
> (and LXVRDX
> +;; in the case of power10), we use the machine independent code.  If
> we are
> +;; loading up GPRs, we fall back to the old code.
> +(define_insn_and_split "zero_extendditi2"
> +  [(set (match_operand:TI 0
> "register_operand" "=r,r, wa,")
> + (zero_extend:TI (match_operand:DI 1
> "register_operand"  "r,wa,r,  wa")))]
> +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> +  "@
> +   #
> +   #
> +   mtvsrdd %x0,0,%1
> +   #"
> +  "&& reload_completed
> +   && (int_reg_operand (operands[0], TImode)
> +   || vsx_register_operand (operands[1], DImode))"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  int dest_regno = reg_or_subregno (dest);
> +
> +  /* Handle conversion to GPR registers.  Load up the low part and
> then do
> + zero out the upper part.  */
> +  if (INT_REGNO_P (dest_regno))
> +{
> +  rtx dest_hi = gen_highpart (DImode, dest);
> +  rtx dest_lo = gen_lowpart (DImode, dest);
> +
> +  emit_move_insn (dest_lo, src);
> +  emit_move_insn (dest_hi, const0_rtx);
> +  DONE;
> +}
> +
> +  /* 

Re: [PATCH] Delay splitting addti3/subti3 until first split pass.

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 11:08 -0400, Michael Meissner wrote:
> Add zero_extendditi2.  Improve lxvr*x code generation.
> 

Hi,


> Subject: Re: [PATCH] Delay splitting addti3/subti3 until first split
pass.

Subject does not seem to match contents?





> This pattern adds zero_extendditi2 so that if we are extending DImode that
> is in a GPR register to TImode in a vector register, the compiler can
> generate MTVSRDDD.

Just "mtvsrdd".   


> 
> In addition the patterns for generating lxvr{b,h,w,d}x were tuned to allow
> loading to gpr registers.  This prevents needlessly doing direct moves to
> get the value into the vector registers if the gpr register was already
> selected.
> 
> In updating the insn counts for two tests due to these changes, I noticed
> the tests were done at -O0.  I changed this so that the tests are now done
> at the normal -O2 optimization level.
> 
s/normal/default/ ?


> This patch will be needed for an upcoming patch for PR target/103109.
> 
ok

> I have built this patch on little endian power10, little endian power9,
> and big endian power8 systems.  There were no regressions with this
> patch.  Can I install this on the GCC 13 trunk?
> 
> 2022-05-013   Michael Meissner  
> 
> gcc/
>   * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading to
>   GPR registers.
>   (vsx_stxvrx): Add support for storing from GPR registers.

swap the froms and tos ?


>   (zero_extendditi2): New insn.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2
>   instead of -O0 and update insn counts.
>   * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
>   * gcc.target/powerpc/zero-extend-di-ti.c: New test.
> ---
>  gcc/config/rs6000/vsx.md  | 82 +--
>  .../powerpc/vsx-load-element-extend-int.c | 36 
>  .../powerpc/vsx-load-element-extend-short.c   | 35 
>  .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++
>  4 files changed, 164 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index c091e5e2f47..ad971e3a1de 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_"
>  }
>  })
> 
> -;; Load rightmost element from load_data
> -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx.
> -(define_insn "vsx_lxvrx"
> -  [(set (match_operand:TI 0 "vsx_register_operand" "=wa")
> - (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "Z")))]
> -  "TARGET_POWER10"
> -  "lxvrx %x0,%y1"
> -  [(set_attr "type" "vecload")])
> +;; Load rightmost element from load_data using lxvrbx, lxvrhx, lxvrwx, 
> lxvrdx.
> +;; Support TImode being in a GPR register to prevent generating lvxr{d,w,b}x
> +;; and then two direct moves if we ultimately need the value in a GPR 
> register.

Perhaps break into two sentences and split the description of what is
prevented in a separate sentence.  ?


> +(define_insn_and_split "vsx_lxvrx"
> +  [(set (match_operand:TI 0 "register_operand" "=r,wa")
> + (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "m,Z")))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "@
> +   #
> +   lxvrx %x0,%y1"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
> +  [(set (match_dup 2) (match_dup 3))
> +   (set (match_dup 4) (const_int 0))]
> +{
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +
> +  operands[2] = gen_lowpart (DImode, op0);
> +  operands[3] = (mode == DImode
> +  ? op1
> +  : gen_rtx_ZERO_EXTEND (DImode, op1));
> +
> +  operands[4] = gen_highpart (DImode, op0);
> +}
> +  [(set_attr "type" "load,vecload")
> +   (set_attr "num_insns" "2,*")])
> 
>  ;; Store rightmost element into store_data
>  ;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di"
>DONE;
>  })
> 
> +;; Zero extend DI to TI.  If we don't have the MTVSRDD instruction (and 
> LXVRDX
> +;; in the case of power10), we use the machine independent code.  If we are
> +;; loading up GPRs, we fall back to the old code.

Will 'old code' have meaning to future readers of this lump of code?


> +(define_insn_and_split "zero_extendditi2"
> +  [(set (match_operand:TI 0 "register_operand" "=r,r, 
> wa,")
> + (zero_extend:TI (match_operand:DI 1 "register_operand"  "r,wa,r,  
> wa")))]
> +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> +  "@
> +   #
> +   #
> +   mtvsrdd %x0,0,%1
> +   #"
> +  "&& reload_completed
> +   && (int_reg_operand (operands[0], TImode)
> +   || vsx_register_operand (operands[1], DImode))"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  int dest_regno = reg_or_subregno (dest);
> +
> +  /* Handle conversion to GPR registers.  Load up the low part and then do
> + zero out the upper part.  */


s/do//

> +  if 

Re: [PATCH] Replace UNSPEC with RTL code for extendditi2.

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 10:52 -0400, Michael Meissner wrote:
> Replace UNSPEC with RTL code for extendditi2.
> 

Hi,


> When I submitted my patch on March 12th for extendditi2, Segher
> wished I
> had removed the use of the UNSPEC for the vextsd2q instruction.  This
> patch rewrites extendditi2_vector to use VEC_SELECT rather than
> UNSPEC.


I'd suggest a paragraph break between the two sentences.   


> 
> 2022-05-13   Michael Meissner  
> 
> gcc/
>   * config/rs6000/vsx.md (UNSPEC_EXTENDDITI2): Delete.

>   (extendditi2_vector): Rewrite to use VEC_SELECT as a
>   define_expand.

>   (extendditi2_vector2): New insn.


Ok, so per my interpretation of the patch below, it converts the
define_insn extendditi2_vector into a define_expand, and creates a new
extendditi2_vector2 instruction.  


Content below seems reasonable, I've not reviewed it extensively.  

Thanks
-Will

> ---
>  gcc/config/rs6000/vsx.md | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index a1a1ce95195..c091e5e2f47 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -358,7 +358,6 @@ (define_c_enum "unspec"
> UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX
> UNSPEC_XXGENPCV
> UNSPEC_MTVSBM
> -   UNSPEC_EXTENDDITI2
> UNSPEC_VCNTMB
> UNSPEC_VEXPAND
> UNSPEC_VEXTRACT
> @@ -5083,10 +5082,25 @@ (define_insn_and_split "extendditi2"
> (set_attr "type" "shift,load,vecmove,vecperm,load")])
> 
>  ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in
> TI reg
> -(define_insn "extendditi2_vector"
> +(define_expand "extendditi2_vector"
> +  [(use (match_operand:TI 0 "gpc_reg_operand"))
> +   (use (match_operand:TI 1 "gpc_reg_operand"))]
> +  "TARGET_POWER10"
> +{
> +  rtx dest = operands[0];
> +  rtx src_v2di = gen_lowpart (V2DImode, operands[1]);
> +  rtx element = GEN_INT (VECTOR_ELEMENT_SCALAR_64BIT);
> +
> +  emit_insn (gen_extendditi2_vector2 (dest, src_v2di, element));
> +  DONE;
> +})
> +
> +(define_insn "extendditi2_vector2"
>[(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> - (unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
> -  UNSPEC_EXTENDDITI2))]
> + (sign_extend:TI
> +  (vec_select:DI
> +   (match_operand:V2DI 1 "gpc_reg_operand" "v")
> +   (parallel [(match_operand 2 "vsx_scalar_64bit" "wD")]]
>"TARGET_POWER10"
>"vextsd2q %0,%1"
>[(set_attr "type" "vecexts")])
> -- 
> 2.35.3
> 
> 



Re: [PATCH] Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293

2022-05-13 Thread will schmidt via Gcc-patches
On Fri, 2022-05-13 at 10:49 -0400, Michael Meissner wrote:
> Optimize vec_splats of constant V2DI/V2DF vec_extract, PR target/99293.
> 
> This patch has been previously posted, but it seemed to get lost.:
> 
> > Date: Tue, 29 Mar 2022 23:25:31 -0400
> > Subject: [PATCH, V2] Optimize vec_splats of constant vec_extract for 
> > V2DI/V2DF, PR target 99293.
> > Message-ID: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592509.html
> 
> I had originally posted a previous version of this patch here.  There were
> changes asked for, which I did in this patch.
> 
> > Date: Mon, 28 Mar 2022 12:26:02 -0400
> > Subject: [PATCH 1/4] Optimize vec_splats of constant vec_extract for 
> > V2DI/V2DF, PR target 99293.
> > Message-ID: 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592420.html
> 


Hi, 
generally lgtm.  A few typos and comment suggests called out below. 
Thanks.
-Will



> In PR target/99293, it was pointed out that doing:
> 
>   vector long long dest0, dest1, src;
>   /* ... */
>   dest0 = vec_splats (vec_extract (src, 0));
>   dest1 = vec_splats (vec_extract (src, 1));
> 
> would generate slower code.
> 
> It generates the following code on power8:
> 
>   ;; vec_splats (vec_extract (src, 0))
>   xxpermdi 0,34,34,3
>   xxpermdi 34,0,0,0
> 
>   ;; vec_splats (vec_extract (src, 1))
>   xxlor 0,34,34
>   xxpermdi 34,0,0,0
> 
> However on power9 and power10 it generates:
> 
>   ;; vec_splats (vec_extract (src, 0))
>   mfvsld 3,34
>   mtvsrdd 34,9,9
> 
>   ;; vec_splats (vec_extract (src, 1))
>   mfvsrd 9,34
>   mtvsrdd 34,9,9
> 
> This is due to the power9 having the mfvsrld instruction which can extract
> either 64-bit element into a GPR.  While there are alternatives for both
> vector registers and GPR registers, the register allocator prefers to put
> DImode into GPR registers.
> 
> However in this case, it is better to have a single combiner pattern that
> can generate a single xxpermdi, instead of doing 2 insnsns (the extract

I like the idea of insnsns being the plural of insn. 

> and then the concat).  This is particularly true if the two operations are
> move from vector register and move to vector register.  As Segher pointed
> out in a previous version of the patch, the combiner already tries doing

s/doing//

> creating a (vec_duplicate (vec_select ...)) pattern, but we didn't provide
> one.



> 
> This patch reworks vsx_xxspltd_ for V2DImode and V2DFmode so that it
> no longer uses an UNSPEC.  Instead it uses VEC_DUPLICATE, which the
> combiner checks for.

potentially
s/no longer uses an UNSPEC.  Instead it uses/now uses/
and possibly 
s/ch
ecks for/can find/

> 





> I have built Spec 2017 with this patch installed, and the cam4_r benchmark
> is the only benchmark that generated different code (3 mfvsrld/mtvsrdd
> pairs of instructions were replaced with xxpermdi).
> 
> I have built bootstrap versions on the following systems and I have run
> the regression tests.  There were no regressions in the runs:
> 
>   Power9 little endian, --with-cpu=power9
>   Power10 little endian, --with-cpu=power10
>   Power8 big endian, --with-cpu=power8 (both 32-bit & 64-bit tests)
> 
> Can I install this into the trunk?  After a burn-in period, can I backport
> and install this into GCC 12, GCC 11 and GCC 10 branches?
> 
> 2022-05-13   Michael Meissner  
> 
> gcc/
>   PR target/99293
>   * config/rs6000/rs6000-p8swap.cc (rtx_is_swappable_p): Remove
>   UNSPEC_VSX_XXSPLTD case.
>   * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): Delete.
>   (vsx_xxspltd_): Rewrite to use VEC_DUPLICATE.
> 
> gcc/testsuite:
>   PR target/99293
>   * gcc.target/powerpc/builtins-1.c: Update insn count.
>   * gcc.target/powerpc/pr99293.c: New test.
> ---
>  gcc/config/rs6000/rs6000-p8swap.cc|  1 -
>  gcc/config/rs6000/vsx.md  | 19 +-
>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr99293.c| 36 +++
>  4 files changed, 47 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index d301bc3fe59..1973d9c8245 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -805,7 +805,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
> case UNSPEC_VUPKLU_V4SF:
>   return 0;
> case UNSPEC_VSPLT_DIRECT:
> -   case UNSPEC_VSX_XXSPLTD:
>   *special = SH_SPLAT;
>   return 1;
> case UNSPEC_REDUC_PLUS:

ok

> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 1b75538f42f..a1a1ce95195 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -296,7 +296,6 @@ (define_c_enum "unspec"
> UNSPEC_VSX_XXPERM
> 
> UNSPEC_VSX_XXSPLTW
> -   

Re: [PATCH, V4] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-04-20 Thread will schmidt via Gcc-patches
On Tue, 2022-04-12 at 21:14 -0400, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059
> 
> This is V4 of the patch.  Compared to V3 of the patch, GCC will just
> ignore -m{,no-}power8-fusion and -m{,no-}power8-fusion-sign.
> 


Hi, 
No comments on code, a few comments about the comments below.



> The splitting of signed halfword and word loads into unsigned load and
> sign extension is now suppressed with -Os, but it is done normally if we
> are not optimizing for space.

I see references to TARGET_P8_FUSION_SIGN in the patch below, and some
removal of old code.  I assume this describes the implementation that
remains.  

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.

specification.  :-)

> 
> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't 
> do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.

inlining. 


> 
> This code makes the -mpower8-fusion option a nop.  It is accepted without
> warning, but it does nothing.  Power8 fusion is only enabled if we are tuning
> for a power8.
> 
> The undocumented -mpower8-fusion-sign option is also made into a nop.
> 
> I left in the pragma target and attribute target support for power8-fusion, 
> but
> using it doesn't do anything now.  This is because I told the customer who
> encountered this problem that one solution was to add an explicit
> no-power8-fusion option in their target pragma or attribute to work around the
> problem.
> 
> I have tested this patch on a little endian power10 system.  I have tested
> previous versions on little endian power9 and big endian power8 systems.
> Can I apply this patch to the master branch?
> 
> If it is accepted, I will produce a similar patch for back porting to GCC 11
> and GCC 10.
> 
> 2022-04-12   Michael Meissner  
> 
> gcc/
>   PR target/102059
>   * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
>   (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
>   (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Delete code that set the power8 fusion options automatically.
>   (rs6000_opt_masks): Allow #pragma target and attribute target
>   power8-fusion option for backwards compatibility.
>   (rs6000_print_options_internal): Skip printing backward
>   compatibility options that are just ignored.
>   * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
>   (TARGET_P8_FUSION_SIGN): Likewise.
>   (MASK_P8_FUSION): Delete.
>   * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
>   ignore it completely.
>   (-mpower8-fusion-sign): Likewise.
>   * doc/invoke.texi (RS/6000 and PowerPC Options): Delete
>   -mpower8-fusion.
> 
> gcc/testsuite/
>   PR target/102059
>   * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
>   * gcc.dg/lto/pr102059-2_0.c: Likewise.
>   * gcc.target/powerpc/pr102059-3.c: Likewise.
>   * gcc.target/powerpc/pr102059-4.c: New test.
> ---
>  gcc/config/rs6000/rs6000-cpus.def | 18 +++
>  gcc/config/rs6000/rs6000.cc   | 49 +--
>  gcc/config/rs6000/rs6000.h| 13 -
>  gcc/config/rs6000/rs6000.opt  |  8 +--
>  gcc/doc/invoke.texi   | 13 +
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c   |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +
>  9 files changed, 62 insertions(+), 68 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..d913a3d6b73 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -54,19 +54,14 @@
>| OPTION_MASK_QUAD_MEMORY  \
>| OPTION_MASK_QUAD_MEMORY_ATOMIC)
> 
> -/* ISA masks setting fusion options.  */
> -#define OTHER_FUSION_MASKS   (OPTION_MASK_P8_FUSION  \
> -  | OPTION_MASK_P8_FUSION_SIGN)
> -
>  /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not 

Re: [PATCH, rs6000] Correct match pattern in pr56605.c

2022-04-08 Thread will schmidt via Gcc-patches
On Mon, 2022-02-28 at 11:17 +0800, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
>   This patch corrects the match pattern in pr56605.c. The former pattern
> is wrong and test case fails with GCC11. It should match following insn on
> each subtarget after mode promotion is disabled. The patch need to be
> backported to GCC11.
> 

Hi,

I note This patch appears to (partially?) address the P1 [11 regression] pr.  
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146


The issue makes reference to a different proposed patch 
in issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103197
titled  ppc inline expansion of memcpy/memmove should not use lxsibzx/stxsibx 
for a single byte
proposed patch named
rs6000: Disparage lfiwzx and similar

I can't address any of the background or history there.  :-)


> //gimple
> _17 = (unsigned int) _20;
>  prolog_loop_niters.4_23 = _17 & 3;
> 
> //rtl
> (insn 19 18 20 2 (parallel [
> (set (reg:CC 208)
> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0)
> (const_int 3 [0x3]))
> (const_int 0 [0])))
> (set (reg:SI 129 [ prolog_loop_niters.5 ])
> (and:SI (subreg:SI (reg:DI 207) 0)
> (const_int 3 [0x3])))
> ]) 197 {*andsi3_imm_mask_dot2}
> 
> 
>   Bootstrapped and tested on powerpc64-linux BE/LE and AIX with no 
> regressions.
> Is this okay for trunk and GCC11? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-02-28 Haochen Gui 
> 
> gcc/testsuite/
>   PR target/102146
>   * gcc.target/powerpc/pr56605.c: Correct match pattern in combine pass.
> 
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> index fdedbfc573d..231d808aa99 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> @@ -11,5 +11,5 @@ void foo (short* __restrict sb, int* __restrict ia)
>  ia[i] = (int) sb[i];
>  }
> 
> -/* { dg-final { scan-rtl-dump-times {\(compare:CC 
> \((?:and|zero_extend):(?:DI) \((?:sub)?reg:[SD]I} 1 "combine" } } */
> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI 
> \(reg:DI} 1 "combine" } } */


SO with the update, (i squint so this is an approximate handwave) this
drops the zero_extend and changes the destination type to be DI for the
scan-rtl.This appears to match the rtl as mentioned in the patch
comments.


> 



Re: [PATCH] rs6000/test: Adjust p9-vec-length-7 sensitive to unroll [PR103196]

2022-04-07 Thread will schmidt via Gcc-patches
On Mon, 2022-02-28 at 13:37 +0800, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR103196 shows, p9-vec-length-full-7.c needs to be adjusted as the
> complete unrolling can happen on some of its loops.  This patch is to
> use pragma "GCC unroll 0" to disable all possible loop unrollings.
> Hope it can help the case not that fragile.

ok

Is the lack of effectiveness of "-fno-unroll-loops" otherwise
understood, or is there further issue behind that option? 

I would
expect the effect of the option, versus the pragma, two to roughly
equivalent.   Obviously it is not.  :-)
> 
> There are some other p9-vec-length* cases, I noticed that some of them
> use either bigger or unknown loop iteration counts, and
> "p9-vec-length-3*" have considered the effects of complete unrolling.
> So I just leave them alone for now.
> 
> Tested on powerpc64-linux-gnu P8 and powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
>   PR testsuite/103196
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/p9-vec-length-7.h: Add DO_PRAGMA macro.
>   * gcc.target/powerpc/p9-vec-length-epil-7.c: Use unroll pragma to
>   disable any unrollings.
>   * gcc.target/powerpc/p9-vec-length-full-7.c: Remove useless option.
>   * gcc.target/powerpc/p9-vec-length.h: Likewise.

I suggest a slight rearrangement and correction.

The -fno-unroll-loops options are removed from *-epil-7.c and *-full-7.c.

p9-vec-length.h  adds the DO_PRAGMA macro.

p9-vec-length-7.h updates (corrects?) whitespace and adds the PRAGMA call for 
"GCC unroll 0" around the test loop. 




> > ---
> >  .../gcc.target/powerpc/p9-vec-length-7.h| 17 +++--
> >  .../gcc.target/powerpc/p9-vec-length-epil-7.c   |  2 +-
> >  .../gcc.target/powerpc/p9-vec-length-full-7.c   |  2 +-
> >  .../gcc.target/powerpc/p9-vec-length.h  |  2 ++
> >  4 files changed, 15 insertions(+), 8 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h 
> > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h
> > index 4ef8f974a04..4f338565619 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h
> > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-7.h
> > @@ -7,14 +7,19 @@
> >  #define START 1
> >  #define END 59
> >  
> > +/* Note that we use pragma unroll to disable any loop unrollings.  */
> > +
> >  #define test(TYPE) 
> > \
> > -  TYPE x_##TYPE[N] __attribute__((aligned(16)));   
> >  \
> > -  void __attribute__((noinline, noclone)) test_npeel_##TYPE() {
> > \
> > +  TYPE x_##TYPE[N] __attribute__ ((aligned (16))); 
> > \
> > +  void __attribute__ ((noinline, noclone)) test_npeel_##TYPE ()
> > \
> > +  {
> > \
> >  TYPE v = 0;
> > \
> > -for (unsigned int i = START; i < END; i++) {   
> > \
> > -  x_##TYPE[i] = v; 
> > \
> > -  v += 1;  
> > \
> > -}  
> > \
> > +DO_PRAGMA (GCC unroll 0)   
> > \
> > +for (unsigned int i = START; i < END; i++) 
> > \
> > +  {
> > \
> > +   x_##TYPE[i] = v;   \
> > +   v += 1;\
> > +  }
> > \
> >}


Some whitespace fix-ups (ok), and the addition of
the "DO_PRAGMA (GCC unroll 0)".

ok.


> >  
> >  TEST_ALL (test)
> > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c 
> > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> > index a27ee347ca1..859fedd5679 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-7.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile { target { lp64 && powerpc_p9vector_ok } } } */
> > -/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize 
> > -fno-vect-cost-model -fno-unroll-loops -ffast-math" } */
> > +/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize 
> > -fno-vect-cost-model -ffast-math" } */

ok

> >  
> >  /* { dg-additional-options "--param=vect-partial-vector-usage=1" } */
> >  
> > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c 
> > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> > index 89ff38443e7..5fe542bba20 100644
> > --- 

Re: [PATCH] Disable float128 tests on VxWorks, PR target/104253.

2022-04-07 Thread will schmidt via Gcc-patches
On Thu, 2022-04-07 at 06:00 -0500, Segher Boessenkool wrote:
> On Thu, Apr 07, 2022 at 12:29:45AM -0400, Michael Meissner wrote:
> > In PR target/104253, it was pointed out the that test case added as part
> > of fixing the PR does not work on VxWorks because float128 is not
> > supported on that system.  I have modified the three tests for float128 so
> > that they are manually excluded on VxWorks systems.  In looking at the
> > code, I also added checks in check_effective_target_ppc_ieee128_ok to
> > disable the systems that will never support VSX instructions which are
> > required for float128 support (eabi, eabispe, darwin).
> 
> It's just one extra to the big list here, but, why do we need all these
> manual exclusions anyway?  What is broken about the test itself?



>From the PR, it looks like this test noted an error, not actually a
failure.  

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104253#c17

cc1: warning: The '-mfloat128' option may not be fully supported


which comes out of gcc/config/rs6000/rs6000.cc 
rs6000_option_override_internal() via 

  /* IEEE 128-bit floating point requires VSX support.  */
  if (TARGET_FLOAT128_KEYWORD)
{
  if (!TARGET_VSX)
{

}
  else if (!TARGET_FLOAT128_TYPE)
{
  TARGET_FLOAT128_TYPE = 1;
  warning (0, "The %<-mfloat128%> option may not be fully
supported");
}
}


> 
> It would be so much more useful if the tests would help us, instead of
> producing a lot of extra busy-work.





> 
> 
> Segher



Re: [PATCH v2] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-04-07 Thread will schmidt via Gcc-patches
On Thu, 2022-04-07 at 17:29 +0800, Kewen.Lin wrote:
> Hi,
> 
> As PR103353 shows, we may want to continue to expand a MMA built-in
> function like a normal function, even if we have already emitted
> error messages about some missing required conditions.  As shown in
> that PR, without one explicit mov optab on OOmode provided, it would
> call emit_move_insn recursively.
> 
> So this patch is to allow the mov pattern to be generated when we are
> expanding to RTL and have seen errors even without MMA supported, it's
> expected that the generated pattern would not cause further ICEs as the
> compilation would stop soon after expanding.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591150.html
> 
> v2: Polish some comments and add one test case as Will and Peter suggested.

Thanks.

> 
> Is it ok for trunk or upcoming stage1?
> 
> BR,
> Kewen
> --
> 
>   PR target/103353
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition
>   check to preparation statements and add handlings for !TARGET_MMA.
>   (define_expand movxo): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr103353.c: New test.
> ---
>  gcc/config/rs6000/mma.md| 42 ++---
>  gcc/testsuite/gcc.target/powerpc/pr103353.c | 22 +++
>  2 files changed, 58 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103353.c
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 907c9d6d516..746a77a0957 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -268,10 +268,25 @@ (define_int_attr avvi4i4i4  
> [(UNSPEC_MMA_PMXVI8GER4PP   "pmxvi8ger4pp")
>  (define_expand "movoo"
>[(set (match_operand:OO 0 "nonimmediate_operand")
>   (match_operand:OO 1 "input_operand"))]
> -  "TARGET_MMA"
> +  ""
>  {
> -  rs6000_emit_move (operands[0], operands[1], OOmode);
> -  DONE;
> +  if (TARGET_MMA) {
> +rs6000_emit_move (operands[0], operands[1], OOmode);
> +DONE;
> +  }
> +  /* Opaque modes are only expected to be available when MMA is supported,
> + but PR103353 shows we may want to continue to expand a MMA built-in
> + function, even if we have already emitted error messages about some
> + missing required conditions.  As shown in that PR, without one
> + explicit mov optab on OOmode provided, it would call emit_move_insn
> + recursively.  So we allow this pattern to be generated when we are
> + expanding to RTL and have seen errors, even though there is no MMA
> + support.  It would not cause further ICEs as the compilation would
> + stop soon after expanding.  */
> +  else if (currently_expanding_to_rtl && seen_error ())
> +;
> +  else
> +gcc_unreachable ();
>  })

ok

> 
>  (define_insn_and_split "*movoo"
> @@ -300,10 +315,25 @@ (define_insn_and_split "*movoo"
>  (define_expand "movxo"
>[(set (match_operand:XO 0 "nonimmediate_operand")
>   (match_operand:XO 1 "input_operand"))]
> -  "TARGET_MMA"
> +  ""
>  {
> -  rs6000_emit_move (operands[0], operands[1], XOmode);
> -  DONE;
> +  if (TARGET_MMA) {
> +rs6000_emit_move (operands[0], operands[1], XOmode);
> +DONE;
> +  }
> +  /* Opaque modes are only expected to be available when MMA is supported,
> + but PR103353 shows we may want to continue to expand a MMA built-in
> + function, even if we have already emitted error messages about some
> + missing required conditions.  As shown in that PR, without one
> + explicit mov optab on XOmode provided, it would call emit_move_insn
> + recursively.  So we allow this pattern to be generated when we are
> + expanding to RTL and have seen errors, even though there is no MMA
> + support.  It would not cause further ICEs as the compilation would
> + stop soon after expanding.  */
> +  else if (currently_expanding_to_rtl && seen_error ())
> +;
> +  else
> +gcc_unreachable ();
>  })

ok


> 
>  (define_insn_and_split "*movxo"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103353.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103353.c
> new file mode 100644
> index 000..6b0bedbb958
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103353.c
> @@ -0,0 +1,22 @@
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* If the default cpu type is power10 or later, MMA is enabled by default.
> +   To keep the test point available all the time, this case specifies
> +   -mdejagnu-cpu=power6 to make it be tested without MMA.  */
> +/* { dg-options "-maltivec -mdejagnu-cpu=power6" } */
> +
> +/* Verify there is no ICE and don't check the error messages on MMA
> +   requirement since they could be fragile and are not test points
> +   of this case.  */
> +
> +void
> +foo (__vector_pair *dst, double *x)
> +{
> +  dst[0] = 

Re: [PATCH] Add zero_extendditi2. Improve lxvr*x code generation.

2022-04-06 Thread will schmidt via Gcc-patches
On Wed, 2022-04-06 at 14:21 -0400, Michael Meissner wrote:
> From bf51c49f1481001c7b3223474d261dcbf9365eda Mon Sep 17 00:00:00 2001
> From: Michael Meissner 
> Date: Fri, 1 Apr 2022 22:27:13 -0400
> Subject: [PATCH] Add zero_extendditi2.  Improve lxvr*x code generation.
> 

Hi,

> This pattern adds zero_extendditi2 so that if we are extending DImode to
> TImode, and we want the result in a vector register, the compiler can
> generate MTVSRDDD.
> 
> In addition the patterns for generating lxvr{b,h,w,d}x were tuned to allow
> loading to gpr registers.  This prevents needlessly doing direct moves to
> get the value into the vector registers if the gpr register was already
> selected.

ok

> 
> In updating the insn counts for two tests due to these changes, I noticed
> the tests were done at -O0.  I changed this so that the tests are now done
> at the normal -O2 optimization level.

Per the comments (which you fixed up later in patch), I note they were
deliberately done at -O0 since under higher optimizations gcc would
generate other load instructions during those tests.  Presumably with
these changes that is no longer the case.  :-)
> 
> I have tested this patch with bootstrap builds and running the regression
> testsuite using this patch on:
> 
>   Little endian power10, --with-cpu=power10
>   Little endian power9, --with-cpu=power9
>   Big endian power8, --with-cpu=power8 (both 64/32-bit tests done).
> 
> There were no regressions.  Can I check this into the master branch?
> 
> 2022-04-06   Michael Meissner  
> 
> gcc/
>   * config/rs6000/vsx.md (vsx_lxvrx): Add support for loading to
>   GPR registers.
>   (vsx_stxvrx): Add support for storing from GPR registers.
>   (zero_extendditi2): New insn.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/vsx-load-element-extend-int.c: Use -O2
>   instead of -O0 and update insn counts.
>   * gcc.target/powerpc/vsx-load-element-extend-short.c: Likewise.
>   * gcc.target/powerpc/zero-extend-di-ti.c: New test.
> 
> ---
>  gcc/config/rs6000/vsx.md  | 82 +--
>  .../powerpc/vsx-load-element-extend-int.c | 36 
>  .../powerpc/vsx-load-element-extend-short.c   | 35 
>  .../gcc.target/powerpc/zero-extend-di-ti.c| 62 ++
>  4 files changed, 164 insertions(+), 51 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/zero-extend-di-ti.c
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index c091e5e2f47..ad971e3a1de 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1315,14 +1315,32 @@ (define_expand "vsx_store_"
>  }
>  })
> 
> -;; Load rightmost element from load_data
> -;; using lxvrbx, lxvrhx, lxvrwx, lxvrdx.
> -(define_insn "vsx_lxvrx"
> -  [(set (match_operand:TI 0 "vsx_register_operand" "=wa")
> - (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "Z")))]
> -  "TARGET_POWER10"
> -  "lxvrx %x0,%y1"
> -  [(set_attr "type" "vecload")])
> +;; Load rightmost element from load_data using lxvrbx, lxvrhx, lxvrwx, 
> lxvrdx.
> +;; Support TImode being in a GPR register to prevent generating lvxr{d,w,b}x
> +;; and then two direct moves if we ultimately need the value in a GPR 
> register.
> +(define_insn_and_split "vsx_lxvrx"
> +  [(set (match_operand:TI 0 "register_operand" "=r,wa")
> + (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "m,Z")))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "@
> +   #
> +   lxvrx %x0,%y1"
> +  "&& reload_completed && int_reg_operand (operands[0], TImode)"
> +  [(set (match_dup 2) (match_dup 3))
> +   (set (match_dup 4) (const_int 0))]
> +{
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +
> +  operands[2] = gen_lowpart (DImode, op0);
> +  operands[3] = (mode == DImode
> +  ? op1
> +  : gen_rtx_ZERO_EXTEND (DImode, op1));
> +
> +  operands[4] = gen_highpart (DImode, op0);
> +}
> +  [(set_attr "type" "load,vecload")
> +   (set_attr "num_insns" "2,*")])
> 
>  ;; Store rightmost element into store_data
>  ;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> @@ -5019,6 +5037,54 @@ (define_expand "vsignextend_si_v2di"
>DONE;
>  })
> 
> +;; Zero extend DI to TI.  If we don't have the MTVSRDD instruction (and 
> LXVRDX
> +;; in the case of power10), we use the machine independent code.  If we are
> +;; loading up GPRs, we fall back to the old code.

In this context it's not clear what is the "old code" ?
The mtvsrdd
instruction is referenced in this code path.  I see no direct reference
to lxvrdx here, though I suppose it's assumed somewhere behind the
emit_ calls.


> +(define_insn_and_split "zero_extendditi2"
> +  [(set (match_operand:TI 0 "register_operand" "=r,r, 
> wa,")
> + (zero_extend:TI (match_operand:DI 1 "register_operand"  "r,wa,r,  
> wa")))]
> +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> +  "@
> +   #
> +   #
> +   mtvsrdd %x0,0,%1
> +   #"
> +  "&& reload_completed
> +   && 

Re: [PATCH] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-04-01 Thread will schmidt via Gcc-patches
On Thu, 2022-03-03 at 16:38 +0800, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 

Hi

> As PR103353 shows, we may want to continue to expand a MMA built-in
> function like a normal function, even if we have already emitted
> error messages about some missing required conditions.  As shown in
> that PR, without one explicit mov optab on OOmode provided, it would
> call emit_move_insn recursively.
> 
> So this patch is to allow the mov pattern to be generated when we are
> expanding to RTL and have seen errors even without MMA supported, it's
> expected that the generated pattern would not cause further ICEs as the
> compilation would stop soon after expanding.

Is there a testcase, new or existing, that illustrates this error path?

> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> --
> 
>   PR target/103353
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition
>   check to preparation statements and add handlings for !TARGET_MMA.
>   (define_expand movxo): Likewise.

> > ---
> >  gcc/config/rs6000/mma.md | 42 ++--
> >  1 file changed, 36 insertions(+), 6 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> > index 907c9d6d516..f76a87b4a21 100644
> > --- a/gcc/config/rs6000/mma.md
> > +++ b/gcc/config/rs6000/mma.md
> > @@ -268,10 +268,25 @@ (define_int_attr avvi4i4i4
> > [(UNSPEC_MMA_PMXVI8GER4PP   "pmxvi8ger4pp")
> >  (define_expand "movoo"
> >[(set (match_operand:OO 0 "nonimmediate_operand")
> > (match_operand:OO 1 "input_operand"))]
> > -  "TARGET_MMA"
> > +  ""
> >  {
> > -  rs6000_emit_move (operands[0], operands[1], OOmode);
> > -  DONE;
> > +  if (TARGET_MMA) {
> > +rs6000_emit_move (operands[0], operands[1], OOmode);
> > +DONE;
> > +  }
> > +  /* Opaque modes are only expected to be available when MMA is supported,
> > + but PR103353 shows we may want to continue to expand a MMA built-in
> > + function like a normal function, even if we have already emitted
> > + error messages about some missing required conditions.

perhaps drop "like a normal function".  


> > + As shown in that PR, without one explicit mov optab on OOmode 
> > provided,
> > + it would call emit_move_insn recursively.  So we allow this pattern to
> > + be generated when we are expanding to RTL and have seen errors, even
> > + though there is no MMA support.  It would not cause further ICEs as
> > + the compilation would stop soon after expanding.  */

Testcase would be particularly helpful to illustrate this, i think.  

TH
anks,
-Will

> > +  else if (currently_expanding_to_rtl && seen_error ())
> > +;
> > +  else
> > +gcc_unreachable ();
> >  })
> >  
> >  (define_insn_and_split "*movoo"
> > @@ -300,10 +315,25 @@ (define_insn_and_split "*movoo"
> >  (define_expand "movxo"
> >[(set (match_operand:XO 0 "nonimmediate_operand")
> > (match_operand:XO 1 "input_operand"))]
> > -  "TARGET_MMA"
> > +  ""
> >  {
> > -  rs6000_emit_move (operands[0], operands[1], XOmode);
> > -  DONE;
> > +  if (TARGET_MMA) {
> > +rs6000_emit_move (operands[0], operands[1], XOmode);
> > +DONE;
> > +  }
> > +  /* Opaque modes are only expected to be available when MMA is supported,
> > + but PR103353 shows we may want to continue to expand a MMA built-in
> > + function like a normal function, even if we have already emitted
> > + error messages about some missing required conditions.
> > + As shown in that PR, without one explicit mov optab on OOmode 
> > provided,
> > + it would call emit_move_insn recursively.  So we allow this pattern to
> > + be generated when we are expanding to RTL and have seen errors, even
> > + though there is no MMA support.  It would not cause further ICEs as
> > + the compilation would stop soon after expanding.  */
> > +  else if (currently_expanding_to_rtl && seen_error ())
> > +;
> > +  else
> > +gcc_unreachable ();
> >  })
> >  
> >  (define_insn_and_split "*movxo"
> > -- 
> > 2.25.1
> > 



Re: [PATCH 8/8] rs6000: Fix some missing built-in attributes [PR104004]

2022-03-30 Thread will schmidt via Gcc-patches
On Fri, 2022-01-28 at 11:50 -0600, Bill Schmidt via Gcc-patches wrote:
> PR104004 caught some misses on my part in converting to the new
> built-in
> function infrastructure.  In particular, I forgot to mark all of the
> "nosoft"
> built-ins, and one of those should also have been marked "no32bit".
> 
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this okay for trunk?
> 
> Thanks,
> Bill
> 
Hi,

The patch here seems reasonable to me. 
There are comments/subsequent pings that include commentary about
additional test coverage.

I see all of the builtins referenced here appear to be touched by
the existing test  gcc.target/powerpc/test_fpscr_drn_builtin.c .
I could create a variation of that test forcing ! hard_dfp in case that
would help, though i'm uncertain the value there. 

Thanks
-Will

> 
> 2022-01-27  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rs6000-builtin.def (MFFSL): Mark nosoft.
>   (MTFSB0): Likewise.
>   (MTFSB1): Likewise.
>   (SET_FPSCR_RN): Likewise.
>   (SET_FPSCR_DRN): Mark nosoft and no32bit.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def
> b/gcc/config/rs6000/rs6000-builtins.def
> index c8f0cf332eb..98619a649e3 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -215,7 +215,7 @@
>  ; processors, this builtin automatically falls back to mffs on older
>  ; platforms.  Thus it appears here in the [always] stanza.
>double __builtin_mffsl ();
> -MFFSL rs6000_mffsl {}
> +MFFSL rs6000_mffsl {nosoft}
> 
>  ; This is redundant with __builtin_pack_ibm128, as it requires long
>  ; double to be __ibm128.  Should probably be deprecated.
> @@ -226,10 +226,10 @@
>  MFTB rs6000_mftb_di {32bit}
> 
>void __builtin_mtfsb0 (const int<0,31>);
> -MTFSB0 rs6000_mtfsb0 {}
> +MTFSB0 rs6000_mtfsb0 {nosoft}
> 
>void __builtin_mtfsb1 (const int<0,31>);
> -MTFSB1 rs6000_mtfsb1 {}
> +MTFSB1 rs6000_mtfsb1 {nosoft}
> 
>void __builtin_mtfsf (const int<0,255>, double);
>  MTFSF rs6000_mtfsf {}
> @@ -238,7 +238,7 @@
>  PACK_IF packif {}
> 
>void __builtin_set_fpscr_rn (const int[0,3]);
> -SET_FPSCR_RN rs6000_set_fpscr_rn {}
> +SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
> 
>const double __builtin_unpack_ibm128 (__ibm128, const int<0,1>);
>  UNPACK_IF unpackif {}
> @@ -2969,7 +2969,7 @@
>  PACK_TD packtd {}
> 
>void __builtin_set_fpscr_drn (const int[0,7]);
> -SET_FPSCR_DRN rs6000_set_fpscr_drn {}
> +SET_FPSCR_DRN rs6000_set_fpscr_drn {nosoft,no32bit}
> 
>const unsigned long long __builtin_unpack_dec128 (_Decimal128, \
>  const int<0,1>);



Re: [PATCH v3, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-21 Thread will schmidt via Gcc-patches
On Mon, 2022-03-21 at 09:51 +0800, HAO CHEN GUI wrote:
> Hi,
>This patch adds V1TI mode into a new mode iterator used in vector
> comparison expands.Without the patch, the comparisons between two vector
> __int128 are converted to scalar comparisons with branches. The code is
> suboptimal.The patch fixes the issue. Now all comparisons between two
> vector __int128 generates P10 new comparison instructions. Also the
> relative built-ins generate the same instructions after gimple folding.
> So they're added back to the list.
> 

Hi,
Thanks for reworking the description, this clears up my uncertainty. 
:-)
A few spots where spaces should be added after periods.  No need to
re-post for just that.  Patch content otherwise seems OK to me, though
I defer to others for any subtleties with actual VEC_IC related
changes, 
Thanks
-Will


>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-03-16 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
>   V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
>   * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector
>   __int128.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 5d34c1bcfc9..fac7f43f438 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPEQUT:
>fold_compare_helper (gsi, EQ_EXPR, stmt);
>return true;
> 
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPNET:
>fold_compare_helper (gsi, NE_EXPR, stmt);
>return true;
> 
> @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>fold_compare_helper (gsi, GE_EXPR, stmt);
>return true;
> 
> @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> -/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
> 
> @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>fold_compare_helper (gsi, LE_EXPR, stmt);
>return true;
> 
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index b87a742cca8..d88869cc8d0 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; Vector int modes for comparison
> +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")])
> +
>  ;; 128-bit int modes
>  (define_mode_iterator VEC_TI [V1TI TI])
> 
> @@ -533,10 +536,10 @@ (define_expand "vcond_mask_"
> 
>  ;; For signed integer vectors comparison.
>  (define_expand "vec_cmp"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 

Re: [PATCHv2, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-17 Thread will schmidt via Gcc-patches
On Thu, 2022-03-17 at 13:35 +0800, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
>This patch adds V1TI mode into a new mode iterator used in vector
> comparison expands.With the patch, both built-ins and direct
> comparison
> could generate P10 new V1TI comparison instructions.

Hi,


-/* We deliberately omit RS6000_BIF_CMPGE_1TI ...
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */


I assume it is the case, but don't see a before/after example to
clarify the situation.   A clear statement that the 'worse code'
situation has been resolved with this addition of TI modes into the
iterators, would be good.

Otherwise lgtm.  :-)

Thanks,
-Will


> 
>Bootstrapped and tested on ppc64 Linux BE and LE with no
> regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-03-16 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
>   V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
>   * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector
>   __int128.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 5d34c1bcfc9..fac7f43f438 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because
> gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPEQUT:
>fold_compare_helper (gsi, EQ_EXPR, stmt);
>return true;
> 
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because
> gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPNET:
>fold_compare_helper (gsi, NE_EXPR, stmt);
>return true;
> 
> @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and
> RS6000_BIF_CMPGE_U1TI
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>fold_compare_helper (gsi, GE_EXPR, stmt);
>return true;
> 
> @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> -/* We deliberately omit RS6000_BIF_VCMPGTUT and
> RS6000_BIF_VCMPGTST
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
> 
> @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPLE_1TI and
> RS6000_BIF_CMPLE_U1TI
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>fold_compare_helper (gsi, LE_EXPR, stmt);
>return true;
> 
> diff --git a/gcc/config/rs6000/vector.md
> b/gcc/config/rs6000/vector.md
> index b87a742cca8..d88869cc8d0 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; Vector int modes for comparison
> +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI
> "TARGET_POWER10")])
> +
>  ;; 128-bit int modes
>  (define_mode_iterator VEC_TI [V1TI TI])
> 
> @@ -533,10 +536,10 @@ (define_expand "vcond_mask_"
> 
>  ;; For signed integer vectors comparison.
>  (define_expand "vec_cmp"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 "signed_or_equality_comparison_operator"
> -   [(match_operand:VEC_I 2 "vint_operand")
> -(match_operand:VEC_I 3 "vint_operand")]))]
> +  

Re: rs6000: RFC/Update support for addg6s instruction. PR100693

2022-03-16 Thread will schmidt via Gcc-patches
On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote:
> > For PR100693, we currently provide an addg6s builtin using unsigned
> > int arguments, but we are missing an unsigned long long argument
> > equivalent.  This patch adds an overload to provide the long long
> > version of the builtin.
> > 
> > unsigned long long __builtin_addg6s (unsigned long long, unsigned
> > long long);
> > 
> > RFC/concerns: This patch works, but looking briefly at intermediate
> > stages
> > is not behaving quite as I expected.   Looking at the intermediate
> > dumps, I
> > see in pr100693.original that calls I expect to be routed to the
> > internal
> > __builtin_addg6s_si() that uses (unsigned int) arguments are
> > instead being
> > handled by __builtin_addg6s_di() with casts that convert the
> > arguments to
> > (unsigned long long).
> 
> Did you test with actual 32-bit variables, instead of just function
> arguments?  Function arguments are always passed in (sign-extended)
> registers.
> 
> Like,
> 
> unsigned int f(unsigned int *a, unsigned int *b)
> {
>   return __builtin_addg6s(*a, *b);
> }


I perhaps missed that subtlety.  I'll investigate that further.

> 
> > As a test, I see if I swap the order of the builtins in rs6000-
> > overload.def
> > I end up with code casting the ULL values to UI, which provides
> > truncated
> > results, and is similar to what occurs today without this patch.
> > 
> > All that said, this patch seems to work.  OK for next stage 1?
> > Tested on power8BE as well as LE power8,power9,power10.
> 
> Please ask again when stage 1 has started?
> 
> > gcc/
> > PR target/100693
> > * config/rs6000/rs600-builtins.def: Remove entry for
> > __builtin_addgs()
> >   and add entries for __builtin_addg6s_di() and
> > __builtin_addg6s_si().
> 
> Indent of second and further lines should be at the "*", not two
> spaces
> after that.
> 
> > -   UNSPEC_ADDG6S
> > +   UNSPEC_ADDG6S_SI
> > +   UNSPEC_ADDG6S_DI
> 
> You do not need multiple unspec numbers.  You can differentiate them
> based on the modes of the arguments, already :-)
> 
> >  ;; Miscellaneous ISA 2.06 (power7) instructions
> > -(define_insn "addg6s"
> > +(define_insn "addg6s_si"
> >[(set (match_operand:SI 0 "register_operand" "=r")
> > (unspec:SI [(match_operand:SI 1 "register_operand" "r")
> > (match_operand:SI 2 "register_operand" "r")]
> > -  UNSPEC_ADDG6S))]
> > +  UNSPEC_ADDG6S_SI))]
> > +  "TARGET_POPCNTD"
> > +  "addg6s %0,%1,%2"
> > +  [(set_attr "type" "integer")])
> > +
> > +(define_insn "addg6s_di"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (unspec:DI [(match_operand:DI 1 "register_operand" "r")
> > +   (match_operand:DI 2 "register_operand" "r")]
> > +  UNSPEC_ADDG6S_DI))]
> >"TARGET_POPCNTD"
> >"addg6s %0,%1,%2"
> >[(set_attr "type" "integer")])
> 
> (define_insn "addg6s"
>   [(set (match_operand:GPR 0 "register_operand" "=r")
>   (unspec:GPR [(match_operand:GPR 1 "register_operand" "r")
>(match_operand:GPR 2 "register_operand" "r")]
>   UNSPEC_ADDG6S))]
>   "TARGET_POPCNTD"
>   "addg6s %0,%1,%2"
>   [(set_attr "type" "integer")])
> You do not need multiple unspec numbers.  You can differentiate
them
> based on the modes of the arguments, already :-)


Yeah, Thats what I thought, which is a big part of why I posted this
with RFC. :-)When I attempted this there was an issue with multiple
s (behind the GPR predicate) versus the singular "addg6s"
define_insn.  
It's possible I had something else wrong there, but I'll
go back to that attempt and work in that direction.

> 
> We do not want DI (here, and in most places) for -m32!
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do compile { target { powerpc*-*-linux* } } } */
> 
> Why only on Linux?
> 
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> 
> Why not on Darwin?  And why skip it anyway, given the previous line
> :-)
> 
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
> That is the wrong requirement.  You want to test for Power7, not for
> VSX.  I realise you probably copied this from elsewhere :-(  (If from
> another addg6s testcase, just keep it).

Because reasons. :-)   The stanzas are copied from the nearby bcd-1.c
testcase that has a simpler test for addg6s.Given the input I'll
try to correct the stanzas here and limit how much error I carry along.

Thanks for the feedback and review.   I'll investigate further, and
resubmit at stage1.   

Thanks,
-Will

> 
> 
> Segher



rs6000: RFC/Update support for addg6s instruction. PR100693

2022-03-16 Thread will schmidt via Gcc-patches
Hi,

RFC/Update support for addg6s instruction.  PR100693

For PR100693, we currently provide an addg6s builtin using unsigned
int arguments, but we are missing an unsigned long long argument
equivalent.  This patch adds an overload to provide the long long
version of the builtin.

unsigned long long __builtin_addg6s (unsigned long long, unsigned long long);

RFC/concerns: This patch works, but looking briefly at intermediate stages
is not behaving quite as I expected.   Looking at the intermediate dumps, I
see in pr100693.original that calls I expect to be routed to the internal
__builtin_addg6s_si() that uses (unsigned int) arguments are instead being
handled by __builtin_addg6s_di() with casts that convert the arguments to
(unsigned long long).
i.e.
 return (unsigned int) __builtin_addg6s_di
 ((long long unsigned int) a, (long long unsigned int) b);

As a test, I see if I swap the order of the builtins in rs6000-overload.def
I end up with code casting the ULL values to UI, which provides truncated
results, and is similar to what occurs today without this patch.

All that said, this patch seems to work.  OK for next stage 1?
Tested on power8BE as well as LE power8,power9,power10.

2022-03-15  Will Schmidt  

gcc/
PR target/100693
* config/rs6000/rs600-builtins.def: Remove entry for __builtin_addgs()
  and add entries for __builtin_addg6s_di() and __builtin_addg6s_si().
* config/rs6000/rs6000-overload.def: Add overloaded entries allowing
  __builtin_addg6s() to map to either of the __builtin_addg6s_{di,si}
  builtins.
* config/rs6000/rs6000.md: Add UNSPEC_ADDG6S_SI and UNSPEC_ADDG6S_DI
  unspecs.   Add define_insn entries for addg6s_si and addg6s_di based
  on those unspecs.
* doc/extend.texi:  Add entry for ULL __builtin_addg6s (ULL, ULL);

testsuite/
PR target/100693
* gcc.target/powerpc/pr100693.c:  New test.

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index ae2760c33389..4c23cac26932 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1993,12 +1993,16 @@
 XXSPLTD_V2DI vsx_xxspltd_v2di {}
 
 
 ; Power7 builtins (ISA 2.06).
 [power7]
-  const unsigned int __builtin_addg6s (unsigned int, unsigned int);
-ADDG6S addg6s {}
+  const unsigned long long __builtin_addg6s_di (unsigned long long, \
+   unsigned long long);
+ADDG6S_DI addg6s_di {}
+
+  const unsigned int __builtin_addg6s_si (unsigned int, unsigned int);
+ADDG6S_SI addg6s_si {}
 
   const signed long __builtin_bpermd (signed long, signed long);
 BPERMD bpermd_di {32bit}
 
   const unsigned int __builtin_cbcdtd (unsigned int);
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 44e2945aaa0e..931f85b738c5 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -76,10 +76,15 @@
 ; Blank lines may be used as desired in this file between the lines as
 ; defined above; that is, you can introduce as many extra newlines as you
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[ADDG6S, __builtin_i_addg6s, __builtin_addg6s]
+  unsigned long long __builtin_addg6s_di (signed long long, unsigned long 
long);
+ADDG6S_DI
+  unsigned int __builtin_addg6s_si (unsigned int, unsigned int);
+ADDG6S_SI
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
 BCDADD_V1TI
   vuc __builtin_vec_bcdadd (vuc, vuc, const int);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index fdfbc6566a5c..d040f127eb55 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -122,11 +122,12 @@ (define_c_enum "unspec"
UNSPEC_P8V_MTVSRWZ
UNSPEC_P8V_RELOAD_FROM_GPR
UNSPEC_P8V_MTVSRD
UNSPEC_P8V_XXPERMDI
UNSPEC_P8V_RELOAD_FROM_VSX
-   UNSPEC_ADDG6S
+   UNSPEC_ADDG6S_SI
+   UNSPEC_ADDG6S_DI
UNSPEC_CDTBCD
UNSPEC_CBCDTD
UNSPEC_DIVE
UNSPEC_DIVEU
UNSPEC_UNPACK_128BIT
@@ -14495,15 +14496,24 @@ (define_peephole2
   operands[5] = change_address (mem, mode, new_addr);
 })

 
 ;; Miscellaneous ISA 2.06 (power7) instructions
-(define_insn "addg6s"
+(define_insn "addg6s_si"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:SI 1 "register_operand" "r")
(match_operand:SI 2 "register_operand" "r")]
-  UNSPEC_ADDG6S))]
+  UNSPEC_ADDG6S_SI))]
+  "TARGET_POPCNTD"
+  "addg6s %0,%1,%2"
+  [(set_attr "type" "integer")])
+
+(define_insn "addg6s_di"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "r")
+   (match_operand:DI 2 "register_operand" "r")]
+  

Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread will schmidt via Gcc-patches
On Thu, 2022-03-10 at 13:49 -0600, Segher Boessenkool wrote:
> On Thu, Mar 10, 2022 at 10:44:52AM -0600, will schmidt wrote:
> > On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> > > --- a/gcc/config/rs6000/rs6000-cpus.def
> > > +++ b/gcc/config/rs6000/rs6000-cpus.def
> > > @@ -43,9 +43,7 @@
> > >| OPTION_MASK_ALTIVEC  
> > > \
> > >| OPTION_MASK_VSX)
> > > 
> > > -/* For now, don't provide an embedded version of ISA 2.07.  Do
> > > not set power8
> > > -   fusion here, instead set it in rs6000.cc if we are tuning for
> > > a power8
> > > -   system.  */
> > > +/* For now, don't provide an embedded version of ISA 2.07.  */
> > 
> > ok.  (as far as removing the comment, I'm not clear what the
> > remaining
> > comment is telling me, but thats outside of the scope of this
> > patch).
> 
> It is saying there is nothing that implements Book III-E of ISA 2.07
> (nothing in GCC, but no actual CPU either).  Or Category: Embedded
> even
> maybe :-)

Lol, Ok.  The small-e in embedded did not clue me in that this was
referring to the big-E Embedded category.  :-)

> It could be clearer perhaps, or just be removed completely; it might
> have been useful historically, but it isn't anymore really.


THanks,
-Will

> 
> 
> Segher



Re: [PATCH, V2] Eliminate power8 fusion options, use power8 tuning, PR target/102059

2022-03-10 Thread will schmidt via Gcc-patches
On Wed, 2022-03-09 at 22:49 -0500, Michael Meissner wrote:
> Eliminate power8 fusion options, use power8 tuning, PR target/102059

Hi,

> 
> The power8 fusion support used to be set automatically when -mcpu=power8 or
> -mtune=power8 was used, and it was cleared for other cpu's.  However, if you
> used the target attribute or target #pragma to change the default cpu type or
> tuning, you would get an error that a target specifiction option mismatch
> occurred.
> 
specification. 
(ok :-)


> This occurred because the rs6000_can_inline_p function just compares the ISA
> bits between the called inline function and the caller.  If the ISA flags of
> the called function is not a subset of the ISA flags of the caller, we won't 
> do
> the inlinging.  When a power9 or power10 function inlines a function that is
> explicitly compiled for power8, the power8 function has the power8 fusion bits
> set and the power9 or power10 functions do not have the fusion bits set.
> 

inlining.


> This code removes the -mpower8-fusion and -mpower8-fusion-sign options, and
> only enables power8 fusion if we are tuning for a power8.  Power8 sign fusion
> is only enabled if we are tuning for a power8 and we have -O3 optimization or
> higher.
> 
> I left the options -mno-power8-fusion and -mno-power8-fusion-sign in 
> rs6000.opt
> and they don't issue a warning.  If the user explicitly used -mpower8-fusion 
> or
> -mpower8-fusion-sign, then they will get a warning that the swtich has been
> removed.
> 

switch


> Similarly, I left in the pragma target and attribute target support for the
> fusion options, but they don't do anything now.  This is because I believe the
> customer who encountered this problem now is explicitly setting the
> no-power8-fusion option in the pragma or attribute to avoid the warning.
> 
> I have tested this on the following systems, and they all bootstraps fine and
> there were no regressions in the test suite:
> 
> big endian power8 (both 64-bit and 32-bit)
> little endian power9
> little endian power10
> 

ok.

> Can I check this patch into the current master branch for GCC and after a
> cooling period check in the patch to the GCC 11 and GCC 10 branches.  The
> customer is currently using GCC 10.
> 
> 2022-03-09   Michael Meissner  
> 
> gcc/
>   PR target/102059
>   * config/rs6000/rs6000-cpus.def (OTHER_FUSION_MASKS): Delete.
>   (ISA_3_0_MASKS_SERVER): Don't clear the fusion masks.
>   (POWERPC_MASKS): Remove OPTION_MASK_P8_FUSION.

ok

>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Delete code that set the power8 fusion options automatically.
>   (rs6000_opt_masks): Allow #pragma target and attribute target to set
>   power8-fusion and power8-fusion-sign, but these no longer represent
>   options that the user can set.
>   (rs6000_print_options_internal): Skip printing nop options.

ok


>   * config/rs6000/rs6000.h (TARGET_P8_FUSION): New macro.
>   (TARGET_P8_FUSION_SIGN): Likewise.
>   (MASK_P8_FUSION): Delete.

ok


>   * config/rs6000/rs6000.opt (-mpower8-fusion): Recognize the option but
>   ignore the no form and warn that the option was removed for the regular
>   form.
>   (-mpower8-fusion-sign): Likewise.

ok

>   * doc/invoke.texi (RS/6000 and PowerPC Options): Delete -mpower8-fusion
>   and -mpower8-fusion-sign.

This change removes the -mpower8-fusion and -mno-power8-fusion options,
There is not a direct reference to -mpower8-fusion-sign in the change
here.  It may be an implied removal, but not immediately obvious to me.


> 
> gcc/testsuite/
>   PR target/102059
>   * gcc.dg/lto/pr102059-1_0.c: Remove -mno-power8-fusion.
>   * gcc.dg/lto/pr102059-2_0.c: Likewise.
>   * gcc.target/powerpc/pr102059-3.c: Likewise.
>   * gcc.target/powerpc/pr102059-4.c: New test.

ok

> ---
>  gcc/config/rs6000/rs6000-cpus.def | 22 +++--
>  gcc/config/rs6000/rs6000.cc   | 49 +--
>  gcc/config/rs6000/rs6000.h| 14 +-
>  gcc/config/rs6000/rs6000.opt  | 19 +--
>  gcc/doc/invoke.texi   | 13 +
>  gcc/testsuite/gcc.dg/lto/pr102059-1_0.c   |  2 +-
>  gcc/testsuite/gcc.dg/lto/pr102059-2_0.c   |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-3.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 23 +
>  9 files changed, 75 insertions(+), 71 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 963947f6939..a05b2d8c41a 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -43,9 +43,7 @@
>| OPTION_MASK_ALTIVEC  \
>| OPTION_MASK_VSX)
> 
> -/* For now, don't provide an embedded version of ISA 2.07.  Do 

Re: [PATCH] Optimize signed DImode -> TImode on power10, PR target/104698

2022-03-01 Thread will schmidt via Gcc-patches
On Mon, 2022-02-28 at 22:21 -0500, Michael Meissner wrote:
> Optimize signed DImode -> TImode on power10, PR target/104698.
> 

Hi,
  Logic seems OK to me, a few suggestions on the comments intermixed
below.  As always, i defer if there are counter arguments. :-)


> On power10, GCC tries to optimize the signed conversion from DImode to
> TImode by using the vextsd2q instruction.  However to generate this
> instruction, it would have to generate 3 direct moves (1 from the GPR
> registers to the altivec registers, and 2 from the altivec registers to
> the GPR register).
> 
> This patch adds code back in to use the shift right immediate instruction
> to do the conversion if the target/source is GPR registers.


Perhaps drop "back in".   If it's necessary to call out a previous
commit that removed the code for whatever reason, certainly do so. 
It's not clear from context if that was the case.


> 
> 2022-02-28   Michael Meissner  
> 
> gcc/
>   PR target/104698
>   * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
>   (extendditi2): Replace with code to deal with both GPR registers
>   and with altivec registers.

Perhaps enhance with 
(extendditi2):  Convert from define_expand to
define_insn_and_split.  Replace with code ...


> ---
>  gcc/config/rs6000/vsx.md | 73 
>  1 file changed, 52 insertions(+), 21 deletions(-)
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index b53de103872..62464f67f4d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5023,15 +5023,58 @@ (define_expand "vsignextend_si_v2di"
>DONE;
>  })
> 
> -;; ISA 3.1 vector sign extend
> -;; Move DI value from GPR to TI mode in VSX register, word 1.
> -(define_insn "mtvsrdd_diti_w1"
> -  [(set (match_operand:TI 0 "register_operand" "=wa")
> - (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> -  UNSPEC_MTVSRD_DITI_W1))]
> -  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> -  "mtvsrdd %x0,0,%1"
> -  [(set_attr "type" "vecmove")])
> +;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets.  
> If
> +;; the register allocator prefers the GPRs, we won't have to move the value 
> to
> +;; the altivec registers, do the vextsd2q instruction and move it back.  If 
> we
> +;; aren't compiling for 64-bit power10, don't provide the service and let the
> +;; machine independent code handle the extension.

So, the ".. we won't have to ..." applies to the altivec target path
here?   Describing in a way that indicates what code doesn't do doesn't
seem right. 
If so, and perhaps even if not,  i suggest rearranging the
comment slightly so it can be read as an either or.  
If the register
allocator prefers the GPRS, ... 
Otherwise, for altivec registers we dothe vextsd2q ... 


> +(define_insn_and_split "extendditi2"
> +  [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
> + (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
> +   (clobber (reg:DI CA_REGNO))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"
> +  "#"
> +  "&& reload_completed"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  int dest_regno = reg_or_subregno (dest);
> +
> +  /* Handle conversion to GPR registers.  Load up the low part and then do
> + a sign extension to the upper part.  */
> +  if (INT_REGNO_P (dest_regno))
> +{
> +  rtx dest_hi = gen_highpart (DImode, dest);
> +  rtx dest_lo = gen_lowpart (DImode, dest);
> +
> +  emit_move_insn (dest_lo, src);
> +  emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));
> +  DONE;
> +}
ok

> +
> +  /* For conversion to Altivec register, generate either a splat operation or
> + a load rightmost double word instruction.  Both instructions gets the
> + DImode value into the lower 64 bits, and then do the vextsd2q
> + instruction.  */

consider   s/instruction. Both instructions gets/to get/

> +  else if (ALTIVEC_REGNO_P (dest_regno))
> +{
> +  if (MEM_P (src))
> + emit_insn (gen_vsx_lxvrdx (dest, src));
> +  else
> + {
> +   rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> +   emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
> + }
> +
> +  emit_insn (gen_extendditi2_vector (dest, dest));
> +  DONE;
> +}

ok

lgtm, thanks
-Will

> +
> +  else
> +gcc_unreachable ();
> +}
> +  [(set_attr "length" "8")])
> 
>  ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
>  (define_insn "extendditi2_vector"
> @@ -5042,18 +5085,6 @@ (define_insn "extendditi2_vector"
>"vextsd2q %0,%1"
>[(set_attr "type" "vecexts")])
> 
> -(define_expand "extendditi2"
> -  [(set (match_operand:TI 0 "gpc_reg_operand")
> - (sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> -  "TARGET_POWER10"
> -  {
> -/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits.  
> */
> -rtx temp = gen_reg_rtx (TImode);
> -

Re: [PATCH, 11 backport] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-02-11 Thread Bill Schmidt via Gcc-patches
Fine.  I withdraw the patch request, and will remove my name from
the bugzilla.  Somebody else can deal with it.  I have more important
things to worry about.

Bill

On 2/11/22 1:31 AM, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Feb 10, 2022 at 04:28:02PM -0600, Bill Schmidt wrote:
>> On 2/10/22 4:11 PM, Segher Boessenkool wrote:
 No, trunk has this, for example:

   const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
     VCLZLSBB_V16QI vctzlsbb_v16qi {endian}
>>> I see this on trunk:
>>>
>>>   const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
>>> VCLZLSBB_V16QI vclzlsbb_v16qi {}
>>>
>>> Oh, you changed it?  Please fix it, then.
>> In a patch you approved, yes.
> Yes, I missed it.  That is not an argument that it would be good or
> should not be change.
>
>> I don't really understand why you want
>> it changed now.
> Because it is wrong.
>
>> You must not be looking at the most recent trunk revision.
> Indeed I haven't been able to update master for a week or so, it does
> not bootstrap, as we have talked about.
>
 Throughout the new builtin infrastructure, the defaults are set for
 little-endian, and the "endian" flag changes behavior for big-endian.
>>> That is a big mistake.  There are many machine instructions  that are
>>> *always* big-endian (most even!), and none that are always
>>> little-endian.  So this should be fixed, sooner rather than later :-(
>> That does not seem like a good idea in stage 4 to me.  That requires
>> yet another patch to reverse a bunch of other things unnecessarily.
> Things that were added in stage 4, a few days ago even.  Things that are
> broken and wrong.  Things I do not want to have to release with and deal
> with all the pain of having broken released versions.
>
>> This is a purely arbitrary choice.
> No, it is not.  It flies in the face of consistency.
>
>> The endian flag is only used when
>> a built-in function must have one behavior for big-endian, and another
>> behavior for little-endian.  Which one is chosen as the default is
>> absolutely arbitrary.
> The one that corresponds to the name should be the default.  I don't see
> how you can argue otherwise.
>
>> When we expand the built-in we will either
>> accept the default or change to the other.  The existence of machine
>> instructions that are only big-endian has nothing to do with the case;
>> what matters is the existence of built-in functions that have two
>> behaviors.
> Everything in our backend is BE by default, just like everything in the
> architecture is.  Yes, LE works almost as well (or just as well) in most
> places, but everything is named assuming BE.  This consistency is hugely
> important, without it the reader will not understand things as well and
> as easily.
>
 That's something that should be fixed, I guess, but it's orthogonal
 to this patch.
>>> Fixing it later is more work :-(
>>>
>>> Please at least open a bug report for it.
>> I can do that.
> Thanks!
>
>>> The other things need fixing before the patch is okay.
>> I'd ask you to reconsider, as explained above.
> It is purely an implementation thing, and it is completely trivial to
> do.  If you truly are afraid of breaking things (you should not be), it
> is marginally acceptable to do this as the very first thing in stage 1.
>
> Consistency matters.  Naming matters.  These shape how we think about
> things.
>
>
> Segher


Re: [PATCH, 11 backport] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-02-10 Thread Bill Schmidt via Gcc-patches
Hi!

On 2/10/22 4:11 PM, Segher Boessenkool wrote:
> On Thu, Feb 10, 2022 at 03:17:05PM -0600, Bill Schmidt wrote:
  /* 1 argument vector functions added in ISA 3.0 (power9). */
 -BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi",CONST,  vclzlsbb_v16qi)
 -BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",  CONST,  vclzlsbb_v8hi)
 -BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",  CONST,  vclzlsbb_v4si)
 -BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi",CONST,  vctzlsbb_v16qi)
 -BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",  CONST,  vctzlsbb_v8hi)
 -BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",  CONST,  vctzlsbb_v4si)
 +BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi",CONST,  vctzlsbb_v16qi)
 +BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",  CONST,  vctzlsbb_v8hi)
 +BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",  CONST,  vctzlsbb_v4si)
 +BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi",CONST,  vclzlsbb_v16qi)
 +BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",  CONST,  vclzlsbb_v8hi)
 +BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",  CONST,  vclzlsbb_v4si)
>>> Please change the default to be equal to the builtin name, so, the BE
>>> version.  We do that everywhere else as well, and it makes a lot more
>>> sense (since everything in Power has BE numbering).
>>>
>>> The trunk version has this correct afaics?
>> No, trunk has this, for example:
>>
>>   const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
>>     VCLZLSBB_V16QI vctzlsbb_v16qi {endian}
> I see this on trunk:
>
>   const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
> VCLZLSBB_V16QI vclzlsbb_v16qi {}
>
> Oh, you changed it?  Please fix it, then.

In a patch you approved, yes.  I don't really understand why you want
it changed now.  You must not be looking at the most recent trunk
revision.

>
>> Throughout the new builtin infrastructure, the defaults are set for
>> little-endian, and the "endian" flag changes behavior for big-endian.
> That is a big mistake.  There are many machine instructions  that are
> *always* big-endian (most even!), and none that are always
> little-endian.  So this should be fixed, sooner rather than later :-(

That does not seem like a good idea in stage 4 to me.  That requires
yet another patch to reverse a bunch of other things unnecessarily.

This is a purely arbitrary choice.  The endian flag is only used when
a built-in function must have one behavior for big-endian, and another
behavior for little-endian.  Which one is chosen as the default is
absolutely arbitrary.  When we expand the built-in we will either
accept the default or change to the other.  The existence of machine
instructions that are only big-endian has nothing to do with the case;
what matters is the existence of built-in functions that have two
behaviors.

  /* { dg-require-effective-target powerpc_p9vector_ok } */
  /* { dg-options "-mdejagnu-cpu=power9" } */
 +/* { dg-additional-options "-mbig" { target powerpc64le-*-* } } */
>>> You don't need the target clause, if it already is BE by default it does
>>> not do anything to add it redundantly.
>>>
>>> But this is wrong anyway: the name of the target triple does not say
>>> whether we are BE or LE.  Instead you should use the be or le selectors.
>>> But again, just add -mbig always.
>> This was added by David Edelsohn to the trunk version of the patch, because
>> -mbig actually is not supported on all subtargets.  (I found that quite
>> surprising also.)
> Huh.  Yeah I think I encountered that before.
>
> So this is because these options are in sysv4.opt .
>
>> Apparently this doesn't work on AIX, for example.  But 
>> -mlittle works everywhere.  Go figure.
> ... and -mlittle is exactly the same?  Wtw.
>
> I only looked at the .opt files, maybe one of them is handled directly,
> or more likely in specs?  And not symmetrically?
>
>> That's something that should be fixed, I guess, but it's orthogonal
>> to this patch.
> Fixing it later is more work :-(
>
> Please at least open a bug report for it.

I can do that.

>
>
> The other things need fixing before the patch is okay.

I'd ask you to reconsider, as explained above.

Thanks,
Bill

>
>
> Segher


Re: [PATCH, 11 backport] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-02-10 Thread Bill Schmidt via Gcc-patches
Hi!

On 2/10/22 2:50 PM, Segher Boessenkool wrote:
> On Thu, Feb 10, 2022 at 12:22:28PM -0600, Bill Schmidt wrote:
>> This is a backport from mainline 3f30f2d1dbb3228b8468b26239fe60c2974ce2ac.
>> These built-ins were misimplemented as always having big-endian semantics.
>>
>> Because the built-in infrastructure has changed, the modifications to the
>> source are different but achieve the same purpose.  The modifications to
>> the test suite are identical (after fixing the issue with -mbig that David
>> pointed out with the original patch).
>>  /* 1 argument vector functions added in ISA 3.0 (power9). */
>> -BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi",  CONST,  vclzlsbb_v16qi)
>> -BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",CONST,  vclzlsbb_v8hi)
>> -BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",CONST,  vclzlsbb_v4si)
>> -BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi",  CONST,  vctzlsbb_v16qi)
>> -BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",CONST,  vctzlsbb_v8hi)
>> -BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",CONST,  vctzlsbb_v4si)
>> +BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi",  CONST,  vctzlsbb_v16qi)
>> +BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",CONST,  vctzlsbb_v8hi)
>> +BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",CONST,  vctzlsbb_v4si)
>> +BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi",  CONST,  vclzlsbb_v16qi)
>> +BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",CONST,  vclzlsbb_v8hi)
>> +BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",CONST,  vclzlsbb_v4si)
> Please change the default to be equal to the builtin name, so, the BE
> version.  We do that everywhere else as well, and it makes a lot more
> sense (since everything in Power has BE numbering).
>
> The trunk version has this correct afaics?

No, trunk has this, for example:

  const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
    VCLZLSBB_V16QI vctzlsbb_v16qi {endian}

So the backport matches what is on trunk.  

Throughout the new builtin infrastructure, the defaults are set for
little-endian, and the "endian" flag changes behavior for big-endian.

>
>> --- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
>> @@ -1,6 +1,7 @@
>>  /* { dg-do compile { target { powerpc*-*-* } } } */
> (Delete the redundant target clause when modifying any testcase, please).

Okay.
>
>>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>>  /* { dg-options "-mdejagnu-cpu=power9" } */
>> +/* { dg-additional-options "-mbig" { target powerpc64le-*-* } } */
> You don't need the target clause, if it already is BE by default it does
> not do anything to add it redundantly.
>
> But this is wrong anyway: the name of the target triple does not say
> whether we are BE or LE.  Instead you should use the be or le selectors.
> But again, just add -mbig always.

This was added by David Edelsohn to the trunk version of the patch, because
-mbig actually is not supported on all subtargets.  (I found that quite
surprising also.)  Apparently this doesn't work on AIX, for example.  But 
-mlittle works everywhere.  Go figure.

That's something that should be fixed, I guess, but it's orthogonal
to this patch.

Thanks!
Bill

>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile { target { powerpc*-*-* } } } */
>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power9 -mlittle" } */
> And here you do it correctly :-)
>
> Okay with those fixes (all happen a few times).  Thanks!
>
>
> Segher


Re: [PATCH, 11 backport] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-02-10 Thread Bill Schmidt via Gcc-patches
Hi!

On 2/10/22 2:06 PM, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Feb 10, 2022 at 12:22:28PM -0600, Bill Schmidt wrote:
>> This is a backport from mainline 3f30f2d1dbb3228b8468b26239fe60c2974ce2ac.
>> These built-ins were misimplemented as always having big-endian semantics.
> What is different compared to the trunk version?

The infrastructure changed, so:

(1) Instead of changing the default pattern in rs6000-builtins.def, I have
to change it in rs6000-builtin.def.  (Note the missing "s".)

(2) Instead of having the endian change driven by an "endian" flag in the
built-in description in rs6000-builtins.def, I have to add some more ad-hoc
code in rs6000_expand_builtin to handle the change to the big-endian
pattern.

That's all.

Thanks!
Bill

>
>
> Segher


[PATCH, 11 backport] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-02-10 Thread Bill Schmidt via Gcc-patches
Hi!

This is a backport from mainline 3f30f2d1dbb3228b8468b26239fe60c2974ce2ac.
These built-ins were misimplemented as always having big-endian semantics.

Because the built-in infrastructure has changed, the modifications to the
source are different but achieve the same purpose.  The modifications to
the test suite are identical (after fixing the issue with -mbig that David
pointed out with the original patch).

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for releases/gcc-11?

Thanks!
Bill


2022-02-10  Bill Schmidt  

gcc/
PR target/95082
* config/rs6000/rs6000-builtin.def (VCLZLSBB_V16QI): Change default
pattern.
(VCLZLSBB_V8HI): Likewise.
(VCLZLSBB_V4SI): Likewise.
(VCTZLSBB_V16QI): Likewise.
(VCTZLSBB_V8HI): Likewise.
(VCTZLSBB_V4SI): Likewise.
* config/rs6000/rs6000-call.c (rs6000_expand_builtin): Make big-endian
adjustments to P9V_BUILTIN_VC[LT]ZLSBB_* built-in expansions.

gcc/testsuite/
PR target/95082
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c: Restrict to big-endian.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-1.c: Likewise.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c: New.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c: New.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-0.c: Restrict to big-endian.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-1.c: Likewise.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c: New.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c: New.
---
 gcc/config/rs6000/rs6000-builtin.def  | 12 
 gcc/config/rs6000/rs6000-call.c   | 30 +++
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c |  1 +
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-1.c |  1 +
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c | 15 ++
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c | 15 ++
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-0.c |  1 +
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-1.c |  1 +
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c | 15 ++
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c | 15 ++
 10 files changed, 100 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 6270444ef70..b28ee02070a 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2678,12 +2678,12 @@ BU_P9V_64BIT_AV_X (STXVL,   "stxvl",MISC)
 BU_P9V_64BIT_AV_X (XST_LEN_R,  "xst_len_r",MISC)
 
 /* 1 argument vector functions added in ISA 3.0 (power9). */
-BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi", CONST,  vclzlsbb_v16qi)
-BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",   CONST,  vclzlsbb_v8hi)
-BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",   CONST,  vclzlsbb_v4si)
-BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi", CONST,  vctzlsbb_v16qi)
-BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",   CONST,  vctzlsbb_v8hi)
-BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",   CONST,  vctzlsbb_v4si)
+BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi", CONST,  vctzlsbb_v16qi)
+BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",   CONST,  vctzlsbb_v8hi)
+BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",   CONST,  vctzlsbb_v4si)
+BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi", CONST,  vclzlsbb_v16qi)
+BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",   CONST,  vclzlsbb_v8hi)
+BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",   CONST,  vclzlsbb_v4si)
 
 /* Built-in support for Power9 "VSU option" string operations includes
new awareness of the "vector compare not equal" (vcmpneb, vcmpneb.,
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index ef20cb30388..27bb25fa4d8 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13221,6 +13221,36 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
}
   break;
 
+case P9V_BUILTIN_VCLZLSBB_V16QI:
+  if (BYTES_BIG_ENDIAN)
+   icode = CODE_FOR_vclzlsbb_v16qi;
+  break;
+
+case P9V_BUILTIN_VCLZLSBB_V8HI:
+  if (BYTES_BIG_ENDIAN)
+   icode = CODE_FOR_vclzlsbb_v8hi;
+  break;
+
+case P9V_BUILTIN_VCLZLSBB_V4SI:
+  if (BYTES_BIG_ENDIAN)
+   icode = CODE_FOR_vclzlsbb_v4si;
+  break;
+
+case P9V_BUILTIN_VCTZLSBB_V16QI:
+  if (BYTES_BIG_ENDIAN)
+   icode = CODE_FOR_vctzlsbb_v16qi;
+  break;
+
+case P9V_BUILTIN_VCTZLSBB_V8HI:
+  if (BYTES_BIG_ENDIAN)
+   icode = CODE_FOR_vctzlsbb_v8hi;
+  break;
+
+case P9V_BUILTIN_VCTZLSBB_V4SI:
+  if (BYTES_BIG_ENDIAN)
+   icode = CODE_FOR_vctzlsbb_v4si;
+  break;
+
  

[PATCH] rs6000: Rename vec_clrl and vec_clrr to agreed-upon names

2022-02-09 Thread Bill Schmidt via Gcc-patches
Hi!

After vec_clrl and vec_clrr were implemented and during review of the
documentation, it was agreed to change their names to vec_clr_first and
vec_clr_last to more clearly describe their bi-endian semantics.  ("Left"
and "right" are the wrong terms to be using.)  It looks like I neglected
to make that change, so fixing it now.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk, and for backport to gcc 11 after some burn-in?

Thanks!
Bill


2022-02-09  Bill Schmidt  

gcc/
* config/rs6000/rs6000-overload.def (VEC_CLR_FIRST): Rename from
VEC_CLRL.
(VEC_CLR_LAST): Rename from VEC_CLRR.

gcc/testsuite/
* gcc.target/powerpc/vec-clrl-0.c: Adjust to new names.
* gcc.target/powerpc/vec-clrl-1.c: Likewise.
* gcc.target/powerpc/vec-clrl-2.c: Likewise.
* gcc.target/powerpc/vec-clrl-3.c: Likewise.
* gcc.target/powerpc/vec-clrr-0.c: Likewise.
* gcc.target/powerpc/vec-clrr-1.c: Likewise.
* gcc.target/powerpc/vec-clrr-2.c: Likewise.
* gcc.target/powerpc/vec-clrr-3.c: Likewise.
---
 gcc/config/rs6000/rs6000-overload.def | 12 ++--
 gcc/testsuite/gcc.target/powerpc/vec-clrl-0.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/vec-clrl-1.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/vec-clrl-2.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/vec-clrl-3.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/vec-clrr-0.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/vec-clrr-1.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/vec-clrr-2.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/vec-clrr-3.c |  4 ++--
 9 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 44e2945aaa0..0b68cc3c3b2 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -557,16 +557,16 @@
   vuc __builtin_vec_vcipherlast_be (vuc, vuc);
 VCIPHERLAST_BE
 
-[VEC_CLRL, vec_clrl, __builtin_vec_clrl]
-  vsc __builtin_vec_clrl (vsc, unsigned int);
+[VEC_CLR_FIRST, vec_clr_first, __builtin_vec_clr_first]
+  vsc __builtin_vec_clr_first (vsc, unsigned int);
 VCLRLB  VCLRLB_S
-  vuc __builtin_vec_clrl (vuc, unsigned int);
+  vuc __builtin_vec_clr_first (vuc, unsigned int);
 VCLRLB  VCLRLB_U
 
-[VEC_CLRR, vec_clrr, __builtin_vec_clrr]
-  vsc __builtin_vec_clrr (vsc, unsigned int);
+[VEC_CLR_LAST, vec_clr_last, __builtin_vec_clr_last]
+  vsc __builtin_vec_clr_last (vsc, unsigned int);
 VCLRRB  VCLRRB_S
-  vuc __builtin_vec_clrr (vuc, unsigned int);
+  vuc __builtin_vec_clr_last (vuc, unsigned int);
 VCLRRB  VCLRRB_U
 
 ; We skip generating a #define because of the C-versus-C++ complexity
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-clrl-0.c 
b/gcc/testsuite/gcc.target/powerpc/vec-clrl-0.c
index d0b183ebfaf..df055c6535e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-clrl-0.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-clrl-0.c
@@ -5,11 +5,11 @@
 
 extern void abort (void);
 
-/* Vector string clear left-most bytes of unsigned char.  */
+/* Vector string clear first bytes of unsigned char.  */
 vector unsigned char
 clrl (vector unsigned char arg, int n)
 {
-  return vec_clrl (arg, n);
+  return vec_clr_first (arg, n);
 }
 
 /* { dg-final { scan-assembler {\mvclrlb\M} { target be } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-clrl-1.c 
b/gcc/testsuite/gcc.target/powerpc/vec-clrl-1.c
index 43ab32c0278..692f83e033b 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-clrl-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-clrl-1.c
@@ -7,11 +7,11 @@
 
 extern void abort (void);
 
-/* Vector string clear left-most bytes of unsigned char.  */
+/* Vector string clear first bytes of unsigned char.  */
 vector unsigned char
 clrl (vector unsigned char arg, int n)
 {
-  return vec_clrl (arg, n);
+  return vec_clr_first (arg, n);
 }
 
 int main (int argc, char *argv [])
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-clrl-2.c 
b/gcc/testsuite/gcc.target/powerpc/vec-clrl-2.c
index b9676b8b04c..ffecf432736 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-clrl-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-clrl-2.c
@@ -5,11 +5,11 @@
 
 extern void abort (void);
 
-/* Vector string clear left-most bytes of unsigned char.  */
+/* Vector string clear first bytes of unsigned char.  */
 vector signed char
 clrl (vector signed char arg, int n)
 {
-  return vec_clrl (arg, n);
+  return vec_clr_first (arg, n);
 }
 
 /* { dg-final { scan-assembler {\mvclrlb\M} { target be } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-clrl-3.c 
b/gcc/testsuite/gcc.target/powerpc/vec-clrl-3.c
index 0ae5abcee50..456f655e7aa 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-clrl-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-clrl-3.c
@@ -7,11 +7,11 @@
 
 extern void abort (void);
 
-/* Vector string clear left-most bytes of unsigned char.  */
+/* Vector string clear first bytes of unsigned char.  */
 vector 

[PATCH] rs6000: Correct function prototypes for vec_replace_unaligned

2022-02-08 Thread Bill Schmidt via Gcc-patches
Hi!

Due to a pasto error in the documentation, vec_replace_unaligned was
implemented with the same function prototypes as vec_replace_elt.  It was
intended that vec_replace_unaligned always specify output vectors as having
type vector unsigned char, to emphasize that elements are potentially
misaligned by this built-in function.  This patch corrects the
misimplementation.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?  Eventually I would also like to backport it
to GCC 11, after burn-in.

Thanks!
Bill


2022-02-04  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtins.def (VREPLACE_UN_UV2DI): Change
function prototype.
(VREPLACE_UN_UV4SI): Likewise.
(VREPLACE_UN_V2DF): Likewise.
(VREPLACE_UN_V2DI): Likewise.
(VREPLACE_UN_V4SF): Likewise.
(VREPLACE_UN_V4SI): Likewise.
* config/rs6000/rs6000-overload.def (VEC_REPLACE_UN): Change all
function prototypes.
* config/rs6000/vsx.md (vreplace_un_): Remove define_expand.
(vreplace_un_): New define_insn.

gcc/testsuite/
* gcc.target/powerpc/vec-replace-word-runnable.c: Handle expected
prototypes for each call to vec_replace_unaligned.
---
 gcc/config/rs6000/rs6000-builtins.def | 16 ++--
 gcc/config/rs6000/rs6000-overload.def | 12 -
 gcc/config/rs6000/vsx.md  | 25 ---
 .../powerpc/vec-replace-word-runnable.c   | 20 ++-
 4 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 5c988cc1152..846c0bafd45 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3387,25 +3387,25 @@
   const vull __builtin_altivec_vpextd (vull, vull);
 VPEXTD vpextd {}
 
-  const vull __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
-  const int<4>);
+  const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
+ const int<4>);
 VREPLACE_UN_UV2DI vreplace_un_v2di {}
 
-  const vui __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \
+  const vuc __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \
  const int<4>);
 VREPLACE_UN_UV4SI vreplace_un_v4si {}
 
-  const vd __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>);
+  const vuc __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>);
 VREPLACE_UN_V2DF vreplace_un_v2df {}
 
-  const vsll __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \
- const int<4>);
+  const vuc __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \
+const int<4>);
 VREPLACE_UN_V2DI vreplace_un_v2di {}
 
-  const vf __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>);
+  const vuc __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>);
 VREPLACE_UN_V4SF vreplace_un_v4sf {}
 
-  const vsi __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>);
+  const vuc __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>);
 VREPLACE_UN_V4SI vreplace_un_v4si {}
 
   const vull __builtin_altivec_vreplace_uv2di (vull, unsigned long long, \
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 49a6104ddd2..44e2945aaa0 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3059,17 +3059,17 @@
 VREPLACE_ELT_V2DF
 
 [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
-  vui __builtin_vec_replace_un (vui, unsigned int, const int);
+  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
 VREPLACE_UN_UV4SI
-  vsi __builtin_vec_replace_un (vsi, signed int, const int);
+  vuc __builtin_vec_replace_un (vsi, signed int, const int);
 VREPLACE_UN_V4SI
-  vull __builtin_vec_replace_un (vull, unsigned long long, const int);
+  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
 VREPLACE_UN_UV2DI
-  vsll __builtin_vec_replace_un (vsll, signed long long, const int);
+  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
 VREPLACE_UN_V2DI
-  vf __builtin_vec_replace_un (vf, float, const int);
+  vuc __builtin_vec_replace_un (vf, float, const int);
 VREPLACE_UN_V4SF
-  vd __builtin_vec_replace_un (vd, double, const int);
+  vuc __builtin_vec_replace_un (vd, double, const int);
 VREPLACE_UN_V2DF
 
 [VEC_REVB, vec_revb, __builtin_vec_revb]
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 2f5a2f7828d..b53de103872 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4197,21 +4197,6 @@ (define_expand "vreplace_elt_"
  }
 [(set_attr "type" "vecsimple")])
 

Re: [PATCH v3] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2022-02-08 Thread Bill Schmidt via Gcc-patches
Hi!

>From some discussion today, I think we want to limit the scope of
this patch to just the power8-fusion flag that's causing trouble for
now, given stage 4.  We've talked about making power8-fusion a do-
nothing flag, since it doesn't add much benefit now and probably
shouldn't be a separate flag anyway.  Having it as a meaningless
flag makes it more palatable to add an exception for it in the
inlining path.

Others, feel free to weigh in.

Thanks,
Bill

On 1/5/22 1:34 AM, Kewen.Lin wrote:
> Hi,
>
> This patch is to fix the inconsistent behaviors for non-LTO mode
> and LTO mode.  As Martin pointed out, currently the function
> rs6000_can_inline_p simply makes it inlinable if callee_tree is
> NULL, but it's unexpected, we should use the command line options
> from target_option_default_node as default.
>
> It replaces rs6000_isa_flags with target_option_default_node when
> caller_tree is NULL since it's more straightforward and doesn't
> suffer from some bug not to keep rs6000_isa_flags as default.
>
> It also extends the scope of the check for the case that callee
> has explicit set options, inlining in test case pr102059-5.c can
> happen unexpectedly before, it's fixed accordingly.
>
> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
> can be neglected for always inlining, this patch also takes some
> flags when the callee is attributed by always_inline.
>
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586112.html
>
> This patch is one re-post of this updated version[1] and also
> rebased and adjusted on top of the related commit r12-6219.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586296.html
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
>   PR target/102059
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
>   target_option_default_node and consider always_inline_safe flags.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/102059
>   * gcc.target/powerpc/pr102059-4.c: New test.
>   * gcc.target/powerpc/pr102059-5.c: New test.
>   * gcc.target/powerpc/pr102059-6.c: New test.
>   * gcc.target/powerpc/pr102059-7.c: New test.
>   * gcc.target/powerpc/pr102059-8.c: New test.
>   * gcc.dg/lto/pr102059-1_0.c: Remove unneeded option.
>
>


Re: [PATCH] rs6000: Add support for vmsumcud and vec_msumc

2022-02-08 Thread Bill Schmidt via Gcc-patches


On 2/8/22 9:45 AM, Segher Boessenkool wrote:
> On Mon, Feb 07, 2022 at 10:06:36PM -0600, Bill Schmidt wrote:
>> On 2/7/22 5:05 PM, Segher Boessenkool wrote:
>>> On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote:
 I observed recently that a couple of Power10 instructions and built-in 
 functions
 were somehow not implemented.  This patch adds one of them (vmsumcud).  
 Although
 this isn't normally stage-4 material, this is really simple and carries no
 discernible risk, so I hope it can be considered.
>>> But what is the advantage?  That will be very tiny as well, afaics?
>>>
>>> Ah, this implements a builtin as well.  But that builtin is not in the
>>> PVIPR, so no one yet uses it most likely?
>> It's in the yet unpublished version of PVIPR that adds ISA 3.1 support,
>> currently awaiting public review.  It should have been implemented with
>> the rest of the ISA 3.1 built-ins.  (There are two more that were missed
>> as well, which I haven't yet addressed.)
> Ugh.  Too much process, not enough speed.
>
 +;; vmsumcud
 +(define_insn "vmsumcud"
 +[(set (match_operand:V1TI 0 "register_operand" "+v")
 +  (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
 +(match_operand:V2DI 2 "register_operand" "v")
 +  (match_operand:V1TI 3 "register_operand" "v")]
 + UNSPEC_VMSUMCUD))]
 +  "TARGET_POWER10"
 +  "vmsumcud %0,%1,%2,%3"
 +  [(set_attr "type" "vecsimple")]
 +)
>>> This can be properly described in RTL instead of using an unspec.  This
>>> is much preferable.  I would say compare to maddhd[u], but those insns
>>> aren't implemented either (maddld is though).
>> Is it?  Note that vmsumcud produces the carry out of the final
>> result, not the result itself.  I couldn't immediately see how
>> to express this in RTL.
> It produces thw top 128 bits of the (infinitely precise) result.  But
> yeah that requires an OImode here (for the temp itself), and we do not
> have that in the backend yet.
>
>> The full operation multiplies the corresponding lanes of each
>> doubleword of arguments 1 and 2, adds them together with the
>> 128-bit value in argument 3, and produces the carry out of the
>> result as a 128-bit value in the result.  I think I'd need to
>> have a 256-bit mode to express this properly in RTL, right?
> Not if you actually calculate the carry, instead of computing the
> 256-bit result and truncating it.  But this is very unwieldy (it
> would be fine if adding just two datums, but here there are three).
>
> Should the type be vecsimple?  Don't we have a type for multiplications?
> Hrm it looks like we use veccomplex usually.
>
> Okay for trunk with that taken care of.  Thanks!

Thanks!  Revised as requested and pushed as r12-7110 (943d631abdd7be623c).

Bill

>
>
> Segher


Re: [PATCH] rs6000: Add support for vmsumcud and vec_msumc

2022-02-07 Thread Bill Schmidt via Gcc-patches
Hi!

On 2/7/22 5:05 PM, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote:
>> I observed recently that a couple of Power10 instructions and built-in 
>> functions
>> were somehow not implemented.  This patch adds one of them (vmsumcud).  
>> Although
>> this isn't normally stage-4 material, this is really simple and carries no
>> discernible risk, so I hope it can be considered.
> But what is the advantage?  That will be very tiny as well, afaics?
>
> Ah, this implements a builtin as well.  But that builtin is not in the
> PVIPR, so no one yet uses it most likely?

It's in the yet unpublished version of PVIPR that adds ISA 3.1 support,
currently awaiting public review.  It should have been implemented with
the rest of the ISA 3.1 built-ins.  (There are two more that were missed
as well, which I haven't yet addressed.)

>> gcc/
>>  * config/rs6000/rs6000-builtins.def (VMSUMCUD): New.
>>  * config/rs6000/rs6000-overload.def (VEC_MSUMC): New.
>>  * config/rs6000/vsx.md (UNSPEC_VMSUMCUD): New constant.
>>  (vmsumcud): New define_insn.
>>
>> gcc/testsuite/
>>  * gcc.target/powerpc/vec-msumc.c: New test.
>> +;; vmsumcud
>> +(define_insn "vmsumcud"
>> +[(set (match_operand:V1TI 0 "register_operand" "+v")
>> +  (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
>> +(match_operand:V2DI 2 "register_operand" "v")
>> +(match_operand:V1TI 3 "register_operand" "v")]
>> +   UNSPEC_VMSUMCUD))]
>> +  "TARGET_POWER10"
>> +  "vmsumcud %0,%1,%2,%3"
>> +  [(set_attr "type" "vecsimple")]
>> +)
> This can be properly described in RTL instead of using an unspec.  This
> is much preferable.  I would say compare to maddhd[u], but those insns
> aren't implemented either (maddld is though).

Is it?  Note that vmsumcud produces the carry out of the final
result, not the result itself.  I couldn't immediately see how
to express this in RTL.

The full operation multiplies the corresponding lanes of each
doubleword of arguments 1 and 2, adds them together with the
128-bit value in argument 3, and produces the carry out of the
result as a 128-bit value in the result.  I think I'd need to
have a 256-bit mode to express this properly in RTL, right?

Thanks,
Bill

>
>
> Segher


[PATCH] rs6000: Add support for vmsumcud and vec_msumc

2022-02-07 Thread Bill Schmidt via Gcc-patches
Hi!

I observed recently that a couple of Power10 instructions and built-in functions
were somehow not implemented.  This patch adds one of them (vmsumcud).  Although
this isn't normally stage-4 material, this is really simple and carries no
discernible risk, so I hope it can be considered.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill


2022-02-07  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtins.def (VMSUMCUD): New.
* config/rs6000/rs6000-overload.def (VEC_MSUMC): New.
* config/rs6000/vsx.md (UNSPEC_VMSUMCUD): New constant.
(vmsumcud): New define_insn.

gcc/testsuite/
* gcc.target/powerpc/vec-msumc.c: New test.
---
 gcc/config/rs6000/rs6000-builtins.def|  3 ++
 gcc/config/rs6000/rs6000-overload.def|  4 ++
 gcc/config/rs6000/vsx.md | 13 +++
 gcc/testsuite/gcc.target/powerpc/vec-msumc.c | 39 
 4 files changed, 59 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-msumc.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index d0ea54d77e4..846c0bafd45 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3497,6 +3497,9 @@
   const signed int __builtin_altivec_vstrihr_p (vss);
 VSTRIHR_P vstrir_p_v8hi {}
 
+  const vuq __builtin_vsx_vmsumcud (vull, vull, vuq);
+VMSUMCUD vmsumcud {}
+
   const signed int __builtin_vsx_xvtlsbb_all_ones (vsc);
 XVTLSBB_ONES xvtlsbbo {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 5e38d597722..44e2945aaa0 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -2456,6 +2456,10 @@
   vuq __builtin_vec_msum (vull, vull, vuq);
 VMSUMUDM  VMSUMUDM_U
 
+[VEC_MSUMC, vec_msumc, __builtin_vec_msumc]
+  vuq __builtin_vec_msumc (vull, vull, vuq);
+VMSUMCUD
+
 [VEC_MSUMS, vec_msums, __builtin_vec_msums]
   vui __builtin_vec_msums (vus, vus, vui);
 VMSUMUHS
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 88053f11e29..e4904102526 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -372,6 +372,7 @@ (define_c_enum "unspec"
UNSPEC_REPLACE_UN
UNSPEC_VDIVES
UNSPEC_VDIVEU
+   UNSPEC_VMSUMCUD
UNSPEC_XXEVAL
UNSPEC_XXSPLTIW
UNSPEC_XXSPLTIDP
@@ -6615,3 +6616,15 @@ (define_split
   emit_move_insn (operands[0], tmp4);
   DONE;
 })
+
+;; vmsumcud
+(define_insn "vmsumcud"
+[(set (match_operand:V1TI 0 "register_operand" "+v")
+  (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
+(match_operand:V2DI 2 "register_operand" "v")
+   (match_operand:V1TI 3 "register_operand" "v")]
+  UNSPEC_VMSUMCUD))]
+  "TARGET_POWER10"
+  "vmsumcud %0,%1,%2,%3"
+  [(set_attr "type" "vecsimple")]
+)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-msumc.c 
b/gcc/testsuite/gcc.target/powerpc/vec-msumc.c
new file mode 100644
index 000..524a2225c6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-msumc.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target { power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#endif
+
+extern void abort (void);
+
+int
+main ()
+{
+  vector unsigned long long arg1, arg2;
+  vector unsigned __int128 arg3, result, expected;
+  unsigned __int128 c = (unsigned __int128) (-1); /* 2^128 - 1 */
+
+  arg1 = (vector unsigned long long) { 111ULL, 300ULL };
+  arg2 = (vector unsigned long long) { 700ULL, 222ULL };
+  arg3 = (vector unsigned __int128) { c };
+  expected = (vector unsigned __int128) { 1 };
+
+  result = vec_msumc (arg1, arg2, arg3);
+  if (result[0] != expected[0])
+{
+#if DEBUG
+  printf ("ERROR, expected %d, result %d\n",
+ (unsigned int) expected[0],
+ (unsigned int) result[0]);
+#else
+  abort ();
+#endif
+}
+
+  return 0;
+}
-- 
2.27.0




Re: [PATCH 7/8] rs6000: vec_neg built-ins wrongly require POWER8

2022-02-07 Thread Bill Schmidt via Gcc-patches
Hi Segher,

Thanks for all the reviews for this series!  I'd like to gently ping the last 
two patches.

BR,
Bill

On 1/28/22 11:50 AM, Bill Schmidt via Gcc-patches wrote:
> As the subject states.  Fixing this is accomplished by moving the built-ins
> to the correct stanzas, [altivec] and [vsx].
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this okay for trunk?
>
> Thanks,
> Bill
>
>
> 2022-01-27  Bill Schmidt  
>
> gcc/
>   * config/rs6000/rs6000-builtin.def (NEG_V16QI): Move to [altivec]
>   stanza.
>   (NEG_V4SF): Likewise.
>   (NEG_V4SI): Likewise.
>   (NEG_V8HI): Likewise.
>   (NEG_V2DF): Move to [vsx] stanza.
>   (NEG_V2DI): Likewise.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 36 +--
>  1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 2bb997a5279..c8f0cf332eb 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -410,6 +410,18 @@
>const vss __builtin_altivec_nabs_v8hi (vss);
>  NABS_V8HI nabsv8hi2 {}
>
> +  const vsc __builtin_altivec_neg_v16qi (vsc);
> +NEG_V16QI negv16qi2 {}
> +
> +  const vf __builtin_altivec_neg_v4sf (vf);
> +NEG_V4SF negv4sf2 {}
> +
> +  const vsi __builtin_altivec_neg_v4si (vsi);
> +NEG_V4SI negv4si2 {}
> +
> +  const vss __builtin_altivec_neg_v8hi (vss);
> +NEG_V8HI negv8hi2 {}
> +
>void __builtin_altivec_stvebx (vsc, signed long, void *);
>  STVEBX altivec_stvebx {stvec}
>
> @@ -1175,6 +1187,12 @@
>const vsll __builtin_altivec_nabs_v2di (vsll);
>  NABS_V2DI nabsv2di2 {}
>
> +  const vd __builtin_altivec_neg_v2df (vd);
> +NEG_V2DF negv2df2 {}
> +
> +  const vsll __builtin_altivec_neg_v2di (vsll);
> +NEG_V2DI negv2di2 {}
> +
>void __builtin_altivec_stvx_v2df (vd, signed long, void *);
>  STVX_V2DF altivec_stvx_v2df {stvec}
>
> @@ -2118,24 +2136,6 @@
>const vus __builtin_altivec_nand_v8hi_uns (vus, vus);
>  NAND_V8HI_UNS nandv8hi3 {}
>
> -  const vsc __builtin_altivec_neg_v16qi (vsc);
> -NEG_V16QI negv16qi2 {}
> -
> -  const vd __builtin_altivec_neg_v2df (vd);
> -NEG_V2DF negv2df2 {}
> -
> -  const vsll __builtin_altivec_neg_v2di (vsll);
> -NEG_V2DI negv2di2 {}
> -
> -  const vf __builtin_altivec_neg_v4sf (vf);
> -NEG_V4SF negv4sf2 {}
> -
> -  const vsi __builtin_altivec_neg_v4si (vsi);
> -NEG_V4SI negv4si2 {}
> -
> -  const vss __builtin_altivec_neg_v8hi (vss);
> -NEG_V8HI negv8hi2 {}
> -
>const vsc __builtin_altivec_orc_v16qi (vsc, vsc);
>  ORC_V16QI orcv16qi3 {}
>


[PATCH, committed] rs6000: Clean up ISA 3.1 documentation [PR100808]

2022-02-04 Thread Bill Schmidt via Gcc-patches
Hi!

PR100808 pointed out some trivial formatting issues with Power documentation
for basic ISA 3.1 built-in functions.  This patch cleans those up.

Tested on powerpc64le-linux-gnu, committed as obvious.

Thanks!
Bill


2022-02-04  Bill Schmidt  

gcc/
PR target/100808
* doc/extend.texi (Basic PowerPC Built-in Functions Available on ISA
3.1): Provide consistent type names.  Remove unnecessary semicolons.
Fix bad line breaks.
---
 gcc/doc/extend.texi | 71 +++--
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a961fc4e0a2..cb1b2b98ca8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18276,74 +18276,89 @@ The following built-in functions are available on 
Linux 64-bit systems
 that use a future architecture instruction set (@option{-mcpu=power10}):
 
 @smallexample
-@exdent unsigned long long int
-@exdent __builtin_cfuged (unsigned long long int, unsigned long long int)
+@exdent unsigned long long
+@exdent __builtin_cfuged (unsigned long long, unsigned long long)
 @end smallexample
 Perform a 64-bit centrifuge operation, as if implemented by the
 @code{cfuged} instruction.
 @findex __builtin_cfuged
 
 @smallexample
-@exdent unsigned long long int
-@exdent __builtin_cntlzdm (unsigned long long int, unsigned long long int)
+@exdent unsigned long long
+@exdent __builtin_cntlzdm (unsigned long long, unsigned long long)
 @end smallexample
 Perform a 64-bit count leading zeros operation under mask, as if
 implemented by the @code{cntlzdm} instruction.
 @findex __builtin_cntlzdm
 
 @smallexample
-@exdent unsigned long long int
-@exdent __builtin_cnttzdm (unsigned long long int, unsigned long long int)
+@exdent unsigned long long
+@exdent __builtin_cnttzdm (unsigned long long, unsigned long long)
 @end smallexample
 Perform a 64-bit count trailing zeros operation under mask, as if
 implemented by the @code{cnttzdm} instruction.
 @findex __builtin_cnttzdm
 
 @smallexample
-@exdent unsigned long long int
-@exdent __builtin_pdepd (unsigned long long int, unsigned long long int)
+@exdent unsigned long long
+@exdent __builtin_pdepd (unsigned long long, unsigned long long)
 @end smallexample
 Perform a 64-bit parallel bits deposit operation, as if implemented by the
 @code{pdepd} instruction.
 @findex __builtin_pdepd
 
 @smallexample
-@exdent unsigned long long int
-@exdent __builtin_pextd (unsigned long long int, unsigned long long int)
+@exdent unsigned long long
+@exdent __builtin_pextd (unsigned long long, unsigned long long)
 @end smallexample
 Perform a 64-bit parallel bits extract operation, as if implemented by the
 @code{pextd} instruction.
 @findex __builtin_pextd
 
 @smallexample
-@exdent vector signed __int128 vsx_xl_sext (signed long long, signed char *);
-@exdent vector signed __int128 vsx_xl_sext (signed long long, signed short *);
-@exdent vector signed __int128 vsx_xl_sext (signed long long, signed int *);
-@exdent vector signed __int128 vsx_xl_sext (signed long long, signed long long 
*);
-@exdent vector unsigned __int128 vsx_xl_zext (signed long long, unsigned char 
*);
-@exdent vector unsigned __int128 vsx_xl_zext (signed long long, unsigned short 
*);
-@exdent vector unsigned __int128 vsx_xl_zext (signed long long, unsigned int 
*);
-@exdent vector unsigned __int128 vsx_xl_zext (signed long long, unsigned long 
long *);
+@exdent vector signed __int128 vsx_xl_sext (signed long long, signed char *)
+
+@exdent vector signed __int128 vsx_xl_sext (signed long long, signed short *)
+
+@exdent vector signed __int128 vsx_xl_sext (signed long long, signed int *)
+
+@exdent vector signed __int128 vsx_xl_sext (signed long long, signed long long 
*)
+
+@exdent vector unsigned __int128 vsx_xl_zext (signed long long, unsigned char 
*)
+
+@exdent vector unsigned __int128 vsx_xl_zext (signed long long, unsigned short 
*)
+
+@exdent vector unsigned __int128 vsx_xl_zext (signed long long, unsigned int *)
+
+@exdent vector unsigned __int128 vsx_xl_zext (signed long long, unsigned long 
long *)
 @end smallexample
 
 Load (and sign extend) to an __int128 vector, as if implemented by the ISA 3.1
-@code{lxvrbx} @code{lxvrhx} @code{lxvrwx} @code{lxvrdx} instructions.
+@code{lxvrbx}, @code{lxvrhx}, @code{lxvrwx}, and  @code{lxvrdx} instructions.
 @findex vsx_xl_sext
 @findex vsx_xl_zext
 
 @smallexample
-@exdent void vec_xst_trunc (vector signed __int128, signed long long, signed 
char *);
-@exdent void vec_xst_trunc (vector signed __int128, signed long long, signed 
short *);
-@exdent void vec_xst_trunc (vector signed __int128, signed long long, signed 
int *);
-@exdent void vec_xst_trunc (vector signed __int128, signed long long, signed 
long long *);
-@exdent void vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned char *);
-@exdent void vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned short *);
-@exdent void vec_xst_trunc (vector 

[PATCH v3 1/8] rs6000: More factoring of overload processing

2022-02-03 Thread Bill Schmidt via Gcc-patches
Hi!

Although the previous patch was correct, the logic around what to do when
the number of arguments is wrong was still hard to understand.  It should
be better now.  I'm now explicitly counting the number of expected arguments
and comparing against that.  The way the argument list is represented ensures
there is always at least one element in the argument chain, by terminating
the chain with an argument type of void, which is why the previous logic was
so convoluted.

The revisions are in altivec_resolve_overloaded_builtin.  Otherwise the patch
is the same as before.  I hope this is much easier to read!  Bootstrapped and
tested on powerpc64le-linux-gnu.  Is this okay for trunk?

Original changelog message follows:

This patch continues the refactoring started with r12-6014.  I had previously
noted that the resolve_vec* routines can be further simplified by processing
the argument list earlier, so that all routines can use the arrays of arguments
and types.  I found that this was useful for some of the routines, but not for
all of them.

For several of the special-cased overloads, we don't specify all of the
possible type combinations in rs6000-overload.def, because the types don't
matter for the expansion we do.  For these, we can't use generic error message
handling when the number of arguments is incorrect, because the result is
misleading error messages that indicate argument types are wrong.

So this patch goes halfway and improves the factoring on the remaining special
cases, but leaves vec_splats, vec_promote, vec_extract, vec_insert, and
vec_step alone.

Thanks,
Bill


2022-02-02  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.cc (resolve_vec_mul): Accept args and types
parameters instead of arglist and nargs.  Simplify accordingly.  Remove
unnecessary test for argument count mismatch.
(resolve_vec_cmpne): Likewise.
(resolve_vec_adde_sube): Likewise.
(resolve_vec_addec_subec): Likewise.
(altivec_resolve_overloaded_builtin): Move overload special handling
after the gathering of arguments into args[] and types[] and the test
for correct number of arguments.  Don't perform the test for correct
number of arguments for certain special cases.  Call the other special
cases with args and types instead of arglist and nargs.
---
 gcc/config/rs6000/rs6000-c.cc | 304 ++
 1 file changed, 127 insertions(+), 177 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 145421ab8f2..15251efc209 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -939,37 +939,25 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 enum resolution { unresolved, resolved, resolved_bad };
 
 /* Resolve an overloaded vec_mul call and return a tree expression for the
-   resolved call if successful.  NARGS is the number of arguments to the call.
-   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   resolved call if successful.  ARGS contains the arguments to the call.
+   TYPES contains their types.  RES must be set to indicate the status of
the resolution attempt.  LOC contains statement location information.  */
 
 static tree
-resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs,
-location_t loc)
+resolve_vec_mul (resolution *res, tree *args, tree *types, location_t loc)
 {
   /* vec_mul needs to be special cased because there are no instructions for it
  for the {un}signed char, {un}signed short, and {un}signed int types.  */
-  if (nargs != 2)
-{
-  error ("builtin %qs only accepts 2 arguments", "vec_mul");
-  *res = resolved;
-  return error_mark_node;
-}
-
-  tree arg0 = (*arglist)[0];
-  tree arg0_type = TREE_TYPE (arg0);
-  tree arg1 = (*arglist)[1];
-  tree arg1_type = TREE_TYPE (arg1);
 
   /* Both arguments must be vectors and the types must be compatible.  */
-  if (TREE_CODE (arg0_type) != VECTOR_TYPE
-  || !lang_hooks.types_compatible_p (arg0_type, arg1_type))
+  if (TREE_CODE (types[0]) != VECTOR_TYPE
+  || !lang_hooks.types_compatible_p (types[0], types[1]))
 {
   *res = resolved_bad;
   return error_mark_node;
 }
 
-  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+  switch (TYPE_MODE (TREE_TYPE (types[0])))
 {
 case E_QImode:
 case E_HImode:
@@ -978,21 +966,21 @@ resolve_vec_mul (resolution *res, vec 
*arglist, unsigned nargs,
 case E_TImode:
   /* For scalar types just use a multiply expression.  */
   *res = resolved;
-  return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
- fold_convert (TREE_TYPE (arg0), arg1));
+  return fold_build2_loc (loc, MULT_EXPR, types[0], args[0],
+ fold_convert (types[0], args[1]));
 case E_SFmode:
   {
/* For floats use the xvmulsp instruction directly.  */

Re: [PATCH v2 1/8] rs6000: More factoring of overload processing

2022-02-02 Thread Bill Schmidt via Gcc-patches
Hi!

On 2/1/22 3:48 PM, Segher Boessenkool wrote:
> On Tue, Feb 01, 2022 at 08:49:34AM -0600, Bill Schmidt wrote:
>> I've modified the previous patch to add more explanatory commentary about
>> the number-of-arguments test that was previously confusing, and to convert
>> the switch into an if-then-else chain.  The rest of the patch is unchanged.
>> Bootstrapped and tested on powerpc64le-linux-gnu.  Is this okay for trunk?
>> gcc/
>>  * config/rs6000/rs6000-c.cc (resolve_vec_mul): Accept args and types
>>  parameters instead of arglist and nargs.  Simplify accordingly.  Remove
>>  unnecessary test for argument count mismatch.
>>  (resolve_vec_cmpne): Likewise.
>>  (resolve_vec_adde_sube): Likewise.
>>  (resolve_vec_addec_subec): Likewise.
>>  (altivec_resolve_overloaded_builtin): Move overload special handling
>>  after the gathering of arguments into args[] and types[] and the test
>>  for correct number of arguments.  Don't perform the test for correct
>>  number of arguments for certain special cases.  Call the other special
>>  cases with args and types instead of arglist and nargs.
>> +  if (fcode != RS6000_OVLD_VEC_PROMOTE
>> +  && fcode != RS6000_OVLD_VEC_SPLATS
>> +  && fcode != RS6000_OVLD_VEC_EXTRACT
>> +  && fcode != RS6000_OVLD_VEC_INSERT
>> +  && fcode != RS6000_OVLD_VEC_STEP
>> +  && (!VOID_TYPE_P (TREE_VALUE (fnargs)) || n < nargs))
>>  return NULL;
> Please don't do De Morgan manually, let the compiler deal with it?
> Although even with that the logic is as clear as mud.  This matters if
> someone (maybe even you) will have to debug this later, or modify this.
> Maybe adding some suitably named variables can clarify things  here?

I can de-deMorgan this.  Do you want to see the patch again, or is it okay
with that change?

Thanks!
Bill

>
>> +  if (fcode == RS6000_OVLD_VEC_MUL)
>> +returned_expr = resolve_vec_mul (, args, types, loc);
>> +  else if (fcode == RS6000_OVLD_VEC_CMPNE)
>> +returned_expr = resolve_vec_cmpne (, args, types, loc);
>> +  else if (fcode == RS6000_OVLD_VEC_ADDE || fcode == RS6000_OVLD_VEC_SUBE)
>> +returned_expr = resolve_vec_adde_sube (, fcode, args, types, loc);
>> +  else if (fcode == RS6000_OVLD_VEC_ADDEC || fcode == RS6000_OVLD_VEC_SUBEC)
>> +returned_expr = resolve_vec_addec_subec (, fcode, args, types, loc);
>> +  else if (fcode == RS6000_OVLD_VEC_SPLATS || fcode == 
>> RS6000_OVLD_VEC_PROMOTE)
>> +returned_expr = resolve_vec_splats (, fcode, arglist, nargs);
>> +  else if (fcode == RS6000_OVLD_VEC_EXTRACT)
>> +returned_expr = resolve_vec_extract (, arglist, nargs, loc);
>> +  else if (fcode == RS6000_OVLD_VEC_INSERT)
>> +returned_expr = resolve_vec_insert (, arglist, nargs, loc);
>> +  else if (fcode == RS6000_OVLD_VEC_STEP)
>> +returned_expr = resolve_vec_step (, arglist, nargs);
>> +
>> +  if (res == resolved)
>> +return returned_expr;
> This is so convoluted because the functions do two things, and have two
> return values (res and returned_expr).
>
>
> Segher


Re: [PATCH] rs6000: Fix up PCH on powerpc* [PR104323]

2022-02-01 Thread Bill Schmidt via Gcc-patches
Hi!

Jakub, thanks for fixing this.  I didn't realize the PCH implications here, 
clearly...

On 2/1/22 12:33 PM, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Feb 01, 2022 at 04:27:40PM +0100, Jakub Jelinek wrote:
>> +/* PR target/104323 */
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
>> +/* { dg-options "-maltivec" } */
>> +
>> +#include 
>> testcase which I'm not including into testsuite because for some reason
>> the test fails on non-powerpc* targets (is done even on those and fails
>> because of missing altivec.h etc.),
> powerpc_altivec_ok returns false if the target isn't Power, you can use
> this in the testsuite fine?  Why does it still fail on other targets,
> the test should be SKIPPED there?
>
> Or wait, proc check_effective_target_powerpc_altivec_ok is broken, and
> does not implement its intention or documentation.  Will fix.
>
>> PCH is broken on powerpc*-*-* since the
>> new builtin generator has been introduced.
>> The generator contains or emits comments like:
>>   /*  Cannot mark this as a GC root because only pointer types can
>>  be marked as GTY((user)) and be GC roots.  All trees in here are
>>  kept alive by other globals, so not a big deal.  Alternatively,
>>  we could change the enum fields to ints and cast them in and out
>>  to avoid requiring a GTY((user)) designation, but that seems
>>  unnecessarily gross.  */
>> Having the fntypes stored in other GC roots can work fine for GC,
>> ggc_collect will then always mark them and so they won't disappear from
>> the tables, but it definitely doesn't work for PCH, which when the
>> arrays with fntype members aren't GTY marked means on PCH write we create
>> copies of those FUNCTION_TYPEs and store in *.gch that the GC roots should
>> be updated, but don't store that rs6000_builtin_info[?].fntype etc. should
>> be updated.  When PCH is read again, the blob is read at some other address,
>> GC roots are updated, rs6000_builtin_info[?].fntype contains garbage
>> pointers (GC freed pointers with random data, or random unrelated types or
>> other trees).
>> The following patch fixes that.  It stops any user markings because that
>> is totally unnecessary, just skips fields we don't need to mark and adds
>> GTY(()) to the 2 array variables.  We can get rid of all those global
>> vars for the fn types, they can be now automatic vars.
>> With the patch we get
>>   {
>> _instance_info[0].fntype,
>> 1 * (RS6000_INST_MAX),
>> sizeof (rs6000_instance_info[0]),
>> _ggc_mx_tree_node,
>> _pch_nx_tree_node
>>   },
>>   {
>> _builtin_info[0].fntype,
>> 1 * (RS6000_BIF_MAX),
>> sizeof (rs6000_builtin_info[0]),
>> _ggc_mx_tree_node,
>> _pch_nx_tree_node
>>   },
>> as the new roots which is exactly what we want and significantly more
>> compact than countless
>>   {
>> _ftype_pudi_usi,
>> 1,
>> sizeof (uv2di_ftype_pudi_usi),
>> _ggc_mx_tree_node,
>> _pch_nx_tree_node
>>   },
>>   {
>> _ftype_lg_puv2di,
>> 1,
>> sizeof (uv2di_ftype_lg_puv2di),
>> _ggc_mx_tree_node,
>> _pch_nx_tree_node
>>   },
>>   {
>> _ftype_lg_pudi,
>> 1,
>> sizeof (uv2di_ftype_lg_pudi),
>> _ggc_mx_tree_node,
>> _pch_nx_tree_node
>>   },
>>   {
>> _ftype_di_puv2di,
>> 1,
>> sizeof (uv2di_ftype_di_puv2di),
>> _ggc_mx_tree_node,
>> _pch_nx_tree_node
>>   },
>> cases (822 of these instead of just those 4 shown).
> Bill, can you review the builtin side of this?

Yes, I've just read through it and it looks just fine to me.
It's a big improvement over what I had there, even ignoring
the PCH issues.

Thanks again, Jakub!

Bill

>
>>  PR target/104323
>>  * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Append rs6000-builtins.h
>>  rather than $(srcdir)/config/rs6000/rs6000-builtins.def.
>>  * config/rs6000/rs6000-gen-builtins.cc (write_decls): Don't use
>>  GTY((user)) for struct bifdata and struct ovlddata.  Instead add
>>  GTY((skip(""))) to members with pointer and enum types that don't need
>>  to be tracked.  Add GTY(()) to rs6000_builtin_info and 
>> rs6000_instance_info
>>  declarations.  Don't emit gt_ggc_mx and gt_pch_nx declarations.
> Nice :-)
>
>>  (write_extern_fntype, write_fntype): Remove.
>>  (write_fntype_init): Emit the fntype vars as automatic vars instead
>>  of file scope ones.
>>  (write_header_file): Don't iterate with write_extern_fntype.
>>  (write_init_file): Don't iterate with write_fntype.  Don't emit
>>  gt_ggc_mx and gt_pch_nx definitions.
>>if (tf_found)
>> -fprintf (init_file, "  if (float128_type_node)\n  ");
>> +fprintf (init_file,
>> + "  tree %s = NULL_TREE;\n  if (float128_type_node)\n",
>> + buf);
>>else if (dfp_found)
>> -fprintf (init_file, "  if (dfloat64_type_node)\n  ");
>> +fprintf (init_file,
>> + "  tree %s = NULL_TREE;\n  if (dfloat64_type_node)\n",
>> + buf);
> Things are 

[PATCH v2 3/8] rs6000: Unify error messages for built-in constant restrictions

2022-02-01 Thread Bill Schmidt via Gcc-patches
Hi!

As discussed, I simplified this patch by just changing how the error
message is produced:

We currently give different error messages for built-in functions that
violate range restrictions on their arguments, depending on whether we
record them as requiring an n-bit literal or a literal between two values.
It's better to be consistent.  Change the error message for the n-bit
literal to look like the other one.

Bootstrapped and tested on powerpc64le-linux-gnu.  Is this okay for trunk?

Thanks!
Bill


2022-01-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.cc (rs6000_expand_builtin): Revise
error message for RES_BITS case.

gcc/testsuite/
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-10.c:
Adjust error messages.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-2.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-3.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-4.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-5.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-9.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/vec-test-data-class-4.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/vec-test-data-class-5.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/vec-test-data-class-6.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/vec-test-data-class-7.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-12.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-14.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-17.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-19.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-2.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-22.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-24.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-27.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-29.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-32.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-34.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-37.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-39.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-4.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-42.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-44.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-47.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-49.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-52.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-54.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-57.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-59.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-62.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-64.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-67.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-69.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-7.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-72.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-74.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-77.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-79.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-9.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr80315-1.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr80315-2.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr80315-3.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr80315-4.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr82015.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr91903.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_error.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/vec-ternarylogic-10.c: Likewise.
---
 gcc/config/rs6000/rs6000-call.cc  |  6 +-
 .../powerpc/bfp/scalar-test-data-class-10.c   |  2 +-
 .../powerpc/bfp/scalar-test-data-class-2.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-3.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-4.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-5.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-9.c|  2 +-
 .../powerpc/bfp/vec-test-data-class-4.c   |  2 +-
 .../powerpc/bfp/vec-test-data-class-5.c   |  2 +-
 .../powerpc/bfp/vec-test-data-class-6.c   |  2 +-
 

[PATCH v2 1/8] rs6000: More factoring of overload processing

2022-02-01 Thread Bill Schmidt via Gcc-patches
Hi,

I've modified the previous patch to add more explanatory commentary about
the number-of-arguments test that was previously confusing, and to convert
the switch into an if-then-else chain.  The rest of the patch is unchanged.
Bootstrapped and tested on powerpc64le-linux-gnu.  Is this okay for trunk?

Remainder of commit message follows:

This patch continues the refactoring started with r12-6014.  I had previously
noted that the resolve_vec* routines can be further simplified by processing
the argument list earlier, so that all routines can use the arrays of arguments
and types.  I found that this was useful for some of the routines, but not for
all of them.

For several of the special-cased overloads, we don't specify all of the
possible type combinations in rs6000-overload.def, because the types don't
matter for the expansion we do.  For these, we can't use generic error message
handling when the number of arguments is incorrect, because the result is
misleading error messages that indicate argument types are wrong.

So this patch goes halfway and improves the factoring on the remaining special
cases, but leaves vec_splats, vec_promote, vec_extract, vec_insert, and
vec_step alone.

Thanks!
Bill


2022-01-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.cc (resolve_vec_mul): Accept args and types
parameters instead of arglist and nargs.  Simplify accordingly.  Remove
unnecessary test for argument count mismatch.
(resolve_vec_cmpne): Likewise.
(resolve_vec_adde_sube): Likewise.
(resolve_vec_addec_subec): Likewise.
(altivec_resolve_overloaded_builtin): Move overload special handling
after the gathering of arguments into args[] and types[] and the test
for correct number of arguments.  Don't perform the test for correct
number of arguments for certain special cases.  Call the other special
cases with args and types instead of arglist and nargs.
---
 gcc/config/rs6000/rs6000-c.cc | 297 ++
 1 file changed, 120 insertions(+), 177 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 145421ab8f2..4911e5f509c 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -939,37 +939,25 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 enum resolution { unresolved, resolved, resolved_bad };
 
 /* Resolve an overloaded vec_mul call and return a tree expression for the
-   resolved call if successful.  NARGS is the number of arguments to the call.
-   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   resolved call if successful.  ARGS contains the arguments to the call.
+   TYPES contains their types.  RES must be set to indicate the status of
the resolution attempt.  LOC contains statement location information.  */
 
 static tree
-resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs,
-location_t loc)
+resolve_vec_mul (resolution *res, tree *args, tree *types, location_t loc)
 {
   /* vec_mul needs to be special cased because there are no instructions for it
  for the {un}signed char, {un}signed short, and {un}signed int types.  */
-  if (nargs != 2)
-{
-  error ("builtin %qs only accepts 2 arguments", "vec_mul");
-  *res = resolved;
-  return error_mark_node;
-}
-
-  tree arg0 = (*arglist)[0];
-  tree arg0_type = TREE_TYPE (arg0);
-  tree arg1 = (*arglist)[1];
-  tree arg1_type = TREE_TYPE (arg1);
 
   /* Both arguments must be vectors and the types must be compatible.  */
-  if (TREE_CODE (arg0_type) != VECTOR_TYPE
-  || !lang_hooks.types_compatible_p (arg0_type, arg1_type))
+  if (TREE_CODE (types[0]) != VECTOR_TYPE
+  || !lang_hooks.types_compatible_p (types[0], types[1]))
 {
   *res = resolved_bad;
   return error_mark_node;
 }
 
-  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+  switch (TYPE_MODE (TREE_TYPE (types[0])))
 {
 case E_QImode:
 case E_HImode:
@@ -978,21 +966,21 @@ resolve_vec_mul (resolution *res, vec 
*arglist, unsigned nargs,
 case E_TImode:
   /* For scalar types just use a multiply expression.  */
   *res = resolved;
-  return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
- fold_convert (TREE_TYPE (arg0), arg1));
+  return fold_build2_loc (loc, MULT_EXPR, types[0], args[0],
+ fold_convert (types[0], args[1]));
 case E_SFmode:
   {
/* For floats use the xvmulsp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 case E_DFmode:
   {
/* For doubles use the xvmuldp instruction directly.  */
*res = resolved;
tree call = 

Re: [PATCH 4/8] rs6000: Consolidate target built-ins code

2022-01-31 Thread Bill Schmidt via Gcc-patches
Hi Segher,

On 1/31/22 3:32 PM, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Jan 28, 2022 at 11:50:22AM -0600, Bill Schmidt wrote:
>> Continuing with the refactoring effort, this patch moves as much of the
>> target-specific built-in support code into a new file, rs6000-builtin.cc.
>> However, we can't easily move the overloading support code out of
>> rs6000-c.cc, because the build machinery understands that as a special file
>> to be included with the C and C++ front ends.
> And the other C-like frontends.
>
>> This patch is just a straightforward move, with one exception.  I found
>> that the builtin_mode_to_type[] array is no longer used, so I also removed
>> all code having to do with it.
> Oh nice, your rewrite removed the need for that array.  Great :-)
>
>> The code in rs6000-builtin.cc is organized in related sections:
>>  - General support functions
>>  - Initialization support
>>  - GIMPLE folding support
>>  - Expansion support
>>
>> Overloading support remains in rs6000-c.cc.
> So, what is needed to move that as well?  Is moving that in the plan?

No, as explained above, that code needs to stay in the "special" file
that the build machinery understands.  It looks very difficult to
tease that apart, so I've given up on that.  Sorry!

>
>>  * config/rs6000/rs6000-builtin.cc: New file, containing code moved
>>  from other files.
> (You're breaking lines early again.)
>
>> -extra_objs="${extra_objs} rs6000-builtins.o"
>> +extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
> It's pretty unfortunate that these files are named alike.  The source
> files exist in different places of course, so the danger of confusion
> is minimal usually.
>
>> +/* Support targetm.vectorize.builtin_mask_for_load.  */
>> +tree altivec_builtin_mask_for_load;
> "Support"?  What does that mean?  Please describe what this tree is.

That comment is just moved.  There's a target hook from the vectorizer
for Altivec-style unaligned load masking.  The target needs to provide
a built-in function for this.  The tree contains its function decl.
I can change the comment.

>
>> +/*  General support functions.  */
> This isn't a sentence so should not have a full stop.  (And otherwise
> it should be followed by two spaces!)
>
>> +bool
>> +rs6000_builtin_is_supported (enum rs6000_gen_builtins fncode)
>> +{
>> +  switch (rs6000_builtin_info[(size_t) fncode].enable)
>> +{
>> +case ENB_ALWAYS:
>> +  return true;
>> +case ENB_P5:
>> +  return TARGET_POPCNTB;
>> +case ENB_P6:
>> +  return TARGET_CMPB;
>> +case ENB_P6_64:
>> +  return TARGET_CMPB && TARGET_POWERPC64;
>> +case ENB_P7:
>> +  return TARGET_POPCNTD;
>> +case ENB_P7_64:
>> +  return TARGET_POPCNTD && TARGET_POWERPC64;
>> +case ENB_P8:
>> +  return TARGET_DIRECT_MOVE;
>> +case ENB_P8V:
>> +  return TARGET_P8_VECTOR;
>> +case ENB_P9:
>> +  return TARGET_MODULO;
>> +case ENB_P9_64:
>> +  return TARGET_MODULO && TARGET_POWERPC64;
>> +case ENB_P9V:
>> +  return TARGET_P9_VECTOR;
>> +case ENB_P10:
>> +  return TARGET_POWER10;
>> +case ENB_P10_64:
>> +  return TARGET_POWER10 && TARGET_POWERPC64;
>> +case ENB_ALTIVEC:
>> +  return TARGET_ALTIVEC;
>> +case ENB_VSX:
>> +  return TARGET_VSX;
>> +case ENB_CELL:
>> +  return TARGET_ALTIVEC && rs6000_cpu == PROCESSOR_CELL;
>> +case ENB_IEEE128_HW:
>> +  return TARGET_FLOAT128_HW;
>> +case ENB_DFP:
>> +  return TARGET_DFP;
>> +case ENB_CRYPTO:
>> +  return TARGET_CRYPTO;
>> +case ENB_HTM:
>> +  return TARGET_HTM;
>> +case ENB_MMA:
>> +  return TARGET_MMA;
>> +default:
>> +  gcc_unreachable ();
>> +}
>> +  gcc_unreachable ();
>> +}
> If you rewrite this without switch it is shorter and clearer, and you do
> not need to duplicate the gcc_unreachable (which the broken warning
> forces you to).
>
>> +  if (fcode >= RS6000_OVLD_MAX)
>> +return error_mark_node;
> This shows that that isn't really the max, it is the number of elts in
> the array, instead (maximum is inclusive).  Maybe fis that some day :-)
>
>> +/* Implement targetm.vectorize.builtin_md_vectorized_function.  */
>> +
>> +tree
>> +rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
>> +   tree type_in)
>> +{
>> +  machine_mode in_mode, out_mode;
>> +  int in_n, out_n;
>> +
>> +  if (TARGET_DEBUG_BUILTIN)
>> +fprintf (stderr,
>> + "rs6000_builtin_md_vectorized_function (%s, %s, %s)\n",
>> + IDENTIFIER_POINTER (DECL_NAME (fndecl)),
>> + GET_MODE_NAME (TYPE_MODE (type_out)),
>> + GET_MODE_NAME (TYPE_MODE (type_in)));
>> +
>> +  /* TODO: Should this be gcc_assert?  */
>> +  if (TREE_CODE (type_out) != VECTOR_TYPE
>> +  || TREE_CODE (type_in) != VECTOR_TYPE)
>> +return NULL_TREE;
> Yes, as target.def says.
>
>> +  enum rs6000_gen_builtins fn
>> += (enum rs6000_gen_builtins) 

Re: [PATCH 3/8] rs6000: Convert built-in constraints to form

2022-01-31 Thread Bill Schmidt via Gcc-patches
On 1/31/22 11:28 AM, Segher Boessenkool wrote:
> On Mon, Jan 31, 2022 at 11:21:32AM -0600, Bill Schmidt wrote:
>> On 1/28/22 5:24 PM, Segher Boessenkool wrote:
>>> On Fri, Jan 28, 2022 at 11:50:21AM -0600, Bill Schmidt wrote:
 When introducing the new built-in support, I tried to match as many
 existing error messages as possible.  One common form was "argument X must
 be a Y-bit unsigned literal".  Another was "argument X must be a literal
 between X' and  Y', inclusive".  During reviews, Segher requested that I
 eventually convert all messages of the first form into the second form for
 consistency.  That's what this patch does, replacing all -form
 constraints (first form) with -form constraints (second form).
>>> Well, I asked for the error messages to be clearer and more consistent
>>> like that.  I don't think changing our source code like this is an
>>> improvement (*we* know what a 5-bit signed number is).  Do you think
>>> after your patch it is clearer and we will make fewer errors?
>> No, I don't think the patch is a particular improvement.  It sounds like
>> I may have misinterpreted what you were looking for here.  Please let me
>> know what I might do differently.
>>
>> For example, if we leave the  format in place in the source, I could
>> change the error messages that we produce to calculate the minimum and
>> maximum allowed values.  Then we'd still have the changes to the test
>> cases, but fewer changes to the source.  Thoughts?
> That is exactly what I asked for, and what I still think is the best
> option.  I haven't tried it out though, so there may be arguments
> against this :-)

Thanks for the clarification!  I'll make a run at it.

Bill

>
> Segher


Re: [PATCH 3/8] rs6000: Convert built-in constraints to form

2022-01-31 Thread Bill Schmidt via Gcc-patches
On 1/28/22 5:24 PM, Segher Boessenkool wrote:
> On Fri, Jan 28, 2022 at 11:50:21AM -0600, Bill Schmidt wrote:
>> When introducing the new built-in support, I tried to match as many
>> existing error messages as possible.  One common form was "argument X must
>> be a Y-bit unsigned literal".  Another was "argument X must be a literal
>> between X' and  Y', inclusive".  During reviews, Segher requested that I
>> eventually convert all messages of the first form into the second form for
>> consistency.  That's what this patch does, replacing all -form
>> constraints (first form) with -form constraints (second form).
> Well, I asked for the error messages to be clearer and more consistent
> like that.  I don't think changing our source code like this is an
> improvement (*we* know what a 5-bit signed number is).  Do you think
> after your patch it is clearer and we will make fewer errors?

No, I don't think the patch is a particular improvement.  It sounds like
I may have misinterpreted what you were looking for here.  Please let me
know what I might do differently.

For example, if we leave the  format in place in the source, I could
change the error messages that we produce to calculate the minimum and
maximum allowed values.  Then we'd still have the changes to the test
cases, but fewer changes to the source.  Thoughts?

Thanks,
Bill

>
> Segher


Re: [PATCH 2/8] rs6000: Don't #ifdef "short" built-in names

2022-01-28 Thread Bill Schmidt via Gcc-patches


On 1/28/22 2:32 PM, Segher Boessenkool wrote:
> On Fri, Jan 28, 2022 at 11:50:20AM -0600, Bill Schmidt wrote:
>> It was recently pointed out that we get anomalous behavior when using
>> __attribute__((target)) to select a CPU.  As an example, when building for
>> -mcpu=power8 but using __attribute__((target("mcpu=power10")), it is legal
>> to call __builtin_vec_mod, but not vec_mod, even though these are
>> equivalent.  This is because the equivalence is established with a #define
>> that is guarded by #ifdef _ARCH_PWR10.
> Yeah that is bad.
>
>> This goofy behavior occurs with both the old builtins support and the
>> new.  One of the goals of the new builtins support was to make sure all
>> appropriate interfaces are available using __attribute__((target)), so I
>> failed in this respect.  This patch corrects the problem by removing the
>> apply.  For example, #ifdef __PPU__ is still appropriate.
> "By removing the apply"...  What does that mean?

Er, wow.  Meant to say "by removing the #define."  Strange error... will fix.

Thanks for catching that!
Bill

>
> Nice cleanup (and nice bugfix of course).  Okay for trunk (with that
> comment improved a bit perhaps).  Thanks!
>
>
> Segher


Re: [PATCH 1/8] rs6000: More factoring of overload processing

2022-01-28 Thread Bill Schmidt via Gcc-patches


On 1/28/22 1:11 PM, Segher Boessenkool wrote:
> On Fri, Jan 28, 2022 at 11:50:19AM -0600, Bill Schmidt wrote:
>> This patch continues the refactoring started with r12-6014.
> ab3f5b71dc6e
>
>> + and the generic code will issue the appropriate error message.  Skip
>> + this test for functions where we don't fully describe all the possible
>> + overload signatures in rs6000-overload.def (because they aren't 
>> relevant
>> + to the expansion here).  If we don't, we get confusing error messages. 
>>  */
>> +  if (fcode != RS6000_OVLD_VEC_PROMOTE
>> +  && fcode != RS6000_OVLD_VEC_SPLATS
>> +  && fcode != RS6000_OVLD_VEC_EXTRACT
>> +  && fcode != RS6000_OVLD_VEC_INSERT
>> +  && fcode != RS6000_OVLD_VEC_STEP
>> +  && (!VOID_TYPE_P (TREE_VALUE (fnargs)) || n < nargs))
>>  return NULL;
> Can you expand a bit on this, give an example for example?  It is very
> hard to understand this code, the way it depends on code following many
> lines later.

Sure, sorry.

This check gives up if the number of arguments doesn't match the prototype.
It gives a fairly generic error message.  That part of it has always been
in here.

Now, I moved this check forward relative to the big switch statement on
fcode, because there are redundant checks for the number of arguments
in each of the resolve_vec_* helper functions.  This allowed me to simplify
those a bit.

Now, it turns out that this doesn't work so well for functions that aren't
fully described in rs6000-overload.def.  For example, for vec_splats we
have:

; There are no actual builtins for vec_splats.  There is special handling for
; this in altivec_resolve_overloaded_builtin in rs6000-c.cc, where the call
; is replaced by a constructor.  The single overload here causes
; __builtin_vec_splats to be registered with the front end so that can happen.
[VEC_SPLATS, vec_splats, __builtin_vec_splats]
  vsi __builtin_vec_splats (vsi);
ABS_V4SI SPLATS_FAKERY

So even though __builtin_vec_splats accepts all vector types, the
infrastructure cheats and just records one prototype.  We end up getting
an error message that refers to this specific prototype even when we are
handling a different argument type.  That is completely confusing to the
user.  So I felt I was starting to get too deep for a simple refactoring
patch, and gave up on early number-of-arguments checking for the special
cases that use the _FAKERY technique.

That's probably still not clear, but maybe clearer?

>
>> +default:
>> +  ;
> Don't.
>
> I like this better than a BS break statement, but it is just as stupid.
>
> If you need this, you don't want a switch statement, but some number of
> if statements.  You cannot use a switch as a shorthand for this because
> we have a silly warning and -Werror for this use.
>
> You probably get easier to understand code that way, too, you can get
> rid of the above (just do some early returns), etc.

If I understand correctly, you'd like me to resubmit this in if-then-else
form.  That's fine, just want to be sure that's what you want.

Thanks for the review!
Bill

>
>
> Segher


[PATCH 8/8] rs6000: Fix some missing built-in attributes [PR104004]

2022-01-28 Thread Bill Schmidt via Gcc-patches
PR104004 caught some misses on my part in converting to the new built-in
function infrastructure.  In particular, I forgot to mark all of the "nosoft"
built-ins, and one of those should also have been marked "no32bit".

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?

Thanks,
Bill


2022-01-27  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin.def (MFFSL): Mark nosoft.
(MTFSB0): Likewise.
(MTFSB1): Likewise.
(SET_FPSCR_RN): Likewise.
(SET_FPSCR_DRN): Mark nosoft and no32bit.
---
 gcc/config/rs6000/rs6000-builtins.def | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index c8f0cf332eb..98619a649e3 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -215,7 +215,7 @@
 ; processors, this builtin automatically falls back to mffs on older
 ; platforms.  Thus it appears here in the [always] stanza.
   double __builtin_mffsl ();
-MFFSL rs6000_mffsl {}
+MFFSL rs6000_mffsl {nosoft}
 
 ; This is redundant with __builtin_pack_ibm128, as it requires long
 ; double to be __ibm128.  Should probably be deprecated.
@@ -226,10 +226,10 @@
 MFTB rs6000_mftb_di {32bit}
 
   void __builtin_mtfsb0 (const int<0,31>);
-MTFSB0 rs6000_mtfsb0 {}
+MTFSB0 rs6000_mtfsb0 {nosoft}
 
   void __builtin_mtfsb1 (const int<0,31>);
-MTFSB1 rs6000_mtfsb1 {}
+MTFSB1 rs6000_mtfsb1 {nosoft}
 
   void __builtin_mtfsf (const int<0,255>, double);
 MTFSF rs6000_mtfsf {}
@@ -238,7 +238,7 @@
 PACK_IF packif {}
 
   void __builtin_set_fpscr_rn (const int[0,3]);
-SET_FPSCR_RN rs6000_set_fpscr_rn {}
+SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
 
   const double __builtin_unpack_ibm128 (__ibm128, const int<0,1>);
 UNPACK_IF unpackif {}
@@ -2969,7 +2969,7 @@
 PACK_TD packtd {}
 
   void __builtin_set_fpscr_drn (const int[0,7]);
-SET_FPSCR_DRN rs6000_set_fpscr_drn {}
+SET_FPSCR_DRN rs6000_set_fpscr_drn {nosoft,no32bit}
 
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, \
 const int<0,1>);
-- 
2.27.0



[PATCH 7/8] rs6000: vec_neg built-ins wrongly require POWER8

2022-01-28 Thread Bill Schmidt via Gcc-patches
As the subject states.  Fixing this is accomplished by moving the built-ins
to the correct stanzas, [altivec] and [vsx].

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?

Thanks,
Bill


2022-01-27  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin.def (NEG_V16QI): Move to [altivec]
stanza.
(NEG_V4SF): Likewise.
(NEG_V4SI): Likewise.
(NEG_V8HI): Likewise.
(NEG_V2DF): Move to [vsx] stanza.
(NEG_V2DI): Likewise.
---
 gcc/config/rs6000/rs6000-builtins.def | 36 +--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 2bb997a5279..c8f0cf332eb 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -410,6 +410,18 @@
   const vss __builtin_altivec_nabs_v8hi (vss);
 NABS_V8HI nabsv8hi2 {}
 
+  const vsc __builtin_altivec_neg_v16qi (vsc);
+NEG_V16QI negv16qi2 {}
+
+  const vf __builtin_altivec_neg_v4sf (vf);
+NEG_V4SF negv4sf2 {}
+
+  const vsi __builtin_altivec_neg_v4si (vsi);
+NEG_V4SI negv4si2 {}
+
+  const vss __builtin_altivec_neg_v8hi (vss);
+NEG_V8HI negv8hi2 {}
+
   void __builtin_altivec_stvebx (vsc, signed long, void *);
 STVEBX altivec_stvebx {stvec}
 
@@ -1175,6 +1187,12 @@
   const vsll __builtin_altivec_nabs_v2di (vsll);
 NABS_V2DI nabsv2di2 {}
 
+  const vd __builtin_altivec_neg_v2df (vd);
+NEG_V2DF negv2df2 {}
+
+  const vsll __builtin_altivec_neg_v2di (vsll);
+NEG_V2DI negv2di2 {}
+
   void __builtin_altivec_stvx_v2df (vd, signed long, void *);
 STVX_V2DF altivec_stvx_v2df {stvec}
 
@@ -2118,24 +2136,6 @@
   const vus __builtin_altivec_nand_v8hi_uns (vus, vus);
 NAND_V8HI_UNS nandv8hi3 {}
 
-  const vsc __builtin_altivec_neg_v16qi (vsc);
-NEG_V16QI negv16qi2 {}
-
-  const vd __builtin_altivec_neg_v2df (vd);
-NEG_V2DF negv2df2 {}
-
-  const vsll __builtin_altivec_neg_v2di (vsll);
-NEG_V2DI negv2di2 {}
-
-  const vf __builtin_altivec_neg_v4sf (vf);
-NEG_V4SF negv4sf2 {}
-
-  const vsi __builtin_altivec_neg_v4si (vsi);
-NEG_V4SI negv4si2 {}
-
-  const vss __builtin_altivec_neg_v8hi (vss);
-NEG_V8HI negv8hi2 {}
-
   const vsc __builtin_altivec_orc_v16qi (vsc, vsc);
 ORC_V16QI orcv16qi3 {}
 
-- 
2.27.0



[PATCH 6/8] rs6000: Remove -m[no-]fold-gimple flag [PR103686]

2022-01-28 Thread Bill Schmidt via Gcc-patches
The -m[no-]fold-gimple flag was really intended primarily for internal
testing while implementing GIMPLE folding for rs6000 vector built-in
functions.  It ended up leaking into other places, causing problems such
as PR103686 identifies.  Let's remove it.

There are a number of tests in the testsuite that require adjustment.
Some specify -mfold-gimple directly, which is the default, so that is
handled by removing the option.  Others unnecessarily specify
-mno-fold-gimple, as the tests work fine without this.  Again that is
handled by removing the option.  There are a couple of extra variants of
tests specifically for -mno-fold-gimple; for those, we can just remove the
whole test.

gcc.target/powerpc/builtins-1.c was more problematic.  It was written in
such a way as to be extremely fragile.  For this one, I rewrote the whole
test in a different style, using individual functions to test each
built-in function.  These same tests are also largely covered by
builtins-1-be-folded.c and builtins-1-le-folded.c, so I chose to
explicitly make this test -mbig for simplicity, and use -O2 for clean code
generation.  I made some slight modifications to the expected instruction
counts as a result, and tested on both 32- and 64-bit.  Most instruction
count tests now use the {\m ... \M} style, but I wasn't able to figure out
how to get this right for vcmpequd. and vcmpgtud.  Using \. didn't do the
trick, and I got tired of messing with it.  I can change those if you
suggest the proper incantation for an opcode ending with a period.

Bootstrapped and tested on powerpc64le-linux-gnu and on
powerpc64-linux-gnu (32- and 64-bit) with no regressions.
Is this okay for trunk?

Thanks,
Bill


2022-01-27  Bill Schmidt  

gcc/
PR target/103686
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
Remove test for !rs6000_fold_gimple.
* config/rs6000/rs6000.cc (rs6000_option_override_internal):
Likewise.
* config/rs6000/rs6000.opt (mfold-gimple): Remove.

gcc/testsuite/
PR target/103686
* gcc.target/powerpc/builtins-1-be-folded.c: Remove -mfold-gimple
option.
* gcc.target/powerpc/builtins-1-le-folded.c: Likewise.
* gcc.target/powerpc/builtins-1.c: Rewrite to use small functions
and restrict to -O2 -mbig for predictability.  Adjust instruction
counts.
* gcc.target/powerpc/builtins-5.c: Remove -mno-fold-gimple
option.
* gcc.target/powerpc/p8-vec-xl-xst.c: Likewise.
* gcc.target/powerpc/pr83926.c: Likewise.
* gcc.target/powerpc/pr86731-nogimplefold-longlong.c: Delete.
* gcc.target/powerpc/pr86731-nogimplefold.c: Delete.
* gcc.target/powerpc/swaps-p8-17.c: Remove -mno-fold-gimple
option.
---
 gcc/config/rs6000/rs6000-builtin.cc   |3 -
 gcc/config/rs6000/rs6000.cc   |4 -
 gcc/config/rs6000/rs6000.opt  |4 -
 .../gcc.target/powerpc/builtins-1-be-folded.c |2 +-
 .../gcc.target/powerpc/builtins-1-le-folded.c |2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-1.c | 1210 +
 gcc/testsuite/gcc.target/powerpc/builtins-5.c |3 +-
 .../gcc.target/powerpc/p8-vec-xl-xst.c|3 +-
 gcc/testsuite/gcc.target/powerpc/pr83926.c|3 +-
 .../powerpc/pr86731-nogimplefold-longlong.c   |   32 -
 .../gcc.target/powerpc/pr86731-nogimplefold.c |   63 -
 .../gcc.target/powerpc/swaps-p8-17.c  |3 +-
 12 files changed, 951 insertions(+), 381 deletions(-)
 delete mode 100644 
gcc/testsuite/gcc.target/powerpc/pr86731-nogimplefold-longlong.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr86731-nogimplefold.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 163287f2b67..dc9e3a4df1d 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -1299,9 +1299,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   fprintf (stderr, "rs6000_gimple_fold_builtin %d %s %s\n",
   fn_code, fn_name1, fn_name2);
 
-  if (!rs6000_fold_gimple)
-return false;
-
   /* Prevent gimple folding for code that does not have a LHS, unless it is
  allowed per the rs6000_builtin_valid_without_lhs helper function.  */
   if (!gimple_call_lhs (stmt)
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d27e1ec4a60..a4acb5d1f43 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3851,10 +3851,6 @@ rs6000_option_override_internal (bool global_init_p)
   & OPTION_MASK_DIRECT_MOVE))
 rs6000_isa_flags |= ~rs6000_isa_flags_explicit & OPTION_MASK_STRICT_ALIGN;
 
-  if (!rs6000_fold_gimple)
- fprintf (stderr,
- "gimple folding of rs6000 builtins has been disabled.\n");
-
   /* Add some warnings for VSX.  */
   if (TARGET_VSX)
 {
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c2a77182a9e..68c0cae6e63 

[PATCH 5/8] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-01-28 Thread Bill Schmidt via Gcc-patches
These built-ins were misimplemented as always having big-endian semantics.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?

Thanks,
Bill


2022-01-18  Bill Schmidt  

gcc/
PR target/95082
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Handle
endianness for vclzlsbb and vctzlsbb.
* config/rs6000/rs6000-builtins.def (VCLZLSBB_V16QI): Change
default pattern and indicate a different pattern will be used for
big endian.
(VCLZLSBB_V4SI): Likewise.
(VCLZLSBB_V8HI): Likewise.
(VCTZLSBB_V16QI): Likewise.
(VCTZLSBB_V4SI): Likewise.
(VCTZLSBB_V8HI): Likewise.

gcc/testsuite/
PR target/95082
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c: Restrict to -mbig.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-1.c: Likewise.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c: New.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c: New.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-0.c: Restrict to -mbig.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-1.c: Likewise.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c: New.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c: New.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 12 
 gcc/config/rs6000/rs6000-builtins.def | 12 ++--
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-1.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c | 15 +++
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c | 15 +++
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-0.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-1.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c | 15 +++
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c | 15 +++
 10 files changed, 82 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 191a6108a5e..163287f2b67 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -3485,6 +3485,18 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
subtarget */,
icode = CODE_FOR_vsx_store_v8hi;
   else if (fcode == RS6000_BIF_ST_ELEMREV_V16QI)
icode = CODE_FOR_vsx_store_v16qi;
+  else if (fcode == RS6000_BIF_VCLZLSBB_V16QI)
+   icode = CODE_FOR_vclzlsbb_v16qi;
+  else if (fcode == RS6000_BIF_VCLZLSBB_V4SI)
+   icode = CODE_FOR_vclzlsbb_v4si;
+  else if (fcode == RS6000_BIF_VCLZLSBB_V8HI)
+   icode = CODE_FOR_vclzlsbb_v8hi;
+  else if (fcode == RS6000_BIF_VCTZLSBB_V16QI)
+   icode = CODE_FOR_vctzlsbb_v16qi;
+  else if (fcode == RS6000_BIF_VCTZLSBB_V4SI)
+   icode = CODE_FOR_vctzlsbb_v4si;
+  else if (fcode == RS6000_BIF_VCTZLSBB_V8HI)
+   icode = CODE_FOR_vctzlsbb_v8hi;
   else
gcc_unreachable ();
 }
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index cfe31c2e7de..2bb997a5279 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2551,13 +2551,13 @@
 VBPERMD altivec_vbpermd {}
 
   const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
-VCLZLSBB_V16QI vclzlsbb_v16qi {}
+VCLZLSBB_V16QI vctzlsbb_v16qi {endian}
 
   const signed int __builtin_altivec_vclzlsbb_v4si (vsi);
-VCLZLSBB_V4SI vclzlsbb_v4si {}
+VCLZLSBB_V4SI vctzlsbb_v4si {endian}
 
   const signed int __builtin_altivec_vclzlsbb_v8hi (vss);
-VCLZLSBB_V8HI vclzlsbb_v8hi {}
+VCLZLSBB_V8HI vctzlsbb_v8hi {endian}
 
   const vsc __builtin_altivec_vctzb (vsc);
 VCTZB ctzv16qi2 {}
@@ -2572,13 +2572,13 @@
 VCTZW ctzv4si2 {}
 
   const signed int __builtin_altivec_vctzlsbb_v16qi (vsc);
-VCTZLSBB_V16QI vctzlsbb_v16qi {}
+VCTZLSBB_V16QI vclzlsbb_v16qi {endian}
 
   const signed int __builtin_altivec_vctzlsbb_v4si (vsi);
-VCTZLSBB_V4SI vctzlsbb_v4si {}
+VCTZLSBB_V4SI vclzlsbb_v4si {endian}
 
   const signed int __builtin_altivec_vctzlsbb_v8hi (vss);
-VCTZLSBB_V8HI vctzlsbb_v8hi {}
+VCTZLSBB_V8HI vclzlsbb_v8hi {endian}
 
   const signed int __builtin_altivec_vcmpaeb_p (vsc, vsc);
 VCMPAEB_P vector_ae_v16qi_p {}
diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c 
b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
index 0faf233425e..dc92d6fdd65 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* 

[PATCH 3/8] rs6000: Convert built-in constraints to form

2022-01-28 Thread Bill Schmidt via Gcc-patches
When introducing the new built-in support, I tried to match as many
existing error messages as possible.  One common form was "argument X must
be a Y-bit unsigned literal".  Another was "argument X must be a literal
between X' and  Y', inclusive".  During reviews, Segher requested that I
eventually convert all messages of the first form into the second form for
consistency.  That's what this patch does, replacing all -form
constraints (first form) with -form constraints (second form).

For the moment, the parser will still accept  arguments, but I've added
a note in rs6000-builtins.def that this form is deprecated in favor of
.  I think it's harmless to leave it in, in case a desire for the
distinction comes up in the future.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?

Thanks,
Bill


2022-01-12  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtins.def (MTFSB0): Replace -form
constraints with -form constraints.
(MTFSB1): Likewise.
(MTFSF): Likewise.
(UNPACK_IF): Likewise.
(UNPACK_TF): Likewise.
(DSS): Likewise.
(DST): Likewise.
(DSTST): Likewise.
(DSTSTT): Likewise.
(DSTT): Likewise.
(VCFSX): Likewise.
(VCFUX): Likewise.
(VCTSXS): Likewise.
(VCTUXS): Likewise.
(VSLDOI_16QI): Likewise.
(VSLDOI_4SF): Likewise.
(VSLDOI_4SI): Likewise.
(VSLDOI_8HI): Likewise.
(VSPLTB): Likewise.
(VSPLTH): Likewise.
(VSPLTW): Likewise.
(VEC_SET_V16QI): Likewise.
(VEC_SET_V4SF): Likewise.
(VEC_SET_V4SI): Likewise.
(VEC_SET_V8HI): Likewise.
(VSLDOI_2DF): Likewise.
(VSLDOI_2DI): Likewise.
(VEC_SET_V2DF): Likewise.
(VEC_SET_V2DI): Likewise.
(XVCVSXDDP_SCALE): Likewise.
(XVCVUXDDP_SCALE): Likewise.
(XXPERMDI_16QI): Likewise.
(XXPERMDI_1TI): Likewise.
(XXPERMDI_2DF): Likewise.
(XXPERMDI_2DI): Likewise.
(XXPERMDI_4SF): Likewise.
(XXPERMDI_4SI): Likewise.
(XXPERMDI_8HI): Likewise.
(XXSLDWI_16QI): Likewise.
(XXSLDWI_2DF): Likewise.
(XXSLDWI_2DI): Likewise.
(XXSLDWI_4SF): Likewise.
(XXSLDWI_4SI): Likewise.
(XXSLDWI_8HI): Likewise.
(XXSPLTD_V2DF): Likewise.
(XXSPLTD_V2DI): Likewise.
(UNPACK_V1TI): Likewise.
(BCDADD_V1TI): Likewise.
(BCDADD_V16QI): Likewise.
(BCDADD_EQ_V1TI): Likewise.
(BCDADD_EQ_V16QI): Likewise.
(BCDADD_GT_V1TI): Likewise.
(BCDADD_GT_V16QI): Likewise.
(BCDADD_LT_V1TI): Likewise.
(BCDADD_LT_V16QI): Likewise.
(BCDADD_OV_V1TI): Likewise.
(BCDADD_OV_V16QI): Likewise.
(BCDSUB_V1TI): Likewise.
(BCDSUB_V16QI): Likewise.
(BCDSUB_EQ_V1TI): Likewise.
(BCDSUB_EQ_V16QI): Likewise.
(BCDSUB_GT_V1TI): Likewise.
(BCDSUB_GT_V16QI): Likewise.
(BCDSUB_LT_V1TI): Likewise.
(BCDSUB_LT_V16QI): Likewise.
(BCDSUB_OV_V1TI): Likewise.
(BCDSUB_OV_V16QI): Likewise.
(VSTDCDP): Likewise.
(VSTDCSP): Likewise.
(VTDCDP): Likewise.
(VTDCSP): Likewise.
(TSTSFI_EQ_DD): Likewise.
(TSTSFI_EQ_TD): Likewise.
(TSTSFI_GT_DD): Likewise.
(TSTSFI_GT_TD): Likewise.
(TSTSFI_LT_DD): Likewise.
(TSTSFI_LT_TD): Likewise.
(TSTSFI_OV_DD): Likewise.
(TSTSFI_OV_TD): Likewise.
(VSTDCQP): Likewise.
(DDEDPD): Likewise.
(DDEDPDQ): Likewise.
(DENBCD): Likewise.
(DENBCDQ): Likewise.
(DSCLI): Likewise.
(DSCLIQ): Likewise.
(DSCRI): Likewise.
(DSCRIQ): Likewise.
(UNPACK_TD): Likewise.
(VSHASIGMAD): Likewise.
(VSHASIGMAW): Likewise.
(VCNTMBB): Likewise.
(VCNTMBD): Likewise.
(VCNTMBH): Likewise.
(VCNTMBW): Likewise.
(VREPLACE_UN_UV2DI): Likewise.
(VREPLACE_UN_UV4SI): Likewise.
(VREPLACE_UN_V2DF): Likewise.
(VREPLACE_UN_V2DI): Likewise.
(VREPLACE_UN_V4SF): Likewise.
(VREPLACE_UN_V4SI): Likewise.
(VREPLACE_ELT_UV2DI): Likewise.
(VREPLACE_ELT_UV4SI): Likewise.
(VREPLACE_ELT_V2DF): Likewise.
(VREPLACE_ELT_V2DI): Likewise.
(VREPLACE_ELT_V4SF): Likewise.
(VREPLACE_ELT_V4SI): Likewise.
(VSLDB_V16QI): Likewise.
(VSLDB_V2DI): Likewise.
(VSLDB_V4SI): Likewise.
(VSLDB_V8HI): Likewise.
(VSRDB_V16QI): Likewise.
(VSRDB_V2DI): Likewise.
(VSRDB_V4SI): Likewise.
(VSRDB_V8HI): Likewise.
(VXXSPLTI32DX_V4SF): Likewise.
(VXXSPLTI32DX_V4SI): Likewise.
(XXEVAL): Likewise.
(XXGENPCVM_V16QI): Likewise.
(XXGENPCVM_V2DI): Likewise.
(XXGENPCVM_V4SI): Likewise.
(XXGENPCVM_V8HI): Likewise.

[PATCH 2/8] rs6000: Don't #ifdef "short" built-in names

2022-01-28 Thread Bill Schmidt via Gcc-patches
It was recently pointed out that we get anomalous behavior when using
__attribute__((target)) to select a CPU.  As an example, when building for
-mcpu=power8 but using __attribute__((target("mcpu=power10")), it is legal
to call __builtin_vec_mod, but not vec_mod, even though these are
equivalent.  This is because the equivalence is established with a #define
that is guarded by #ifdef _ARCH_PWR10.

This goofy behavior occurs with both the old builtins support and the
new.  One of the goals of the new builtins support was to make sure all
appropriate interfaces are available using __attribute__((target)), so I
failed in this respect.  This patch corrects the problem by removing the
apply.  For example, #ifdef __PPU__ is still appropriate.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?

Thanks,
Bill


2022-01-06  Bill Schmidt  

gcc/
* config/rs6000/rs6000-overload.def (VEC_ABSD): Remove #ifdef token.
(VEC_BLENDV): Likewise.
(VEC_BPERM): Likewise.
(VEC_CFUGE): Likewise.
(VEC_CIPHER_BE): Likewise.
(VEC_CIPHERLAST_BE): Likewise.
(VEC_CLRL): Likewise.
(VEC_CLRR): Likewise.
(VEC_CMPNEZ): Likewise.
(VEC_CNTLZ): Likewise.
(VEC_CNTLZM): Likewise.
(VEC_CNTTZM): Likewise.
(VEC_CNTLZ_LSBB): Likewise.
(VEC_CNTM): Likewise.
(VEC_CNTTZ): Likewise.
(VEC_CNTTZ_LSBB): Likewise.
(VEC_CONVERT_4F32_8F16): Likewise.
(VEC_DIV): Likewise.
(VEC_DIVE): Likewise.
(VEC_EQV): Likewise.
(VEC_EXPANDM): Likewise.
(VEC_EXTRACT_FP_FROM_SHORTH): Likewise.
(VEC_EXTRACT_FP_FROM_SHORTL): Likewise.
(VEC_EXTRACTH): Likewise.
(VEC_EXTRACTL): Likewise.
(VEC_EXTRACTM): Likewise.
(VEC_EXTRACT4B): Likewise.
(VEC_EXTULX): Likewise.
(VEC_EXTURX): Likewise.
(VEC_FIRSTMATCHINDEX): Likewise.
(VEC_FIRSTMACHOREOSINDEX): Likewise.
(VEC_FIRSTMISMATCHINDEX): Likewise.
(VEC_FIRSTMISMATCHOREOSINDEX): Likewise.
(VEC_GB): Likewise.
(VEC_GENBM): Likewise.
(VEC_GENHM): Likewise.
(VEC_GENWM): Likewise.
(VEC_GENDM): Likewise.
(VEC_GENQM): Likewise.
(VEC_GENPCVM): Likewise.
(VEC_GNB): Likewise.
(VEC_INSERTH): Likewise.
(VEC_INSERTL): Likewise.
(VEC_INSERT4B): Likewise.
(VEC_LXVL): Likewise.
(VEC_MERGEE): Likewise.
(VEC_MERGEO): Likewise.
(VEC_MOD): Likewise.
(VEC_MSUB): Likewise.
(VEC_MULH): Likewise.
(VEC_NAND): Likewise.
(VEC_NCIPHER_BE): Likewise.
(VEC_NCIPHERLAST_BE): Likewise.
(VEC_NEARBYINT): Likewise.
(VEC_NMADD): Likewise.
(VEC_ORC): Likewise.
(VEC_PDEP): Likewise.
(VEC_PERMX): Likewise.
(VEC_PEXT): Likewise.
(VEC_POPCNT): Likewise.
(VEC_PARITY_LSBB): Likewise.
(VEC_REPLACE_ELT): Likewise.
(VEC_REPLACE_UN): Likewise.
(VEC_REVB): Likewise.
(VEC_RINT): Likewise.
(VEC_RLMI): Likewise.
(VEC_RLNM): Likewise.
(VEC_SBOX_BE): Likewise.
(VEC_SIGNEXTI): Likewise.
(VEC_SIGNEXTLL): Likewise.
(VEC_SIGNEXTQ): Likewise.
(VEC_SLDB): Likewise.
(VEC_SLV): Likewise.
(VEC_SPLATI): Likewise.
(VEC_SPLATID): Likewise.
(VEC_SPLATI_INS): Likewise.
(VEC_SQRT): Likewise.
(VEC_SRDB): Likewise.
(VEC_SRV): Likewise.
(VEC_STRIL): Likewise.
(VEC_STRIL_P): Likewise.
(VEC_STRIR): Likewise.
(VEC_STRIR_P): Likewise.
(VEC_STXVL): Likewise.
(VEC_TERNARYLOGIC): Likewise.
(VEC_TEST_LSBB_ALL_ONES): Likewise.
(VEC_TEST_LSBB_ALL_ZEROS): Likewise.
(VEC_VEE): Likewise.
(VEC_VES): Likewise.
(VEC_VIE): Likewise.
(VEC_VPRTYB): Likewise.
(VEC_VSCEEQ): Likewise.
(VEC_VSCEGT): Likewise.
(VEC_VSCELT): Likewise.
(VEC_VSCEUO): Likewise.
(VEC_VSEE): Likewise.
(VEC_VSES): Likewise.
(VEC_VSIE): Likewise.
(VEC_VSTDC): Likewise.
(VEC_VSTDCN): Likewise.
(VEC_VTDC): Likewise.
(VEC_XL): Likewise.
(VEC_XL_BE): Likewise.
(VEC_XL_LEN_R): Likewise.
(VEC_XL_SEXT): Likewise.
(VEC_XL_ZEXT): Likewise.
(VEC_XST): Likewise.
(VEC_XST_BE): Likewise.
(VEC_XST_LEN_R): Likewise.
(VEC_XST_TRUNC): Likewise.
(VEC_XXPERMDI): Likewise.
(VEC_XXSLDWI): Likewise.
(VEC_TSTSFI_EQ_DD): Likewise.
(VEC_TSTSFI_EQ_TD): Likewise.
(VEC_TSTSFI_GT_DD): Likewise.
(VEC_TSTSFI_GT_TD): Likewise.
(VEC_TSTSFI_LT_DD): Likewise.
(VEC_TSTSFI_LT_TD): Likewise.
(VEC_TSTSFI_OV_DD): Likewise.
(VEC_TSTSFI_OV_TD): Likewise.
(VEC_VADDCUQ): Likewise.
(VEC_VADDECUQ): 

[PATCH 1/8] rs6000: More factoring of overload processing

2022-01-28 Thread Bill Schmidt via Gcc-patches
This patch continues the refactoring started with r12-6014.  I had previously
noted that the resolve_vec* routines can be further simplified by processing
the argument list earlier, so that all routines can use the arrays of arguments
and types.  I found that this was useful for some of the routines, but not for
all of them.

For several of the special-cased overloads, we don't specify all of the
possible type combinations in rs6000-overload.def, because the types don't
matter for the expansion we do.  For these, we can't use generic error message
handling when the number of arguments is incorrect, because the result is
misleading error messages that indicate argument types are wrong.

So this patch goes halfway and improves the factoring on the remaining special
cases, but leaves vec_splats, vec_promote, vec_extract, vec_insert, and
vec_step alone.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?

Thanks,
Bill


2022-01-18  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.cc (resolve_vec_mul): Accept args and types
parameters instead of arglist and nargs.  Simplify accordingly.  Remove
unnecessary test for argument count mismatch.
(resolve_vec_cmpne): Likewise.
(resolve_vec_adde_sube): Likewise.
(resolve_vec_addec_subec): Likewise.
(altivec_resolve_overloaded_builtin): Move overload special handling
after the gathering of arguments into args[] and types[] and the test
for correct number of arguments.  Don't perform the test for correct
number of arguments for certain special cases.  Call the other special
cases with args and types instead of arglist and nargs.
---
 gcc/config/rs6000/rs6000-c.cc | 304 ++
 1 file changed, 127 insertions(+), 177 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 145421ab8f2..35c1383f059 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -939,37 +939,25 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 enum resolution { unresolved, resolved, resolved_bad };
 
 /* Resolve an overloaded vec_mul call and return a tree expression for the
-   resolved call if successful.  NARGS is the number of arguments to the call.
-   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   resolved call if successful.  ARGS contains the arguments to the call.
+   TYPES contains their types.  RES must be set to indicate the status of
the resolution attempt.  LOC contains statement location information.  */
 
 static tree
-resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs,
-location_t loc)
+resolve_vec_mul (resolution *res, tree *args, tree *types, location_t loc)
 {
   /* vec_mul needs to be special cased because there are no instructions for it
  for the {un}signed char, {un}signed short, and {un}signed int types.  */
-  if (nargs != 2)
-{
-  error ("builtin %qs only accepts 2 arguments", "vec_mul");
-  *res = resolved;
-  return error_mark_node;
-}
-
-  tree arg0 = (*arglist)[0];
-  tree arg0_type = TREE_TYPE (arg0);
-  tree arg1 = (*arglist)[1];
-  tree arg1_type = TREE_TYPE (arg1);
 
   /* Both arguments must be vectors and the types must be compatible.  */
-  if (TREE_CODE (arg0_type) != VECTOR_TYPE
-  || !lang_hooks.types_compatible_p (arg0_type, arg1_type))
+  if (TREE_CODE (types[0]) != VECTOR_TYPE
+  || !lang_hooks.types_compatible_p (types[0], types[1]))
 {
   *res = resolved_bad;
   return error_mark_node;
 }
 
-  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+  switch (TYPE_MODE (TREE_TYPE (types[0])))
 {
 case E_QImode:
 case E_HImode:
@@ -978,21 +966,21 @@ resolve_vec_mul (resolution *res, vec 
*arglist, unsigned nargs,
 case E_TImode:
   /* For scalar types just use a multiply expression.  */
   *res = resolved;
-  return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
- fold_convert (TREE_TYPE (arg0), arg1));
+  return fold_build2_loc (loc, MULT_EXPR, types[0], args[0],
+ fold_convert (types[0], args[1]));
 case E_SFmode:
   {
/* For floats use the xvmulsp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 case E_DFmode:
   {
/* For doubles use the xvmuldp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 /* Other types are errors.  */
 default:
@@ -1002,37 +990,25 @@ resolve_vec_mul (resolution *res, vec 

[PATCH 0/8] rs6000: Built-in function cleanups and bug fixes

2022-01-28 Thread Bill Schmidt via Gcc-patches
Hi!

This is a resubmission of some patches and a new submission of others.
Patches 1, 3, and 4 finish up the pending clean-up work for the new built-in
infrastructure support.  Patches 2 and 5-8 fix a variety of bugs not specific
to the new infrastructure.  I'm submitting these as a group primarily because
5-8 are dependent on the previous patches, particularly patch 4, which
consolidates much of the built-in code in a new file.

Thanks for your consideration!

Bill


Bill Schmidt (8):
  rs6000: More factoring of overload processing
  rs6000: Don't #ifdef "short" built-in names
  rs6000: Convert  built-in constraints to  form
  rs6000: Consolidate target built-ins code
  rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]
  rs6000: Remove -m[no-]fold-gimple flag [PR103686]
  rs6000: vec_neg built-ins wrongly require POWER8
  rs6000: Fix some missing built-in attributes [PR104004]

 gcc/config.gcc|2 +-
 gcc/config/rs6000/rs6000-builtin.cc   | 3721 +
 gcc/config/rs6000/rs6000-builtins.def |  578 +--
 gcc/config/rs6000/rs6000-c.cc |  304 +-
 gcc/config/rs6000/rs6000-call.cc  | 3524 
 gcc/config/rs6000/rs6000-overload.def |  344 +-
 gcc/config/rs6000/rs6000.cc   |  167 +-
 gcc/config/rs6000/rs6000.h|1 -
 gcc/config/rs6000/rs6000.opt  |4 -
 gcc/config/rs6000/t-rs6000|4 +
 .../powerpc/bfp/scalar-test-data-class-10.c   |2 +-
 .../powerpc/bfp/scalar-test-data-class-2.c|2 +-
 .../powerpc/bfp/scalar-test-data-class-3.c|2 +-
 .../powerpc/bfp/scalar-test-data-class-4.c|2 +-
 .../powerpc/bfp/scalar-test-data-class-5.c|2 +-
 .../powerpc/bfp/scalar-test-data-class-9.c|2 +-
 .../powerpc/bfp/vec-test-data-class-4.c   |2 +-
 .../powerpc/bfp/vec-test-data-class-5.c   |2 +-
 .../powerpc/bfp/vec-test-data-class-6.c   |2 +-
 .../powerpc/bfp/vec-test-data-class-7.c   |2 +-
 .../gcc.target/powerpc/builtins-1-be-folded.c |2 +-
 .../gcc.target/powerpc/builtins-1-le-folded.c |2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-1.c | 1210 --
 gcc/testsuite/gcc.target/powerpc/builtins-5.c |3 +-
 .../gcc.target/powerpc/dfp/dtstsfi-12.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-14.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-17.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-19.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-2.c|2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-22.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-24.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-27.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-29.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-32.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-34.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-37.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-39.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-4.c|2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-42.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-44.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-47.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-49.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-52.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-54.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-57.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-59.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-62.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-64.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-67.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-69.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-7.c|2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-72.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-74.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-77.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-79.c   |2 +-
 .../gcc.target/powerpc/dfp/dtstsfi-9.c|2 +-
 .../gcc.target/powerpc/p8-vec-xl-xst.c|3 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-1.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-2.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-3.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-4.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr82015.c|4 +-
 gcc/testsuite/gcc.target/powerpc/pr83926.c|3 +-
 .../powerpc/pr86731-nogimplefold-longlong.c   |   32 -
 .../gcc.target/powerpc/pr86731-nogimplefold.c |   63 -
 gcc/testsuite/gcc.target/powerpc/pr91903.c|   60 +-
 .../gcc.target/powerpc/swaps-p8-17.c  |3 +-
 .../powerpc/test_fpscr_rn_builtin_error.c |8 +-
 .../gcc.target/powerpc/vec-ternarylogic-10.c  |6 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c |2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-1.c |2 +-
 

Re: [PATCH v9] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2022-01-24 Thread Bill Schmidt via Gcc-patches
Adding the patch author for his information.

Thanks,
Bill

On 1/24/22 2:26 PM, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Jan 24, 2022 at 08:55:37AM -0600, Segher Boessenkool wrote:
>> Hi!
>>
>> On Thu, Jan 13, 2022 at 02:08:53PM -0300, Raoni Fassina Firmino wrote:
>>> Changes since v8[8]:
>>>   - Refactored and expanded builtin-feclearexcept-feraiseexcept-2.c
>>> testcase:
>>> + Use a macro to avoid extended repetition of the core test code.
>>> + Expanded the test code to check builtins return code.
>>> + Added more tests to test all valid (standard) exceptions input
>> This is okay for trunk (Jeff already approved the generic parts).
>> Thanks!
> This breaks bootstrap with --enable-checking=rtl, e.g. while compiling
> libquadmath/math/llrintq.c
> #0  internal_error (gmsgid=0x131bb1e0 "RTL check: expected code '%s', have 
> '%s' in %s, at %s:%d") at ../../gcc/diagnostic.cc:1938
> #1  0x113a0e94 in rtl_check_failed_code1 (r=0x3fffaf4a24a8, 
> code=CONST_INT, file=0x13400018 "../../gcc/config/rs6000/rs6000.md", 
> line=7010, 
> func=0x13409298  
> "gen_feraiseexceptsi") at ../../gcc/rtl.cc:918
> #2  0x125154e8 in gen_feraiseexceptsi (operand0=0x3fffaf4a3720, 
> operand1=0x3fffaf4a24a8) at ../../gcc/config/rs6000/rs6000.md:7010
> #3  0x108badf4 in insn_gen_fn::operator() 
> (this=0x138ee440 ) at ../../gcc/recog.h:407
> #4  0x10890b1c in expand_builtin_feclear_feraise_except 
> (exp=0x3fffaf3041a0, target=0x3fffaf4a3720, target_mode=E_SImode, 
> op_optab=feraiseexcept_optab)
> at ../../gcc/builtins.cc:2606
> #5  0x108a6f74 in expand_builtin (exp=0x3fffaf3041a0, 
> target=0x3fffaf100490, subtarget=0x0, mode=E_VOIDmode, ignore=1) at 
> ../../gcc/builtins.cc:7130
> #6  0x10c01770 in expand_expr_real_1 (exp=0x3fffaf3041a0, target=0x0, 
> tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, 
> inner_reference_p=false)
> at ../../gcc/expr.cc:11536
> #7  0x10bf0604 in expand_expr_real (exp=0x3fffaf3041a0, 
> target=0x3fffaf100490, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0, 
> inner_reference_p=false)
> at ../../gcc/expr.cc:8737
> #8  0x108ffa00 in expand_expr (exp=0x3fffaf3041a0, 
> target=0x3fffaf100490, mode=E_VOIDmode, modifier=EXPAND_NORMAL) at 
> ../../gcc/expr.h:301
> #9  0x1090c934 in expand_call_stmt (stmt=0x3fffaf1314d0) at 
> ../../gcc/cfgexpand.cc:2831
> #10 0x10911e18 in expand_gimple_stmt_1 (stmt=0x3fffaf1314d0) at 
> ../../gcc/cfgexpand.cc:3864
> #11 0x10912730 in expand_gimple_stmt (stmt=0x3fffaf1314d0) at 
> ../../gcc/cfgexpand.cc:4028
> #12 0x1091ecb0 in expand_gimple_basic_block (bb=0x3fffaf190c98, 
> disable_tail_calls=false) at ../../gcc/cfgexpand.cc:6069
> #13 0x10921be8 in (anonymous namespace)::pass_expand::execute 
> (this=0x13ab0d40, fun=0x3fffaf0c0c38) at ../../gcc/cfgexpand.cc:6795
> #14 0x11216ea4 in execute_one_pass (pass=0x13ab0d40) at 
> ../../gcc/passes.cc:2637
> #15 0x112173d8 in execute_pass_list_1 (pass=0x13ab0d40) at 
> ../../gcc/passes.cc:2737
> #16 0x112174b0 in execute_pass_list (fn=0x3fffaf0c0c38, 
> pass=0x13aac8c0) at ../../gcc/passes.cc:2748
> #17 0x109b4e4c in cgraph_node::expand (this=0x3fffaf151760) at 
> ../../gcc/cgraphunit.cc:1834
> #18 0x109b5844 in expand_all_functions () at 
> ../../gcc/cgraphunit.cc:1998
> #19 0x109b67d0 in symbol_table::compile (this=0x3fffaf0d) at 
> ../../gcc/cgraphunit.cc:2348
> #20 0x109b6f40 in symbol_table::finalize_compilation_unit 
> (this=0x3fffaf0d) at ../../gcc/cgraphunit.cc:2529
> #21 0x114f10f4 in compile_file () at ../../gcc/toplev.cc:479
> #22 0x114f6204 in do_compile (no_backend=false) at 
> ../../gcc/toplev.cc:2158
> #23 0x114f68d0 in toplev::main (this=0x3fffeb64, argc=45, 
> argv=0x3fffef98) at ../../gcc/toplev.cc:2310
> #24 0x12f97a6c in main (argc=45, argv=0x3fffef98) at 
> ../../gcc/main.cc:39
>
> expand_builtin_feclear_feraise_except doesn't check if op0 matches
> the predicate of operands[1], the backend requires const_int_operand,
> but because the call isn't done with a constant integer:
> feraiseexcept (t == LLONG_MIN ? FE_INEXACT : FE_INVALID);
> op0 is a REG.
> If CONST_INT is what is expected on all targets, then it should punt if
> op0 isn't one, otherwise it should the predicate.
>
>   Jakub
>


Re: [PATCH] rs6000: Support vector float/double for vec_sldw

2022-01-21 Thread Bill Schmidt via Gcc-patches
Thanks!  Pushed as r12-6806 with the testcase adjusted.

Bill

On 1/21/22 11:47 AM, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Jan 21, 2022 at 11:31:34AM -0600, Bill Schmidt wrote:
>> It was recently discovered that Clang supports a couple of variants of 
>> vec_sldw that
>> GCC does not.  After some discussion, we decided that these variants are 
>> reasonable,
>> and GCC will also support them.  This patch adds that support.
> As we discussed, this is reasonable only because we already allow
> non-integer inputs (and outputs) for all(?) other permute class
> instructions.
>
>> I updated an existing test and discovered it wasn't actually checking for 
>> generation
>> of the xxsldwi instruction, so I added that check as well.
> It can always generate vsldoi instead, which is a strict superset (if
> all registers used are VRs).  They will not likely be here, because
> these are such simple functions, but that is a bit fragile.
>
>>  * gcc.target/powerpc/builtins-4.c: Add two test variants.  Adjust
>>  assembler counts.
> Is there any justification for the new counts?
>
> ... Ah, it didn't count the sld's at all before.  Okay.
>
>> @@ -161,6 +175,6 @@ test_sll_vuill_vuill_vuc (vector unsigned long long int 
>> x,
>>  /* { dg-final { scan-assembler-times "xvnabssp"  1 } } */
>>  /* { dg-final { scan-assembler-times "xvnabsdp"  1 } } */
>>  /* { dg-final { scan-assembler-times "vslo"  4 } } */
>> -/* { dg-final { scan-assembler-times "xxlor" 30 } } */
>> +/* { dg-final { scan-assembler-times "xxlor" 32 } } */
> This will need modification for the phase of the moon.  It also does not
> even test only xxlor insn (also xxlorc insns, for example).
>
>> +/* { dg-final { scan-assembler-times "xxsldwi"   10 } } */
> Okay if you make this
>   \mxxsldwi\M
> or even
>   \m(?:xxsldwi|vsldoi)\M
>
> Thanks!
>
>
> Segher


[PATCH] rs6000: Support vector float/double for vec_sldw

2022-01-21 Thread Bill Schmidt via Gcc-patches
Hi,

It was recently discovered that Clang supports a couple of variants of vec_sldw 
that
GCC does not.  After some discussion, we decided that these variants are 
reasonable,
and GCC will also support them.  This patch adds that support.

I updated an existing test and discovered it wasn't actually checking for 
generation
of the xxsldwi instruction, so I added that check as well.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this 
okay
for trunk?

Thanks!
Bill


2022-01-21  Bill Schmidt  

gcc/
* config/rs6000/rs6000-overload.def (VEC_SLDW): Add instances for
vector float and vector double.

gcc/testsuite/
* gcc.target/powerpc/builtins-4.c: Add two test variants.  Adjust
assembler counts.
---
 gcc/config/rs6000/rs6000-overload.def |  4 +++
 gcc/testsuite/gcc.target/powerpc/builtins-4.c | 34 +--
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index dea6f5d4258..cdc703e9764 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3405,6 +3405,10 @@
 XXSLDWI_2DI  XXSLDWI_VSLL
   vull __builtin_vec_sldw (vull, vull, const int);
 XXSLDWI_2DI  XXSLDWI_VULL
+  vf __builtin_vec_sldw (vf, vf, const int);
+XXSLDWI_4SF  XXSLDWI_VF
+  vd __builtin_vec_sldw (vd, vd, const int);
+XXSLDWI_2DF  XXSLDWI_VD
 
 [VEC_SLL, vec_sll, __builtin_vec_sll]
   vsc __builtin_vec_sll (vsc, vuc);
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-4.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-4.c
index 4e3b543f242..df012e9b7d6 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-4.c
@@ -119,6 +119,18 @@ test_vul_sldw_vul_vul (vector unsigned long long x,
return vec_sldw (x, y, 3);
 }
 
+vector float
+test_vf_sldw_vf_vf (vector float x, vector float y)
+{
+  return vec_sldw (x, y, 3);
+}
+
+vector double
+test_vd_sldw_vd_vd (vector double x, vector double y)
+{
+  return vec_sldw (x, y, 1);
+}
+
 vector signed int long long
 test_sll_vsill_vsill_vuc (vector signed long long int x,
  vector unsigned char y)
@@ -146,14 +158,16 @@ test_sll_vuill_vuill_vuc (vector unsigned long long int x,
  test_slo_vsll_slo_vsll_vuc1 vslo
  test_slo_vull_slo_vull_vsc1 vslo
  test_slo_vull_slo_vull_vuc1 vslo
- test_vsc_sldw_vsc_vsc 1 xxlor
- test_vuc_sldw_vuc_vuc 1 xxlor
- test_vssi_sldw_vssi_vssi  1 xxlor
- test_vusi_sldw_vusi_vusi  1 xxlor
- test_vsi_sldw_vsi_vsi 1 xxlor
- test_vui_sldw_vui_vui 1 xxlor
- test_vsl_sldw_vsl_vsl 1 xxlor
- test_vul_sldw_vul_vul 1 xxlor
+ test_vsc_sldw_vsc_vsc 1 xxlor, 1 xxsldwi
+ test_vuc_sldw_vuc_vuc 1 xxlor, 1 xxsldwi
+ test_vssi_sldw_vssi_vssi  1 xxlor, 1 xxsldwi
+ test_vusi_sldw_vusi_vusi  1 xxlor, 1 xxsldwi
+ test_vsi_sldw_vsi_vsi 1 xxlor, 1 xxsldwi
+ test_vui_sldw_vui_vui 1 xxlor, 1 xxsldwi
+ test_vsl_sldw_vsl_vsl 1 xxlor, 1 xxsldwi
+ test_vul_sldw_vul_vul 1 xxlor, 1 xxsldwi
+ test_vf_sldw_vf_vf1 xxlor, 1 xxsldwi
+ test_vd_sldw_vd_vd1 xxlor, 1 xxsldwi
  test_sll_vsill_vsill_vuc  1 vsl
  test_sll_vuill_vuill_vuc  1 vsl  */
 
@@ -161,6 +175,6 @@ test_sll_vuill_vuill_vuc (vector unsigned long long int x,
 /* { dg-final { scan-assembler-times "xvnabssp"  1 } } */
 /* { dg-final { scan-assembler-times "xvnabsdp"  1 } } */
 /* { dg-final { scan-assembler-times "vslo"  4 } } */
-/* { dg-final { scan-assembler-times "xxlor" 30 } } */
+/* { dg-final { scan-assembler-times "xxlor" 32 } } */
 /* { dg-final { scan-assembler-times {\mvsl\M}   5 } } */
-
+/* { dg-final { scan-assembler-times "xxsldwi"   10 } } */
-- 
2.27.0




[PATCH v2] rs6000: More factoring of overload processing

2022-01-19 Thread Bill Schmidt via Gcc-patches
Hi!

[I'm resubmitting this because the filename changed with the recent conversion
from .c to .cc.]

This patch continues the refactoring started with r12-6014.  I had previously
noted that the resolve_vec* routines can be further simplified by processing
the argument list earlier, so that all routines can use the arrays of arguments
and types.  I found that this was useful for some of the routines, but not for
all of them.

For several of the special-cased overloads, we don't specify all of the
possible type combinations in rs6000-overload.def, because the types don't
matter for the expansion we do.  For these, we can't use generic error message
handling when the number of arguments is incorrect, because the result is
misleading error messages that indicate argument types are wrong.

So this patch goes halfway and improves the factoring on the remaining special
cases, but leaves vec_splats, vec_promote, vec_extract, vec_insert, and
vec_step alone.

Bootstrapped and tested on powerpc64le-linux-gnu.  Is this okay for trunk?

Thanks,
Bill


2022-01-18  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.cc (resolve_vec_mul): Accept args and types
parameters instead of arglist and nargs.  Simplify accordingly.  Remove
unnecessary test for argument count mismatch.
(resolve_vec_cmpne): Likewise.
(resolve_vec_adde_sube): Likewise.
(resolve_vec_addec_subec): Likewise.
(altivec_resolve_overloaded_builtin): Move overload special handling
after the gathering of arguments into args[] and types[] and the test
for correct number of arguments.  Don't perform the test for correct
number of arguments for certain special cases.  Call the other special
cases with args and types instead of arglist and nargs.
---
 gcc/config/rs6000/rs6000-c.cc | 304 ++
 1 file changed, 127 insertions(+), 177 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 145421ab8f2..35c1383f059 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -939,37 +939,25 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 enum resolution { unresolved, resolved, resolved_bad };
 
 /* Resolve an overloaded vec_mul call and return a tree expression for the
-   resolved call if successful.  NARGS is the number of arguments to the call.
-   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   resolved call if successful.  ARGS contains the arguments to the call.
+   TYPES contains their types.  RES must be set to indicate the status of
the resolution attempt.  LOC contains statement location information.  */
 
 static tree
-resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs,
-location_t loc)
+resolve_vec_mul (resolution *res, tree *args, tree *types, location_t loc)
 {
   /* vec_mul needs to be special cased because there are no instructions for it
  for the {un}signed char, {un}signed short, and {un}signed int types.  */
-  if (nargs != 2)
-{
-  error ("builtin %qs only accepts 2 arguments", "vec_mul");
-  *res = resolved;
-  return error_mark_node;
-}
-
-  tree arg0 = (*arglist)[0];
-  tree arg0_type = TREE_TYPE (arg0);
-  tree arg1 = (*arglist)[1];
-  tree arg1_type = TREE_TYPE (arg1);
 
   /* Both arguments must be vectors and the types must be compatible.  */
-  if (TREE_CODE (arg0_type) != VECTOR_TYPE
-  || !lang_hooks.types_compatible_p (arg0_type, arg1_type))
+  if (TREE_CODE (types[0]) != VECTOR_TYPE
+  || !lang_hooks.types_compatible_p (types[0], types[1]))
 {
   *res = resolved_bad;
   return error_mark_node;
 }
 
-  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+  switch (TYPE_MODE (TREE_TYPE (types[0])))
 {
 case E_QImode:
 case E_HImode:
@@ -978,21 +966,21 @@ resolve_vec_mul (resolution *res, vec 
*arglist, unsigned nargs,
 case E_TImode:
   /* For scalar types just use a multiply expression.  */
   *res = resolved;
-  return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
- fold_convert (TREE_TYPE (arg0), arg1));
+  return fold_build2_loc (loc, MULT_EXPR, types[0], args[0],
+ fold_convert (types[0], args[1]));
 case E_SFmode:
   {
/* For floats use the xvmulsp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 case E_DFmode:
   {
/* For doubles use the xvmuldp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 /* Other types are errors.  */

[PATCH] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-01-19 Thread Bill Schmidt via Gcc-patches
Hi!

https://gcc.gnu.org/PR95082 demonstrates that we don't generate correct code for
vec_cntlz_lsbb and vec_cnttz_lsbb for little-endian targets.  This patch 
corrects
the problem by marking the built-ins as bif_is_endian and using the correct
target patterns for each endianness.  Note that the default patterns are for
little endian, and the overridden patterns in rs6000-builtin.cc are for big
endian.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk, and eventually for backport to GCC 11?

Thanks!
Bill


2022-01-18  Bill Schmidt  

gcc/
PR target/95082
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Handle
endianness for vclzlsbb and vctzlsbb.
* config/rs6000/rs6000-builtins.def (VCLZLSBB_V16QI): Change
default pattern and indicate a different pattern will be used for
big endian.
(VCLZLSBB_V4SI): Likewise.
(VCLZLSBB_V8HI): Likewise.
(VCTZLSBB_V16QI): Likewise.
(VCTZLSBB_V4SI): Likewise.
(VCTZLSBB_V8HI): Likewise.

gcc/testsuite/
PR target/95082
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c: Restrict to -mbig.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-1.c: Likewise.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c: New.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c: New.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-0.c: Restrict to -mbig.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-1.c: Likewise.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c: New.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c: New.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 12 
 gcc/config/rs6000/rs6000-builtins.def | 12 ++--
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-1.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c | 15 +++
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c | 15 +++
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-0.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-1.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c | 15 +++
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c | 15 +++
 10 files changed, 82 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-4.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-4.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 6eca3568c02..421277a0ef0 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -3485,6 +3485,18 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
subtarget */,
icode = CODE_FOR_vsx_store_v8hi;
   else if (fcode == RS6000_BIF_ST_ELEMREV_V16QI)
icode = CODE_FOR_vsx_store_v16qi;
+  else if (fcode == RS6000_BIF_VCLZLSBB_V16QI)
+   icode = CODE_FOR_vclzlsbb_v16qi;
+  else if (fcode == RS6000_BIF_VCLZLSBB_V4SI)
+   icode = CODE_FOR_vclzlsbb_v4si;
+  else if (fcode == RS6000_BIF_VCLZLSBB_V8HI)
+   icode = CODE_FOR_vclzlsbb_v8hi;
+  else if (fcode == RS6000_BIF_VCTZLSBB_V16QI)
+   icode = CODE_FOR_vctzlsbb_v16qi;
+  else if (fcode == RS6000_BIF_VCTZLSBB_V4SI)
+   icode = CODE_FOR_vctzlsbb_v4si;
+  else if (fcode == RS6000_BIF_VCTZLSBB_V8HI)
+   icode = CODE_FOR_vctzlsbb_v8hi;
   else
gcc_unreachable ();
 }
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index cfe31c2e7de..2bb997a5279 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2551,13 +2551,13 @@
 VBPERMD altivec_vbpermd {}
 
   const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
-VCLZLSBB_V16QI vclzlsbb_v16qi {}
+VCLZLSBB_V16QI vctzlsbb_v16qi {endian}
 
   const signed int __builtin_altivec_vclzlsbb_v4si (vsi);
-VCLZLSBB_V4SI vclzlsbb_v4si {}
+VCLZLSBB_V4SI vctzlsbb_v4si {endian}
 
   const signed int __builtin_altivec_vclzlsbb_v8hi (vss);
-VCLZLSBB_V8HI vclzlsbb_v8hi {}
+VCLZLSBB_V8HI vctzlsbb_v8hi {endian}
 
   const vsc __builtin_altivec_vctzb (vsc);
 VCTZB ctzv16qi2 {}
@@ -2572,13 +2572,13 @@
 VCTZW ctzv4si2 {}
 
   const signed int __builtin_altivec_vctzlsbb_v16qi (vsc);
-VCTZLSBB_V16QI vctzlsbb_v16qi {}
+VCTZLSBB_V16QI vclzlsbb_v16qi {endian}
 
   const signed int __builtin_altivec_vctzlsbb_v4si (vsi);
-VCTZLSBB_V4SI vctzlsbb_v4si {}
+VCTZLSBB_V4SI vclzlsbb_v4si {endian}
 
   const signed int __builtin_altivec_vctzlsbb_v8hi (vss);
-VCTZLSBB_V8HI vctzlsbb_v8hi {}
+VCTZLSBB_V8HI vclzlsbb_v8hi {endian}
 
   const signed int __builtin_altivec_vcmpaeb_p (vsc, vsc);
 VCMPAEB_P 

[PATCH] rs6000: Convert built-in constraints to form

2022-01-12 Thread Bill Schmidt via Gcc-patches
Hi!

When introducing the new built-in support, I tried to match as many
existing error messages as possible.  One common form was "argument X must
be a Y-bit unsigned literal".  Another was "argument X must be a literal
between X' and  Y', inclusive".  During reviews, Segher requested that I
eventually convert all messages of the first form into the second form for
consistency.  That's what this patch does, replacing all -form
constraints (first form) with -form constraints (second form).

For the moment, the parser will still accept  arguments, but I've added
a note in rs6000-builtins.def that this form is deprecated in favor of
.  I think it's harmless to leave it in, in case a desire for the
distinction comes up in the future.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?

Thanks!
Bill

2022-01-12  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtins.def (MTFSB0): Replace -form
constraints with -form constraints.
(MTFSB1): Likewise.
(MTFSF): Likewise.
(UNPACK_IF): Likewise.
(UNPACK_TF): Likewise.
(DSS): Likewise.
(DST): Likewise.
(DSTST): Likewise.
(DSTSTT): Likewise.
(DSTT): Likewise.
(VCFSX): Likewise.
(VCFUX): Likewise.
(VCTSXS): Likewise.
(VCTUXS): Likewise.
(VSLDOI_16QI): Likewise.
(VSLDOI_4SF): Likewise.
(VSLDOI_4SI): Likewise.
(VSLDOI_8HI): Likewise.
(VSPLTB): Likewise.
(VSPLTH): Likewise.
(VSPLTW): Likewise.
(VEC_SET_V16QI): Likewise.
(VEC_SET_V4SF): Likewise.
(VEC_SET_V4SI): Likewise.
(VEC_SET_V8HI): Likewise.
(VSLDOI_2DF): Likewise.
(VSLDOI_2DI): Likewise.
(VEC_SET_V2DF): Likewise.
(VEC_SET_V2DI): Likewise.
(XVCVSXDDP_SCALE): Likewise.
(XVCVUXDDP_SCALE): Likewise.
(XXPERMDI_16QI): Likewise.
(XXPERMDI_1TI): Likewise.
(XXPERMDI_2DF): Likewise.
(XXPERMDI_2DI): Likewise.
(XXPERMDI_4SF): Likewise.
(XXPERMDI_4SI): Likewise.
(XXPERMDI_8HI): Likewise.
(XXSLDWI_16QI): Likewise.
(XXSLDWI_2DF): Likewise.
(XXSLDWI_2DI): Likewise.
(XXSLDWI_4SF): Likewise.
(XXSLDWI_4SI): Likewise.
(XXSLDWI_8HI): Likewise.
(XXSPLTD_V2DF): Likewise.
(XXSPLTD_V2DI): Likewise.
(UNPACK_V1TI): Likewise.
(BCDADD_V1TI): Likewise.
(BCDADD_V16QI): Likewise.
(BCDADD_EQ_V1TI): Likewise.
(BCDADD_EQ_V16QI): Likewise.
(BCDADD_GT_V1TI): Likewise.
(BCDADD_GT_V16QI): Likewise.
(BCDADD_LT_V1TI): Likewise.
(BCDADD_LT_V16QI): Likewise.
(BCDADD_OV_V1TI): Likewise.
(BCDADD_OV_V16QI): Likewise.
(BCDSUB_V1TI): Likewise.
(BCDSUB_V16QI): Likewise.
(BCDSUB_EQ_V1TI): Likewise.
(BCDSUB_EQ_V16QI): Likewise.
(BCDSUB_GT_V1TI): Likewise.
(BCDSUB_GT_V16QI): Likewise.
(BCDSUB_LT_V1TI): Likewise.
(BCDSUB_LT_V16QI): Likewise.
(BCDSUB_OV_V1TI): Likewise.
(BCDSUB_OV_V16QI): Likewise.
(VSTDCDP): Likewise.
(VSTDCSP): Likewise.
(VTDCDP): Likewise.
(VTDCSP): Likewise.
(TSTSFI_EQ_DD): Likewise.
(TSTSFI_EQ_TD): Likewise.
(TSTSFI_GT_DD): Likewise.
(TSTSFI_GT_TD): Likewise.
(TSTSFI_LT_DD): Likewise.
(TSTSFI_LT_TD): Likewise.
(TSTSFI_OV_DD): Likewise.
(TSTSFI_OV_TD): Likewise.
(VSTDCQP): Likewise.
(DDEDPD): Likewise.
(DDEDPDQ): Likewise.
(DENBCD): Likewise.
(DENBCDQ): Likewise.
(DSCLI): Likewise.
(DSCLIQ): Likewise.
(DSCRI): Likewise.
(DSCRIQ): Likewise.
(UNPACK_TD): Likewise.
(VSHASIGMAD): Likewise.
(VSHASIGMAW): Likewise.
(VCNTMBB): Likewise.
(VCNTMBD): Likewise.
(VCNTMBH): Likewise.
(VCNTMBW): Likewise.
(VREPLACE_UN_UV2DI): Likewise.
(VREPLACE_UN_UV4SI): Likewise.
(VREPLACE_UN_V2DF): Likewise.
(VREPLACE_UN_V2DI): Likewise.
(VREPLACE_UN_V4SF): Likewise.
(VREPLACE_UN_V4SI): Likewise.
(VREPLACE_ELT_UV2DI): Likewise.
(VREPLACE_ELT_UV4SI): Likewise.
(VREPLACE_ELT_V2DF): Likewise.
(VREPLACE_ELT_V2DI): Likewise.
(VREPLACE_ELT_V4SF): Likewise.
(VREPLACE_ELT_V4SI): Likewise.
(VSLDB_V16QI): Likewise.
(VSLDB_V2DI): Likewise.
(VSLDB_V4SI): Likewise.
(VSLDB_V8HI): Likewise.
(VSRDB_V16QI): Likewise.
(VSRDB_V2DI): Likewise.
(VSRDB_V4SI): Likewise.
(VSRDB_V8HI): Likewise.
(VXXSPLTI32DX_V4SF): Likewise.
(VXXSPLTI32DX_V4SI): Likewise.
(XXEVAL): Likewise.
(XXGENPCVM_V16QI): Likewise.
(XXGENPCVM_V2DI): Likewise.
(XXGENPCVM_V4SI): Likewise.
(XXGENPCVM_V8HI): Likewise.
   

Re: [vect] PR103971, PR103977: Fix epilogue mode selection for autodetect only

2022-01-12 Thread Bill Schmidt via Gcc-patches
I think we need a fix or a revert for this today, please.  Bootstrap has been 
broken
for a couple of days during the last week of stage 3, which is really 
problematic.

Thanks,
Bill

On 1/12/22 6:57 AM, Richard Biener via Gcc-patches wrote:
> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>
>> On 12/01/2022 11:59, Richard Biener wrote:
>>> On Wed, 12 Jan 2022, Andre Vieira (lists) wrote:
>>>
 On 12/01/2022 11:44, Richard Sandiford wrote:
> Another alternative would be to push autodetected_vector_mode when the
> length is 1 and keep 1 as the starting point.
>
> Richard
 I'm guessing we would still want to skip epilogue vectorization if
 !VECTOR_MODE_P (autodetected_vector_mode) in that case?
>>> Practically we currently only support fixed width word_mode there,
>>> but eventually one could end up with 64bit DImode for the main loop
>>> and 32bit V4QImode in the epilogue ... so not sure if it's worth
>>> special-casing.  But I don't mind adding that skip.
>>>
>>> Richard.
>> I left out the skip, it shouldn't break anything as it would try that same
>> mode before anyway.
>> Just to clarify what I meant though was to skip if autodetected_vector_mode
>> wasn't a vector AND the target didn't define autovectorize_vector_modes, so 
>> in
>> that scenario it wouldn't ever try  V4QImode for the epilogue if the mainloop
>> was autodetected DImode, I think...
>> Either way, this is less code, less complicated and doesn't analyze more than
>> it did before the original patch, so I'm happy with that too.
>>
>> Is this what you had in mind?
> -  mode_i = 1;
> +  if (vector_modes.length () == 1)
> +{
> +  /* If we only had VOIDmode then use AUTODETECTED_VECTOR_MODE to see
> if
> +an epilogue can be created with that mode.  */
> +  vector_modes[0] = autodetected_vector_mode;
> +  mode_i = 0;
> +}
> +  else
> +mode_i = 1;
> +
>
> I would have left out the condition and unconditionally do
>
>   vector_modes[0] = autodetected_vector_mode;
>   mode_i = 0;
>
> but OK if you think it makes sense to special case length == 1.
>
> Richard.


Re: [PATCH] PR 102935, Fix pr101384-1.c code generation test.

2022-01-11 Thread Bill Schmidt via Gcc-patches
Hi Mike,

This looks fine to me.  Maintainers?

Thanks,
Bill

On 1/7/22 6:33 PM, Michael Meissner wrote:
> Fix pr101384-1.c code generation test.
>
> Add support for the compiler using XXSPLTIB reg,255 to load all 1's into a
> register on power9 and above instead of using VSPLTI{B,H,W} reg,-1.
>
> gcc/testsuite/
> 2022-01-07  Michael Meissner  
>
>   PR testsuite/102935
>   * gcc.target/powerpc/pr101384-1.c: Update insn regexp for power9
>   and power10.
> ---
>  gcc/testsuite/gcc.target/powerpc/pr101384-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr101384-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> index 627d7d76721..41cf84bf8bc 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> @@ -2,7 +2,7 @@
>  /* { dg-do compile { target le } } */
>  /* { dg-options "-O2 -maltivec" } */
>  /* { dg-require-effective-target powerpc_altivec_ok } */
> -/* { dg-final { scan-assembler-times {\mvspltis[whb] [^\n\r]*,-1\M} 9 } } */
> +/* { dg-final { scan-assembler-times {\mvspltis[whb] 
> [^\n\r]*,-1\M|\mxxspltib[^\n\r]*,255\M} 9 } } */
>  /* { dg-final { scan-assembler-times {\mvslw\M} 3 } } */
>  /* { dg-final { scan-assembler-times {\mvslh\M} 3 } } */
>  /* { dg-final { scan-assembler-times {\mvslb\M} 3 } } */


[PATCH] rs6000: Don't #ifdef "short" built-in names

2022-01-06 Thread Bill Schmidt via Gcc-patches
Hi!

It was recently pointed out that we get anomalous behavior when using
__attribute__((target)) to select a CPU.  As an example, when building for
-mcpu=power8 but using __attribute__((target("mcpu=power10")), it is legal
to call __builtin_vec_mod, but not vec_mod, even though these are
equivalent.  This is because the equivalence is established with a #define
that is guarded by #ifdef _ARCH_PWR10.

This goofy behavior occurs with both the old builtins support and the
new.  One of the goals of the new builtins support was to make sure all
appropriate interfaces are available using __attribute__((target)), so I
failed in this respect.  This patch corrects the problem by removing the
apply.  For example, #ifdef __PPU__ is still appropriate.

Bootstrapped and tested on powerpc64le-linux-gnu.  Is this okay for trunk?

Thanks!
Bill


2022-01-06  Bill Schmidt  

gcc/
* config/rs6000/rs6000-overload.def (VEC_ABSD): Remove #ifdef token.
(VEC_BLENDV): Likewise.
(VEC_BPERM): Likewise.
(VEC_CFUGE): Likewise.
(VEC_CIPHER_BE): Likewise.
(VEC_CIPHERLAST_BE): Likewise.
(VEC_CLRL): Likewise.
(VEC_CLRR): Likewise.
(VEC_CMPNEZ): Likewise.
(VEC_CNTLZ): Likewise.
(VEC_CNTLZM): Likewise.
(VEC_CNTTZM): Likewise.
(VEC_CNTLZ_LSBB): Likewise.
(VEC_CNTM): Likewise.
(VEC_CNTTZ): Likewise.
(VEC_CNTTZ_LSBB): Likewise.
(VEC_CONVERT_4F32_8F16): Likewise.
(VEC_DIV): Likewise.
(VEC_DIVE): Likewise.
(VEC_EQV): Likewise.
(VEC_EXPANDM): Likewise.
(VEC_EXTRACT_FP_FROM_SHORTH): Likewise.
(VEC_EXTRACT_FP_FROM_SHORTL): Likewise.
(VEC_EXTRACTH): Likewise.
(VEC_EXTRACTL): Likewise.
(VEC_EXTRACTM): Likewise.
(VEC_EXTRACT4B): Likewise.
(VEC_EXTULX): Likewise.
(VEC_EXTURX): Likewise.
(VEC_FIRSTMATCHINDEX): Likewise.
(VEC_FIRSTMACHOREOSINDEX): Likewise.
(VEC_FIRSTMISMATCHINDEX): Likewise.
(VEC_FIRSTMISMATCHOREOSINDEX): Likewise.
(VEC_GB): Likewise.
(VEC_GENBM): Likewise.
(VEC_GENHM): Likewise.
(VEC_GENWM): Likewise.
(VEC_GENDM): Likewise.
(VEC_GENQM): Likewise.
(VEC_GENPCVM): Likewise.
(VEC_GNB): Likewise.
(VEC_INSERTH): Likewise.
(VEC_INSERTL): Likewise.
(VEC_INSERT4B): Likewise.
(VEC_LXVL): Likewise.
(VEC_MERGEE): Likewise.
(VEC_MERGEO): Likewise.
(VEC_MOD): Likewise.
(VEC_MSUB): Likewise.
(VEC_MULH): Likewise.
(VEC_NAND): Likewise.
(VEC_NCIPHER_BE): Likewise.
(VEC_NCIPHERLAST_BE): Likewise.
(VEC_NEARBYINT): Likewise.
(VEC_NMADD): Likewise.
(VEC_ORC): Likewise.
(VEC_PDEP): Likewise.
(VEC_PERMX): Likewise.
(VEC_PEXT): Likewise.
(VEC_POPCNT): Likewise.
(VEC_PARITY_LSBB): Likewise.
(VEC_REPLACE_ELT): Likewise.
(VEC_REPLACE_UN): Likewise.
(VEC_REVB): Likewise.
(VEC_RINT): Likewise.
(VEC_RLMI): Likewise.
(VEC_RLNM): Likewise.
(VEC_SBOX_BE): Likewise.
(VEC_SIGNEXTI): Likewise.
(VEC_SIGNEXTLL): Likewise.
(VEC_SIGNEXTQ): Likewise.
(VEC_SLDB): Likewise.
(VEC_SLV): Likewise.
(VEC_SPLATI): Likewise.
(VEC_SPLATID): Likewise.
(VEC_SPLATI_INS): Likewise.
(VEC_SQRT): Likewise.
(VEC_SRDB): Likewise.
(VEC_SRV): Likewise.
(VEC_STRIL): Likewise.
(VEC_STRIL_P): Likewise.
(VEC_STRIR): Likewise.
(VEC_STRIR_P): Likewise.
(VEC_STXVL): Likewise.
(VEC_TERNARYLOGIC): Likewise.
(VEC_TEST_LSBB_ALL_ONES): Likewise.
(VEC_TEST_LSBB_ALL_ZEROS): Likewise.
(VEC_VEE): Likewise.
(VEC_VES): Likewise.
(VEC_VIE): Likewise.
(VEC_VPRTYB): Likewise.
(VEC_VSCEEQ): Likewise.
(VEC_VSCEGT): Likewise.
(VEC_VSCELT): Likewise.
(VEC_VSCEUO): Likewise.
(VEC_VSEE): Likewise.
(VEC_VSES): Likewise.
(VEC_VSIE): Likewise.
(VEC_VSTDC): Likewise.
(VEC_VSTDCN): Likewise.
(VEC_VTDC): Likewise.
(VEC_XL): Likewise.
(VEC_XL_BE): Likewise.
(VEC_XL_LEN_R): Likewise.
(VEC_XL_SEXT): Likewise.
(VEC_XL_ZEXT): Likewise.
(VEC_XST): Likewise.
(VEC_XST_BE): Likewise.
(VEC_XST_LEN_R): Likewise.
(VEC_XST_TRUNC): Likewise.
(VEC_XXPERMDI): Likewise.
(VEC_XXSLDWI): Likewise.
(VEC_TSTSFI_EQ_DD): Likewise.
(VEC_TSTSFI_EQ_TD): Likewise.
(VEC_TSTSFI_GT_DD): Likewise.
(VEC_TSTSFI_GT_TD): Likewise.
(VEC_TSTSFI_LT_DD): Likewise.
(VEC_TSTSFI_LT_TD): Likewise.
(VEC_TSTSFI_OV_DD): Likewise.
(VEC_TSTSFI_OV_TD): Likewise.
(VEC_VADDCUQ): Likewise.
(VEC_VADDECUQ): Likewise.
  

[PATCH] rs6000: More factoring of overload processing

2022-01-06 Thread Bill Schmidt via Gcc-patches
Hi!

This patch continues the refactoring started with r12-6014.  I had previously
noted that the resolve_vec* routines can be further simplified by processing
the argument list earlier, so that all routines can use the arrays of arguments
and types.  I found that this was useful for some of the routines, but not for
all of them.

For several of the special-cased overloads, we don't specify all of the
possible type combinations in rs6000-overload.def, because the types don't
matter for the expansion we do.  For these, we can't use generic error message
handling when the number of arguments is incorrect, because the result is
misleading error messages that indicate argument types are wrong.

So this patch goes halfway and improves the factoring on the remaining special
cases, but leaves vec_splats, vec_promote, vec_extract, vec_insert, and
vec_step alone.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill


2022-01-06  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (resolve_vec_mul): Accept args and types
parameters instead of arglist and nargs.  Simplify accordingly.  Remove
unnecessary test for argument count mismatch.
(resolve_vec_cmpne): Likewise.
(resolve_vec_adde_sube): Likewise.
(resolve_vec_addec_subec): Likewise.
(altivec_resolve_overloaded_builtin): Move overload special handling
after the gathering of arguments into args[] and types[] and the test
for correct number of arguments.  Don't perform the test for correct
number of arguments for certain special cases.  Call the other special
cases with args and types instead of arglist and nargs.
---
 gcc/config/rs6000/rs6000-c.c | 304 +++
 1 file changed, 127 insertions(+), 177 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 24a081ced37..189a70d89bf 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -939,37 +939,25 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 enum resolution { unresolved, resolved, resolved_bad };
 
 /* Resolve an overloaded vec_mul call and return a tree expression for the
-   resolved call if successful.  NARGS is the number of arguments to the call.
-   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   resolved call if successful.  ARGS contains the arguments to the call.
+   TYPES contains their types.  RES must be set to indicate the status of
the resolution attempt.  LOC contains statement location information.  */
 
 static tree
-resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs,
-location_t loc)
+resolve_vec_mul (resolution *res, tree *args, tree *types, location_t loc)
 {
   /* vec_mul needs to be special cased because there are no instructions for it
  for the {un}signed char, {un}signed short, and {un}signed int types.  */
-  if (nargs != 2)
-{
-  error ("builtin %qs only accepts 2 arguments", "vec_mul");
-  *res = resolved;
-  return error_mark_node;
-}
-
-  tree arg0 = (*arglist)[0];
-  tree arg0_type = TREE_TYPE (arg0);
-  tree arg1 = (*arglist)[1];
-  tree arg1_type = TREE_TYPE (arg1);
 
   /* Both arguments must be vectors and the types must be compatible.  */
-  if (TREE_CODE (arg0_type) != VECTOR_TYPE
-  || !lang_hooks.types_compatible_p (arg0_type, arg1_type))
+  if (TREE_CODE (types[0]) != VECTOR_TYPE
+  || !lang_hooks.types_compatible_p (types[0], types[1]))
 {
   *res = resolved_bad;
   return error_mark_node;
 }
 
-  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+  switch (TYPE_MODE (TREE_TYPE (types[0])))
 {
 case E_QImode:
 case E_HImode:
@@ -978,21 +966,21 @@ resolve_vec_mul (resolution *res, vec 
*arglist, unsigned nargs,
 case E_TImode:
   /* For scalar types just use a multiply expression.  */
   *res = resolved;
-  return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
- fold_convert (TREE_TYPE (arg0), arg1));
+  return fold_build2_loc (loc, MULT_EXPR, types[0], args[0],
+ fold_convert (types[0], args[1]));
 case E_SFmode:
   {
/* For floats use the xvmulsp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 case E_DFmode:
   {
/* For doubles use the xvmuldp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 /* Other types are errors.  */
 default:
@@ -1002,37 +990,25 @@ resolve_vec_mul (resolution *res, vec 

Re: [PATCH] rs6000: Skip overload instances with uninitialized fntype (PR103622)

2022-01-05 Thread Bill Schmidt via Gcc-patches
Hi!  I'd like to ping this patch, now that I'm back from break.

Thanks!
Bill

On 12/13/21 10:15 AM, Bill Schmidt wrote:
> Hi!
>
> For some data types like IEEE-128, we determine whether the type is available
> at built-in function initialization time.  If it's not, then we don't provide
> the function type for function instances that require the data type.  PR103622
> observes that this can cause us to ICE when running the list of instances when
> the target doesn't support the data type.
>
> Ideally, we wouldn't even put such an instance in the list of instances that
> an overload can map to, but to do that is much more complicated.  Instead,
> this patch just ensures we don't dereference a NULL pointer when the situation
> arises.
>
> Tested the fix on a powerpc-e300c3-linux-gnu cross.  Bootstrapped and tested 
> on
> powerpc64le-linux-gnu with no regressions.  Is this okay for trunk?
>
> Thanks!
> Bill
>
>
> 2021-12-13  Bill Schmidt  
>
> gcc/
>   PR target/103622
>   * config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin):
>   Skip over instances with undefined function types.
> ---
>  gcc/config/rs6000/rs6000-c.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index 8e83d97e72f..fc4cc929884 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -2943,6 +2943,12 @@ altivec_resolve_new_overloaded_builtin (location_t 
> loc, tree fndecl,
>  
>   for (; instance != NULL; instance = instance->next)
> {
> + /* It is possible for an instance to require a data type that isn't
> +defined on this target, in which case instance->fntype will be
> +NULL.  */
> + if (!instance->fntype)
> +   continue;
> +
>   bool mismatch = false;
>   tree nextparm = TYPE_ARG_TYPES (instance->fntype);
>  


Re: [PATCH 2/2] rs6000: Update darn testcases

2021-12-17 Thread Bill Schmidt via Gcc-patches
Hi!

On 12/17/21 11:36 AM, Segher Boessenkool wrote:
> Make the darn testcases work (and be tested) in 32-bit mode as well.
> They used to ICE, but they no longer do.
>
>
> 2021-12-17  Segher Boessenkool 
>
> gcc/testsuite/
>   PR target/103624
>   * gcc.target/powerpc/darn-0.c: Remove target clause.
>   * gcc.target/powerpc/darn-1.c: Remove target clause. Remove lp64
>   requirement.  Change return type to long.
>   * gcc.target/powerpc/darn-2.c: Ditto.
>   * gcc.target/powerpc/darn-3.c: Remove target clause.

LGTM.

Thanks!
Bill

>
> ---
>  gcc/testsuite/gcc.target/powerpc/darn-0.c | 2 +-
>  gcc/testsuite/gcc.target/powerpc/darn-1.c | 5 ++---
>  gcc/testsuite/gcc.target/powerpc/darn-2.c | 5 ++---
>  gcc/testsuite/gcc.target/powerpc/darn-3.c | 2 +-
>  4 files changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/darn-0.c 
> b/gcc/testsuite/gcc.target/powerpc/darn-0.c
> index f446f494b06d..64d98f5f91d7 100644
> --- a/gcc/testsuite/gcc.target/powerpc/darn-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/darn-0.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-do compile } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-skip-if "" { powerpc*-*-aix* } } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/darn-1.c 
> b/gcc/testsuite/gcc.target/powerpc/darn-1.c
> index 0938718a5ad6..f483a89862d0 100644
> --- a/gcc/testsuite/gcc.target/powerpc/darn-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/darn-1.c
> @@ -1,12 +1,11 @@
> -/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-do compile } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-skip-if "" { powerpc*-*-aix* } } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
>  
>  #include 
>  
> -long long get_conditioned_random ()
> +long get_conditioned_random ()
>  {
>return __builtin_darn ();
>  }
> diff --git a/gcc/testsuite/gcc.target/powerpc/darn-2.c 
> b/gcc/testsuite/gcc.target/powerpc/darn-2.c
> index 64e44b244c4b..56a9ffb677b4 100644
> --- a/gcc/testsuite/gcc.target/powerpc/darn-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/darn-2.c
> @@ -1,12 +1,11 @@
> -/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-do compile } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
> -/* { dg-require-effective-target lp64 } */
>  /* { dg-skip-if "" { powerpc*-*-aix* } } */
>  /* { dg-options "-mdejagnu-cpu=power9" } */
>  
>  #include 
>  
> -long long get_raw_random ()
> +long get_raw_random ()
>  {
>return __builtin_darn_raw ();
>  }
> diff --git a/gcc/testsuite/gcc.target/powerpc/darn-3.c 
> b/gcc/testsuite/gcc.target/powerpc/darn-3.c
> index 477901fde70d..4c68fad80d5d 100644
> --- a/gcc/testsuite/gcc.target/powerpc/darn-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/darn-3.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-do compile } */
>  /* { dg-skip-if "" { powerpc*-*-aix* } } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
>  


Re: [PATCH 1/2] rs6000: Redo darn (PR103624)

2021-12-17 Thread Bill Schmidt via Gcc-patches
Hi!

On 12/17/21 11:36 AM, Segher Boessenkool wrote:
> The builtins now all return "long".  The patterns have :GPR as the
> output mode, so they can be 32-bit as well (the instruction makes sense
> in 32 bit just fine).  The builtins expand to the DImode version
> normally, but to the SImode if {32bit} is true.
>
> 2021-12-17  Segher Boessenkool 
>
>   PR target/103624
>   * config/rs6000/rs6000-builtins.def (__builtin_darn): Expand to
>   darn_64_di.  Add {32bit} attribute.  Return long.
>   (__builtin_darn_32): Expand to darn_32_di.  Add {32bit} attribute.
>   Return long.
>   (__builtin_darn_raw): Expand to darn_raw_di.  Add {32bit} attribute.
>   Return long.
>   * config/rs6000/rs6000-call.c (rs6000_expand_builtin): Expand the darn
>   builtins to the _si variants for -m32.
>   * config/rs6000/rs6000.md (UNSPECV_DARN_32, UNSPECV_DARN_RAW): Delete.
>   (UNSPECV_DARN): Update comment.
>   (darn_32, darn_raw, darn): Delete.
>   (darn_32_, darn_64_, darn_raw_ for GPR): New.
>   (@darn for GPR): New.

Patch LGTM.  Thanks for doing the legwork on this!

Bill

>
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 12 -
>  gcc/config/rs6000/rs6000-call.c   |  6 +
>  gcc/config/rs6000/rs6000.md   | 47 
> +--
>  3 files changed, 40 insertions(+), 25 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 45ce160bd421..3ad5a135eaec 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2798,14 +2798,14 @@
>  
>  ; Miscellaneous P9 functions
>  [power9]
> -  signed long long __builtin_darn ();
> -DARN darn {}
> +  signed long __builtin_darn ();
> +DARN darn_64_di {32bit}
>  
> -  signed int __builtin_darn_32 ();
> -DARN_32 darn_32 {}
> +  signed long __builtin_darn_32 ();
> +DARN_32 darn_32_di {32bit}
>  
> -  signed long long __builtin_darn_raw ();
> -DARN_RAW darn_raw {}
> +  signed long __builtin_darn_raw ();
> +DARN_RAW darn_raw_di {32bit}
>  
>const signed int __builtin_dtstsfi_eq_dd (const int<6>, _Decimal64);
>  TSTSFI_EQ_DD dfptstsfi_eq_dd {}
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index b98f4a4c97f7..cc55174c6b72 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -5631,6 +5631,12 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
> subtarget */,
>   icode = CODE_FOR_rs6000_mftb_si;
>else if (fcode == RS6000_BIF_BPERMD)
>   icode = CODE_FOR_bpermd_si;
> +  else if (fcode == RS6000_BIF_DARN)
> + icode = CODE_FOR_darn_64_si;
> +  else if (fcode == RS6000_BIF_DARN_32)
> + icode = CODE_FOR_darn_32_si;
> +  else if (fcode == RS6000_BIF_DARN_RAW)
> + icode = CODE_FOR_darn_raw_si;
>else
>   gcc_unreachable ();
>  }
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 4122acb98cfd..9be484c7cf83 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -172,9 +172,7 @@ (define_c_enum "unspecv"
> UNSPECV_EH_RR ; eh_reg_restore
> UNSPECV_ISYNC ; isync instruction
> UNSPECV_MFTB  ; move from time base
> -   UNSPECV_DARN  ; darn 1 (deliver a random number)
> -   UNSPECV_DARN_32   ; darn 2
> -   UNSPECV_DARN_RAW  ; darn 0
> +   UNSPECV_DARN  ; darn (deliver a random number)
> UNSPECV_NLGR  ; non-local goto receiver
> UNSPECV_MFFS  ; Move from FPSCR
> UNSPECV_MFFSL ; Move from FPSCR light instruction version
> @@ -15065,25 +15063,36 @@ (define_insn "*cmp_hw"
>  
>  ;; Miscellaneous ISA 3.0 (power9) instructions
>  
> -(define_insn "darn_32"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> -(unspec_volatile:SI [(const_int 0)] UNSPECV_DARN_32))]
> +(define_expand "darn_32_"
> +  [(use (match_operand:GPR 0 "register_operand"))]
>"TARGET_P9_MISC"
> -  "darn %0,0"
> -  [(set_attr "type" "integer")])
> +{
> +  emit_insn (gen_darn (mode, operands[0], const0_rtx));
> +  DONE;
> +})
>  
> -(define_insn "darn_raw"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> -(unspec_volatile:DI [(const_int 0)] UNSPECV_DARN_RAW))]
> -  "TARGET_P9_MISC && TARGET_64BIT"
> -  "darn %0,2"
> -  [(set_attr "type" "integer")])
> +(define_expand "darn_64_"
> +  [(use (match_operand:GPR 0 "register_operand"))]
> +  "TARGET_P9_MISC"
> +{
> +  emit_insn (gen_darn (mode, operands[0], const1_rtx));
> +  DONE;
> +})
>  
> -(define_insn "darn"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> -(unspec_volatile:DI [(const_int 0)] UNSPECV_DARN))]
> -  "TARGET_P9_MISC && TARGET_64BIT"
> -  "darn %0,1"
> +(define_expand "darn_raw_"
> +  [(use (match_operand:GPR 0 "register_operand"))]
> +  

  1   2   3   4   5   6   7   8   9   10   >