Re: [PATCH 10/13] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-05-24 Thread Carl Love



On 5/13/24 22:14, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, extend vec_xxpermdi built-in for __int128 args
>>
>> Add a new overloaded instance for vec_xxpermdi
>>
>>__int128 vec_xxpermdi (__int128, __int128, const int);
>>
>> Update the documentation to include a reference to the new built-in
>> instance.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (vec_xxpermdi): Add new
>>  overloaded built-in instance.
>> ---
>>  gcc/config/rs6000/rs6000-overload.def | 2 ++
>>  gcc/doc/extend.texi   | 1 +
>>  2 files changed, 3 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 5912c9452f4..49962e2f2a2 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -4932,6 +4932,8 @@
>>  XXPERMDI_4SF  XXPERMDI_VF
>>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>>  XXPERMDI_2DF  XXPERMDI_VD
>> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TI
> 
> This actually introduces the signed __int128, considering the other
> existing ones, I think we want both signed and unsigned.

Added unsigned as well.

> 
>>  
>>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 86b8e536dbe..47cf2f3bc8b 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -22505,6 +22505,7 @@ void vec_vsx_st (vector bool char, int, vector bool 
>> char *);
>>  void vec_vsx_st (vector bool char, int, unsigned char *);
>>  void vec_vsx_st (vector bool char, int, signed char *);
>>  
>> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
>>  vector double vec_xxpermdi (vector double, vector double, const int);
>>  vector float vec_xxpermdi (vector float, vector float, const int);
> 
> Nit: Considering the existing ones sorted by element size descending, I guess
> it's better to move the above here (and with the explicit signed and 
> unsigned).

OK, moved the new prototype down below the float prototype and added the 
unsigned prototype.
> 
> And we need a test case for it as well?
Yes, we need a test case for both.  Added a new runnable test file.

   Carl 


Re: [PATCH 8/13] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-05-24 Thread Carl Love
Kewen:

On 5/13/24 19:59, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:



>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> index 01f35dad713..35ea31b2616 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> @@ -2,7 +2,6 @@
>>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>>  /* { dg-require-effective-target powerpc_vsx_ok } */
>>  /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
>> -/* { dg-final { scan-assembler "vperm" } } */
>>  /* { dg-final { scan-assembler "xvrdpi" } } */
>>  /* { dg-final { scan-assembler "xvrdpic" } } */
>>  /* { dg-final { scan-assembler "xvrdpim" } } */
>> @@ -56,25 +55,6 @@ extern __vector unsigned long long ull[][4];
>>  extern __vector __bool long bl[][4];
>>  #endif
>>  
>> -int do_perm(void)
>> -{
>> -  int i = 0;
>> -
>> -  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
>> -
>> -  return i;
>> -}
>> -
> 
> I prefer to just relace these __builtin_vsx_vperm with vec_perm,
> OK with this tweaked (also keep the above removed vperm scan), thanks!

OK, sounds good.  Updated the patch to change built-in calls to vec_perm.  
Updated ChangeLog message to match change.
   
 Carl 


Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-05-24 Thread Carl Love
Kewen:

On 5/21/24 20:05, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/5/22 08:13, Carl Love wrote:
>> Kewen:



>>> Why did you place this in a section for ISA 3.1 (Power10)?  It doesn't 
>>> really
>>> require this support.  The used instance VSEL_1TI and VSEL_1TI_UNS are 
>>> placed
>>> in altivec stanza, so it looks that we should put it under the section
>>> "PowerPC AltiVec Built-in Functions on ISA 2.05".  And since it's an 
>>> extension
>>> of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an
>>> extension of the @code{vec_sel} built-in documented in the PVIPR" and 
>>> omitting
>>> the description to avoid possible slightly different wording.
>>
>> Honestly, at this point in time I don't remember why I put it there.  It has 
>> been too long since I created the patch.  That said, the test case requires 
>> Power 10 do to the comparison check using built-in vec_all_eq but that is 
>> another issue.  
>> The built-in generates the xxsel instruction that is an ISA 2.06 
>> instruction.  So, I would say it should to into the ISA 2.06 section.  I 
>> moved it to the ISA 2.06 section.
> 
> But the underlying implementation is:
> 
>   const vsq __builtin_altivec_vsel_1ti (vsq, vsq, vuq);
> VSEL_1TI vector_select_v1ti {}
> 
>   const vuq __builtin_altivec_vsel_1ti_uns (vuq, vuq, vuq);
> VSEL_1TI_UNS vector_select_v1ti_uns {}
> 
> , it's under altivec stanza and can result with insn vsel (so not xxsel),
> vsel is ISA 2.03, so I think ISA 2.05 better matches the implementation.

OK, moved to ISA 2.05

> 



>>
>> Sounds like there was some issue that you noticed on 
>> r14-10011-g6e62ede7aaccc6.  The new version of
>> print_i128 should be functionally equivalent but perhaps is "safer"?
> 
> Thanks for checking!  Looking into this more closely, I realized you didn't 
> apply the previously
> adopted way for printing (the way used in 
> gcc.target/powerpc/builtins-6-p9-runnable.c), sorry for
> the false alarm!  So your supposed print_i128 is fine to me.

OK, no problem.  Will go with the original print_i128 function.

Carl 


Re: [PATCH 11/13] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-05-24 Thread Carl Love



On 5/13/24 22:26, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in
>>
>> The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
>> __builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
>> there are no test cases for it.  The patch removes built-in
>> __builtin_vsx_xvcmpeqsp_p.
> As the previous review comments in the v1 (this is actually v2):
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646728.html
> , both __builtin_vsx_xvcmpeqsp_p and __builtin_vsx_xvcmpeqsp can be
> dropped, so please consider __builtin_vsx_xvcmpeqsp as well.

Yes, as you noted, __builtin_vsx_xvcmpeqsp is removed in the next patch.
> 
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtin.cc (case RS6000_BIF_RSQRT):
>>  Remove case statement.
> 
> It seems you mixed this with some other patch, this line doesn't
> belong to this patch, ...

Took that out of this patch.  Didn't get the changes separated cleanly.

> 
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
>>  Remove built-in definition.
>> ---
>>  gcc/config/rs6000/rs6000-builtin.cc   | 6 --
>>  gcc/config/rs6000/rs6000-builtins.def | 6 --
>>  2 files changed, 12 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
>> b/gcc/config/rs6000/rs6000-builtin.cc
>> index f83d65b06ef..74ed8fc1805 100644
>> --- a/gcc/config/rs6000/rs6000-builtin.cc
>> +++ b/gcc/config/rs6000/rs6000-builtin.cc
>> @@ -269,12 +269,6 @@ rs6000_builtin_md_vectorized_function (tree fndecl, 
>> tree type_out,
>>  = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>>switch (fn)
>>  {
>> -case RS6000_BIF_RSQRTF:
>> -  if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
>> -  && out_mode == SFmode && out_n == 4
>> -  && in_mode == SFmode && in_n == 4)
>> -return rs6000_builtin_decls[RS6000_BIF_VRSQRTFP];
>> -  break;
> 
> ... and this ...

Ditto

> 
>>  case RS6000_BIF_RSQRT:
>>if (VECTOR_UNIT_VSX_P (V2DFmode)
>>&& out_mode == DFmode && out_n == 2
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index d65c858ac0c..2f6149edd5f 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -917,9 +917,6 @@
>>fpmath vf __builtin_altivec_vrsqrtefp (vf);
>>  VRSQRTEFP rsqrtev4sf2 {}
>>  
>> -  fpmath vf __builtin_altivec_vrsqrtfp (vf);
>> -VRSQRTFP rsqrtv4sf2 {}
>> -
> 
> ..., also this.

Ditto

> 
> BR,
> Kewen
> 
>>const vsc __builtin_altivec_vsel_16qi (vsc, vsc, vuc);
>>  VSEL_16QI vector_select_v16qi {}
>>  
>> @@ -1619,9 +1616,6 @@
>>const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>>  XVCMPEQSP vector_eqv4sf {}
>>  
>> -  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
>> -XVCMPEQSP_P vector_eq_v4sf_p {pred}
>> -
>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>  XVCMPGEDP vector_gev2df {}
>>  


Re: [PATCH 7/13] rs6000, remove the vec_xxsel built-ins, they are duplicates

2024-05-24 Thread Carl Love
Kewen:

On 5/13/24 19:55, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, remove the vec_xxsel built-ins, they are duplicates


>> -int do_sel(void)
>> -{
>> -  int i = 0;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++;
  ^ changed to ui
>> -  ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++;
  ^ changed to ui
>> -  sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++;
   ^ changed to uc
>> -  f[i][0] = __builtin_vsx_xxsel_4sf (f[i][1], f[i][2], f[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel_2df (d[i][1], d[i][2], d[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel (si[i][1], si[i][2], bi[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_xxsel (ss[i][1], ss[i][2], bs[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_xxsel (sc[i][1], sc[i][2], bc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_xxsel (f[i][1], f[i][2], bi[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel (d[i][1], d[i][2], bl[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel (si[i][1], si[i][2], ui[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_xxsel (ss[i][1], ss[i][2], us[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_xxsel (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_xxsel (f[i][1], f[i][2], ui[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel (d[i][1], d[i][2], ul[i][3]); i++;
>> -
>> -  return i;
>> -}
>> -
> 
> I prefer to keep them but just replacing the call with vec_sel.
> 
> OK with the above nits tweaked, thanks.

OK, changed __builtin_vsx_xxsel_4si_* to vec_sel, changed__builtin_vsx_xxsel to 
vec_sel.
Had to add #include .

Finally, changed the third argument for the first three calls, as noted above, 
to be compatible with the vec_sel built-in specification.

   Carl

> 
> BR,
> Kewen
> 
>>  int do_perm(void)
>>  {
>>int i = 0;
> 


Re: [PATCH 3/13] rs6000, fix error in unsigned vector float to unsigned int built-in definitions

2024-05-24 Thread Carl Love
Keewn:

On 5/14/24 00:00, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, fix error in unsigned vector float to unsigned  int built-in 
>> definitions
>>
>> The built-ins __builtin_vsx_vunsigned_v2df and__builtin_vsx_vunsigned_v4sf
>> are supposed to take a vector of floats and return a vector of unsigned
>> long long ints.  The definitions are using the signed version of the
> 
> Sorry for nitpicking, here __builtin_vsx_vunsigned_v2df takes vector of 
> doubles
> and returns vector of unsigned long long ints while 
> __builtin_vsx_vunsigned_v4sf
> takes vector of floats and returns vector of unsigned ints.

That is not nitpicking, the description is wrong.  Changed float to double.
> 
>> instructions not the unsigned version of the instruction.  The results
>> should also be unsigned.  The builtins are used by the overloaded
>> vec_unsigned builtin which has an unsigned result.
>>
>> Similarly the built-ins __builtin_vsx_vunsignede_v2df and
>> __builtin_vsx_vunsignedo_v2df are supposed to retun an unsigned result.
> 
> Nit: s/retun/return/

Fixed.

> 
>> If the floating point argument is negative, the unsigned result is zero.
>> The built-ins are used in the overloaded built-in vec_unsignede and
>> vec_unsignedo respectively.
>>
>> Add a test cases for a negative floating point arguments for each of the
>> above built-ins.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
>>  __builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
>>  __builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c: Add tests for
>>  vec_unsignede and vec_unsignedo with negative arguments.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 12 +-
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 23 ---
>>  2 files changed, 26 insertions(+), 9 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index c6d2ea1bc39..bf9a0ae22fc 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1580,16 +1580,16 @@
>>const vsi __builtin_vsx_vsignedo_v2df (vd);
>>  VEC_VSIGNEDO_V2DF vsignedo_v2df {}
>>  
>> -  const vsll __builtin_vsx_vunsigned_v2df (vd);
>> -VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
>> +  const vull __builtin_vsx_vunsigned_v2df (vd);
>> +VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
>>  
>> -  const vsi __builtin_vsx_vunsigned_v4sf (vf);
>> -VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
>> +  const vui __builtin_vsx_vunsigned_v4sf (vf);
>> +VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
>>  
>> -  const vsi __builtin_vsx_vunsignede_v2df (vd);
>> +  const vui __builtin_vsx_vunsignede_v2df (vd);
>>  VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
>>  
>> -  const vsi __builtin_vsx_vunsignedo_v2df (vd);
>> +  const vui __builtin_vsx_vunsignedo_v2df (vd);
>>  VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
>>  
>>const vf __builtin_vsx_xscvdpsp (double);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> index 0231a1fd086..6d4fe84c8a1 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> @@ -313,6 +313,15 @@ int main()
>>  test_unsigned_int_result (ALL, vec_uns_int_result,
>>vec_uns_int_expected);
>>  
>> +/* Convert single precision float to  unsigned int.  Negative
>> +   arguments
>> + */
>> +vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
>> +vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
>> +vec_uns_int_result = vec_unsigned (vec_flt0);
>> +test_unsigned_int_result (ALL, vec_uns_int_result,
>> +  vec_uns_int_expected);
>> +
>>  /* Convert double precision float to long long unsigned int */
>>  vec_dble0 = (vector double){124.930, 8134.49};
>>  vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
>> @@ -321,9 +330,9 @@ int main()
>>   vec_ll_uns_int_expected);
> 
> Nit: Similar coverage on negative for vector double can be added here.

Added.

  Carl


Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-24 Thread Carl Love
Kewen:

On 5/14/24 00:53, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>> __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
>> Add new overloaded specifications.
>> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
>> * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/powerpc/builtins-3-runnable: New tests for the added
>> overloaded built-ins.
> 
> This part is missing, there are no test case changes in this patch.

Yes, the new tests are missing.  Not sure what happened to them.  Fixed.

> 
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def |  6 ++
>>  gcc/config/rs6000/rs6000-overload.def |  8 
>>  gcc/config/rs6000/vsx.md  | 23 +++
>>  gcc/doc/extend.texi   | 13 +
>>  4 files changed, 50 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index bf9a0ae22fc..5b7237a2327 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1709,9 +1709,15 @@
>>const vsll __builtin_vsx_xvcvspsxds (vf);
>>  XVCVSPSXDS vsx_xvcvspsxds {}
>>  
>> +  const vsll __builtin_vsx_xvcvspsxds_low (vf);
>> +XVCVSPSXDSO vsx_xvcvspsxds_low {}
>> +
>>const vsll __builtin_vsx_xvcvspuxds (vf);
>>  XVCVSPUXDS vsx_xvcvspuxds {}
> 
> This existing should return with type vull, ...

Fixed.

> 
>>  
>> +  const vsll __builtin_vsx_xvcvspuxds_low (vf);
>> +XVCVSPUXDSO vsx_xvcvspuxds_low {}
> 
> ... so this copied one should be vull too.

Fixed.

> 
> As the existing instances for vec_signed and vec_unsigned are with
> names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated
> with similar style, maybe something like:
> 
> VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf

Yes, sounds reasonable.  Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF
 XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF
 XVCVSPSXDS  -> VEC_VSIGNEDE_V4SF
 XVCVSPSXDSO  -> VEC_VSIGNEDO_V4SF

NEED TO ADDRESS RESPONSE TO QUESTION I ASKED.

> 
>>const vsi __builtin_vsx_xvcvspuxws (vf);
>>  XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
>>  > diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 84bd9ae6554..68501c05289 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3307,10 +3307,14 @@
>>  [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
>>vsi __builtin_vec_vsignede (vd);
>>  VEC_VSIGNEDE_V2DF
>> +  vsll __builtin_vec_vsignede (vf);
>> +XVCVSPSXDS
>>  
>>  [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
>>vsi __builtin_vec_vsignedo (vd);
>>  VEC_VSIGNEDO_V2DF
>> +  vsll __builtin_vec_vsignedo (vf);
>> +XVCVSPSXDSO
>>  
>>  [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
>>vsi __builtin_vec_signexti (vsc);
>> @@ -4433,10 +4437,14 @@
>>  [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
>>vui __builtin_vec_vunsignede (vd);
>>  VEC_VUNSIGNEDE_V2DF
>> +  vull __builtin_vec_vunsignede (vf);
>> +XVCVSPUXDS
>>  
>>  [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo]
>>vui __builtin_vec_vunsignedo (vd);
>>  VEC_VUNSIGNEDO_V2DF
>> +  vull __builtin_vec_vunsignedo (vf);
>> +XVCVSPUXDSO
>>  
> As above, the name can be tweaked.

Fixed.

> 
>>  [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp]
>>vui __builtin_vec_extract_exp (vf);
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index f135fa079bd..3d39ae7995f 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -2704,6

Re: [PATCH 2/13] rs6000, Remove __builtin_vsx_xvcvspsxws built-in

2024-05-24 Thread Carl Love
Kewen:

On 5/14/24 01:43, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, Remove __builtin_vsx_xvcvspsxws built-in
>>
>> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
>> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
>> built-in is not documented and there are no test cases for it.
>>
>> This patch removes the redundant built-in.
> 
> By revisiting the comments on the previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646723.html

The comments from the previous version:
-
   I think we should recommend users to adopt the recommended built-ins in
   PVIPR, by checking the corresponding mnemonic in PVIPR, I got:

   __builtin_vsx_xvcvspsxws -> vec_signed
   __builtin_vsx_xvcvspsxds -> N/A
   __builtin_vsx_xvcvspuxds -> N/A
   __builtin_vsx_xvcvdpsxws -> vec_signed{e,o}
   __builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o}
   __builtin_vsx_xvcvdpuxds_uns -> vec_unsigned
   __builtin_vsx_xvcvspdp   -> vec_double{e,o}
   __builtin_vsx_xvcvdpsp   -> vec_float{e,o}
   __builtin_vsx_xvcvspuxws -> vec_unsigned
   __builtin_vsx_xvcvsxwdp  -> vec_double{e,o}
   __builtin_vsx_xvcvuxddp_uns> vec_double

   For __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds which don't have
   the according PVIPR built-ins, we can extend the current vec_{un,}signed{e,o}
   to cover them and document them following the section mentioning PVIPR.

are handled by multiple patches in the new series.  The main comment on the 
previous patch series was to remove most of the built-ins as they were 
redundant.  So, basically most of the patches in the previous series were 
thrown out and a new series to remove the built-ins in the current series.


That all said, I distinctly remember addressing each of the above built-ins.  
The work on the series got
interrupted a couple of times and it looks like some of the patches to address 
the above got lost.  My bad.
The following is a list of which patch takes care of removing the duplicate 
built-ins.

__builtin_vsx_xvcvspsxws patch 2 removes this built-in
__builtin_vsx_xvcvspsxds -> N/A  patch 4 extends vec_{un,}signede 
to cover this built-in,
 Built-in used in 
rs6000-overload.def.  Built-in now for   
 internal use only.
__builtin_vsx_xvcvspuxds -> N/A  patch 4 extends vec_{un,}signedo 
to cover this built-in.
 Built-in used in 
rs6000-overload.def.  Built-in now for
 internal use only 


__builtin_vsx_xvcvdpsxws -> vec_signed{e,o}   removed in patch 4
__builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o} removed in patch 4

__builtin_vsx_xvcvdpuxds_uns -> vec_unsigned  remove in patch 4
__builtin_vsx_xvcvspuxws -> vec_unsigned  remove in patch 4

The following will changes will be put into a new patch when the series is 
reposted.  It appears they
got lost in the current series.  My bad.

__builtin_vsx_xvcvspdp   -> vec_double{e,o}   remove in new patch number 5
__builtin_vsx_xvcvdpsp   -> vec_float{e,o}remove in new patch number 5

__builtin_vsx_xvcvsxwdp  -> vec_double{e,o}   remove in new patch number 5
__builtin_vsx_xvcvuxddp_uns> vec_double   remove in new patch number 5

> 
> I wonder if it's intentional to keep the others, at least bifs
> __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws and
> __builtin_vsx_xvcvuxddp_uns looks removable, users can just uses the
> equivalent ones in PVIPR.  And for the others, users can still use
> the PVIPR ones by considering endianness (controlling with endianness
> macros).
> 

Hopefully that makes it clearer where the various changes are.   

The next series will add a new patch 5 in the series.  The remaining patches in 
this series, patches 5, 6, ... will get moved to patch 6, 7, ... in the next 
posting of the built-in cleanup patch series.

Carl 


Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-24 Thread Carl Love
Kewen:

On 5/24/24 03:43, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/24 02:21, Carl Love wrote:
>>
>>
>> On 5/13/24 22:37, Kewen.Lin wrote:
>>> Hi,
>>>
>>> on 2024/4/20 05:18, Carl Love wrote:
>>>> rs6000, remove __builtin_vsx_xvcmpeqsp built-in
>>>>
>>>> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
>>>> vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
>>>> the test cases are removed.
>>>>
>>>> gcc/ChangeLog:
>>>>* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
>>>>Remove built-in definition.
>>>>
>>>
>>> Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
>>> __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
>>> considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.
>>>
>>>
>>>> gcc/testsuite/ChangeLog:
>>>>* vsx-builtin-3.c (do_cmp): Remove test case for
>>>>__builtin_vsx_xvcmpeqsp.
>>>> ---
>>>>  gcc/config/rs6000/rs6000-builtins.def| 3 ---
>>>>  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
>>>>  2 files changed, 5 deletions(-)
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>>>> b/gcc/config/rs6000/rs6000-builtins.def
>>>> index 2f6149edd5f..19d05b8043a 100644
>>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>>> @@ -1613,9 +1613,6 @@
>>>>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>>>>  XVCMPEQDP_P vector_eq_v2df_p {pred}
>>>>  
>>>> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>>>> -XVCMPEQSP vector_eqv4sf {}
>>>> -
>>>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>>>  XVCMPGEDP vector_gev2df {}
>>>>  
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>>> index 35ea31b2616..245893dc0e3 100644
>>>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>>> @@ -27,7 +27,6 @@
>>>>  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
>>>>  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
>>>>  /* { dg-final { scan-assembler "xvcmpgedp" } } */
>>>> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
>>>>  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
>>>>  /* { dg-final { scan-assembler "xvcmpgesp" } } */
>>>>  /* { dg-final { scan-assembler "xxsldwi" } } */
>>>> @@ -112,7 +111,6 @@ int do_cmp (void)
>>>>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
>>>>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
>>>>  
>>>> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>>>>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
>>>>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
>>>>return i;
>>>
>>> As the other in this patch series, I prefer to change it with
>>> vec_cmpeq here, OK for trunk with this tweaked (also keep the
>>> scan there), thanks!
>>
>> When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp 
>> and vec_cmpeq both return a vector where the element is all ones if the 
>> comparison is True and zeros if False.  However, the return type for 
>> __builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool.
>>
> 
> Ah, so they are not equivalent from prototype perspective.
> 
>> The PVIPR says the vec_cmpeq built-in returns a value where each bit in the 
>> vector element is a 1 if the comparison is equal and 0 otherwise.  However, 
>> the documented result is a vector bool int for the floating point 
>> comparison.  The return value for __builtin_vsx_xvcmpeqsp was vector float.
> 
> IMHO PVIPR prototype (returning vector bool) makes more sense,
> it does match better with what the result holds.

Yes, I tend to agree.  I think the user would use be likely using the test so 
they could create a mask to selectively replace vector elements.  A bool type 
make more sense in that case.

> 
>>
>> So, the "bit values" returned are the same but not of the same type. So

Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-23 Thread Carl Love



On 5/13/24 22:37, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, remove __builtin_vsx_xvcmpeqsp built-in
>>
>> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
>> vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
>> the test cases are removed.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
>>  Remove built-in definition.
>>
> 
> Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
> __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
> considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.
> 
> 
>> gcc/testsuite/ChangeLog:
>>  * vsx-builtin-3.c (do_cmp): Remove test case for
>>  __builtin_vsx_xvcmpeqsp.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def| 3 ---
>>  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
>>  2 files changed, 5 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 2f6149edd5f..19d05b8043a 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1613,9 +1613,6 @@
>>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>>  XVCMPEQDP_P vector_eq_v2df_p {pred}
>>  
>> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>> -XVCMPEQSP vector_eqv4sf {}
>> -
>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>  XVCMPGEDP vector_gev2df {}
>>  
>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> index 35ea31b2616..245893dc0e3 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> @@ -27,7 +27,6 @@
>>  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
>>  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
>>  /* { dg-final { scan-assembler "xvcmpgedp" } } */
>> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
>>  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
>>  /* { dg-final { scan-assembler "xvcmpgesp" } } */
>>  /* { dg-final { scan-assembler "xxsldwi" } } */
>> @@ -112,7 +111,6 @@ int do_cmp (void)
>>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
>>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
>>  
>> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
>>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
>>return i;
> 
> As the other in this patch series, I prefer to change it with
> vec_cmpeq here, OK for trunk with this tweaked (also keep the
> scan there), thanks!

When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp and 
vec_cmpeq both return a vector where the element is all ones if the comparison 
is True and zeros if False.  However, the return type for 
__builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool.

The PVIPR says the vec_cmpeq built-in returns a value where each bit in the 
vector element is a 1 if the comparison is equal and 0 otherwise.  However, the 
documented result is a vector bool int for the floating point comparison.  The 
return value for __builtin_vsx_xvcmpeqsp was vector float.  

So, the "bit values" returned are the same but not of the same type. So 
technically vec_cmpeq is not a drop in replacement for __builtin_vsx_xvcmpeqsp. 
 Given that, perhaps we should not be removing __builtin_vsx_xvcmpeqsp?

The testcase has to be changed from:
 f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
 bi[i][0] = vec_cmpeq (f[i][1], f[i][2]); i++;

I am thinking we should drop this patch from the series, i.e. don't remove 
__builtin_vsx_xvcmpeqsp.  Thoughts?

 Carl 
 

> 
> BR,
> Kewen
> 


Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.

2024-05-22 Thread Carl Love
Kewen:

On 5/13/24 22:44, Kewen.Lin wrote:
>> perform the same operation as setting a specific element in the vector in
>> C code.  For example:
>>
>>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>>   src_v4si[index] = int_val;
>>
>> The built-in actually generates more instructions than the inline C code
>> with no optimization but is identical with -O3 optimizations.
>>
>> All of the above built-ins that are removed do not have test cases and
>> are not documented.
>>
>> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
>> __builtin_vec_set_v2df are not removed as they are used in function
>> resolve_vec_insert() in file rs6000-c.cc.
> I think we can replace these calls with the equivalent gimple codes
> (early expanding it) and then we can get rid of these instances.

Hmm, going to need a little coaching here.  I am not sure how to do this.  
Looks like I get to lean some  something new.

   Carl 


Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-05-21 Thread Carl Love
Kewen:

On 5/13/24 19:54, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, add overloaded vec_sel with int128 arguments
>>
>> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
>> and return a signed/unsigned int128 result.
>>
>> Extending the vec_sel built-in makes the existing buit-ins
>> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
>> patch removes these built-ins.
>>
>> The patch adds documentation and test cases for the new overloaded vec_sel
>> built-ins.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>>  __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>>  * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>>  definitions.
>>  * doc/extend.texi: Add documentation for new vec_sel arguments.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vec_sel_runnable-int128.c: New test file.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def |  6 --
>>  gcc/config/rs6000/rs6000-overload.def |  4 +
>>  gcc/doc/extend.texi   | 14 
>>  .../powerpc/vec-sel-runnable-i128.c   | 84 +++
>>  4 files changed, 102 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index d09e21a9151..46d2ae7b7cb 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1931,12 +1931,6 @@
>>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>>  
>> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
>> -XXSEL_1TI vector_select_v1ti {}
>> -
>> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
>> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
>> -
>>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>>  XXSEL_2DF vector_select_v2df {}
>>  
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 68501c05289..5912c9452f4 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3274,6 +3274,10 @@
>>  VSEL_2DF  VSEL_2DF_B
>>vd __builtin_vec_sel (vd, vd, vull);
>>  VSEL_2DF  VSEL_2DF_U
>> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
>> +VSEL_1TI  VSEL_1TI_S
>> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
>> +VSEL_1TI_UNS  VSEL_1TI_U
>>  ; The following variants are deprecated.
>>vsll __builtin_vec_sel (vsll, vsll, vsll);
>>  VSEL_2DI_B  VSEL_2DI_S
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 64a43b55e2d..86b8e536dbe 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -23358,6 +23358,20 @@ The programmer is responsible for understanding the 
>> endianness issues involved
>>  with the first argument and the result.
>>  @findex vec_replace_unaligned
>>  
>> +Vector select
>> +
>> +@smallexample
>> +vector signed __int128 vec_sel (vector signed __int128,
>> +   vector signed __int128, vector signed __int128);
>> +vector unsigned __int128 vec_sel (vector unsigned __int128,
>> +   vector unsigned __int128, vector unsigned __int128);
>> +@end smallexample
>> +
>> +The overloaded built-in @code{vec_sel} with vector signed/unsigned __int128
>> +arguments and returns a vector selecting bits from the two source vectors 
>> based
>> +on the values of the third input vector.  This built-in is an extension of 
>> the
>> +@code{vec_sel} built-in documented in the PVIPR.
>> +
> 
> Why did you place this in a section for ISA 3.1 (Power10)?  It doesn't really
> require this support.  The used instance VSEL_1TI and VSEL_1TI_UNS are placed
> in altivec stanza, so it looks that we should put it under the section
> "PowerPC AltiVec Built-in Functions on ISA 2.05".  And since it's an extension
> of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an
> extension of the @code{vec_sel} built-in documented in the PVIPR" and omitting
> the description to avoid possible slightly different wording.

Honestly, at this point in time I don't remember why I put it there.  It has 
been too long since I created the patch.  That said, the test case requires 
Power 10 do to the compariso

Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-17 Thread Carl Love
Kewen:

I am working thru the patches.  I made the changes as requested for this patch 
but have a question about 
one of your comments.

On 5/14/24 00:53, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>> __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
>> Add new overloaded specifications.
>> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
>> * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/powerpc/builtins-3-runnable: New tests for the added



> 
> As the existing instances for vec_signed and vec_unsigned are with
> names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated
> with similar style, maybe something like:
> 
> VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf

Yes, sounds reasonable.  Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF
 XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF
 XVCVSPSXDS  -> VEC_VSIGNEDE_V4SF
 XVCVSPSXDSO  -> VEC_VSIGNEDO_V4SF

QUESTION:
I am not sure what you want changed to v{un,}signed{e,o}_v4sf??  The overloaded 
instance entry names
for vd, vf have to match the first line of the definition. The name can't be 
type specific, i.e. v4sf.  
So not sure where you want the v{un,}signed{e,o}_v4sf name used?

For example, file rs6000-overloaded.def now looks like:

[VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 




 Carl 


[PING} Re: [PATCH 0/13] rs6000, built-in cleanup patch series

2024-05-11 Thread Carl Love
Ping, just wondering if anyone has had a chance to look at the patch series.

Thanks.

  Carl  

On 4/19/24 14:04, Carl Love wrote:
> GCC maintainers:
> 
> The following patch series removes duplicate built-ins.  There are patches to 
> extend an existing overloaded built-in to cover additional input types.  The 
> final patch removes built-ins to set and initialize vectors.  The code 
> generated by these built-ins with the default optimization is efficient than 
> the code generated by using straight C code.  The assembly code for the 
> built-in and straight C code is the same with -O3
> optimizations.  In this case, the built-ins are removed as they add no 
> additional value.
> 
> The patches have all been tested on Power 10 LE.  The last patch was also 
> tested on Power 8 BE.
> 
> No regression tests were seen.
> 
> Please let me know if the patches are acceptable for mainline.  Thanks.
> 
>Carl 
> 


[PATCH 13/13] rs6000, remove vector set and vector init built-ins.

2024-04-19 Thread Carl Love
rs6000, remove vector set and vector init built-ins.

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_set_v1ti

perform the same operation as initializing the vector in C code.  For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
 __builtin_vec_init_v8hi, __builtin_vec_init_v4si,
__builtin_vec_init_v4sf, __builtin_vec_init_v2di,
__builtin_vec_init_v2df, __builtin_vec_set_v1ti,
__builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
__builtin_vec_set_v4si, __builtin_vec_set_v4sf,
__builtin_vec_set_v2di, __builtin_vec_set_v2df,
__builtin_vec_set_v1ti): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 42 ++-
 1 file changed, 2 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 19d05b8043a..d04ad4ce7e5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1115,37 +1115,6 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
 VEC_EXT_V8HI nothing {extract}
 
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}
-
-  const vf __builtin_vec_init_v4sf (float, float, float, float);
-VEC_INIT_V4SF nothing {init}
-
-  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
- signed int);
-VEC_INIT_V4SI nothing {init}
-
-  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\
- signed short, signed short, signed short, signed short, \
- signed short);
-VEC_INIT_V8HI nothing {init}
-
-  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
-VEC_SET_V16QI nothing {set}
-
-  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
-VEC_SET_V4SF nothing {set}
-
-  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
-VEC_SET_V4SI nothing {set}
-
-  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
-VEC_SET_V8HI nothing {set}
-
-
 ; Cell builtins.
 [cell]
   pure vsc __builtin_altivec_lvlx (signed long, const void *);
@@ -1292,15 +1261,8 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}
 
-  const vsq __builtin_vec_init_v1ti (signed __int128);
-VEC_INIT_V1TI nothing {init}
-
-  const vd __builtin_vec_init_v2df (double, double);
-VEC_INIT_V2DF nothing {init}
-
-  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
-VEC_INIT_V2DI nothing {init}
-
+;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
+;; resolve_vec_insert(), rs6000-c.cc
   const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
 VEC_SET_V1TI nothing {set}
 
-- 
2.44.0



[PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-04-19 Thread Carl Love
rs6000, remove __builtin_vsx_xvcmpeqsp built-in

The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
the test cases are removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
Remove built-in definition.

gcc/testsuite/ChangeLog:
* vsx-builtin-3.c (do_cmp): Remove test case for
__builtin_vsx_xvcmpeqsp.
---
 gcc/config/rs6000/rs6000-builtins.def| 3 ---
 gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
 2 files changed, 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 2f6149edd5f..19d05b8043a 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1613,9 +1613,6 @@
   const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
 XVCMPEQDP_P vector_eq_v2df_p {pred}
 
-  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
-XVCMPEQSP vector_eqv4sf {}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index 35ea31b2616..245893dc0e3 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -27,7 +27,6 @@
 /* { dg-final { scan-assembler "xvcmpeqdp" } } */
 /* { dg-final { scan-assembler "xvcmpgtdp" } } */
 /* { dg-final { scan-assembler "xvcmpgedp" } } */
-/* { dg-final { scan-assembler "xvcmpeqsp" } } */
 /* { dg-final { scan-assembler "xvcmpgtsp" } } */
 /* { dg-final { scan-assembler "xvcmpgesp" } } */
 /* { dg-final { scan-assembler "xxsldwi" } } */
@@ -112,7 +111,6 @@ int do_cmp (void)
   d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
   d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
 
-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
   f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
   f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
   return i;
-- 
2.44.0



[PATCH 8/13] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-04-19 Thread Carl Love
rs6000, remove __builtin_vsx_vperm_* built-ins

The undocumented built-ins:
  __builtin_vsx_vperm_16qi_uns,
  __builtin_vsx_vperm_1ti,
  __builtin_vsx_vperm_1ti_uns,
  __builtin_vsx_vperm_2df,
  __builtin_vsx_vperm_2di,
  __builtin_vsx_vperm_2di_uns,
  __builtin_vsx_vperm_4sf,
  __builtin_vsx_vperm_4si,
  __builtin_vsx_vperm_4si_uns

are duplicats of the __builtin_altivec_* builtins that are used by
the overloaded vec_perm built-in that is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
built-in definitions and comments.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns,
 __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
test cases.
---
 gcc/config/rs6000/rs6000-builtins.def | 33 ---
 .../gcc.target/powerpc/vsx-builtin-3.c| 20 ---
 2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3c409d729ea..f33564d3d9c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1529,39 +1529,6 @@
   const vf __builtin_vsx_uns_floato_v2di (vsll);
 UNS_FLOATO_V2DI unsfloatov2di {}
 
-; These are duplicates of __builtin_altivec_* counterparts, and are being
-; kept for backwards compatibility.  The reason for their existence is
-; unclear.  TODO: Consider deprecation/removal at some point.
-  const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
-VPERM_16QI_X altivec_vperm_v16qi {}
-
-  const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc);
-VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {}
-
-  const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc);
-VPERM_1TI_X altivec_vperm_v1ti {}
-
-  const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc);
-VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {}
-
-  const vd __builtin_vsx_vperm_2df (vd, vd, vuc);
-VPERM_2DF_X altivec_vperm_v2df {}
-
-  const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc);
-VPERM_2DI_X altivec_vperm_v2di {}
-
-  const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc);
-VPERM_2DI_UNS_X altivec_vperm_v2di_uns {}
-
-  const vf __builtin_vsx_vperm_4sf (vf, vf, vuc);
-VPERM_4SF_X altivec_vperm_v4sf {}
-
-  const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc);
-VPERM_4SI_X altivec_vperm_v4si {}
-
-  const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc);
-VPERM_4SI_UNS_X altivec_vperm_v4si_uns {}
-
   const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
 VPERM_8HI_X altivec_vperm_v8hi {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index 01f35dad713..35ea31b2616 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -2,7 +2,6 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
-/* { dg-final { scan-assembler "vperm" } } */
 /* { dg-final { scan-assembler "xvrdpi" } } */
 /* { dg-final { scan-assembler "xvrdpic" } } */
 /* { dg-final { scan-assembler "xvrdpim" } } */
@@ -56,25 +55,6 @@ extern __vector unsigned long long ull[][4];
 extern __vector __bool long bl[][4];
 #endif
 
-int do_perm(void)
-{
-  int i = 0;
-
-  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
-  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
-  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
-  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
-  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
-
-  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
-  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
-  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
-  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
-  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
-
-  return i;
-}
-
 int do_xxperm (void)
 {
   int i = 0;
-- 
2.44.0



[PATCH 7/13] rs6000, remove the vec_xxsel built-ins, they are duplicates

2024-04-19 Thread Carl Love
rs6000, remove the vec_xxsel built-ins, they are duplicates

The following undocumented built-ins are covered by the existing overloaded
vec_sel built-in definitions.

  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)

  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)

  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)

  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)

  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)

  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)

  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)

  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)

  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)

  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)

This patch removed the duplicate built-in definitions so users will only
use the documented vec_sel built-in.  The __builtin_vsx_xxsel_[4si, 8hi,
16qi, 4sf, 2df] tests are also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrglw_4si,
__builtin_vsx_xxsel_16qi, __builtin_vsx_xxsel_16qi_uns,
__builtin_vsx_xxsel_2df, __builtin_vsx_xxsel_2di,
__builtin_vsx_xxsel_2di_uns, __builtin_vsx_xxsel_4sf,
__builtin_vsx_xxsel_4si, __builtin_vsx_xxsel_4si_uns,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_8hi_uns): Remove
built-in definitions.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df): Remove test
cases for removed built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 30 ---
 .../gcc.target/powerpc/vsx-builtin-3.c| 26 
 2 files changed, 56 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 46d2ae7b7cb..3c409d729ea 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1925,36 +1925,6 @@
   const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>);
 XXPERMDI_8HI vsx_xxpermdi_v8hi {}
 
-  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
-XXSEL_16QI vector_select_v16qi {}
-
-  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
-XXSEL_16QI_UNS vector_select_v16qi_uns {}
-
-  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
-XXSEL_2DF vector_select_v2df {}
-
-  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
-XXSEL_2DI vector_select_v2di {}
-
-  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
-XXSEL_2DI_UNS vector_select_v2di_uns {}
-
-  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
-XXSEL_4SF vector_select_v4sf {}
-
-  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
-XXSEL_4SI vector_select_v4si {}
-
-  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
-XXSEL_4SI_UNS vector_select_v4si_uns {}
-
-  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
-XXSEL_8HI vector_select_v8hi {}
-
-  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
-XXSEL_8HI_UNS vector_select_v8hi_uns {}
-
   const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>);
 XXSLDWI_16QI vsx_xxsldwi_v16qi {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index ff875c55304..01f35dad713 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -2,7 +2,6 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
-/* { dg-final { scan-assembler "xxsel" } } */
 /* { dg-final { scan-assembler "vperm" } } */
 /* { dg-final { scan-assembler "xvrdpi" } } */
 /* { dg-final { scan-assembler "xvrdpic" } } */
@@ -57,31 +56,6 @@ extern __vector unsigned long long ull[][4];
 extern __vector __bool long bl[][4];
 #endif
 
-int do_sel(void)
-{
-  int i = 0;
-
-  si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++;
-  ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++;
-  sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++;
-  f[i][0] = __builtin_vsx_xxsel_4sf (f[i][1], f[i][2], 

[PATCH 11/13] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-04-19 Thread Carl Love
rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
__builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
there are no test cases for it.  The patch removes built-in
__builtin_vsx_xvcmpeqsp_p.

gcc/ChangeLog:
* config/rs6000/rs6000-builtin.cc (case RS6000_BIF_RSQRT):
Remove case statement.
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 6 --
 gcc/config/rs6000/rs6000-builtins.def | 6 --
 2 files changed, 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index f83d65b06ef..74ed8fc1805 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -269,12 +269,6 @@ rs6000_builtin_md_vectorized_function (tree fndecl, tree 
type_out,
 = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
   switch (fn)
 {
-case RS6000_BIF_RSQRTF:
-  if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
- && out_mode == SFmode && out_n == 4
- && in_mode == SFmode && in_n == 4)
-   return rs6000_builtin_decls[RS6000_BIF_VRSQRTFP];
-  break;
 case RS6000_BIF_RSQRT:
   if (VECTOR_UNIT_VSX_P (V2DFmode)
  && out_mode == DFmode && out_n == 2
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index d65c858ac0c..2f6149edd5f 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -917,9 +917,6 @@
   fpmath vf __builtin_altivec_vrsqrtefp (vf);
 VRSQRTEFP rsqrtev4sf2 {}
 
-  fpmath vf __builtin_altivec_vrsqrtfp (vf);
-VRSQRTFP rsqrtv4sf2 {}
-
   const vsc __builtin_altivec_vsel_16qi (vsc, vsc, vuc);
 VSEL_16QI vector_select_v16qi {}
 
@@ -1619,9 +1616,6 @@
   const vf __builtin_vsx_xvcmpeqsp (vf, vf);
 XVCMPEQSP vector_eqv4sf {}
 
-  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
-XVCMPEQSP_P vector_eq_v4sf_p {pred}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}
 
-- 
2.44.0



[PATCH 5/13] rs6000, remove duplicated built-ins of vecmergl and vec_mergeh

2024-04-19 Thread Carl Love
rs6000, remove duplicated built-ins of vecmergl and vec_mergeh

The following undocumented built-ins are same as existing documented
overloaded builtins.

  const vf __builtin_vsx_xxmrghw (vf, vf);
same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)

  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)

  const vf __builtin_vsx_xxmrglw (vf, vf);
same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)

  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)

This patch removes the duplicate built-in definitions so only the
documented built-ins will be available for use.  The case statements in
rs6000_gimple_fold_builtin are removed as they are no longer needed.  The
patch removes the now unused define_expands for vsx_xxmrghw_ and
vsx_xxmrglw_.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
__builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove
built-in definition.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
remove case entries RS6000_BIF_XXMRGLW_4SI,
RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
RS6000_BIF_XXMRGHW_4SF.
* config/rs6000/vsx.md (vsx_xxmrghw_, vsx_xxmrglw_):
Remove unused define_expands.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 ---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 gcc/config/rs6000/vsx.md  | 41 ---
 3 files changed, 57 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index ac9f16fe51a..f83d65b06ef 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2097,20 +2097,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 /* vec_mergel (integrals).  */
 case RS6000_BIF_VMRGLH:
 case RS6000_BIF_VMRGLW:
-case RS6000_BIF_XXMRGLW_4SI:
 case RS6000_BIF_VMRGLB:
 case RS6000_BIF_VEC_MERGEL_V2DI:
-case RS6000_BIF_XXMRGLW_4SF:
 case RS6000_BIF_VEC_MERGEL_V2DF:
   fold_mergehl_helper (gsi, stmt, 1);
   return true;
 /* vec_mergeh (integrals).  */
 case RS6000_BIF_VMRGHH:
 case RS6000_BIF_VMRGHW:
-case RS6000_BIF_XXMRGHW_4SI:
 case RS6000_BIF_VMRGHB:
 case RS6000_BIF_VEC_MERGEH_V2DI:
-case RS6000_BIF_XXMRGHW_4SF:
 case RS6000_BIF_VEC_MERGEH_V2DF:
   fold_mergehl_helper (gsi, stmt, 0);
   return true;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 5b7237a2327..d09e21a9151 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1904,18 +1904,6 @@
   const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
 XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
 
-  const vf __builtin_vsx_xxmrghw (vf, vf);
-XXMRGHW_4SF vsx_xxmrghw_v4sf {}
-
-  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
-XXMRGHW_4SI vsx_xxmrghw_v4si {}
-
-  const vf __builtin_vsx_xxmrglw (vf, vf);
-XXMRGLW_4SF vsx_xxmrglw_v4sf {}
-
-  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
-XXMRGLW_4SI vsx_xxmrglw_v4si {}
-
   const vsc __builtin_vsx_xxpermdi_16qi (vsc, vsc, const int<2>);
 XXPERMDI_16QI vsx_xxpermdi_v16qi {}
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 3d39ae7995f..26560ecc38a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4810,47 +4810,6 @@
 }
   [(set_attr "type" "vecperm")])
 
-;; V4SF/V4SI interleave
-(define_expand "vsx_xxmrghw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-(vec_select:VSX_W
- (vec_concat:
-   (match_operand:VSX_W 1 "vsx_register_operand" "wa")
-   (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
- (parallel [(const_int 0) (const_int 4)
-(const_int 1) (const_int 5)])))]
-  "VECTOR_MEM_VSX_P (mode)"
-{
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_
-: gen_altivec_vmrglw_direct_;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
-  DONE;
-}
-  [(set_attr "type" "vecperm")])
-
-(define_expand "vsx_xxmrglw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-   (vec_select:VSX_W
- (vec_concat:
-   (match_operand:VSX_W 1 "vsx_register_operand" "wa")
-   (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
- (parallel [(const_int 2) (const_int 6)
-(const_int 3) (const_int 7)])))]
-  "VECTOR_MEM_VSX_P (mode)"
-{
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_
-: gen_altivec_vmrghw_direct_;
-  if (!BYTES_BIG_ENDIAN)
-std::swap 

[PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-04-19 Thread Carl Love
rs6000, extend the current vec_{un,}signed{e,o} built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to signed/unsigned long long ints.  Extend the
existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return the even/odd signed/unsigned integers.

Add testcases and update documentation.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
__builtin_vsx_xvcvspuxds_low): New built-in definitions.
* config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
Add new overloaded specifications.
* config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
* doc/extend.texi (vec_signedo, vec_signede): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable: New tests for the added
overloaded built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def |  6 ++
 gcc/config/rs6000/rs6000-overload.def |  8 
 gcc/config/rs6000/vsx.md  | 23 +++
 gcc/doc/extend.texi   | 13 +
 4 files changed, 50 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index bf9a0ae22fc..5b7237a2327 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1709,9 +1709,15 @@
   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}
 
+  const vsll __builtin_vsx_xvcvspsxds_low (vf);
+XVCVSPSXDSO vsx_xvcvspsxds_low {}
+
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
+  const vsll __builtin_vsx_xvcvspuxds_low (vf);
+XVCVSPUXDSO vsx_xvcvspuxds_low {}
+
   const vsi __builtin_vsx_xvcvspuxws (vf);
 XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 84bd9ae6554..68501c05289 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+XVCVSPSXDS
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+XVCVSPSXDSO
 
 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede (vf);
+XVCVSPUXDS
 
 [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo]
   vui __builtin_vec_vunsignedo (vd);
 VEC_VUNSIGNEDO_V2DF
+  vull __builtin_vec_vunsignedo (vf);
+XVCVSPUXDSO
 
 [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp]
   vui __builtin_vec_extract_exp (vf);
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..3d39ae7995f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2704,6 +2704,29 @@
   DONE;
 })
 
+;; Convert low vector elements of 32-bit floating point numbers to vector of
+;; 64-bit signed/unsigned integers.
+(define_expand "vsx_xvcvspxds_low"
+  [(match_operand:V2DI 0 "vsx_register_operand")
+   (match_operand:V4SF 1 "vsx_register_operand")
+   (any_fix (pc))]
+  "VECTOR_UNIT_VSX_P (V2DFmode)"
+{
+  /* Shift left one word to put even word in correct location */
+  rtx rtx_tmp;
+  rtx rtx_val = GEN_INT (4);
+  rtx_tmp = gen_reg_rtx (V4SFmode);
+  emit_insn (gen_altivec_vsldoi_v4sf (rtx_tmp, operands[1], operands[1],
+  rtx_val));
+
+  if (BYTES_BIG_ENDIAN)
+emit_insn (gen_vsx_xvcvspxds_be (operands[0], rtx_tmp));
+  else
+emit_insn (gen_vsx_xvcvspxds_le (operands[0], rtx_tmp));
+
+  DONE;
+})
+
 ;; Generate float2 double
 ;; convert two double to float
 (define_expand "float2_v2df"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7b54a241a7b..64a43b55e2d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22552,6 +22552,19 @@ can use @var{vector long} instead of @var{vector long 
long},
 @var{vector bool long} instead of @var{vector bool long long}, and
 @var{vector unsigned long} instead of @var{vector unsigned long long}.
 
+@smallexample
+vector signed signed long long vec_signedo (vector float);
+vector signed signed long long vec_signede (vector float);
+vector unsigned signed long long vec_signedo (vector float);
+vector unsigned signed long long vec_signede (vector float);
+@end smallexample
+
+The overloaded built-ins @code{vec_signedo} and @code{vec_signede} convert the
+even/odd input vector elements to signed/unsigned long long integer values in
+addition to the supported arguments and return types documented in the PVIPR.
+Negative input values are returned as zero for the 

[PATCH 10/13] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-04-19 Thread Carl Love
rs6000, extend vec_xxpermdi built-in for __int128 args

Add a new overloaded instance for vec_xxpermdi

   __int128 vec_xxpermdi (__int128, __int128, const int);

Update the documentation to include a reference to the new built-in
instance.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (vec_xxpermdi): Add new
overloaded built-in instance.
---
 gcc/config/rs6000/rs6000-overload.def | 2 ++
 gcc/doc/extend.texi   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 5912c9452f4..49962e2f2a2 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4932,6 +4932,8 @@
 XXPERMDI_4SF  XXPERMDI_VF
   vd __builtin_vsx_xxpermdi (vd, vd, const int);
 XXPERMDI_2DF  XXPERMDI_VD
+  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
+XXPERMDI_1TI  XXPERMDI_1TI
 
 [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
   vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 86b8e536dbe..47cf2f3bc8b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22505,6 +22505,7 @@ void vec_vsx_st (vector bool char, int, vector bool 
char *);
 void vec_vsx_st (vector bool char, int, unsigned char *);
 void vec_vsx_st (vector bool char, int, signed char *);
 
+vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
 vector double vec_xxpermdi (vector double, vector double, const int);
 vector float vec_xxpermdi (vector float, vector float, const int);
 vector long long vec_xxpermdi (vector long long, vector long long, const int);
-- 
2.44.0



[PATCH 9/13] rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins

2024-04-19 Thread Carl Love
rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins

The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are
redundant.  The overloaded vec_neg built-in provides the same
functionality.  The two buit-ins are not documented nor are there any
test cases for them.

Remove the definitions so users will use the overloaded vec_neg built-in
which is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp,
__builtin_vsx_xvnegsp): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f33564d3d9c..d65c858ac0c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1763,12 +1763,6 @@
   const vf __builtin_vsx_xvnabssp (vf);
 XVNABSSP vsx_nabsv4sf2 {}
 
-  const vd __builtin_vsx_xvnegdp (vd);
-XVNEGDP negv2df2 {}
-
-  const vf __builtin_vsx_xvnegsp (vf);
-XVNEGSP negv4sf2 {}
-
   const vd __builtin_vsx_xvnmadddp (vd, vd, vd);
 XVNMADDDP nfmav2df4 {}
 
-- 
2.44.0



[PATCH 3/13] rs6000, fix error in unsigned vector float to unsigned int built-in definitions

2024-04-19 Thread Carl Love
rs6000, fix error in unsigned vector float to unsigned  int built-in definitions

The built-ins __builtin_vsx_vunsigned_v2df and__builtin_vsx_vunsigned_v4sf
are supposed to take a vector of floats and return a vector of unsigned
long long ints.  The definitions are using the signed version of the
instructions not the unsigned version of the instruction.  The results
should also be unsigned.  The builtins are used by the overloaded
vec_unsigned builtin which has an unsigned result.

Similarly the built-ins __builtin_vsx_vunsignede_v2df and
__builtin_vsx_vunsignedo_v2df are supposed to retun an unsigned result.
If the floating point argument is negative, the unsigned result is zero.
The built-ins are used in the overloaded built-in vec_unsignede and
vec_unsignedo respectively.

Add a test cases for a negative floating point arguments for each of the
above built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
__builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
__builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c: Add tests for
vec_unsignede and vec_unsignedo with negative arguments.
---
 gcc/config/rs6000/rs6000-builtins.def | 12 +-
 .../gcc.target/powerpc/builtins-3-runnable.c  | 23 ---
 2 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index c6d2ea1bc39..bf9a0ae22fc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1580,16 +1580,16 @@
   const vsi __builtin_vsx_vsignedo_v2df (vd);
 VEC_VSIGNEDO_V2DF vsignedo_v2df {}
 
-  const vsll __builtin_vsx_vunsigned_v2df (vd);
-VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
+  const vull __builtin_vsx_vunsigned_v2df (vd);
+VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
 
-  const vsi __builtin_vsx_vunsigned_v4sf (vf);
-VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
+  const vui __builtin_vsx_vunsigned_v4sf (vf);
+VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
 
-  const vsi __builtin_vsx_vunsignede_v2df (vd);
+  const vui __builtin_vsx_vunsignede_v2df (vd);
 VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
 
-  const vsi __builtin_vsx_vunsignedo_v2df (vd);
+  const vui __builtin_vsx_vunsignedo_v2df (vd);
 VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
 
   const vf __builtin_vsx_xscvdpsp (double);
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
index 0231a1fd086..6d4fe84c8a1 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
@@ -313,6 +313,15 @@ int main()
test_unsigned_int_result (ALL, vec_uns_int_result,
  vec_uns_int_expected);
 
+   /* Convert single precision float to  unsigned int.  Negative
+  arguments
+*/
+   vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
+   vec_uns_int_result = vec_unsigned (vec_flt0);
+   test_unsigned_int_result (ALL, vec_uns_int_result,
+ vec_uns_int_expected);
+
/* Convert double precision float to long long unsigned int */
vec_dble0 = (vector double){124.930, 8134.49};
vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
@@ -321,9 +330,9 @@ int main()
 vec_ll_uns_int_expected);
 
/* Convert double precision vector float to vector unsigned int,
-  even words */
-   vec_dble0 = (vector double){3124.930, 8234.49};
-   vec_uns_int_expected = (vector unsigned int){3124, 0, 8234, 0};
+  even words.  Negative arguments */
+   vec_dble0 = (vector double){-124.930, -234.49};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
vec_uns_int_result = vec_unsignede (vec_dble0);
test_unsigned_int_result (EVEN, vec_uns_int_result,
  vec_uns_int_expected);
@@ -335,5 +344,13 @@ int main()
vec_uns_int_result = vec_unsignedo (vec_dble0);
test_unsigned_int_result (ODD, vec_uns_int_result,
  vec_uns_int_expected);
+
+   /* Convert double precision vector float to vector unsigned int,
+  odd words.  Negative arguments.  */
+   vec_dble0 = (vector double){-924.930, -1234.49};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
+   vec_uns_int_result = vec_unsignedo (vec_dble0);
+   test_unsigned_int_result (ODD, vec_uns_int_result,
+ vec_uns_int_expected);
 }
 
-- 
2.44.0



[PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-04-19 Thread Carl Love
rs6000, add overloaded vec_sel with int128 arguments

Extend the vec_sel built-in to take three signed/unsigned int128 arguments
and return a signed/unsigned int128 result.

Extending the vec_sel built-in makes the existing buit-ins
__builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
patch removes these built-ins.

The patch adds documentation and test cases for the new overloaded vec_sel
built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
__builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
* config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
definitions.
* doc/extend.texi: Add documentation for new vec_sel arguments.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec_sel_runnable-int128.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def |  6 --
 gcc/config/rs6000/rs6000-overload.def |  4 +
 gcc/doc/extend.texi   | 14 
 .../powerpc/vec-sel-runnable-i128.c   | 84 +++
 4 files changed, 102 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index d09e21a9151..46d2ae7b7cb 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1931,12 +1931,6 @@
   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
 XXSEL_16QI_UNS vector_select_v16qi_uns {}
 
-  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
-XXSEL_1TI vector_select_v1ti {}
-
-  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
-XXSEL_1TI_UNS vector_select_v1ti_uns {}
-
   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
 XXSEL_2DF vector_select_v2df {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 68501c05289..5912c9452f4 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3274,6 +3274,10 @@
 VSEL_2DF  VSEL_2DF_B
   vd __builtin_vec_sel (vd, vd, vull);
 VSEL_2DF  VSEL_2DF_U
+  vsq __builtin_vec_sel (vsq, vsq, vsq);
+VSEL_1TI  VSEL_1TI_S
+  vuq __builtin_vec_sel (vuq, vuq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_U
 ; The following variants are deprecated.
   vsll __builtin_vec_sel (vsll, vsll, vsll);
 VSEL_2DI_B  VSEL_2DI_S
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 64a43b55e2d..86b8e536dbe 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23358,6 +23358,20 @@ The programmer is responsible for understanding the 
endianness issues involved
 with the first argument and the result.
 @findex vec_replace_unaligned
 
+Vector select
+
+@smallexample
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector signed __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+The overloaded built-in @code{vec_sel} with vector signed/unsigned __int128
+arguments and returns a vector selecting bits from the two source vectors based
+on the values of the third input vector.  This built-in is an extension of the
+@code{vec_sel} built-in documented in the PVIPR.
+
 Vector Shift Left Double Bit Immediate
 @smallexample
 @exdent vector signed char vec_sldb (vector signed char, vector signed char,
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
new file mode 100644
index 000..58eb383e8c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
@@ -0,0 +1,84 @@
+/* { dg-do run  { target power10_hw }} */
+/* { dg-require-effective-target int128 } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+
+#include 
+
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  vector signed __int128 src_va_s128;
+  vector signed __int128 src_vb_s128;
+  vector signed __int128 src_vc_s128;
+  vector signed __int128 vresult_s128;
+  vector signed __int128 expected_vresult_s128;
+
+  vector unsigned __int128 src_va_u128;
+  vector unsigned __int128 src_vb_u128;
+  vector unsigned __int128 src_vc_u128;
+  vector unsigned __int128 vresult_u128;
+  vector unsigned __int128 expected_vresult_u128;
+
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
+  src_vc_s128 = (vector signed __int128) {0x};
+  expected_vresult_s128 = (vector signed __int128) {0x32147658ba9cfed0};
+
+  /* Signed arguments.  

[PATCH 2/13] rs6000, Remove __builtin_vsx_xvcvspsxws built-in

2024-04-19 Thread Carl Love
rs6000, Remove __builtin_vsx_xvcvspsxws built-in

The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

This patch removes the redundant built-in.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws):
Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtins.def | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 7c36976a089..c6d2ea1bc39 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1709,9 +1709,6 @@
   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}
 
-  const vsi __builtin_vsx_xvcvspsxws (vf);
-XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
-
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-- 
2.44.0



[PATCH 1/13] rs6000, Remove __builtin_vsx_cmple* builtins

2024-04-19 Thread Carl Love


rs6000, Remove __builtin_vsx_cmple* builtins

The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
unsigned arguments and return an unsigned result.  The current definitions
take signed arguments and return signed results which is incorrect.

The signed and unsigned versions of __builtin_vsx_cmple* are not
documented in extend.texi.  Also there are no test cases for the
built-ins.

Users can use the existing vec_cmple as PVIPR defines instead of
__builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi,
__builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di,
__builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi,
__builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti.

Hence these built-ins are redundant and are removed by this patch.

gcc/ChangeLog:
* config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI,
RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI,
RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI,
RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI,
RS6000_BIF_CMPLE_U1TI): Remove case statements.
config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi,
__builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si,
__builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi,
__builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si,
__builtin_vsx_cmple_u8hi): Remove buit-in definitions.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 13 
 gcc/config/rs6000/rs6000-builtins.def | 30 ---
 2 files changed, 43 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 320affd79e3..ac9f16fe51a 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2027,19 +2027,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   fold_compare_helper (gsi, GT_EXPR, stmt);
   return true;
 
-case RS6000_BIF_CMPLE_16QI:
-case RS6000_BIF_CMPLE_U16QI:
-case RS6000_BIF_CMPLE_8HI:
-case RS6000_BIF_CMPLE_U8HI:
-case RS6000_BIF_CMPLE_4SI:
-case RS6000_BIF_CMPLE_U4SI:
-case RS6000_BIF_CMPLE_2DI:
-case RS6000_BIF_CMPLE_U2DI:
-case RS6000_BIF_CMPLE_1TI:
-case RS6000_BIF_CMPLE_U1TI:
-  fold_compare_helper (gsi, LE_EXPR, stmt);
-  return true;
-
 /* flavors of vec_splat_[us]{8,16,32}.  */
 case RS6000_BIF_VSPLTISB:
 case RS6000_BIF_VSPLTISH:
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..7c36976a089 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1337,30 +1337,6 @@
   const vss __builtin_vsx_cmpge_u8hi (vus, vus);
 CMPGE_U8HI vector_nltuv8hi {}
 
-  const vsc __builtin_vsx_cmple_16qi (vsc, vsc);
-CMPLE_16QI vector_ngtv16qi {}
-
-  const vsll __builtin_vsx_cmple_2di (vsll, vsll);
-CMPLE_2DI vector_ngtv2di {}
-
-  const vsi __builtin_vsx_cmple_4si (vsi, vsi);
-CMPLE_4SI vector_ngtv4si {}
-
-  const vss __builtin_vsx_cmple_8hi (vss, vss);
-CMPLE_8HI vector_ngtv8hi {}
-
-  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
-CMPLE_U16QI vector_ngtuv16qi {}
-
-  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
-CMPLE_U2DI vector_ngtuv2di {}
-
-  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
-CMPLE_U4SI vector_ngtuv4si {}
-
-  const vss __builtin_vsx_cmple_u8hi (vss, vss);
-CMPLE_U8HI vector_ngtuv8hi {}
-
   const vd __builtin_vsx_concat_2df (double, double);
 CONCAT_2DF vsx_concat_v2df {}
 
@@ -3117,12 +3093,6 @@
   const vbq __builtin_altivec_cmpge_u1ti (vuq, vuq);
 CMPGE_U1TI vector_nltuv1ti {}
 
-  const vbq __builtin_altivec_cmple_1ti (vsq, vsq);
-CMPLE_1TI vector_ngtv1ti {}
-
-  const vbq __builtin_altivec_cmple_u1ti (vuq, vuq);
-CMPLE_U1TI vector_ngtuv1ti {}
-
   const unsigned long long __builtin_altivec_cntmbb (vuc, const int<1>);
 VCNTMBB vec_cntmb_v16qi {}
 
-- 
2.44.0



[PATCH 0/13] rs6000, built-in cleanup patch series

2024-04-19 Thread Carl Love
GCC maintainers:

The following patch series removes duplicate built-ins.  There are patches to 
extend an existing overloaded built-in to cover additional input types.  The 
final patch removes built-ins to set and initialize vectors.  The code 
generated by these built-ins with the default optimization is efficient than 
the code generated by using straight C code.  The assembly code for the 
built-in and straight C code is the same with -O3
optimizations.  In this case, the built-ins are removed as they add no 
additional value.

The patches have all been tested on Power 10 LE.  The last patch was also 
tested on Power 8 BE.

No regression tests were seen.

Please let me know if the patches are acceptable for mainline.  Thanks.

   Carl 



Re: [PATCH 01/11] rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

2024-02-28 Thread Carl Love
Kewen:

Thanks for the review.  From the review, it looks like a few of the built-ins 
just need to be replaced with an overloaded version of an existing PVPIR 
documented buit-in.  Most of the rest can just be removed.  I will work on 
redoing the patch set accordingly.  We can then look at the new patch set after 
stage 4 is over.

   Carl 

On 2/20/24 09:55, Carl Love wrote:
> 
> GCC maintainers:
> 
> This patch fixes the arguments and return type for the various 
> __builtin_vsx_cmple* built-ins.  They were defined as signed but should have 
> been defined as unsigned.
> 
> The patch has been tested on Power 10 with no regressions.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>   Carl 
> 
> -
> 
> rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins
> 
> The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
> __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
> unsigned arguments and return an unsigned result.  This patch changes
> the arguments and return type from signed to unsigned.
> 
> The documentation for the signed and unsigned versions of
> __builtin_vsx_cmple is missing from extend.texi.  This patch adds the
> missing documentation.
> 
> Test cases are added for each of the signed and unsigned built-ins.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_u16qi,
>   __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si): Change
>   arguments and return from signed to unsigned.
>   * doc/extend.texi (__builtin_vsx_cmple_16qi,
>   __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_4si,
>   __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u8hi,
>   __builtin_vsx_cmple_u4si): Add documentation.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vsx-cmple.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-builtins.def|  10 +-
>  gcc/doc/extend.texi  |  23 
>  gcc/testsuite/gcc.target/powerpc/vsx-cmple.c | 127 +++
>  3 files changed, 155 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 3bc7fed6956..d66a53a0fab 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1349,16 +1349,16 @@
>const vss __builtin_vsx_cmple_8hi (vss, vss);
>  CMPLE_8HI vector_ngtv8hi {}
>  
> -  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
> +  const vuc __builtin_vsx_cmple_u16qi (vuc, vuc);
>  CMPLE_U16QI vector_ngtuv16qi {}
>  
> -  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
> +  const vull __builtin_vsx_cmple_u2di (vull, vull);
>  CMPLE_U2DI vector_ngtuv2di {}
>  
> -  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
> +  const vui __builtin_vsx_cmple_u4si (vui, vui);
>  CMPLE_U4SI vector_ngtuv4si {}
>  
> -  const vss __builtin_vsx_cmple_u8hi (vss, vss);
> +  const vus __builtin_vsx_cmple_u8hi (vus, vus);
>  CMPLE_U8HI vector_ngtuv8hi {}
>  
>const vd __builtin_vsx_concat_2df (double, double);
> @@ -1769,7 +1769,7 @@
>const vf __builtin_vsx_xvcvuxdsp (vull);
>  XVCVUXDSP vsx_xvcvuxdsp {}
>  
> -  const vd __builtin_vsx_xvcvuxwdp (vsi);
> +  const vd __builtin_vsx_xvcvuxwdp (vui);
>  XVCVUXWDP vsx_xvcvuxwdp {}
>  
>const vf __builtin_vsx_xvcvuxwsp (vsi);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 2b8ba1949bf..4d8610f6aa8 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22522,6 +22522,29 @@ if the VSX instruction set is available.  The 
> @samp{vec_vsx_ld} and
>  @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
>  @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
>  
> +
> +@smallexample
> +vector signed char __builtin_vsx_cmple_16qi (vector signed char,
> + vector signed char);
> +vector signed short __builtin_vsx_cmple_8hi (vector signed short,
> + vector signed short);
> +vector signed int __builtin_vsx_cmple_4si (vector signed int,
> + vector signed int);
> +vector unsigned char __builtin_vsx_cmple_u16qi (vector unsigned char,
> +vector unsigned char);
> +vector unsigned short __builtin_vsx_cmple_u8hi (vector unsigned short,
> +vector unsigned short);
> +vector unsigned i

PATCH 11/11] rs6000, make test vec-cmpne.c a runnable test

2024-02-20 Thread Carl Love
 GCC maintainers:

The patch changes the  vec-cmpne.c from a compile only test to a runnable test. 
 The macros to create the functions needed to test the built-ins and verify the 
restults are all there in the include file.  The .c file just needed to have 
the macro definitions inserted and change the header from compile to run.  The 
test can now do functional verification of the results in addition to verifying 
the expected instructions are generated.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, make test vec-cmpne.c a runnable test

The macros in vec-cmpne.h define test functions.  They also setup
test value functions, verification functions and execute test functions.
The test is setup as a compile only test so none of the verification and
execute functions are being used.

The patch adds the macro definitions to create the intialization,
verfiy and execute functions to a main program so not only can the
test verify the correct instructions are generated but also run the
tests and verify the results.  The test is then changed from a compile
to a run test.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-cmple.c (main): Add main function with
macro calls to define the test functions, create the verify
functions and execute functions.
Update scan-assembler-times (vcmpequ): Updated count to include
instructions used to generate expected test results.
* gcc.target/powerpc/vec-cmple.h (vector_tests_##NAME): Remove
line continuation after closing bracket.  Remove extra blank line.
---
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.c | 41 +++-
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h |  3 +-
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c 
b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
index b57e0ac8638..2c369976a44 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
@@ -1,20 +1,41 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec -O2" } */
+/* { dg-options "-maltivec -O2 -save-temps" } */
 
 /* Test that the vec_cmpne builtin generates the expected Altivec
instructions.  */
 
 #include "vec-cmpne.h"
 
-define_test_functions (int, signed int, signed int, si);
-define_test_functions (int, unsigned int, unsigned int, ui);
-define_test_functions (short, signed short, signed short, ss);
-define_test_functions (short, unsigned short, unsigned short, us);
-define_test_functions (char, signed char, signed char, sc);
-define_test_functions (char, unsigned char, unsigned char, uc);
-define_test_functions (int, signed int, float, ff);
+int main ()
+{
+  define_test_functions (int, signed int, signed int, si);
+  define_test_functions (int, unsigned int, unsigned int, ui);
+  define_test_functions (short, signed short, signed short, ss);
+  define_test_functions (short, unsigned short, unsigned short, us);
+  define_test_functions (char, signed char, signed char, sc);
+  define_test_functions (char, unsigned char, unsigned char, uc);
+  define_test_functions (int, signed int, float, ff);
+
+  define_init_verify_functions (int, signed int, signed int, si);
+  define_init_verify_functions (int, unsigned int, unsigned int, ui);
+  define_init_verify_functions (short, signed short, signed short, ss);
+  define_init_verify_functions (short, unsigned short, unsigned short, us);
+  define_init_verify_functions (char, signed char, signed char, sc);
+  define_init_verify_functions (char, unsigned char, unsigned char, uc);
+  define_init_verify_functions (int, signed int, float, ff);
+
+  execute_test_functions (int, signed int, signed int, si);
+  execute_test_functions (int, unsigned int, unsigned int, ui);
+  execute_test_functions (short, signed short, signed short, ss);
+  execute_test_functions (short, unsigned short, unsigned short, us);
+  execute_test_functions (char, signed char, signed char, sc);
+  execute_test_functions (char, unsigned char, unsigned char, uc);
+  execute_test_functions (int, signed int, float, ff);
+
+  return 0;
+}
 
 /* { dg-final { scan-assembler-times {\mvcmpequb\M}  2 } } */
 /* { dg-final { scan-assembler-times {\mvcmpequh\M}  2 } } */
-/* { dg-final { scan-assembler-times {\mvcmpequw\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequw\M}  32 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h 
b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
index a304de01d86..374cca360b3 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
@@ -33,7 +33,7 @@ __attribute__((noinline)) void vector_tests_##NAME () \
   tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
 

PATCH 10/11] rs6000, add test cases for __builtin_vec_init* and, __builtin_vec_set*

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds test cases for the __builtin_vec_init* and __builtin_vec_set* 
built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, add test cases for __builtin_vec_init* and __builtin_vec_set*

Add test cases for the following built-ins:

__builtin_vec_init_v1ti
__builtin_vec_init_v2df
__builtin_vec_init_v2di
__builtin_vec_set_v1ti
__builtin_vec_set_v2df
__builtin_vec_set_v2di

Note, the above built-ins are documented in extend.texi.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-21.c: New test file.
---
 .../gcc.target/powerpc/vsx-builtin-21.c   | 181 ++
 1 file changed, 181 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c
new file mode 100644
index 000..b7e1201f37e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c
@@ -0,0 +1,181 @@
+/* { dg-do run { target int128 } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mvsx" } */
+
+/* This test should run the same on any target that supports vsx
+   instructions.  Intentionally not specifying cpu in order to test
+   all code generation paths.  */
+
+#define DEBUG 0
+
+#include 
+
+#if DEBUG
+#include 
+#include 
+
+void print_i128 (__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+(signed long long)(val >> 64),
+(unsigned long long)(val & 0x),
+(unsigned long long)(val >> 64),
+(unsigned long long)(val & 0x));
+}
+#endif
+
+void abort (void);
+
+void test_vec_init_v1ti (__int128_t ti_arg,
+vector __int128_t v1ti_expected_result)
+{
+  vector __int128_t v1ti_result;
+
+  v1ti_result = __builtin_vec_init_v1ti (ti_arg);
+  if (v1ti_result[0] != v1ti_expected_result[0])
+{
+#if DEBUG
+   printf ("test_vec_init_v1ti: v1ti_result[0] = ");
+   print_i128 (v1ti_result[0]);
+   printf( "vf_expected_result[0] = ");
+   print_i128 (v1ti_expected_result[0]);
+   printf("\n");
+#else
+   abort();
+#endif
+}
+}
+
+void test_vec_init_v2df (double d_arg1, double d_arg2,
+vector double v2df_expected_result)
+{
+  vector double v2df_result;
+  int i;
+
+  v2df_result = __builtin_vec_init_v2df (d_arg1, d_arg2);
+
+  for ( i= 0; i < 2; i++)
+if (v2df_result[i] != v2df_expected_result[i])
+#if DEBUG
+  printf ("test_vec_init_v2df: v2df_result[%d] = %f, 
v2df_expected_result[%d] = %f\n",
+ i, v2df_result[i], i, v2df_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_init_v2di (signed long long sl_arg1, signed long long sl_arg2,
+vector signed long long v2di_expected_result)
+{
+  vector signed long long v2di_result;
+  int i;
+
+  v2di_result = __builtin_vec_init_v2di (sl_arg1, sl_arg2);
+
+  for ( i= 0; i < 2; i++)
+if (v2di_result[i] != v2di_expected_result[i])
+#if DEBUG
+  printf ("test_vec_init_v2di: v2di_result[%d] = %lld, 
v2df_expected_result[%d] = %lld\n",
+ i, v2di_result[i], i, v2di_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_set_v1ti (vector __int128_t v1ti_arg, __int128_t ti_arg,
+   vector __int128_t v1ti_expected_result)
+{
+  vector __int128_t v1ti_result;
+
+  v1ti_result = __builtin_vec_set_v1ti (v1ti_arg, ti_arg, 0);
+  if (v1ti_result[0] != v1ti_expected_result[0])
+{
+#if DEBUG
+   printf ("test_vec_set_v1ti: v1ti_result[0] = ");
+   print_i128 (v1ti_result[0]);
+   printf( "vf_expected_result[0] = ");
+   print_i128 (v1ti_expected_result[0]);
+   printf("\n");
+#else
+   abort();
+#endif
+}
+}
+
+void test_vec_set_v2df (vector double v2df_arg, double d_arg,
+   vector double v2df_expected_result)
+{
+  vector double v2df_result;
+  int i;
+
+  v2df_result = __builtin_vec_set_v2df (v2df_arg, d_arg, 0);
+
+  for ( i= 0; i < 2; i++)
+if (v2df_result[i] != v2df_expected_result[i])
+#if DEBUG
+  printf ("test_vec_set_v2df: v2df_result[%d] = %f, 
v2df_expected_result[%d] = %f\n",
+ i, v2df_result[i], i, v2df_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_set_v2di (vector signed long long v2di_arg, signed long long 
sl_arg,
+   vector signed long long v2di_expected_result)
+{
+  vector signed long long v2di_result;
+  int i;
+
+  v2di_result = __builtin_vec_set_v2di (v2di_arg, sl_arg, 1);
+
+  for ( i= 0; i < 2; i++)
+if (v2di_result[i] != v2di_expected_result[i])
+#if DEBUG
+  printf ("test_vec_set_v2di: v2di_result[%d] = %lld, 
v2df_expected_result[%d] = %lld\n",
+ i, v2di_result[i], i, 

[PATCH 09/11] rs6000, add test cases for the vec_cmpne built-ins

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds test cases for the vec_cmpne of built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, add test cases for the vec_cmpne built-ins

Add test cases for the signed int, unsigned it, signed short, unsigned
short, signed char and unsigned char built-ins.

Note, the built-ins are documented in the Power Vector Instrinsic
Programing reference manual.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-cmple.c: New test case.
* gcc.target/powerpc/vec-cmple.h: New test case include file.
---
 gcc/testsuite/gcc.target/powerpc/vec-cmple.c | 35 
 gcc/testsuite/gcc.target/powerpc/vec-cmple.h | 84 
 2 files changed, 119 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmple.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmple.h

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmple.c 
b/gcc/testsuite/gcc.target/powerpc/vec-cmple.c
new file mode 100644
index 000..766a1c770e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmple.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+/* Test that the vec_cmpne builtin generates the expected Altivec
+   instructions.  */
+
+#include "vec-cmple.h"
+
+int main ()
+{
+  /* Note macro expansions for "signed long long int" and
+ "unsigned long long int" do not work for the vec_vsx_ld builtin.  */
+  define_test_functions (int, signed int, signed int, si);
+  define_test_functions (int, unsigned int, unsigned int, ui);
+  define_test_functions (short, signed short, signed short, ss);
+  define_test_functions (short, unsigned short, unsigned short, us);
+  define_test_functions (char, signed char, signed char, sc);
+  define_test_functions (char, unsigned char, unsigned char, uc);
+
+  define_init_verify_functions (int, signed int, signed int, si);
+  define_init_verify_functions (int, unsigned int, unsigned int, ui);
+  define_init_verify_functions (short, signed short, signed short, ss);
+  define_init_verify_functions (short, unsigned short, unsigned short, us);
+  define_init_verify_functions (char, signed char, signed char, sc);
+  define_init_verify_functions (char, unsigned char, unsigned char, uc);
+
+  execute_test_functions (int, signed int, signed int, si);
+  execute_test_functions (int, unsigned int, unsigned int, ui);
+  execute_test_functions (short, signed short, signed short, ss);
+  execute_test_functions (short, unsigned short, unsigned short, us);
+  execute_test_functions (char, signed char, signed char, sc);
+  execute_test_functions (char, unsigned char, unsigned char, uc);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmple.h 
b/gcc/testsuite/gcc.target/powerpc/vec-cmple.h
new file mode 100644
index 000..4126706b99a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmple.h
@@ -0,0 +1,84 @@
+#include "altivec.h"
+
+#define N 4096
+
+#include 
+void abort ();
+
+#define PRAGMA(X) _Pragma (#X)
+#define UNROLL0 PRAGMA (GCC unroll 0)
+
+#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME)  \
+\
+RTYPE result_le_##NAME[N] __attribute__((aligned(16))); \
+STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
+STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
+RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
+\
+__attribute__((noinline)) void vector_tests_##NAME () \
+{ \
+  vector STYPE v1_##NAME, v2_##NAME; \
+  vector bool VBTYPE tmp_##NAME; \
+  int i; \
+  UNROLL0 \
+  for (i = 0; i < N; i+=16/sizeof (STYPE)) \
+{ \
+  /* result_le = operand1!=operand2.  */ \
+  v1_##NAME = vec_vsx_ld (0, (const vector STYPE*)_##NAME[i]); \
+  v2_##NAME = vec_vsx_ld (0, (const vector STYPE*)_##NAME[i]); \
+\
+  tmp_##NAME = vec_cmple (v1_##NAME, v2_##NAME); \
+  vec_vsx_st (tmp_##NAME, 0, _le_##NAME[i]); \
+} \
+}
+
+#define define_init_verify_functions(VBTYPE, RTYPE, STYPE, NAME)   \
+__attribute__((noinline)) void init_##NAME () \
+{ \
+  int i; \
+  for (i = 0; i < N; ++i) \
+{ \
+  result_le_##NAME[i] = 7; \
+  if (i%3 == 0) \
+   { \
+ /* op1 < op2.  */ \
+ operand1_##NAME[i] = 1; \
+ operand2_##NAME[i] = 2; \
+   } \
+  else if (i%3 == 1) \
+   { \
+ /* op1 > op2.  */ \
+ operand1_##NAME[i] = 2; \
+ operand2_##NAME[i] = 1; \
+   } \
+  else if (i%3 == 2) \
+   { \
+ /* op1 == op2.  */ \
+ operand1_##NAME[i] = 3; \
+ operand2_##NAME[i] = 3; \
+   } \
+  /* For vector comparisons: "For each element of the result_le, the \
+ value of each bit is 1 if the corresponding elements of ARG1 and \
+ ARG2 are equal." {or whatever the comparison is} 

[PATCH 07/11] rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add, documentation and test case

2024-02-20 Thread Carl Love


 GCC maintainers:

The patch adds documentation and test case for the  __builtin_vsx_xvcmpeq[sp, 
dp, sp_p] built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add documentation and test case

Add a test case for the __builtin_vsx_xvcmpeqsp_p built-in.

Add documentation for the __builtin_vsx_xvcmpeqsp_p,
__builtin_vsx_xvcmpeqdp, and __builtin_vsx_xvcmpeqsp builtins.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xvcmpeqsp_p,
__builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpeqsp): Add
documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-4.c: New test case.
---
 gcc/doc/extend.texi   |  23 +++
 .../powerpc/vsx-builtin-runnable-4.c  | 135 ++
 2 files changed, 158 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 22f67ebab31..87fd30bfa9e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22700,6 +22700,18 @@ vectors of their defined type.  The corresponding 
result element is set to
 all ones if the two argument elements are less than or equal and all zeros
 otherwise.
 
+@smallexample
+const vf __builtin_vsx_xvcmpeqsp (vf, vf);
+const vd __builtin_vsx_xvcmpeqdp (vd, vd);
+@end smallexample
+
+The builti-ins @code{__builtin_vsx_xvcmpeqdp} and
+@code{__builtin_vsx_xvcmpeqdp} compare two floating point vectors and return
+a vector.  If the corresponding elements are equal then the corresponding
+vector element of the result is set to all ones, it is set to all zeros
+otherwise.
+
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
 
@@ -23989,6 +24001,17 @@ is larger than 128 bits, the result is undefined.
 The result is the modulo result of dividing the first input  by the second
 input.
 
+@smallexample
+const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
+@end smallexample
+
+The first argument of the builti-in @code{__builtin_vsx_xvcmpeqdp_p} is an
+integer in the range of 0 to 1.  The second and third arguments are floating
+point vectors to be compared.  The result is 1 if the first argument is a 1
+and one or more of the corresponding vector elements are equal.  The result is
+1 if the first argument is 0 and all of the corresponding vector elements are
+not equal.  The result is zero otherwise.
+
 The following builtins perform 128-bit vector comparisons.  The
 @code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
 one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c
new file mode 100644
index 000..8ac07c7c807
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c
@@ -0,0 +1,135 @@
+/* { dg-do run { target { power10_hw } } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */
+/* { dg-require-effective-target power10_ok } */
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+  int result;
+  vector float vf_arg1, vf_arg2;
+  vector double d_arg1, d_arg2;
+
+  /* Compare vectors with one equal element, check
+ for all elements unequal, i.e. first arg is 1.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {1.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p (1, vf_arg1, vf_arg2);
+
+#if DEBUG
+  printf("result = 0x%x\n", (unsigned int) result);
+#endif
+
+  if (result != 1)
+for (i = 0; i < 4; i++)
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvcmpeqsp_p 1: arg 1 = 1, varg3[%d] = %f, 
varg3[%d] = %f\n",
+i, vf_arg1[i], i, vf_arg2[i]);
+#else
+  abort();
+#endif
+  /* Compare vectors with one equal element, check
+ for all elements unequal, i.e. first arg is 0.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {1.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p (0, vf_arg1, vf_arg2);
+
+#if DEBUG
+  printf("result = 0x%x\n", (unsigned int) result);
+#endif
+
+  if (result != 0)
+for (i = 0; i < 4; i++)
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvcmpeqsp_p 2: arg 1 = 0, varg3[%d] = %f, 
varg3[%d] = %f\n",
+i, vf_arg1[i], i, vf_arg2[i]);
+#else
+  abort();
+#endif
+
+  /* Compare vectors with all unequal elements, check
+ for all elements unequal, i.e. first arg is 1.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {8.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p 

[PATCH 03/11] rs6000, remove duplicated built-ins

2024-02-20 Thread Carl Love
GCC maintainers:

There are a number of undocumented built-ins that are duplicates of other 
documented built-ins.  This patch removes the duplicates so users will only use 
the documented built-in.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-

rs6000, remove duplicated built-ins

The following undocumented built-ins are same as existing documented
overloaded builtins.

  const vf __builtin_vsx_xxmrghw (vf, vf);
same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)

  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)

  const vf __builtin_vsx_xxmrglw (vf, vf);
same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)

  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)

  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)

  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)

  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)

  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)

  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)

  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)

  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)

  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)

  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)

  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)

This patch removed the duplicate built-in definitions so only the
documented built-ins will be available for use.  The case statements in
rs6000_gimple_fold_builtin that ar no longer needed are also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
__builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel_2di, __builtin_vsx_xxsel_2di_uns,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_4si_uns, __builtin_vsx_xxsel_8hi,
__builtin_vsx_xxsel_8hi_uns): Removed built-in definition.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
remove case entries RS6000_BIF_XXMRGLW_4SI,
RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
RS6000_BIF_XXMRGHW_4SF.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df): Remove test
cases for removed built-ins.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 --
 gcc/config/rs6000/rs6000-builtins.def | 42 ---
 .../gcc.target/powerpc/vsx-builtin-3.c|  6 ---
 3 files changed, 52 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 6698274031b..e436cbe4935 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2110,20 +2110,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 /* vec_mergel (integrals).  */
 case RS6000_BIF_VMRGLH:
 case RS6000_BIF_VMRGLW:
-case RS6000_BIF_XXMRGLW_4SI:
 case RS6000_BIF_VMRGLB:
 case RS6000_BIF_VEC_MERGEL_V2DI:
-case RS6000_BIF_XXMRGLW_4SF:
 case RS6000_BIF_VEC_MERGEL_V2DF:
   fold_mergehl_helper (gsi, stmt, 1);
   return true;
 /* vec_mergeh (integrals).  */
 case RS6000_BIF_VMRGHH:
 case RS6000_BIF_VMRGHW:
-case RS6000_BIF_XXMRGHW_4SI:
 case RS6000_BIF_VMRGHB:
 case RS6000_BIF_VEC_MERGEH_V2DI:
-case RS6000_BIF_XXMRGHW_4SF:
 case RS6000_BIF_VEC_MERGEH_V2DF:
   fold_mergehl_helper (gsi, stmt, 0);
   return true;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index fd316f629e5..96d095da2cb 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1925,18 +1925,6 @@
   const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
 XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
 

[PATCH 08/11] rs6000, add tests and documentation for various, built-ins

2024-02-20 Thread Carl Love
 
 GCC maintainers:

The patch adds documentation a number of built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

 rs6000, add tests and documentation for various built-ins

This patch adds a test case and documentation in extend.texi for the
following built-ins:

__builtin_altivec_fix_sfsi
__builtin_altivec_fixuns_sfsi
__builtin_altivec_float_sisf
__builtin_altivec_uns_float_sisf
__builtin_altivec_vrsqrtfp
__builtin_altivec_mask_for_load
__builtin_altivec_vsel_1ti
__builtin_altivec_vsel_1ti_uns
__builtin_vec_init_v16qi
__builtin_vec_init_v4sf
__builtin_vec_init_v4si
__builtin_vec_init_v8hi
__builtin_vec_set_v16qi
__builtin_vec_set_v4sf
__builtin_vec_set_v4si
__builtin_vec_set_v8hi

gcc/ChangeLog:
* doc/extend.texi (__builtin_altivec_fix_sfsi,
__builtin_altivec_fixuns_sfsi, __builtin_altivec_float_sisf,
__builtin_altivec_uns_float_sisf, __builtin_altivec_vrsqrtfp,
__builtin_altivec_mask_for_load, __builtin_altivec_vsel_1ti,
__builtin_altivec_vsel_1ti_uns, __builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_set_v16qi,
__builtin_vec_set_v4sf, __builtin_vec_set_v4si,
__builtin_vec_set_v8hi): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-38.c: New test case.
---
 gcc/doc/extend.texi   |  98 
 gcc/testsuite/gcc.target/powerpc/altivec-38.c | 503 ++
 2 files changed, 601 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-38.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 87fd30bfa9e..89d0a1f77b0 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22678,6 +22678,104 @@ if the VSX instruction set is available.  The 
@samp{vec_vsx_ld} and
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
 
+@smallexample
+vector signed int __builtin_altivec_fix_sfsi (vector float);
+vector signed int __builtin_altivec_fixuns_sfsi (vector float);
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector int);
+vector float __builtin_altivec_vrsqrtfp (vector float);
+@end smallexample
+
+The @code{__builtin_altivec_fix_sfsi} converts a vector of single precision
+floating point values to a vector of signed integers with round to zero.
+
+The @code{__builtin_altivec_fixuns_sfsi} converts a vector of single precision
+floating point values to a vector of unsigned integers with round to zero.  If
+the rounded floating point value is less then 0 the result is 0 and VXCVI
+is set to 1.
+
+The @code{__builtin_altivec_float_sisf} converts a vector of single precision
+signed integers to a vector of floating point values using the rounding mode
+specified by RN.
+
+The @code{__builtin_altivec_uns_float_sisf} converts a vector of single
+precision unsigned integers to a vector of floating point values using the
+rounding mode specified by RN.
+
+The @code{__builtin_altivec_vrsqrtfp} returns a vector of floating point
+estimates of the reciprical square root of each floating point source vector
+element.
+
+@smallexample
+vector signed char test_altivec_mask_for_load (const void *);
+@end smallexample
+
+The @code{__builtin_altivec_vrsqrtfp} returns a vector mask based on the
+bottom four bits of the argument.  Let X be the 32-byte value:
+0x00 || 0x01 || 0x02 || ... || 0x1D || 0x1E || 0x1F.
+Bytes sh to sh+15 are returned where sh is given by the least significant 4
+bit of the argument. See description of lvsl, lvsr instructions.
+
+@smallexample
+vector signed __int128 __builtin_altivec_vsel_1ti (vector signed __int128,
+   vector signed __int128,
+   vector unsigned __int128);
+vector unsigned __int128
+  __builtin_altivec_vsel_1ti_uns (vector unsigned __int128,
+  vector unsigned __int128,
+  vector unsigned __int128)
+@end smallexample
+
+Let the arguments of @code{__builtin_altivec_vsel_1ti} and
+@code{__builtin_altivec_vsel_1ti_uns} be src1, src2, mask.  The result is
+given by (src1 & ~mask) | (src2 & mask).
+
+@smallexample
+vector signed char
+__builtin_vec_init_v16qi (signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char);
+
+vector short int __builtin_vec_init_v8hi (short int, short int, short int,
+  short int, short int, short int,
+  short int, short int);

[PATCH 04/11] rs6000, Update comment for the __builtin_vsx_vper*, built-ins.

2024-02-20 Thread Carl Love
GCC maintainers:

The patch expands an existing comment to document that the duplicates are 
covered by an overloaded built-in.  I am wondering if we should just go ahead 
and remove the duplicates?

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-
rs6000, Update comment for the __builtin_vsx_vper* built-ins.

There is a comment about the __builtin_vsx_vper* built-ins being
duplicates of the __builtin_altivec_* built-ins.  The note says we
should consider deprecation/removeal of the __builtin_vsx_vper*.  Add a
note that the _builtin_vsx_vper* built-ins are covered by the overloaded
vec_perm built-ins which use the __builtin_altivec_* built-in definitions.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def ( __builtin_vsx_vperm_*):
Add comment to existing comment about the built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 96d095da2cb..4c95429f137 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1556,6 +1556,14 @@
 ; These are duplicates of __builtin_altivec_* counterparts, and are being
 ; kept for backwards compatibility.  The reason for their existence is
 ; unclear.  TODO: Consider deprecation/removal at some point.
+; Note, __builtin_vsx_vperm_16qi, __builtin_vsx_vperm_16qi_uns,
+; __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_v1ti_uns,
+; __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, __builtin_vsx_vperm_2di,
+; __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
+; __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns,
+; __builtin_vsx_vperm_8hi, __builtin_altivec_vperm_8hi_uns
+; are all covered by the overloaded vec_perm built-in which uses the
+; __builtin_altivec_* built-in definitions.
   const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
 VPERM_16QI_X altivec_vperm_v16qi {}
 
-- 
2.43.0



[PATCH 06/11] rs6000, __builtin_vsx_xxpermdi_1ti add documentation, and test case

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds documentation and test case for the __builtin_vsx_xxpermdi_1ti 
built-in.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, __builtin_vsx_xxpermdi_1ti add documentation and test case

Add documentation to the extend.texi file for the
__builtin_vsx_xxpermdi_1ti built-in.

Add test cases for the __builtin_vsx_xxpermdi_1ti built-in.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xxpermdi_1ti): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-3.c: New test case.
---
 gcc/doc/extend.texi   |  7 +++
 .../powerpc/vsx-builtin-runnable-3.c  | 48 +++
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 83eed9e334b..22f67ebab31 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21508,6 +21508,13 @@ vector __int128  __builtin_vsx_xxpermdi_1ti (vector 
__int128, vector __int128,
 const int);
 
 @end smallexample
+
+The  @code{__builtin_vsx_xxpermdi_1ti} Let srcA[127:0] be the 128-bit first
+argument and srcB[127:0] be the 128-bit second argument.  Let sel[1:0] be the
+least significant bits of the const int argument (third input argument).  The
+result bits [127:64] is srcB[127:64] if  sel[1] = 0, srcB[63:0] otherwise.  The
+result bits [63:0] is srcA[127:64] if  sel[0] = 0, srcA[63:0] otherwise.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.07
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c
new file mode 100644
index 000..ba287597cec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { lp64 } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed __int128 vsq_arg1, vsq_arg2, vsq_result, vsq_expected_result;
+
+  vsq_arg1[0] = (__int128) 0x;
+  vsq_arg1[0] = vsq_arg1[0] << 64 | (__int128) 0x;
+  vsq_arg2[0] = (__int128) 0x1100110011001100;
+  vsq_arg2[0] = (vsq_arg2[0]  << 64) | (__int128) 0x;
+
+  vsq_expected_result[0] = (__int128) 0x;
+  vsq_expected_result[0] = (vsq_expected_result[0] << 64)
+| (__int128) 0x;
+
+  vsq_result = __builtin_vsx_xxpermdi_1ti (vsq_arg1, vsq_arg2, 2);
+
+  if (vsq_result[0] != vsq_expected_result[0])
+{
+#if DEBUG
+   printf("ERROR, __builtin_vsx_xxpermdi_1ti: vsq_result = 0x%016llx 
%016llx\n",
+ (unsigned long long) (vsq_result[0] >> 64),
+ (unsigned long long) vsq_result[0]);
+   printf(" vsq_expected_resultd = 0x%016llx 
%016llx\n",
+ (unsigned long long)(vsq_expected_result[0] >> 64),
+ (unsigned long long) vsq_expected_result[0]);
+#else
+  abort();
+#endif
+ }
+
+  return 0;
+}
-- 
2.43.0



[PATCH 05/11] rs6000, __builtin_vsx_xvneg[sp,dp] add documentation, and test cases

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds documentation and test cases for the __builtin_vsx_xvnegsp, 
__builtin_vsx_xvnegdp built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, __builtin_vsx_xvneg[sp,dp] add documentation and test cases

Add documentation to the extend.texi file for the two built-ins
__builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp.

Add test cases for the two built-ins.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp):
Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-2.c: New test case.
---
 gcc/doc/extend.texi   | 13 +
 .../powerpc/vsx-builtin-runnable-2.c  | 51 +++
 2 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 583b1d890bf..83eed9e334b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21495,6 +21495,19 @@ The @code{__builtin_vsx_xvcvuxwdp} converts single 
precision unsigned integer
 value to a double precision floating point value.  Input element at index 2*i
 is stored in the destination element i.
 
+@smallexample
+vector float __builtin_vsx_xvnegsp (vector float);
+vector double __builtin_vsx_xvnegdp (vector double);
+@end smallexample
+
+The  @code{__builtin_vsx_xvnegsp} and @code{__builtin_vsx_xvnegdp} negate each
+vector element.
+
+@smallexample
+vector __int128  __builtin_vsx_xxpermdi_1ti (vector __int128, vector __int128,
+const int);
+
+@end smallexample
 @node Basic PowerPC Built-in Functions Available on ISA 2.07
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c
new file mode 100644
index 000..7906a8e01d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c
@@ -0,0 +1,51 @@
+/* { dg-do run { target { lp64 } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+  vector double vd_arg1, vd_result, vd_expected_result;
+  vector float vf_arg1, vf_result, vf_expected_result;
+
+  /* VSX Vector Negate Single-Precision.  */
+
+  vf_arg1 = (vector float) {-1.0, 12345.98, -2.1234, 238.9};
+  vf_result = __builtin_vsx_xvnegsp (vf_arg1);
+  vf_expected_result = (vector float) {1.0, -12345.98, 2.1234, -238.9};
+
+  for (i = 0; i < 4; i++)
+if (vf_result[i] != vf_expected_result[i])
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvnegsp: vf_result[%d] = %f, 
vf_expected_result[%d] = %f\n",
+i, vf_result[i], i, vf_expected_result[i]);
+#else
+  abort();
+#endif
+
+  /* VSX Vector Negate Double-Precision.  */
+
+  vd_arg1 = (vector double) {12345.98, -2.1234};
+  vd_result = __builtin_vsx_xvnegdp (vd_arg1);
+  vd_expected_result = (vector double) {-12345.98, 2.1234};
+
+  for (i = 0; i < 2; i++)
+if (vd_result[i] != vd_expected_result[i])
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvnegdp: vd_result[%d] = %f, 
vd_expected_result[%d] = %f\n",
+i, vd_result[i], i, vd_expected_result[i]);
+#else
+  abort();
+#endif
+
+  return 0;
+}
-- 
2.43.0



[PATCH 02/11] rs6000, fix arguments, add documentation for vector, element conversions

2024-02-20 Thread Carl Love


GCC maintainers:

This patch fixes the  return type for the __builtin_vsx_xvcvdpuxws and 
__builtin_vsx_xvcvspuxds built-ins.  They were defined as signed but should 
have been defined as unsigned.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-
rs6000, fix arguments, add documentation for vector element conversions

The return type for the __builtin_vsx_xvcvdpuxws, __builtin_vsx_xvcvspuxds,
__builtin_vsx_xvcvspuxws built-ins should be unsigned.  This patch changes
the return values from signed to unsigned.

The documentation for the vector element conversion built-ins:

__builtin_vsx_xvcvspsxws
__builtin_vsx_xvcvspsxds
__builtin_vsx_xvcvspuxds
__builtin_vsx_xvcvdpsxws
__builtin_vsx_xvcvdpuxws
__builtin_vsx_xvcvdpuxds_uns
__builtin_vsx_xvcvspdp
__builtin_vsx_xvcvdpsp
__builtin_vsx_xvcvspuxws
__builtin_vsx_xvcvsxwdp
__builtin_vsx_xvcvuxddp_uns
__builtin_vsx_xvcvuxwdp

is missing from extend.texi.  This patch adds the missing documentation.

This patch also adds runnable test cases for each of the built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvspuxds, __builtin_vsx_xvcvspuxws): Change
return type from signed to unsigned.
* doc/extend.texi (__builtin_vsx_xvcvspsxws,
__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds,
__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspdp,
__builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvspuxws,
__builtin_vsx_xvcvsxwdp, __builtin_vsx_xvcvuxddp_uns,
__builtin_vsx_xvcvuxwdp): Add documentation for builtins.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-1.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 +-
 gcc/doc/extend.texi   | 135 ++
 .../powerpc/vsx-builtin-runnable-1.c  | 233 ++
 3 files changed, 371 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-1.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index d66a53a0fab..fd316f629e5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1724,7 +1724,7 @@
   const vull __builtin_vsx_xvcvdpuxds_uns (vd);
 XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
 
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
+  const vui __builtin_vsx_xvcvdpuxws (vd);
 XVCVDPUXWS vsx_xvcvdpuxws {}
 
   const vd __builtin_vsx_xvcvspdp (vf);
@@ -1736,10 +1736,10 @@
   const vsi __builtin_vsx_xvcvspsxws (vf);
 XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
+  const vull __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
+  const vui __builtin_vsx_xvcvspuxws (vf);
 XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4d8610f6aa8..583b1d890bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21360,6 +21360,141 @@ __float128 __builtin_sqrtf128 (__float128);
 __float128 __builtin_fmaf128 (__float128, __float128, __float128);
 @end smallexample
 
+@smallexample
+vector int __builtin_vsx_xvcvspsxws (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspsxws} converts the single precision floating
+point vector element i to a signed single-precision integer value using
+round to zero storing the result in element i.  If the source element is NaN
+the result is set to 0x8000 and VXCI is set to 1.  If the source
+element is SNaN then VXSNAN is also set to 1.  If the rounded value is greater
+than 2^31 - 1 the result is 0x7FFF and VXCVI is set to 1.  If the
+rounded value is less than -2^31, the result is set to 0x8000 and
+VXCVI is set to 1. If the rounded result is inexact then XX is set to 1.
+
+@smallexample
+vector signed long long int __builtin_vsx_xvcvspsxds (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspsxds} converts the single precision floating
+point vector element to a double precision signed integer value using the
+round to zero rounding mode.  If the source element is NaN the result
+is set to 0x8000 and VXCI is set to 1.  If the source element is
+SNaN then VXSNAN is also set to 1.  If the rounded value is greater than
+2^63 - 1 the result is 0x7FFF and VXCVI is set to 1.  If the
+rounded value is less than zero, the result is set to 0x8000 and
+VXCVI is set to 1.  If the rounded result is inexact then XX is set to 1.
+
+@smallexample
+vector unsigned long long __builtin_vsx_xvcvspuxds (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspuxds} 

[PATCH 01/11] rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

2024-02-20 Thread Carl Love


GCC maintainers:

This patch fixes the arguments and return type for the various 
__builtin_vsx_cmple* built-ins.  They were defined as signed but should have 
been defined as unsigned.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-

rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
unsigned arguments and return an unsigned result.  This patch changes
the arguments and return type from signed to unsigned.

The documentation for the signed and unsigned versions of
__builtin_vsx_cmple is missing from extend.texi.  This patch adds the
missing documentation.

Test cases are added for each of the signed and unsigned built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_u16qi,
__builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si): Change
arguments and return from signed to unsigned.
* doc/extend.texi (__builtin_vsx_cmple_16qi,
__builtin_vsx_cmple_8hi, __builtin_vsx_cmple_4si,
__builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u8hi,
__builtin_vsx_cmple_u4si): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-cmple.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def|  10 +-
 gcc/doc/extend.texi  |  23 
 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c | 127 +++
 3 files changed, 155 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..d66a53a0fab 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1349,16 +1349,16 @@
   const vss __builtin_vsx_cmple_8hi (vss, vss);
 CMPLE_8HI vector_ngtv8hi {}
 
-  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
+  const vuc __builtin_vsx_cmple_u16qi (vuc, vuc);
 CMPLE_U16QI vector_ngtuv16qi {}
 
-  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
+  const vull __builtin_vsx_cmple_u2di (vull, vull);
 CMPLE_U2DI vector_ngtuv2di {}
 
-  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
+  const vui __builtin_vsx_cmple_u4si (vui, vui);
 CMPLE_U4SI vector_ngtuv4si {}
 
-  const vss __builtin_vsx_cmple_u8hi (vss, vss);
+  const vus __builtin_vsx_cmple_u8hi (vus, vus);
 CMPLE_U8HI vector_ngtuv8hi {}
 
   const vd __builtin_vsx_concat_2df (double, double);
@@ -1769,7 +1769,7 @@
   const vf __builtin_vsx_xvcvuxdsp (vull);
 XVCVUXDSP vsx_xvcvuxdsp {}
 
-  const vd __builtin_vsx_xvcvuxwdp (vsi);
+  const vd __builtin_vsx_xvcvuxwdp (vui);
 XVCVUXWDP vsx_xvcvuxwdp {}
 
   const vf __builtin_vsx_xvcvuxwsp (vsi);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2b8ba1949bf..4d8610f6aa8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22522,6 +22522,29 @@ if the VSX instruction set is available.  The 
@samp{vec_vsx_ld} and
 @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
+
+@smallexample
+vector signed char __builtin_vsx_cmple_16qi (vector signed char,
+ vector signed char);
+vector signed short __builtin_vsx_cmple_8hi (vector signed short,
+ vector signed short);
+vector signed int __builtin_vsx_cmple_4si (vector signed int,
+ vector signed int);
+vector unsigned char __builtin_vsx_cmple_u16qi (vector unsigned char,
+vector unsigned char);
+vector unsigned short __builtin_vsx_cmple_u8hi (vector unsigned short,
+vector unsigned short);
+vector unsigned int __builtin_vsx_cmple_u4si (vector unsigned int,
+  vector unsigned int);
+@end smallexample
+
+The builti-ins @code{__builtin_vsx_cmple_16qi}, @code{__builtin_vsx_cmple_8hi},
+@code{__builtin_vsx_cmple_4si}, @code{__builtin_vsx_cmple_u16qi},
+@code{__builtin_vsx_cmple_u8hi} and @code{__builtin_vsx_cmple_u4si} compare
+vectors of their defined type.  The corresponding result element is set to
+all ones if the two argument elements are less than or equal and all zeros
+otherwise.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c
new file mode 100644
index 000..081817b4ba3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c
@@ -0,0 +1,127 @@
+/* { 

rs6000, built-in cleanup patch series

2024-02-20 Thread Carl Love
GCC maintainers:

The following series of patches cleanup some of the rs6000 built-in support.  
Some of the first patches fix errors in the definition of a few of the 
built-ins.  The built-ins are supposed to have unsigned arguments but are 
listed as signed.  Some of the built-ins are supposed to return unsigned values 
but were defined to return a signed value.

There are a number of built-ins that are not documented but are duplicates of 
other documented built-ins.  The duplicate definitions are removed so users 
will only use the supported documented built-ins.

There are a number of the built-ins that are not documented in either the Power 
Vector Intrinsic Reference manual or in the gcc/doc/extend.texi file.  The 
patch adds the missing documentation as needed.  

Also most of the built-ins do not have test cases.  The patch adds test cases 
for the various built-ins.

Carl 


Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-31 Thread Carl Love
Segher:

On Tue, 2023-10-31 at 11:17 -0500, Segher Boessenkool wrote:


> 
> You could use gcov to see which rs6000 builtins are not exercised by
> anything in the testsuite, maybe.  This probably can be automated
> pretty
> nicely.

I will take a look at gcov.  I just did some relatively simple scripts
to go look for test cases.  For the non-overloaded built-ins, the
scrips had to exclude built-ins referenced by the overloaded built-ins.

This patch is just the first of a series of patches that I am working
on to try and clean up the built-in stuff per some comments in a PR. 
The internal LTC issue is
 
https://github.ibm.com/ltc-toolchain/power-gcc/issues/1288

The goal is to make sure there are test cases and documentation for all
of the overloaded and non overloaded built-in definitions.  Just a low
priority project to fill any spare cycles.  :-)

  Carl 




Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-31 Thread Carl Love
On Tue, 2023-10-31 at 10:34 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/10/31 08:08, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch adds tests for two of the rs6000 overloaded
> > built-
> > ins that do not have tests.  Additionally the GCC documentation
> > file
> 
> I just found that actually they have the test coverage, because we
> have
> 
> #define __builtin_bcdcmpeq(a,b)   __builtin_vec_bcdsub_eq(a,b,0)
> #define __builtin_bcdcmpgt(a,b)   __builtin_vec_bcdsub_gt(a,b,0)
> #define __builtin_bcdcmplt(a,b)   __builtin_vec_bcdsub_lt(a,b,0)
> #define __builtin_bcdcmpge(a,b)   __builtin_vec_bcdsub_ge(a,b,0)
> #define __builtin_bcdcmple(a,b)   __builtin_vec_bcdsub_le(a,b,0)
> 
> in altivec.h and gcc/testsuite/gcc.target/powerpc/bcd-4.c tests all
> these

OK, my simple scripts are not going to pickup the stuff in altivec.h. 
They were just grepping for the built-in name in the test file
directory.

> __builtin_bcdcmp* ...
> 
> > doc/extend.texi is updated to include the built-in definitions as
> > they
> > were missing.
> 
> ... since we already document __builtin_vec_bcdsub_{eq,gt,lt}, I
> think
> it's still good to supplement the documentation and add the explicit
> testing cases.
> 
> > The patch has been tested on a Power 10 system with no
> > regressions. 
> > Please let me know if this patch is acceptable for mainline.
> > 
> >  Carl
> > 
> > ---
> > rs6000, Add missing overloaded bcd builtin tests
> > 
> > The two BCD overloaded built-ins __builtin_bcdsub_ge and
> > __builtin_bcdsub_le
> > do not have a corresponding test.  Add tests to existing test file
> > and update
> > the documentation with the built-in definitions.
> 
> As above, this commit log doesn't describe the actuality well, please
> update
> it with something like:
> 
> Currently we have the documentation for
> __builtin_vec_bcdsub_{eq,gt,lt} but
> not for __builtin_bcdsub_[gl]e, this patch is to supplement the
> descriptions
> for them.  Although they are mainly for __builtin_bcdcmp{ge,le}, we
> already
> have some testing coverage for __builtin_vec_bcdsub_{eq,gt,lt}, this
> patch
> adds the corresponding explicit test cases as well.
> 

OK, replaced the commit log with the suggestion.

> > gcc/ChangeLog:
> > * doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge):
> > Add
> > documentation for the builti-ins.
> > 
> > gcc/testsuite/ChangeLog:
> > * bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins
> > __builtin_bcdsub_ge and __builtin_bcdsub_le).
> 
> 1) Unexpected ")" at the end.
> 
> 2) I supposed git gcc-verify would complain on this changelog entry.
> 
> Should be starting with:
> 
>   * gcc.target/powerpc/bcd-3.c (
> 
> , no?
> 

Yes, I ment to run the commit check but obviously got distracted and
didn't.  Sorry about that.  

> OK for trunk with the above comments addressed, thanks!
> 
OK, thanks.

Carl 

> BR,
> Kewen
> 
> > ---
> >  gcc/doc/extend.texi  |  4 
> >  gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22
> > +-
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index cf0d0c63cce..fa7402813e7 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned
> > char, vector unsigned char, const int);
> >  vector __int128 __builtin_bcdsub (vector __int128, vector
> > __int128, const int);
> >  vector unsigned char __builtin_bcdsub (vector unsigned char,
> > vector unsigned char,
> > const int);
> > +int __builtin_bcdsub_le (vector __int128, vector __int128, const
> > int);
> > +int __builtin_bcdsub_le (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_lt (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_lt (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_eq (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_eq (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_gt (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_gt (vector unsigned char, vector unsigned
> > char, const int);
> > +int __builtin_bcdsub_ge (vector __int128, ve

[PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-30 Thread Carl Love
GCC maintainers:

The following patch adds tests for two of the rs6000 overloaded built-
ins that do not have tests.  Additionally the GCC documentation file
doc/extend.texi is updated to include the built-in definitions as they
were missing.

The patch has been tested on a Power 10 system with no regressions. 
Please let me know if this patch is acceptable for mainline.

 Carl

---
rs6000, Add missing overloaded bcd builtin tests

The two BCD overloaded built-ins __builtin_bcdsub_ge and __builtin_bcdsub_le
do not have a corresponding test.  Add tests to existing test file and update
the documentation with the built-in definitions.

gcc/ChangeLog:
* doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge): Add
documentation for the builti-ins.

gcc/testsuite/ChangeLog:
* bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins
__builtin_bcdsub_ge and __builtin_bcdsub_le).
---
 gcc/doc/extend.texi  |  4 
 gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cf0d0c63cce..fa7402813e7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned char, vector 
unsigned char, const int);
 vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const int);
 vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned 
char,
const int);
+int __builtin_bcdsub_le (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_lt (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_eq (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_gt (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const 
int);
+int __builtin_bcdsub_ge (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_ov (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const 
int);
 @end smallexample
diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-3.c 
b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
index 7948a0c95e2..9891f4ff08e 100644
--- a/gcc/testsuite/gcc.target/powerpc/bcd-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
@@ -3,7 +3,7 @@
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O2" } */
 /* { dg-final { scan-assembler-times "bcdadd\[.\] " 4 } } */
-/* { dg-final { scan-assembler-times "bcdsub\[.\] " 4 } } */
+/* { dg-final { scan-assembler-times "bcdsub\[.\] " 6 } } */
 /* { dg-final { scan-assembler-not   "bl __builtin"   } } */
 /* { dg-final { scan-assembler-not   "mtvsr" } } */
 /* { dg-final { scan-assembler-not   "mfvsr" } } */
@@ -93,6 +93,26 @@ do_sub_gt (vector_128_t a, vector_128_t b, int *p)
   return ret;
 }
 
+vector_128_t
+do_sub_ge (vector_128_t a, vector_128_t b, int *p)
+{
+  vector_128_t ret = __builtin_bcdsub (a, b, 0);
+  if (__builtin_bcdsub_ge (a, b, 0))
+*p = 1;
+
+  return ret;
+}
+
+vector_128_t
+do_sub_le (vector_128_t a, vector_128_t b, int *p)
+{
+  vector_128_t ret = __builtin_bcdsub (a, b, 0);
+  if (__builtin_bcdsub_le (a, b, 0))
+*p = 1;
+
+  return ret;
+}
+
 vector_128_t
 do_sub_ov (vector_128_t a, vector_128_t b, int *p)
 {
-- 
2.37.2




Re: [PATCH ver 4] rs6000, add overloaded DFP quantize support

2023-08-29 Thread Carl Love via Gcc-patches
Kewen:

On Tue, 2023-08-29 at 16:54 +0800, Kewen.Lin wrote:
> >   The following functions require @option{-mhard-float},
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr93448.c
> > b/gcc/testsuite/gcc.target/powerpc/pr93448.c
> > new file mode 100644
> > index 000..f9c388585d7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr93448.c
> > @@ -0,0 +1,200 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target  dfp_hw} */
> > +/* { dg-require-effective-target  has_arch_pwr6} */
> 
> Sorry, I didn't catch this in the previous reviews.
> "dfp_hw" and "has_arch_pwr6" don't have the expected
> space after, without the space, the checkings would
> be useless and this case can fail.  So they should be:
> 
> /* { dg-require-effective-target dfp_hw } */
> /* { dg-require-effective-target has_arch_pwr6 } */
> 
> Okay for trunk with this fixed, thanks!

OK, I take it the parsing of the lines by the test scripts will fail
without the space since it can't parse it correctly.  Thanks for
letting me know.  Here is the fixed up code.  Note, I added the space
before the "}" and removed the extra space before dfp_hw and
has_arch_pwr6. 

get/powerpc/pr93448.c   
new file mode 100644
index 000..6b800f8d63d  
--- /dev/null   
+++ b/gcc/testsuite/gcc.target/powerpc/pr93448.c
@@ -0,0 +1,200 @@   
+/* { dg-do run } */
+/* { dg-require-effective-target dfp_hw } */   
+/* { dg-require-effective-target has_arch_pwr6 } */
+/* { dg-options "-mhard-float -O2 -save-temps" } */
+   
+/* Test the decimal floating point quantize built-ins.  */ 

I will go ahead and commit the patch.  Thanks for all your help.

  Carl 



[PATCH ver 4] rs6000, add overloaded DFP quantize support

2023-08-28 Thread Carl Love via Gcc-patches


GCC maintainers:

Version 4, additional define_insn name fix.  Change Log fix for the
UNSPEC_DQUAN.  Retested patch on Power 10 LE.

Version 3, fixed the built-in instance names.  Missed removing the "n"
the name.  Added the tighter constraints on the predicates for the
define_insn.  Updated the wording for the built-ins in the
documentation file.  Changed the test file name again.  Updated the
ChangeLog file, added the PR target line.  Retested the patch on Power
10LE and Power 8 and Power 9.

Version 2, renamed the built-in instances.  Changed the name of the
overloaded built-in.  Added the missing documentation for the new
built-ins.  Fixed typos.  Changed name of the test.  Updated the
effective target for the test.  Retested the patch on Power 10LE and
Power 8 and Power 9.

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

     Carl Love



rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md (UNSPEC_DQUAN): New unspec.
(dfp_dqua_, dfp_dquai_): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448
---
 gcc/config/rs6000/dfp.md   |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def  |  15 ++
 gcc/config/rs6000/rs6000-overload.def  |  10 ++
 gcc/doc/extend.texi|  17 ++
 gcc/testsuite/gcc.target/powerpc/pr93448.c | 200 +
 5 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..bf4a227b0eb 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@ (define_c_enum "unspec"
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@ (define_insn "dfp_dscri_"
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_dqua_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_dquai_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "s5bit_cint_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins

Re: [PATCH ver 3] rs6000, add overloaded DFP quantize support

2023-08-28 Thread Carl Love via Gcc-patches
On Mon, 2023-08-28 at 10:21 +0800, Kewen.Lin wrote:
> Hi Carl,



> > 
> > A testcase is added for the new built-in definitions.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/dfp.md: New UNSPEC_DQUAN.
> 
> Nit: (UNSPEC_DQUAN): New unspec.

Fixed.

> 



> > +(define_insn "dfp_dqua_"
> > +  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
> > +(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
> > + (match_operand:DDTD 2 "gpc_reg_operand" "d")
> > + (match_operand:SI 3 "const_0_to_3_operand" "n")]
> > + UNSPEC_DQUAN))]
> > +  "TARGET_DFP"
> > +  "dqua %0,%1,%2,%3"
> > +  [(set_attr "type" "dfp")
> > +   (set_attr "size" "")])
> > +
> > +(define_insn "dfp_dqua_i"
> 
> Sorry for nitpicking, but what I suggested previously was
> "dfp_dquai_"
> instead of "dfp_dqua_i", "dquai" matches the according mnemonic so
> it's
> read better, i expands to "idd" and "itd" that look odd to me.
> Do you agree "dquai" is better?  If yes, the changelog and the
> related
> expanders need to be updated as well.
> 
> The others look good to me, thanks!

We need to get it right, so don't be sorry for nitpicking.  My bad for
not getting it right the first time.

Fixed.


Carl 



[PATCH ver 3] rs6000, add overloaded DFP quantize support

2023-08-24 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 3, fixed the built-in instance names.  Missed removing the "n"
the name.  Added the tighter constraints on the predicates for the
define_insn.  Updated the wording for the built-ins in the
documentation file.  Changed the test file name again.  Updated the
ChangeLog file, added the PR target line.  Retested the patch on Power
10LE and Power 8 and Power 9.

Version 2, renamed the built-in instances.  Changed the name of the
overloaded built-in.  Added the missing documentation for the new
built-ins.  Fixed typos.  Changed name of the test.  Updated the
effective target for the test.  Retested the patch on Power 10LE and
Power 8 and Power 9.

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

     Carl Love


---
rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPEC_DQUAN.
(dfp_dqua_, dfp_dqua_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448
---
 gcc/config/rs6000/dfp.md   |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def  |  15 ++
 gcc/config/rs6000/rs6000-overload.def  |  10 ++
 gcc/doc/extend.texi|  17 ++
 gcc/testsuite/gcc.target/powerpc/pr93448.c | 200 +
 5 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..052dc0946d3 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@ (define_c_enum "unspec"
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@ (define_insn "dfp_dscri_"
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_dqua_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_dqua_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "s5bit_cint_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 8a294d6c934..81a0de88b9c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/confi

Re: [PATCH ver 2] rs6000, add overloaded DFP quantize support

2023-08-24 Thread Carl Love via Gcc-patches
Kewen, Peter:

> on 2023/8/17 08:19, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 2, renamed the built-in instances.  Changed the name of the
> > overloaded built-in.  Added the missing documentation for the new
> > built-ins.  Fixed typos.  Changed name of the test.  Updated the
> > effective target for the test.  Retested the patch on Power 10LE
> > and
> > Power 8 and Power 9.
> > 
> > The following patch adds four built-ins for the decimal floating
> point
> > (DFP) quantize instructions on rs6000.  The built-ins are for 64-
> > bit
> > and 128-bit DFP operands.
> > 
> > The patch also adds a test case for the new builtins.
> > 
> > The Patch has been tested on Power 10LE and Power 9 LE/BE.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >  Carl Love
> > 
> > 
> > 
> > --
> > [PATCH] rs6000, add overloaded DFP quantize support
> > 
> > Add decimal floating point (DFP) quantize built-ins for both 64-bit
> DFP
> > and 128-DFP operands.  In each case, there is an immediate version
> and a
> > variable version of the built-in.  The RM value is a 2-bit constant
> int
> > which specifies the rounding mode to use.  For the immediate
> > versions
> of
> > the built-in, the TE field is a 5-bit constant that specifies the
> value of
> > the ideal exponent for the result.  The built-in specifications
> > are:
> > 
> >   __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
> > const int RM)
> >   __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
> > const int)
> >   __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
> >  const int RM)
> >   __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
> >  const int)
> 
> Nit: Add the parameter name "RM" for all instances, otherwise the
> readers
> might feel confused what do the other two without RM mean. :)

Yes, they all should have the parameter name RM.  Fixed.

> 
> > A testcase is added for the new built-in definitions.
> 
> Nit: A PR marker line like:
> 
>   PR target/93448
> 
> > gcc/ChangeLog:
> > * config/rs6000/dfp.md: New UNSPECDQUAN.
> > (dfp_quan_, dfp_quan_i): New define_insn.
> > * config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
> > __builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
> > __builtin_dfp_quantize_128i): New buit-in definitions.
> > * config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
> > __builtin_dfpq_quantize): New overloaded definitions.
> 
> These entries need updates with this new revision, also miss one
> entry
Fixed with the new names, added the documentation entry.

> for documentation update.
> 
> > gcc/testsuite/
> >  * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
> > case.
> 
> Ditto, inconsistent name.

Fixed with the new name of the file, pr93448.c.

> 
> > ---
> >  gcc/config/rs6000/dfp.md  |  25 ++-
> >  gcc/config/rs6000/rs6000-builtins.def |  15 ++
> >  gcc/config/rs6000/rs6000-overload.def |  10 +
> >  gcc/doc/extend.texi   |  15 ++
> >  .../gcc.target/powerpc/pr93448-dfp-quantize.c | 199
> ++
> >  5 files changed, 263 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448-dfp-
> quantize.c
> > diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> > index 5ed8a73ac51..abd21c5db75 100644
> > --- a/gcc/config/rs6000/dfp.md
> > +++ b/gcc/config/rs6000/dfp.md
> > @@ -271,7 +271,8 @@
> > UNSPEC_DIEX
> > UNSPEC_DSCLI
> > UNSPEC_DTSTSFI
> > -   UNSPEC_DSCRI])
> > +   UNSPEC_DSCRI
> > +   UNSPEC_DQUAN])
> >  
> >  (define_code_iterator DFP_TEST [eq lt gt unordered])
> >  
> > @@ -395,3 +396,25 @@
> >"dscri %0,%1,%2"
> >[(set_attr "type" "dfp")
> > (set_attr "size" "")])
> > +
> > +(define_insn "dfp_dquan_"
> 
> I guess I mentioned this previously, I prefer "dfp_dqua_"
> which aligns with the most others ...

Yes, I missed that I had the extra "n" and didn't fix that part of the
name.  Sorry about that.  Updated both define_insn definitions.

> 
> &

[PATCH ver 2] rs6000, add overloaded DFP quantize support

2023-08-16 Thread Carl Love via Gcc-patches


GCC maintainers:

Version 2, renamed the built-in instances.  Changed the name of the
overloaded built-in.  Added the missing documentation for the new
built-ins.  Fixed typos.  Changed name of the test.  Updated the
effective target for the test.  Retested the patch on Power 10LE and
Power 8 and Power 9.

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love



--
[PATCH] rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPECDQUAN.
(dfp_quan_, dfp_quan_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
__builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
__builtin_dfp_quantize_128i): New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
__builtin_dfpq_quantize): New overloaded definitions.

gcc/testsuite/
 * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
case.
---
 gcc/config/rs6000/dfp.md  |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++
 gcc/config/rs6000/rs6000-overload.def |  10 +
 gcc/doc/extend.texi   |  15 ++
 .../gcc.target/powerpc/pr93448-dfp-quantize.c | 199 ++
 5 files changed, 263 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448-dfp-quantize.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..abd21c5db75 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_dquan_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:QI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_dquan_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "const_int_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 8a294d6c934..a7ab90771f9 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_dqua (_Decimal64, _Decimal64, \
+  const int<2>);
+DFPQUAN_64 dfp_dquan_dd {}
+
+  const _Decimal64 __builtin_dfp_dquai (const int<5>, _Decimal64, \
+   const int<2>

[PATCH] rs6000, add overloaded DFP quantize support

2023-08-09 Thread Carl Love via Gcc-patches


GCC maintainers:

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love


--
rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the bult-in.  The RM value is a 2-bit const int which
specifies the rounding mode to use.  For the immediate versions of the
built-in, TE field is a 5-bit constant that specifies the value of the
ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int)
  __Decimal128 builtin_dfpq_quantize (_Decimal128, _Decimal128,
  const int RM)
  __Decimal128 builtin_dfpq_quantize (const int TE, _Decimal128,
  const int)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPECDQUAN.
(dfp_quan_, dfp_quan_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
__builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
__builtin_dfp_quantize_128i): New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
__builtin_dfpq_quantize): New overloaded definitions.

gcc/testsuite/
 * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
case.
---
 gcc/config/rs6000/dfp.md  |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 .../powerpc/builtin-dfp-quantize-runnable.c   | 198 ++
 4 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin-dfp-quantize-runnable.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..254c22a5c20 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_quan_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+  (match_operand:QI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_quan_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "const_int_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+  (match_operand:SI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 35c4cdf74c5..36a56311643 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_quantize_64 (_Decimal64, _Decimal64, \
+ const int<2>);
+DFPQUAN_64 dfp_quan_dd {}
+
+  const _Decimal64 __builtin_dfp_quantize_64i (const int<5>, _Decimal64, \
+const int<2>);
+DFPQUAN_64i dfp_quan_idd {}
+
+  const _Decimal128 __builtin_dfp_quantize_128 (_Decimal128, _Decimal128, \
+ const int<2>);
+DFPQUAN_128 dfp_quan_td {}
+
+  const _Decimal128 __builtin_dfp_quantize_128i (const int<5>, _Decimal128, \
+

Re: [PATCH ver 3] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-09 Thread Carl Love via Gcc-patches
Kewen:

On Wed, 2023-08-09 at 16:47 +0800, Kewen.Lin wrote:


> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> 
> Okay for trunk with two nits below fixed, thanks!

Thanks for all the help with the patch.  Fixed the nits below, compiled
and reran the test cases to make sure everything was OK.  Will go ahead
and commit the patch.
> 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh,
> > vcmpnew):
> > Move definitions to Altivec stanza.
> > * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> > define_expand.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> > * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> > execute_test_functions) moved to vec-cmpne.h.  Added
> > scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> 
>   s/ moved/: Move/ => "... execute_test_functions): Move "
>   
> s/Added/Add/

Fixed both issues.

> 



> >  
> > +;; Expand for builtin vcmpne{b,h,w}
> > +(define_expand "altivec_vcmpne_"
> > +  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand"
> > "=v")
> > +   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1
> > "altivec_register_operand" "v")
> > + (match_operand:VSX_EXTRACT_I 2
> > "altivec_register_operand" "v")))
> > +   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand"
> > "=v")
> > +(not:VSX_EXTRACT_I (match_dup 3)))]
> > +  "TARGET_ALTIVEC"
> > +  {
> > +operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
> > +  });
> 
> Nit: Useless ";".

removed semicolon.

   Carl 



[PATCH ver 3] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-07 Thread Carl Love via Gcc-patches


GCC maintainers:

Ver 3: Updated description to make it clear the patch fixes the
confusion on the availability of the builtins.  Fixed the dg-require-
effective-target on the test cases and the dg-options.  Change the test
case so the for loop for the test will not be unrolled.  Fixed a
spelling error in a vec-cmpne.c comment.  Retested on Power 10LE.

Ver 2:  Re-worked the test vec-cmpne.c to create a compile only test
verify the instruction generation and a runnable test to verify the
built-in functionality.  Retested the patch on Power 8 LE/BE, Power
9LE/BE and Power 10 LE with no regressions.

The following patch cleans up the definition for the
__builtin_altivec_vcmpne{b,h,w}.  The current implementation implies
that the built-in is only supported on Power 9 since it is defined
under the Power 9 stanza.  However the built-in has no ISA restrictions
as stated in the Power Vector Intrinsic Programming Reference document.
The current built-in works because the built-in gets replaced during
GIMPLE folding by a simple not-equal operator so it doesn't get
expanded and checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.  

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 



rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins.  This implies they are only
supported on Power 9 and above when in fact they are defined and work with
Altivec as well with the appropriate Altivec instruction generation.

The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction with
Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
processors.

This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors.  The patch
removes the confusion as to which processors support the vcmpequ{b,h,w}
instructions.

There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.

Test vec-cmpne.c is updated to check the generation of the vcmpequ{b,h,w}
instructions for Altivec.  A new test vec-cmpne-runnable.c is added to
verify the built-ins work as expected.

Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew):
Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
* gcc.target/powerpc/vec-cmpne.c (define_test_functions,
execute_test_functions) moved to vec-cmpne.h.  Added
scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
* gcc.target/powerpc/vec-cmpne.h: New include file for vec-cmpne.c
and vec-cmpne-runnable.c. Split define_test_functions definition
into define_test_functions and define_init_verify_functions.
---
 gcc/config/rs6000/altivec.md  |  12 ++
 gcc/config/rs6000/rs6000-builtins.def |  18 +--
 .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 112 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h  |  90 ++
 5 files changed, 156 insertions(+), 112 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad1224e0b57..31f65aa1b7a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
   "vcmpequq. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_"
+  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
"altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 
"altivec_register_operand" "v")))
+   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+(not:VSX_EXTRACT_I (match_dup 3)))]
+  "TARGET_ALTIVEC"
+  {
+operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  });
+
 (define_insn "*altivec_vcmpgts_p"
   [(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1 

Re: [PATCH v2] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-07 Thread Carl Love via Gcc-patches
On Mon, 2023-08-07 at 17:18 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> Sorry for the late review.
> 
> on 2023/8/2 02:29, Carl Love wrote:
> > GCC maintainers:
> > 
> > Ver 2:  Re-worked the test vec-cmpne.c to create a compile only
> > test
> > verify the instruction generation and a runnable test to verify the
> > built-in functionality.  Retested the patch on Power 8 LE/BE, Power
> > 9LE/BE and Power 10 LE with no regressions.
> > 
> > The following patch cleans up the definition for the
> > __builtin_altivec_vcmpne{b,h,w}.  The current implementation
> > implies
> > that the built-in is only supported on Power 9 since it is defined
> > under the Power 9 stanza.  However the built-in has no ISA
> > restrictions
> > as stated in the Power Vector Intrinsic Programming Reference
> > document.
> > The current built-in works because the built-in gets replaced
> > during
> > GIMPLE folding by a simple not-equal operator so it doesn't get
> > expanded and checked for Power 9 code generation.
> > 
> > This patch moves the definition to the Altivec stanza in the built-
> > in
> > definition file to make it clear the built-ins are valid for Power
> > 8,
> > Power 9 and beyond.  
> > 
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE with no regressions.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >   Carl 
> > 
> > 
> > rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
> > 
> > The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are
> > defined
> > under the Power 9 section of r66000-builtins.  This implies they
> > are only
> > supported on Power 9 and above when in fact they are defined and
> > work with
> > Altivec as well with the appropriate Altivec instruction
> > generation.
> > 
> > The vec_cmpne builtin should generate the vcmpequ{b,h,w}
> > instruction with
> > Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
> > processors.
> > 
> > This patch moves the definitions to the Altivec stanza to make it
> > clear
> > the built-ins are supported for all Altivec processors.  The patch
> > enables the vcmpequ{b,h,w} instruction to be generated on Altivec
> > and
> > the vcmpne{b,h,w} instruction to be generated on Power 9 and
> > beyond.
> 
> But as you noted above, the current built-ins work as expected, that
> is
> to generate with vcmpequ{b,h,w} on altivec but not Power9 while
> generate
> with vcmpne{b,h,w} on Power9.  So I think we shouldn't say it's
> enabled
> by this patch.  Instead it's to remove the confusion.

OK, changed.
> 
> > There is existing test coverage for the vec_cmpne built-in for
> > vector bool char, vector bool short, vector bool int,
> > vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> > Coverage for vector signed int, vector unsigned int is in
> > p8vector-builtin-2.c.
> > 
> > Test vec-cmpne.c is updated to check the generation of the
> > vcmpequ{b,h,w}
> > instructions for Altivec.  A new test vec-cmpne-runnable.c is added
> > to
> > verify the built-ins work as expected.
> > 
> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> > 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh,
> > vcmpnew):
> > Move definitions to Altivec stanza.
> > * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> > define_expand.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> > * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> > execute_test_functions) moved to vec-cmpne.h.  Added
> > scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> > * gcc.target/powerpc/vec-cmpne.h: New include file for vec-
> > cmpne.c
> > and vec-cmpne-runnable.c. Split define_test_functions
> > definition
> > into define_test_functions and define_init_verify_functions.
> > ---
> >  gcc/config/rs6000/altivec.md  |  12 ++
> >  gcc/config/rs6000/rs6000-builtins.def |  18 +--
> >  .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++
> >  gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 110 ++--
> > --
> >  gcc/testsuite/gcc.targe

[PATCH v2] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-01 Thread Carl Love via Gcc-patches


GCC maintainers:

Ver 2:  Re-worked the test vec-cmpne.c to create a compile only test
verify the instruction generation and a runnable test to verify the
built-in functionality.  Retested the patch on Power 8 LE/BE, Power 9LE/BE and 
Power 10 LE with no regressions.

The following patch cleans up the definition for the
__builtin_altivec_vcmpne{b,h,w}.  The current implementation implies
that the built-in is only supported on Power 9 since it is defined
under the Power 9 stanza.  However the built-in has no ISA restrictions
as stated in the Power Vector Intrinsic Programming Reference document.
The current built-in works because the built-in gets replaced during
GIMPLE folding by a simple not-equal operator so it doesn't get
expanded and checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.  

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 


rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins.  This implies they are only
supported on Power 9 and above when in fact they are defined and work with
Altivec as well with the appropriate Altivec instruction generation.

The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction with
Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
processors.

This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors.  The patch
enables the vcmpequ{b,h,w} instruction to be generated on Altivec and
the vcmpne{b,h,w} instruction to be generated on Power 9 and beyond.

There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.

Test vec-cmpne.c is updated to check the generation of the vcmpequ{b,h,w}
instructions for Altivec.  A new test vec-cmpne-runnable.c is added to
verify the built-ins work as expected.

Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew):
Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
* gcc.target/powerpc/vec-cmpne.c (define_test_functions,
execute_test_functions) moved to vec-cmpne.h.  Added
scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
* gcc.target/powerpc/vec-cmpne.h: New include file for vec-cmpne.c
and vec-cmpne-runnable.c. Split define_test_functions definition
into define_test_functions and define_init_verify_functions.
---
 gcc/config/rs6000/altivec.md  |  12 ++
 gcc/config/rs6000/rs6000-builtins.def |  18 +--
 .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 110 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h  |  86 ++
 5 files changed, 151 insertions(+), 111 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad1224e0b57..31f65aa1b7a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
   "vcmpequq. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_"
+  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
"altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 
"altivec_register_operand" "v")))
+   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+(not:VSX_EXTRACT_I (match_dup 3)))]
+  "TARGET_ALTIVEC"
+  {
+operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  });
+
 (define_insn "*altivec_vcmpgts_p"
   [(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..6b06fa8b34d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -641,6 +641,15 @@
   const int 

Re: [PATCH] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-01 Thread Carl Love via Gcc-patches
Kewen:

On Mon, 2023-07-31 at 14:53 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/28 23:00, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch cleans up the definition for the
> > __builtin_altivec_vcmpnet.  The current implementation implies that
> > the
> 
> s/__builtin_altivec_vcmpnet/__builtin_altivec_vcmpne[bhw]/

OK, updated in email for version 2. 

> 
> > built-in is only supported on Power 9 since it is defined under the
> > Power 9 stanza.  However the built-in has no ISA restrictions as
> > stated
> > in the Power Vector Intrinsic Programming Reference document. The
> > current built-in works because the built-in gets replaced during
> > GIMPLE
> > folding by a simple not-equal operator so it doesn't get expanded
> > and
> > checked for Power 9 code generation.
> > 
> > This patch moves the definition to the Altivec stanza in the built-
> > in
> > definition file to make it clear the built-ins are valid for Power
> > 8,
> > Power 9 and beyond.  
> > 
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE with no regressions.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >   Carl 
> > 
> > --
> > rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
> > 
> > The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are
> > defined
> > under the Power 9 section of r66000-builtins.  This implies they
> > are only
> > supported on Power 9 and above when in fact they are defined and
> > work on
> > Power 8 as well with the appropriate Power 8 instruction
> > generation.
> 
> Nit: It's confusing to say Power8 only, it's actually supported once
> altivec
> is enabled, so I think it's more clear to replace Power8 with altivec
> here.

OK, replaced Power 8 with Altivec here and for additional instances of
Power 8 below.

> 
> > The vec_cmpne builtin should generate the vcmpequ{b,h,w}
> > instruction on
> > Power 8 and generate the vcmpne{b,h,w} on Power 9 an newer
> > processors.
> 
> 
> Ditto for Power8 and "an" -> "and"?

Fixed, fixed.

> 
> > This patch moves the definitions to the Altivec stanza to make it
> > clear
> > the built-ins are supported for all Altivec processors.  The patch
> > enables the vcmpequ{b,h,w} instruction to be generated on Power 8
> > and
> > the vcmpne{b,h,w} instruction to be generated on Power 9 and
> > beyond.
> 
> Ditto for Power8.

fixed

> 
> > There is existing test coverage for the vec_cmpne built-in for
> > vector bool char, vector bool short, vector bool int,
> > vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> > Coverage for vector signed int, vector unsigned int is in
> > p8vector-builtin-2.c.
> 
> So there is no coverage with the basic altivec support.  I noticed
> we have one test case "gcc/testsuite/gcc.target/powerpc/vec-cmpne.c"
> which is a test case for running but with vsx_ok, I think we can
> rewrite it with altivec (vmx), either separating to compiling and
> running case, or adding -save-temp and check expected insns.

I looked at just adding -save-temp and scan-assembler-times for the
instructions.  I noticed that vcmpequw occurs 30 times in the functions
to initialize and test the results.  So, I opted to create a separate
compile/check instructions test and a runnable test to verify the
functionality.  This way any changes in the code to calculate and
verify the results will not break the instruction generation checks.

> 
> Coverage for unsigned long long int and long long int
> > for Power 10 in int_128bit-runnable.c.

Removed comment about Power 10, long long int testing.

> > 
> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> > 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew.
> > vcmpnet): Move definitions to Altivec stanza.
> 
> vcmpnet which isn't handled in this patch should be removed.

Removed.
 
 Carl 



[PATCH] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-07-28 Thread Carl Love via Gcc-patches
GCC maintainers:

The following patch cleans up the definition for the
__builtin_altivec_vcmpnet.  The current implementation implies that the
built-in is only supported on Power 9 since it is defined under the
Power 9 stanza.  However the built-in has no ISA restrictions as stated
in the Power Vector Intrinsic Programming Reference document. The
current built-in works because the built-in gets replaced during GIMPLE
folding by a simple not-equal operator so it doesn't get expanded and
checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.  

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 

--
rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins.  This implies they are only
supported on Power 9 and above when in fact they are defined and work on
Power 8 as well with the appropriate Power 8 instruction generation.

The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction on
Power 8 and generate the vcmpne{b,h,w} on Power 9 an newer processors.

This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors.  The patch
enables the vcmpequ{b,h,w} instruction to be generated on Power 8 and
the vcmpne{b,h,w} instruction to be generated on Power 9 and beyond.

There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.  Coverage for unsigned long long int and long long int
for Power 10 in int_128bit-runnable.c.

Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew.
vcmpnet): Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.
---
 gcc/config/rs6000/altivec.md  | 12 
 gcc/config/rs6000/rs6000-builtins.def | 18 +-
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad1224e0b57..31f65aa1b7a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
   "vcmpequq. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_"
+  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
"altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 
"altivec_register_operand" "v")))
+   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+(not:VSX_EXTRACT_I (match_dup 3)))]
+  "TARGET_ALTIVEC"
+  {
+operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  });
+
 (define_insn "*altivec_vcmpgts_p"
   [(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..6b06fa8b34d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -641,6 +641,15 @@
   const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi);
 VCMPGTUW_P vector_gtu_v4si_p {pred}
 
+  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
+VCMPNEB altivec_vcmpne_v16qi {}
+
+  const vss __builtin_altivec_vcmpneh (vss, vss);
+VCMPNEH altivec_vcmpne_v8hi {}
+
+  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
+VCMPNEW altivec_vcmpne_v4si {}
+
   const vsi __builtin_altivec_vctsxs (vf, const int<5>);
 VCTSXS altivec_vctsxs {}
 
@@ -2599,9 +2608,6 @@
   const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
 VCMPAEW_P vector_ae_v4si_p {}
 
-  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
-VCMPNEB vcmpneb {}
-
   const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
 VCMPNEB_P vector_ne_v16qi_p {}
 
@@ -2614,15 +2620,9 @@
   const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
 VCMPNEFP_P vector_ne_v4sf_p {}
 
-  const vss __builtin_altivec_vcmpneh (vss, vss);
-VCMPNEH vcmpneh {}
-
   const signed int __builtin_altivec_vcmpneh_p (vss, vss);
 VCMPNEH_P vector_ne_v8hi_p {}
 
-  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
-VCMPNEW vcmpnew {}
-
   const signed int __builtin_altivec_vcmpnew_p (vsi, vsi);
 VCMPNEW_P vector_ne_v4si_p {}
 
-- 

Re: [PATCH 2/2 ver 5] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 5, Fixed patch description, the first argument should be of
type vector.  Fixed comment in vsx.md to say "Vector and scalar
extract_elt iterator/attr ".  Removed a few of the changes in
version 4.  Specifically, reverted the names of REPLACE_ELT_V_sh back
to REPLACE_ELT_sh and REPLACE_ELT_V_max back to REPLACE_ELT_V_max. 
Combined the REPLACE_ELT_char and REPLACE_ELT_V_char mode attributes
into REPLACE_ELT_char.  Put the "dg-do link" directive back into the
vec-replace-word-runnable_1.c test file.  The patch was tested with the
updated patch 1 in the series on Power 8 LE/BE, Power 9 LE/BE and Power
10 with no regressions.

Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case statement
rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was changed
to REPLACE_ELT_V along with the associated define_mode_attr.  Renamed
VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
runnable_1.c to be consistent with the other tests.  Removed the "dg-
do 
link" from both tests.  Put in an explicit cast in test vec-replace-
word-runnable_2.c to eliminate the need for the -flax-vector-
conversions dg-option.

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned
was
   implemented with the same function prototypes as
vec_replace_elt.  
   It was intended that vec_replace_unaligned always specify output
   vectors as having type vector unsigned char, to emphasize that 
   elements are potentially misaligned by this built-in function.  
   This patch corrects the misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 






rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
of type vector unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: Rename
__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
VREPLACE_UN_SF, VREPLACE_UN_V4SI 

[PATCH 1/2 ver 2] rs6000, add argument to function find_instance

2023-07-21 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 2:  Updated a number of formatting and spacing issues.   Added
the NARGS description to the header comment for function find_instance.
This patch was tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

The rs6000 function find_instance assumes that it is called for built-
ins with only two arguments.  There is no checking for the actual
number of aruguments used in the built-in.  This patch adds an
additional parameter to the function call containing the number of
aruguments in the built-in.  The function will now do the needed checks
for all of the arguments.

This fix is needed for the next patch in the series that fixes the
vec_replace_unaligned built-in.c test.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 




-
rs6000, add argument to function find_instance

The function find_instance assumes it is called to check a built-in with
only two arguments.  This patch extends the function by adding a parameter
specifying the number of built-in arguments to check.

gcc/ChangeLog:
* config/rs6000/rs6000-c.cc (find_instance): Add new parameter that
specifies the number of built-in arguments to check.
(altivec_resolve_overloaded_builtin): Update calls to find_instance
to pass the number of built-in arguments to be checked.
---
 gcc/config/rs6000/rs6000-c.cc | 40 +++
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index a353bca19ef..de35490de42 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1668,18 +1668,20 @@ resolve_vec_step (resolution *res, vec 
*arglist, unsigned nargs)
 /* Look for a matching instance in a chain of instances.  INSTANCE points to
the chain of instances; INSTANCE_CODE is the code identifying the specific
built-in being searched for; FCODE is the overloaded function code; TYPES
-   contains an array of two types that must match the types of the instance's
-   parameters; and ARGS contains an array of two arguments to be passed to
-   the instance.  If found, resolve the built-in and return it, unless the
-   built-in is not supported in context.  In that case, set
-   UNSUPPORTED_BUILTIN to true.  If we don't match, return error_mark_node
-   and leave UNSUPPORTED_BUILTIN alone.  */
+   contains an array of NARGS types that must match the types of the
+   instance's parameters; ARGS contains an array of NARGS arguments to be
+   passed to the instance; and NARGS is the number of built-in arguments to
+   check.  If found, resolve the built-in and return it, unless the built-in
+   is not supported in context.  In that case, set UNSUPPORTED_BUILTIN to
+   true.  If we don't match, return error_mark_node and leave
+   UNSUPPORTED_BUILTIN alone.
+*/
 
 tree
 find_instance (bool *unsupported_builtin, ovlddata **instance,
   rs6000_gen_builtins instance_code,
   rs6000_gen_builtins fcode,
-  tree *types, tree *args)
+  tree *types, tree *args, int nargs)
 {
   while (*instance && (*instance)->bifid != instance_code)
 *instance = (*instance)->next;
@@ -1691,17 +1693,27 @@ find_instance (bool *unsupported_builtin, ovlddata 
**instance,
   if (!inst->fntype)
 return error_mark_node;
   tree fntype = rs6000_builtin_info[inst->bifid].fntype;
-  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
-  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+  tree argtype = TYPE_ARG_TYPES (fntype);
+  bool args_compatible = true;
 
-  if (rs6000_builtin_type_compatible (types[0], parmtype0)
-  && rs6000_builtin_type_compatible (types[1], parmtype1))
+  for (int i = 0; i < nargs; i++)
+{
+  tree parmtype = TREE_VALUE (argtype);
+  if (!rs6000_builtin_type_compatible (types[i], parmtype))
+   {
+ args_compatible = false;
+ break;
+   }
+  argtype = TREE_CHAIN (argtype);
+}
+
+  if (args_compatible)
 {
   if (rs6000_builtin_decl (inst->bifid, false) != error_mark_node
  && rs6000_builtin_is_supported (inst->bifid))
{
  tree ret_type = TREE_TYPE (inst->fntype);
- return altivec_build_resolved_builtin (args, 2, fntype, ret_type,
+ return altivec_build_resolved_builtin (args, nargs, fntype, ret_type,
 inst->bifid, fcode);
}
   else
@@ -1921,7 +1933,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  instance_code = RS6000_BIF_CMPB_32;
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
@@ -1958,7 +1970,7 @@ 

[PATCH 0/2 ver 2] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches
GCC maintianers:

Version 2.  Both patches have been updated the first patch was approved
with minor issues to be fixed.  I will post the updated version as
version 2 for completeness of the series.  There were a few changes
with the second patch as well.  The second patch has not been approved
yet.  The updated version of the second patch is version 5 with the
requested changes made.  The two patches were tested together on Power
8 LE/BE, Power 9 LE/BE and Power 10 LE with no regressions.

In the process of fixing the powerpc/vec-replace-word-runnable.c test I
found there is an existing issue with function find_instance in rs6000-
c.cc.  Per the review comments from Kewen in

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624401.html

The fix for function find_instance was put into a separate patch
followed by a patch for the vec-replace-word-runnable.c test fixes.

The two patches have been tested on Power 10 LE with no regression
failures.

   Carl



Re: [PATCH 2/2 ver 4] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches
On Fri, 2023-07-21 at 13:04 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/18 03:20, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case
> > statement
> > rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was
> > changed
> > to REPLACE_ELT_V along with the associated
> > define_mode_attr.  Renamed
> > VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
> > REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
> > runnable_1.c to be consistent with the other tests.  Removed the
> > "dg-do 
> > link" from both tests.  Put in an explicit cast in test vec-
> > replace-word-runnable_2.c to eliminate the need for the -flax-
> > vector-conversions dg-option.
> > 
> > Version 3, added code to altivec_resolve_overloaded_builtin so the
> > correct instruction is selected for the size of the second
> > argument. 
> > This restores the instruction counts to the original values where
> > the
> > correct instructions were originally being generated.  The naming
> > of
> > the overloaded builtin instances and builtin definitions were
> > changed
> > to reflect the type of the second argument since the type of the
> > first
> > argument is now the same for all overloaded instances.  A new
> > builtin
> > test file was added for the case where the first argument is cast
> > to
> > the unsigned long long type.  This test requires the -flax-vector-
> > conversions gcc command line option.  Since the other tests do not
> > require this option, I felt that the new test needed to be in a
> > separate file.  Finally some formatting fixes were made in the
> > original
> > test file.  Patch has been retested on Power 10 with no
> > regressions.
> > 
> > Version 2, fixed various typos.  Updated the change log body to say
> > the
> > instruction counts were updated.  The instruction counts changed as
> > a
> > result of changing the first argument of the vec_replace_unaligned
> > builtin call from vector unsigned long long (vull) to vector
> > unsigned
> > char (vuc).  When the first argument was vull the builtin call
> > generated the vinsd instruction for the two test cases.  The
> > updated
> > call with vuc as the first argument generates two vinsw
> > instructions
> > instead.  Patch was retested on Power 10 with no regressions.
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> >commit ed3fea09b18f67e757b5768b42cb6e816626f1db
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:07:17 2022 -0600
> > 
> >rs6000: Correct function prototypes for
> > vec_replace_unaligned
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned was
> >implemented with the same function prototypes as
> > vec_replace_elt.  
> >It was intended that vec_replace_unaligned always specify
> > output
> >vectors as having type vector unsigned char, to emphasize
> > that 
> >elements are potentially misaligned by this built-in
> > function.  
> >This patch corrects the misimplementation.
> > 
> > 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > 
> > 
> > rs6000, fix vec_replace_unaligned built-in arguments
> > 
> > The first argument of the vec_replace_unaligned built-in should
> > always be
> > of type unsigned char, as specified in gcc/doc/extend.texi.
> 
> Shouldn't be "vector unsigned char" instead of "unsigned char"?
> 
> Or do I miss something?

Nope, I missed saying "vector".  Fixed.

> 
> > This patch fixes the builtin definitions and updates the test cases
> > to use
> > the correct arguments.  The original test file is renamed and a
> > second test
> > file is added for a new test case.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def: Rename
&g

Re: [PATCH 1/2] rs6000, add argument to function find_instance

2023-07-21 Thread Carl Love via Gcc-patches
On Fri, 2023-07-21 at 10:19 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/18 03:19, Carl Love wrote:
> > GCC maintainers:
> > 
> > The rs6000 function find_instance assumes that it is called for
> > built-
> > ins with only two arguments.  There is no checking for the actual
> > number of aruguments used in the built-in.  This patch adds an
> > additional parameter to the function call containing the number of
> > aruguments in the built-in.  The function will now do the needed
> > checks
> > for all of the arguments.
> > 
> > This fix is needed for the next patch in the series that fixes the
> > vec_replace_unaligned built-in.c test.
> > 
> > Please let me know if this patch is acceptable for
> > mainline.  Thanks.
> > 
> > Carl 
> > 
> > 
> > 
> > rs6000, add argument to function find_instance
> > 
> > The function find_instance assumes it is called to check a built-
> > in  with   

Fixed
> >   ~~ two spaces.
> > only two arguments.  Ths patch extends the function by adding a
> > parameter
>s/Ths/This/
> > specifying the number of buit-in arguments to check.
>   s/bult-in/built-in/
> 
Fixed both typos.

> > gcc/ChangeLog:
> > * config/rs6000/rs6000-c.cc (find_instance): Add new parameter
> > that
> > specifies the number of built-in arguments to check.
> > (altivec_resolve_overloaded_builtin): Update calls to
> > find_instance
> > to pass the number of built-in argument to be checked.
> 
> s/argument/arguments/
fixed
> 
> > ---
> >  gcc/config/rs6000/rs6000-c.cc | 27 +++
> >  1 file changed, 19 insertions(+), 8 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index a353bca19ef..350987b851b 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1679,7 +1679,7 @@ tree
> 
> There is one function comment here describing the meaning of each
> parameter,
> I think we should add a corresponding for NARGS, may be something
> like:
> 
> "; and NARGS specifies the number of built-in arguments."
> 
Added NARGS description.

> Also we need to update the below "two"s with "NARGS".
> 
> "TYPES contains an array of two types..." and "ARGS contains an array
> of two arguments..."
> 

Replaced multiple "two" occurrences with NARGS.

> since we already extend this to handle NARGS instead of two.
> 
> >  find_instance (bool *unsupported_builtin, ovlddata **instance,
> >rs6000_gen_builtins instance_code,
> >rs6000_gen_builtins fcode,
> > -  tree *types, tree *args)
> > +  tree *types, tree *args, int nargs)
> >  {
> >while (*instance && (*instance)->bifid != instance_code)
> >  *instance = (*instance)->next;
> > @@ -1691,17 +1691,28 @@ find_instance (bool *unsupported_builtin,
> > ovlddata **instance,
> >if (!inst->fntype)
> >  return error_mark_node;
> >tree fntype = rs6000_builtin_info[inst->bifid].fntype;
> > -  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
> > -  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES
> > (fntype)));
> > +  tree argtype = TYPE_ARG_TYPES (fntype);
> > +  tree parmtype;
> 
> Nit: We can move "tree parmtype" into the loop (close to its only
> use).

Moved and combined declaration with assignment as you noted below.

> 
> > +  int args_compatible = true;
> 
> s/int/bool/
Changed.

> 
> >  
> > -  if (rs6000_builtin_type_compatible (types[0], parmtype0)
> > -  && rs6000_builtin_type_compatible (types[1], parmtype1))
> > +  for (int i = 0; i  
> Nit: formatting issue, space before nargs.
> 
> >  {
> > +  parmtype = TREE_VALUE (argtype);
> 
>  tree parmtype = TREE_VALUE (argtype);

Changed

> 
> > +  if (! rs6000_builtin_type_compatible (types[i], parmtype))
> 
> Nit: One unexpected(?) space after "!".

Removed extra space after "!".
> 
> > +   {
> > + args_compatible = false;
> > + break;
> > +   }
> > +  argtype = TREE_CHAIN (argtype);
> > +}
> > +
> > +  if (args_compatible)
> > +  {
> 
> Nit: indent issue for "{".
Fixed indent.

> 
> Ok for trunk with these nits fixed.  Btw, the description doesn't say
> how this was tested, I'm not sure if it's only tested together with
> "patch 2/2", but please ensure it's bootstrapped and regress-tested
> on BE and LE when committing.  Thanks!
> 

Yes, it was tested with patch 2/2 on Power 10 LE.  I did do a test on
Power 9 as well but don't recall if I tested for both BE and LE.  Will
retest on Power 8 LE/BE, Power 9 LE/BE and Power 10.

 Carl



Re: rs6000: Fix expected counts powerpc/p9-vec-length-full

2023-07-18 Thread Carl Love via Gcc-patches
Ping

On Thu, 2023-06-01 at 16:11 -0700, Carl Love wrote:
> GCC maintainers:
> 
> The following patch updates the expected instruction counts in four
> tests.  The counts in all of the tests changed with commit
> f574e2dfae79055f16d0c63cc12df24815d8ead6.  
> 
> The updated counts have been verified on both Power 9 and Power 10.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>   Carl 
> 
> 
> rs6000: Fix expected counts powerpc/p9-vec-length-full tests
> 
> The counts for instructions lxvl and stxvl in tests:
> 
>   p9-vec-length-full-1.c
>   p9-vec-length-full-2.c
>   p9-vec-length-full-6.c
>   p9-vec-length-full-7.c
> 
> changed with commit:
> 
>commit f574e2dfae79055f16d0c63cc12df24815d8ead6
>Author: Ju-Zhe Zhong 
>Date:   Thu May 25 22:42:35 2023 +0800
> 
>  VECT: Add decrement IV iteration loop control by variable amount
> support
> 
>  This patch is supporting decrement IV by following the flow
> designed by
>  Richard:
>...
> 
> The expected counts for lxvl changed from 20 to 40 and the counts for
> stxvl
> changed from 10 to 20 in the first three tests.  The number of stxvl
> instructions changed from 12 to 20 in p9-vec-length-full-7.c.  This
> patch updates the number of expected instructions in the four tests.
> 
> The counts have been verified on Power 9 and Power 10.
> ---
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c | 4 ++--
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c | 4 ++--
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c | 4 ++--
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c | 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
> index f01f1c54fa5..5e4f34421d3 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
> @@ -12,5 +12,5 @@
>  /* { dg-final { scan-assembler-not   {\mstxv\M} } } */
>  /* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
>  /* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 40 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
> index f546e97fa7d..c7d927382c3 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
> @@ -12,5 +12,5 @@
>  /* { dg-final { scan-assembler-not   {\mstxv\M} } } */
>  /* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
>  /* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 40 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> index 65ddf2b098a..f3be3842c62 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> @@ -11,5 +11,5 @@
>  /* It can use normal vector load for constant vector load.  */
>  /* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */
>  /* 64bit/32bit pairs won't use partial vectors.  */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> index e0e51d9a972..da086f1826a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> @@ -12,4 +12,4 @@
> 
>  /* Each type has one stxvl excepting for int8 and uint8, that have
> two due to
> rtl pass bbro duplicating the block which has one stxvl.  */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 12 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */



[PATCH 2/2 ver 4] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case statement
rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was changed
to REPLACE_ELT_V along with the associated define_mode_attr.  Renamed
VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
runnable_1.c to be consistent with the other tests.  Removed the "dg-do 
link" from both tests.  Put in an explicit cast in test 
vec-replace-word-runnable_2.c to eliminate the need for the 
-flax-vector-conversions dg-option.

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned was
   implemented with the same function prototypes as vec_replace_elt.  
   It was intended that vec_replace_unaligned always specify output
   vectors as having type vector unsigned char, to emphasize that 
   elements are potentially misaligned by this built-in function.  
   This patch corrects the misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 



rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
of type unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: Rename
__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
* config/rs6000/rs6000-c.cc (find_instance): Add case
RS6000_OVLD_VEC_REPLACE_UN.
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.  Rename VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
VREPLACE_UN_DI, 

[PATCH 1/2] rs6000, add argument to function find_instance

2023-07-17 Thread Carl Love via Gcc-patches


GCC maintainers:

The rs6000 function find_instance assumes that it is called for built-
ins with only two arguments.  There is no checking for the actual
number of aruguments used in the built-in.  This patch adds an
additional parameter to the function call containing the number of
aruguments in the built-in.  The function will now do the needed checks
for all of the arguments.

This fix is needed for the next patch in the series that fixes the
vec_replace_unaligned built-in.c test.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 



rs6000, add argument to function find_instance

The function find_instance assumes it is called to check a built-in  with
only two arguments.  Ths patch extends the function by adding a parameter
specifying the number of buit-in arguments to check.

gcc/ChangeLog:
* config/rs6000/rs6000-c.cc (find_instance): Add new parameter that
specifies the number of built-in arguments to check.
(altivec_resolve_overloaded_builtin): Update calls to find_instance
to pass the number of built-in argument to be checked.
---
 gcc/config/rs6000/rs6000-c.cc | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index a353bca19ef..350987b851b 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1679,7 +1679,7 @@ tree
 find_instance (bool *unsupported_builtin, ovlddata **instance,
   rs6000_gen_builtins instance_code,
   rs6000_gen_builtins fcode,
-  tree *types, tree *args)
+  tree *types, tree *args, int nargs)
 {
   while (*instance && (*instance)->bifid != instance_code)
 *instance = (*instance)->next;
@@ -1691,17 +1691,28 @@ find_instance (bool *unsupported_builtin, ovlddata 
**instance,
   if (!inst->fntype)
 return error_mark_node;
   tree fntype = rs6000_builtin_info[inst->bifid].fntype;
-  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
-  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+  tree argtype = TYPE_ARG_TYPES (fntype);
+  tree parmtype;
+  int args_compatible = true;
 
-  if (rs6000_builtin_type_compatible (types[0], parmtype0)
-  && rs6000_builtin_type_compatible (types[1], parmtype1))
+  for (int i = 0; i bifid, false) != error_mark_node
  && rs6000_builtin_is_supported (inst->bifid))
{
  tree ret_type = TREE_TYPE (inst->fntype);
- return altivec_build_resolved_builtin (args, 2, fntype, ret_type,
+ return altivec_build_resolved_builtin (args, nargs, fntype, ret_type,
 inst->bifid, fcode);
}
   else
@@ -1921,7 +1932,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  instance_code = RS6000_BIF_CMPB_32;
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
@@ -1958,7 +1969,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  }
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
-- 
2.37.2




[PATCH 0/2] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches


GCC maintianers:

In the process of fixing the powerpc/vec-replace-word-runnable.c test I
found there is an existing issue with function find_instance in rs6000-
c.cc.  Per the review comments from Kewen in

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624401.html

The fix for function find_instance was put into a separate patch
followed by a patch for the vec-replace-word-runnable.c test fixes.

The two patches have been tested on Power 10 LE with no regression
failures.

   Carl



Re: [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches
On Thu, 2023-07-13 at 17:41 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/8 04:18, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 3, added code to altivec_resolve_overloaded_builtin so the
> > correct instruction is selected for the size of the second
> > argument. 
> > This restores the instruction counts to the original values where
> > the
> > correct instructions were originally being generated.  The naming
> > of
> 
> Nice, I have some comments inlined below.
> 
> > the overloaded builtin instances and builtin definitions were
> > changed
> > to reflect the type of the second argument since the type of the
> > first
> > argument is now the same for all overloaded instances.  A new
> > builtin
> > test file was added for the case where the first argument is cast
> > to
> > the unsigned long long type.  This test requires the -flax-vector-
> > conversions gcc command line option.  Since the other tests do not
> > require this option, I felt that the new test needed to be in a
> > separate file.  Finally some formatting fixes were made in the
> > original
> > test file.  Patch has been retested on Power 10 with no
> > regressions.
> > 
> > Version 2, fixed various typos.  Updated the change log body to say
> > the
> > instruction counts were updated.  The instruction counts changed as
> > a
> > result of changing the first argument of the vec_replace_unaligned
> > builtin call from vector unsigned long long (vull) to vector
> > unsigned
> > char (vuc).  When the first argument was vull the builtin call
> > generated the vinsd instruction for the two test cases.  The
> > updated
> > call with vuc as the first argument generates two vinsw
> > instructions
> > instead.  Patch was retested on Power 10 with no regressions.
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> >commit ed3fea09b18f67e757b5768b42cb6e816626f1db
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:07:17 2022 -0600
> > 
> >rs6000: Correct function prototypes for
> > vec_replace_unaligned
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned
> > was
> >implemented with the same function prototypes as
> > vec_replace_elt.  It was
> >intended that vec_replace_unaligned always specify output
> > vectors as having
> >type vector unsigned char, to emphasize that elements are
> > potentially
> >misaligned by this built-in function.  This patch corrects
> > the
> >misimplementation.
> > 
> > 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > --
> > rs6000, fix vec_replace_unaligned built-in arguments
> > 
> > The first argument of the vec_replace_unaligned built-in should
> > always be
> > unsigned char, as specified in gcc/doc/extend.texi.
> 
> Maybe "be with type vector unsigned char"?

Changed to 

  The first argument of the vec_replace_unaligned built-in should
always be of type unsigned char, 

> 
> > This patch fixes the builtin definitions and updates the test cases
> > to use
> > the correct arguments.  The original test file is renamed and a
> > second test
> > file is added for a new test case.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def: Rename
> > __builtin_altivec_vreplace_un_uv2di as
> > __builtin_altivec_vreplace_un_udi
> > __builtin_altivec_vreplace_un_uv4si as
> > __builtin_altivec_vreplace_un_usi
> > __builtin_altivec_vreplace_un_v2df as
> > __builtin_altivec_vreplace_un_df
> > __builtin_altivec_vreplace_un_v2di as
> > __builtin_altivec_vreplace_un_di
> > __builtin_altivec_vreplace_un_v4sf as
> > __builtin_altivec_vreplace_un_sf
> > __builtin_altivec_vreplace_un_v4si as
> > __builtin_altivec_vreplace_un_si.
> > Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI
> > as
> > VREP

[PATCH ver4] rs6000, Add return value to __builtin_set_fpscr_rn

2023-07-11 Thread Carl Love via Gcc-patches
GCC maintainers:

Ver 4, Removed extra space in subject line.  Added comment to commit
log comments about new __SET_FPSCR_RN_RETURNS_FPSCR__ define.  Changed
Added to Add and Renamed to Rename in ChangeLog.  Updated define_expand
"rs6000_set_fpscr_rn" per Peter's comments to use new temporary
register for output value.  Also, comments from Kewen about moving rtx
tmp_di1 close to use.  Renamed tmp_di2 as orig_df_in_di.  Additionally,
changed the name of tmp_di3 to tmp_di2 so the numbering is
sequential.  Moved the new rtx tmp_di2 = gen_reg_rtx (DImode); right
before its use to be consistent with previous move request.  Fixed tabs
in comment.  Remove -std=c99 from test_fpscr_rn_builtin_1.c. Cleaned up
comment and removed abort from test_fpscr_rn_builtin_2.c.  

Fixed a couple of additional issues with the ChangeLog per feedback
from git gcc-verify.

Retested updated patch on Power 8, 9 and 10 to verify changes.

Ver 3, Renamed the patch per comments on ver 2.  Previous subject line
was " [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value".  
Fixed spelling mistakes and formatting.  Updated define_expand
"rs6000_set_fpscr_rn to have the rs6000_get_fpscr_fields and
rs6000_update_fpscr_rn_field define expands inlined.  Optimized the
code and fixed use of temporary register values. Updated the test file
dg-do run arguments and dg-options.  Removed the check for
__SET_FPSCR_RN_RETURNS_FPSCR__. Removed additional references to the
overloaded built-in with double argument.  Fixed up the documentation
file.  Updated patch retested on Power 8 BE/LE, Power 9 BE/LE and Power
10 LE.

Ver 2,  Went back thru the requirements and emails.  Not sure where I
came up with the requirement for an overloaded version with double
argument.  Removed the overloaded version with the double argument. 
Added the macro to announce if the __builtin_set_fpscr_rn returns a
void or a double with the FPSCR bits.  Updated the documentation file. 
Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
file.  Per request, the original test file functionality was not
changed.  Just changed the name from test_fpscr_rn_builtin.c to 
test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
new test file, test_fpscr_rn_builtin_2.c.

The GLibC team requested a builtin to replace the mffscrn and
mffscrniinline asm instructions in the GLibC code.  Previously there
was discussion on adding builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 

-
rs6000, Add return value to __builtin_set_fpscr_rn

Change the return value from void to double for __builtin_set_fpscr_rn.
The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI,
RN bit positions.  A new test file, test powerpc/test_fpscr_rn_builtin_2.c,
is added to test the new return value for the built-in.

The value 

Re: [PATCH ver3] rs6000, Add return value to __builtin_set_fpscr_rn

2023-07-11 Thread Carl Love via Gcc-patches
On Tue, 2023-07-11 at 13:54 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> Excepting for Peter's review comments, some nits are inline below.
> 
> on 2023/7/11 03:18, Carl Love wrote:
> > GCC maintainers:
> > 
> > 
> > 




> > -
> > rs6000, Add return value  to __builtin_set_fpscr_rn
> 
> Nit: One more unexpected space.

OK, removed

> 
> > Change the return value from void to double for
> > __builtin_set_fpscr_rn.
> > The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE,
> > XE, NI,
> > RN bit positions.  A new test file, test
> > powerpc/test_fpscr_rn_builtin_2.c,
> > is added to test the new return value for the built-in.
> 
> Nit: It would be better to note the newly added
> __SET_FPSCR_RN_RETURNS_FPSCR__
> in commit log as well.

Added a comment as requested.

> 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn):
> > Update
> > built-in definition return type.
> > * config/rs6000-c.cc (rs6000_target_modify_macros): Add check,
> > define __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> > * config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Added return
> 
> Nit: s/Added/Add/

Changed.

> 
> > argument to return FPSCR fields.
> > * doc/extend.texi (__builtin_set_fpscr_rn): Update description
> > for
> > the return value.  Add description for
> > __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> > 
> > gcc/testsuite/ChangeLog:
> > gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
> > test_fpscr_rn_builtin_1.c.  Added comment.
> 
> Nit: s/Added/Add/ and s/Renamed/Rename/.

Changed.

> 
> > gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
> > return value of __builtin_set_fpscr_rn builtin.
> > ---
> > 



> > -  if (CONST_INT_P (operands[0]))
> > +  /* Emulate the behavior of the mffscrni, mffscrn instructions
> > for earlier
> > + ISAs.  Return bits 29:31 (DRN) and bits 56:63 (VE, OE, UE,
> > ZE, XE, NI,
> > + RN) from the FPSCR.  Set the RN field based on the value in
> > operands[1].
> > +  */
> > +
> > +  /* Get the current FPSCR fields, bits 29:31 (DRN) and bits 56:63
> > (VE, OE, UE,
> > +  ZE, XE, NI, RN) from the FPSCR and return them.  */
> > +  rtx tmp_di1 = gen_reg_rtx (DImode);
> 
> Nit: This line is preferred to be move to below (a), close to its
> use.

OK, moved the statement. 
> 
> > +
> > +  emit_insn (gen_rs6000_mffs (tmp_df));
> > +  rtx tmp_di2 = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> 
> Nit: May be good to rename this tmp_di2 as orig_df_in_di, hope it can
> offer better readablity when people read the code below with its use.

OK, changed the name.  Then changed the name of tmp_di3 to tmp_di2 so
the numbering is sequential.  Moved the new rtx tmp_di2 = gen_reg_rtx
(DImode); right before its use.

> 
> ... (a)
> 
> > +  emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT
> > (0x000700FFULL)));
> > +  rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di1, DImode, 0);
> > +  emit_move_insn (operands[0], tmp_rtn);
> > +
> > +  if (CONST_INT_P (operands[1]))
> >  {
> > -  if ((INTVAL (operands[0]) & 0x1) == 0x1)
> > +  if ((INTVAL (operands[1]) & 0x1) == 0x1)
> > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (31)));
> >else
> > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (31)));
> >  
> > -  if ((INTVAL (operands[0]) & 0x2) == 0x2)
> > +  if ((INTVAL (operands[1]) & 0x2) == 0x2)
> > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (30)));
> >else
> > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (30)));
> > @@ -6476,23 +6493,20 @@
> >else
> >  {
> >rtx tmp_rn = gen_reg_rtx (DImode);
> > -  rtx tmp_di = gen_reg_rtx (DImode);
> >  
> >/* Extract new RN mode from operand.  */
> > -  rtx op0 = convert_to_mode (DImode, operands[0], false);
> > -  emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3)));
> > +  rtx op1 = convert_to_mode (DImode, operands[1], false);
> > +  emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3)));
> >  
> > -  /* Insert new RN mode into FSCPR.  */
> > -  emit_insn (gen_rs6000_mffs (tmp_df));
> > -  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> > -  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4)));
> > -  emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));
> > +  /* Insert the new RN value from tmp_rn into FPSCR bit

Re: [PATCH ver3] rs6000, Add return value to __builtin_set_fpscr_rn

2023-07-10 Thread Carl Love via Gcc-patches
Peter:


On Mon, 2023-07-10 at 16:57 -0500, Peter Bergner wrote:
> On 7/10/23 2:18 PM, Carl Love wrote:
> > +  /* Get the current FPSCR fields, bits 29:31 (DRN) and bits 56:63
> > (VE, OE, UE,
> > +  ZE, XE, NI, RN) from the FPSCR and return them.  */
> 
> The 'Z' above should line up directly under the 'G' in Get.

Yup.  Fixed.

> 
> 
> > -  /* Insert new RN mode into FSCPR.  */
> > -  emit_insn (gen_rs6000_mffs (tmp_df));
> > -  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> > -  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4)));
> > -  emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));
> > +  /* Insert the new RN value from tmp_rn into FPSCR bit
> > [62:63].  */
> > +  emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (-4)));
> > +  emit_insn (gen_iordi3 (tmp_di1, tmp_di1, tmp_rn));
> 
> This is an expander, so you shouldn't reuse temporaries as multiple
> destination pseudos, since that limits the register allocator's
> freedom.
> I know the old code did it, but since you're changing the line, you
> might as well use a new temp.

OK, wasn't aware that reusing temps was an issue for the register
allocator.  Thanks for letting me know.  So, I think you want something
like:
   
  rtx tmp_rn = gen_reg_rtx (DImode);
  rtx tmp_di3 = gen_reg_rtx (DImode);

  /* Extract new RN mode from operand.  */
  rtx op1 = convert_to_mode (DImode, operands[1], false);
  emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3)));

  /* Insert the new RN value from tmp_rn into FPSCR bit [62:63].  */
  emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (-4)));
  emit_insn (gen_iordi3 (tmp_di3, tmp_di1, tmp_rn));

  /* Need to write to field k=15.  The fields are [0:15].  Hence with
 L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W).  FLM is an
 8-bit field[0:7]. Need to set the bit that corresponds to the
 value of i that you want [0:7].  */
  tmp_df = simplify_gen_subreg (DFmode, tmp_di3, DImode, 0);

where each destination is a unique register.  Then let the register
allocator can decide if it wants to use the same register or not at
code generation time.

I made the change and did a quick check compiling on Power 10 with
mcpu=power[8,9,10] and it worked fine. I will run the full regression
on each of the processor types just to be sure.

  Carl 



[PATCH ver3] rs6000, Add return value to __builtin_set_fpscr_rn

2023-07-10 Thread Carl Love via Gcc-patches


GCC maintainers:

Ver 3, Renamed the patch per comments on ver 2.  Previous subject line
was " [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value".  
Fixed spelling mistakes and formatting.  Updated define_expand
"rs6000_set_fpscr_rn to have the rs6000_get_fpscr_fields and
rs6000_update_fpscr_rn_field define expands inlined.  Optimized the
code and fixed use of temporary register values. Updated the test file
dg-do run arguments and dg-options.  Removed the check for
__SET_FPSCR_RN_RETURNS_FPSCR__. Removed additional references to the
overloaded built-in with double argument.  Fixed up the documentation
file.  Updated patch retested on Power 8 BE/LE, Power 9 BE/LE and Power
10 LE.

Ver 2,  Went back thru the requirements and emails.  Not sure where I
came up with the requirement for an overloaded version with double
argument.  Removed the overloaded version with the double argument. 
Added the macro to announce if the __builtin_set_fpscr_rn returns a
void or a double with the FPSCR bits.  Updated the documentation file. 
Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
file.  Per request, the original test file functionality was not
changed.  Just changed the name from test_fpscr_rn_builtin.c to 
test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
new test file, test_fpscr_rn_builtin_2.c.

The GLibC team requested a builtin to replace the mffscrn and
mffscrniinline asm instructions in the GLibC code.  Previously there
was discussion on adding builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 



-
rs6000, Add return value  to __builtin_set_fpscr_rn

Change the return value from void to double for __builtin_set_fpscr_rn.
The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI,
RN bit positions.  A new test file, test powerpc/test_fpscr_rn_builtin_2.c,
is added to test the new return value for the built-in.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
built-in definition return type.
* config/rs6000-c.cc (rs6000_target_modify_macros): Add check,
define __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
* config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Added return
argument to return FPSCR fields.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value.  Add description for
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
test_fpscr_rn_builtin_1.c.  Added comment.
gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
return value of __builtin_set_fpscr_rn builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   2 +-
 gcc/config/rs6000/rs6000-c.cc 

Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-10 Thread Carl Love via Gcc-patches
On Fri, 2023-07-07 at 12:06 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> Some more minor comments are inline below on top of Peter's
> insightful
> review comments.
> 
> on 2023/7/1 08:58, Carl Love wrote:
> > GCC maintainers:
> > 
> > Ver 2,  Went back thru the requirements and emails.  Not sure where
> > I
> > came up with the requirement for an overloaded version with double
> > argument.  Removed the overloaded version with the double
> > argument. 
> > Added the macro to announce if the __builtin_set_fpscr_rn returns a
> > void or a double with the FPSCR bits.  Updated the documentation
> > file. 
> > Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the
> > test
> > file.  Per request, the original test file functionality was not
> > changed.  Just changed the name from test_fpscr_rn_builtin.c to 
> > test_fpscr_rn_builtin_1.c.  Put new tests for the return values
> > into a
> > new test file, test_fpscr_rn_builtin_2.c.
> > 
> > The GLibC team requested a builtin to replace the mffscrn and
> > mffscrniinline asm instructions in the GLibC code.  Previously
> > there
> > was discussion on adding builtins for the mffscrn instructions.
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html
> > 
> > In the end, it was felt that it would be to extend the existing
> > __builtin_set_fpscr_rn builtin to return a double instead of a void
> > type.  The desire is that we could have the functionality of the
> > mffscrn and mffscrni instructions on older ISAs.  The two
> > instructions
> > were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has
> > the
> > needed functionality to set the RN field using the mffscrn and
> > mffscrni
> > instructions if ISA 3.0 is supported or fall back to using logical
> > instructions to mask and set the bits for earlier ISAs.  The
> > instructions return the current value of the FPSCR fields DRN, VE,
> > OE,
> > UE, ZE, XE, NI, RN bit positions then update the RN bit positions
> > with
> > the new RN value provided.
> > 
> > The current __builtin_set_fpscr_rn builtin has a return type of
> > void. 
> > So, changing the return type to double and returning the  FPSCR
> > fields
> > DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
> > functionally equivalent of the mffscrn and mffscrni
> > instructions.  Any
> > current uses of the builtin would just ignore the return value yet
> > any
> > new uses could use the return value.  So the requirement is for the
> > change to the __builtin_set_fpscr_rn builtin to be backwardly
> > compatible and work for all ISAs.
> > 
> > The following patch changes the return type of the
> >  __builtin_set_fpscr_rn builtin from void to double.  The return
> > value
> > is the current value of the various FPSCR fields DRN, VE, OE, UE,
> > ZE,
> > XE, NI, RN bit positions when the builtin is called.  The builtin
> > then
> > updated the RN field with the new value provided as an argument to
> > the
> > builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c
> > to
> > check that the builtin returns the current value of the FPSCR
> > fields
> > and then updates the RN field.
> > 
> > The GLibC team has reviewed the patch to make sure it met their
> > needs
> > as a drop in replacement for the inline asm mffscr and mffscrni
> > statements in the GLibC code.  T
> > 
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >Carl 
> > 
> > 
> > --
> > rs6000, __builtin_set_fpscr_rn add retrun value
> > 
> > Change the return value from void to double.  The return value
> > consists of
> > the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit
> > positions.  Add an
> > overloaded version which accepts a double argument.
> > 
> > The test powerpc/test_fpscr_rn_builtin.c is updated to add tests
> > for the
> > double reterun value and the new double argument.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn):
> > Update
> > builtin definition return type.
> > * config/rs6000-c.cc(rs6000_target_modify_macros): Add check,
> > define
> > __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> > * config/rs6000/rs6000.m

Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-10 Thread Carl Love via Gcc-patches
On Thu, 2023-07-06 at 17:54 -0500, Peter Bergner wrote:
> On 6/30/23 7:58 PM, Carl Love via Gcc-patches wrote:
> > rs6000, __builtin_set_fpscr_rn add retrun value
> 
> s/retrun/return/
> 
> Maybe better written as:
> 
> rs6000: Add return value to __builtin_set_fpscr_rn

Changed subject, fixed misspelling.
> 
> 
> > Change the return value from void to double.  The return value
> > consists of
> > the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit
> > positions.  Add an
> > overloaded version which accepts a double argument.
> 
> You're not adding an overloaded version anymore, so I think you can
> just
> remove the last sentence.

Yup, didn't get that removed when removing the overloaded instance. 
fixed.

> 
> 
> 
> > The test powerpc/test_fpscr_rn_builtin.c is updated to add tests
> > for the
> > double reterun value and the new double argument.
> 
> s/reterun/return/   ...and there is no double argument anymore, so
> that
> part can be removed.

Fixed.  Note, the new return value tests were moved to new test file.
> 
> 
> 
> > * config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
> > define_expand.
> 
> Too many '('.

fixed.

> 
> 
> 
> > (rs6000_set_fpscr_rn): Addedreturn argument.  Updated
> > to use new
> 
> Looks like a  after Added instead of a space.
> 
> 
> > rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
> >  _expands.
> 
> Don't split define_expand across two lines.

Fixed.

> 
> 
> 
> > * doc/extend.texi (__builtin_set_fpscr_rn): Update description
> > for
> > the return value and new double argument.  Add descripton for
> > __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> 
> s/descripton/description/

Fixed.

> 
> 
> 
> 
> 
> 
> > +  /* Tell the user the __builtin_set_fpscr_rn now returns the
> > FPSCR fields
> > + in a double.  Originally the builtin returned void.  */
> 
> Either:
>   1) s/Tell the user the __builtin_set_fpscr_rn/Tell the user
> __builtin_set_fpscr_rn/ 
>   2) s/the __builtin_set_fpscr_rn now/the __builtin_set_fpscr_rn
> built-in now/ 
> 
> 
> > +  if ((flags & OPTION_MASK_SOFT_FLOAT) == 0)
> > +  rs6000_define_or_undefine_macro (define_p,
> > "__SET_FPSCR_RN_RETURNS_FPSCR__");
> 
> This doesn't look like it's indented correctly.
> 
> 

Fixed indentation.

> 
> 
> > +(define_expand "rs6000_get_fpscr_fields"
> > + [(match_operand:DF 0 "gpc_reg_operand")]
> > +  "TARGET_HARD_FLOAT"
> > +{
> > +  /* Extract fields bits 29:31 (DRN) and bits 56:63 (VE, OE, UE,
> > ZE, XE, NI,
> > + RN) from the FPSCR and return them.  */
> > +  rtx tmp_df = gen_reg_rtx (DFmode);
> > +  rtx tmp_di = gen_reg_rtx (DImode);
> > +
> > +  emit_insn (gen_rs6000_mffs (tmp_df));
> > +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> > +  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT
> > (0x000700FFULL)));
> > +  rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> > +  emit_move_insn (operands[0], tmp_rtn);
> > +  DONE;
> > +})
> 
> This doesn't look correct.  You first set tmp_di to a new reg rtx but
> then
> throw that away with the return value of simplify_gen_subreg().  I'm
> guessing
> you want that tmp_di as a gen_reg_rtx for the destination of the
> gen_anddi3, so
> you probably want a different rtx for the subreg that feeds the
> gen_anddi3.

OK, fixed the use of the tmp values.  Note the define_expand was
inlined into define_expand "rs6000_set_fpscr_rn per comments from
Kewen.  Inlining allows the reuse some of the tmp values.

> 
> 
> 
> > +(define_expand "rs6000_update_fpscr_rn_field"
> > + [(match_operand:DI 0 "gpc_reg_operand")]
> > +  "TARGET_HARD_FLOAT"
> > +{
> > +  /* Insert the new RN value from operands[0] into FPSCR bit
> > [62:63].  */
> > +  rtx tmp_di = gen_reg_rtx (DImode);
> > +  rtx tmp_df = gen_reg_rtx (DFmode);
> > +
> > +  emit_insn (gen_rs6000_mffs (tmp_df));
> > +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> 
> Ditto.

Fixed.

> 
> 
> 
> 
> > +The @code{__builtin_set_fpscr_rn} builtin allows changing both of
> > the floating
> > +point rounding mode bits and returning the various FPSCR fields
> > before the RN
> > +field is updated.  The builtin returns a double consisting of the
> > initial value
> > +of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit
> > positions with all
> > +oth

[PATCH v5] rs6000: Update the vsx-vector-6.* tests.

2023-07-07 Thread Carl Love via Gcc-patches


GCC maintainers:

Ver 5. Removed -compile from the names of the compile only tests. Fixed
up the reference to the compile file names in the .h file headers. 
Replaced powerpc_vsx_ok with vsx_hw in the run test files.  Removed the
-save-temps from all files.  Retested on all of the various platforms
with no regressions.

Ver 4. Fixed a few typos.  Redid the tests to create separate run and
compile tests.

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl



-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector built-in tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the built-in argument
types and the functional correctness of each built-in.  There is also a
compile only test that verifies the built-ins generate the expected number
of instructions for the various built-in tests.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op-run.c   |  98 
 .../powerpc/vsx-vector-6-func-1op.c   |  22 ++
 .../powerpc/vsx-vector-6-func-1op.h   |  43 
 .../powerpc/vsx-vector-6-func-2lop-run.c  | 177 ++
 .../powerpc/vsx-vector-6-func-2lop.c  |  14 ++
 .../powerpc/vsx-vector-6-func-2lop.h  |  47 
 .../powerpc/vsx-vector-6-func-2op-run.c   |  96 
 .../powerpc/vsx-vector-6-func-2op.c   |  21 ++
 .../powerpc/vsx-vector-6-func-2op.h   |  42 
 .../powerpc/vsx-vector-6-func-3op-run.c   | 229 ++
 .../powerpc/vsx-vector-6-func-3op.c   |  17 ++
 .../powerpc/vsx-vector-6-func-3op.h   |  73 ++
 .../powerpc/vsx-vector-6-func-cmp-all-run.c   | 147 +++
 .../powerpc/vsx-vector-6-func-cmp-all.c   |  17 ++
 .../powerpc/vsx-vector-6-func-cmp-all.h   |  76 ++
 .../powerpc/vsx-vector-6-func-cmp-run.c   |  92 +++
 .../powerpc/vsx-vector-6-func-cmp.c   |  16 ++
 .../powerpc/vsx-vector-6-func-cmp.h   |  40 +++
 .../gcc.target/powerpc/vsx-vector-6.h | 154 
 

Re: [PATCH v4] rs6000: Update the vsx-vector-6.* tests.

2023-07-07 Thread Carl Love via Gcc-patches
On Fri, 2023-07-07 at 10:15 +0800, Kewen.Lin wrote:



> 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op-compile.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op-compile.c
> > new file mode 100644
> > index 000..6b7d73ed66c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op-
> > compile.c
> 
> Nit: Maybe remove "-compile" from the name as when there is "-run"
> variant people
> are easy to realize this is for compilation, the name without "-
> compile" seems
> more neat.  With this name change, you have to update the comment
> referring it in
> its related header file accordingly.  ("sed -i 's/-compile//g' vsx-
> vector-6-func-*.h"
> recommended, similar patterns could be used for the two other
> comments below.)

Changed the compile only file names as requested.  Updated the file
names in the .h files.  Updated the Change Log file names.
 
> 
> > @@ -0,0 +1,22 @@
> > +/* { dg-do compile { target lp64 } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -save-temps -mvsx" } */
> 
> Nit: We don't need "-save-temps" any more for all the test cases in
> this patch.
> 
Yup, -save-temps is on automatically for compile only and we are not
checking instructions in the run file.  Removed all of the -save-temp
directives.

> > +
> > +/* This file just generates calls to the various builtins and
> > verifies the
> > +   expected number of instructions for each builtin were
> > generated.  */
> > +
> > +#include "vsx-vector-6-func-1op.h"
> > +
> > +/* { dg-final { scan-assembler-times {\mxvabssp\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrspip\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrspim\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrspi\M} 1 } } */ 
> > +/* { dg-final { scan-assembler-times {\mxvrspic\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrspiz\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvabsdp\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpip\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpim\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpi\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpic\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpiz\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvsqrtdp\M} 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op-run.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op-
> > run.c
> > new file mode 100644
> > index 000..150e372e428
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op-run.c
> > @@ -0,0 +1,98 @@
> > +/* { dg-do run { target lp64 } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
> We need vsx_hw for those *-run.c cases instead, as powerpc_vsx_ok
> doesn't guarantee the test env can support vsx instructions, it just
> ensures it can be compiled.
> 
> /* { dg-require-effective-target vsx_hw } */
> 
> All "*-run.c" cases need changes.

Updated the run cases to use vsx_hw, removed powerpc_vsx_ok.

 Carl 



[PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-07 Thread Carl Love via Gcc-patches


GCC maintainers:

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned
was
   implemented with the same function prototypes as
vec_replace_elt.  It was
   intended that vec_replace_unaligned always specify output
vectors as having
   type vector unsigned char, to emphasize that elements are
potentially
   misaligned by this built-in function.  This patch corrects the
   misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 

--
rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: Rename
__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
* config/rs6000/rs6000-c.cc (find_instance): Add new argument
nargs.  Add nargs check.  Extend function to handle three arguments.
(altivec_resolve_overloaded_builtin): Add new argument nargs to
function calls.  Add case RS6000_OVLD_VEC_REPLACE_UN.
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.  Rename VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
VREPLACE_UN_DI, VREPLACE_UN_V4SF as VREPLACE_UN_SF,
VREPLACE_UN_V2DF as VREPLACE_UN_DF.
* config/rs6000/vsx.md (VEC_RU): New mode iterator.
(VEC_RU_char): New mode attribute.
(vreplace_un_): Change iterator and mode attribute.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-replace-word-runnable.c: Renamed
vec-replace-word-runnable_1.c.
* 

Re: [PATCH] rs6000, fix vec_replace_unaligned builtin arguments

2023-07-07 Thread Carl Love via Gcc-patches
Kewen:

On Mon, 2023-06-19 at 11:50 +0800, Kewen.Lin wrote:
> > generated the vinsd instruction for the two calls with the first
> > argument of unsigned long long int.  When the first argument of the
> > builtin is changed to the correct type, vector unsigned char the
> > builtin generates the vinsw instruction instead.  The change occurs
> > in
> > two places resulting in reducing the counts for vinsd by two and
> > increasing the counts for vinsw by two.  The other calls to the
> > builtin
> > are either vector ints or vector floats which generate the vinsw
> > instruction.  Changing the first argument in those calls to vector
> > unsigned char still generate the vinsw instruction.
> 
> But it did expose something odd and needed to be handled in this
> change.
> I had a further check, for the below test case:
> 
> #include "altivec.h"
> 
> #ifdef ORIG
> vector unsigned char foo (vector unsigned long long v){
>   unsigned long long val = 678ull;
>   return vec_replace_unaligned (v, val, 7);
> }
> #else
> vector unsigned char foo (vector unsigned long long v){
>   unsigned long long val = 678ull;
>   return vec_replace_unaligned ((vector unsigned char)v, val, 7);
> }
> #endif
> 
> Without this patch (-DORIG required to match the previous prototype),
> it would generate vinsd; while with this proposed patch, it would
> generate vinsw.  I think it's unexpected since users can still have
> the need to replace a doubleword size of chunk but give a constant
> which can be represented by int.  The previous way can support it,
> while the new way can't.  So we should have some way to distinguish
> it, we have some special-casing in function
> altivec_resolve_overloaded_builtin, could you have a check and try
> there?  Thanks!

I added the needed handling in altivec_resolve_overloaded_builtin to
address the issue with the built-in generating the correct instruction
for the unsigned long long cases in the test file.  I added an
additional test file with the above test case.  It was put into a new
test file as it requires the -flax-vector-conversions argument.  I felt
that it was best to separate the tests that need/do not need the -flax-
vector-conversions argument.

Note, adding the additional case statement RS6000_OVLD_VEC_REPLACE_UN
to handle the three argument built-in vec_replace_unaligned in
altivec_resolve_overloaded_builtin exposed an issue with function
find_instance.  Function find_instance assumes there are only two
arguments in the builtin.  There are no checks on the actual number of
arguments used by the built-in. This leads to an error in
tree_operand_check_failed() when using find_builtin.  The find_buitin
function was extended to handle 2 or 3 arguments with a check to make
sure the number of arguments is either 2 or 3.

FYI, I also noticed in the current patch the names in rs6000-
builtins.def and rs6000-overload.def for builtin_altivec_vreplace_un
still reflect the type of the first argument.  The current patch
changes the first argument to vuc, but the naming didn't all get
updated.  I think the names should be changed to reflect the name of
the second argument since the first arguments are all identical.  For
example:
 
-- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3388,29 +3388,29 @@
   const vull __builtin_altivec_vpextd (vull, vull);
 VPEXTD vpextd {}
 
   -  const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
   - const int<4>);
   -VREPLACE_UN_UV2DI vreplace_un_v2di {}
   +  const vuc __builtin_altivec_vreplace_un_udi (vuc, unsigned long long, \
   +   const int<4>);
   +VREPLACE_UN_UDI vreplace_un_di {}
 
 The name changes will ripple thru files rs6000-builtins.def, rs6000-
 overload.def and vsx.md.

I did all the naming as well in the new version 3 of the patch.

 Carl 



[PATCH v4] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Carl Love via Gcc-patches
GCC maintainers:

Ver 4. Fixed a few typos.  Redid the tests to create separate run and
compile tests.

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl



-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the builtin argument
types and the functional correctness of each builtin.  There is also a
compile only test that verifies the builtins generate the expected number
of instructions for the various builtin tests.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-compile.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op-compile.c   |  22 ++
 .../powerpc/vsx-vector-6-func-1op-run.c   |  98 
 .../powerpc/vsx-vector-6-func-1op.h   |  43 
 .../powerpc/vsx-vector-6-func-2lop-compile.c  |  14 ++
 .../powerpc/vsx-vector-6-func-2lop-run.c  | 177 ++
 .../powerpc/vsx-vector-6-func-2lop.h  |  47 
 .../powerpc/vsx-vector-6-func-2op-compile.c   |  21 ++
 .../powerpc/vsx-vector-6-func-2op-run.c   |  96 
 .../powerpc/vsx-vector-6-func-2op.h   |  42 
 .../powerpc/vsx-vector-6-func-3op-compile.c   |  17 ++
 .../powerpc/vsx-vector-6-func-3op-run.c   | 229 ++
 .../powerpc/vsx-vector-6-func-3op.h   |  73 ++
 .../vsx-vector-6-func-cmp-all-compile.c   |  17 ++
 .../powerpc/vsx-vector-6-func-cmp-all-run.c   | 147 +++
 .../powerpc/vsx-vector-6-func-cmp-all.h   |  76 ++
 .../powerpc/vsx-vector-6-func-cmp-compile.c   |  16 ++
 .../powerpc/vsx-vector-6-func-cmp-run.c   |  92 +++
 .../powerpc/vsx-vector-6-func-cmp.h   |  40 +++
 .../gcc.target/powerpc/vsx-vector-6.h | 154 
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 
 22 files changed, 1267 insertions(+), 282 deletions(-)
 create mode 100644 

Re: [PATCH ver 3] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Carl Love via Gcc-patches
Kewen:

On Tue, 2023-07-04 at 10:49 +0800, Kewen.Lin wrote:
> 



> > 
> > The tests are broken up into a seriers of files for related
> > tests.  The
> 
> s/seriers/series/

Fixed

> 
> > new tests are runnable tests to verify the builtin argument types
> > and the
> > functional correctness of each test rather then verifying the type
> > and
> > number of instructions generated.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
> 
> Missing "func-" in the names ...

Fixed.

> 
> > * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
> 
> should be vsx-vector-6-p{7,8,9}.c, "git gcc-verify" should catch
> these.

Fixed, ran git gcc-verify which found a couple more little file name
typos.
> 
> > ---
> >  .../powerpc/vsx-vector-6-func-1op.c   | 141 ++
> >  .../powerpc/vsx-vector-6-func-2lop.c  | 217
> > +++
> >  .../powerpc/vsx-vector-6-func-2op.c   | 133 +
> >  .../powerpc/vsx-vector-6-func-3op.c   | 257
> > ++
> >  .../powerpc/vsx-vector-6-func-cmp-all.c   | 211 ++
> >  .../powerpc/vsx-vector-6-func-cmp.c   | 121 +
> >  .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
> >  10 files changed, 1080 insertions(+), 282 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-1op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2lop.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-3op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp-all.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p7.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p8.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p9.c
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > new file mode 100644
> > index 000..52c7ae3e983
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > @@ -0,0 +1,141 @@
> > +/* { dg-do run { target lp64 } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-options "-O2 -save-temps" } */
> 
> I just noticed that we missed an effective target check here to
> ensure the
> support of those bifs during the test run, and since it's a runnable
> test
> case, also need to ensure the generated hw insn supported, it's
> "vsx_hw"
> like:
> 
> /* { dg-require-effective-target vsx_hw } */
> 
> And adding "-mvsx" to the dg-options.

Add the effective-target and -mvsx to all of the tests.

> 
> This is also applied for the other test cases.
> 
> But as the discussion on xxlor and the different effective target
> requirements
> on compilation part and run part, I think we can separate each of
> these cases into
> two files, one for compilation and the other for run, for example,
> for this
> case, update FLOAT_TEST by adding one more global variable like
> 
> #define FLOAT_TEST(NAME)
>   vector float f_##NAME##_result; \
>   void ... \
>   f_##NAME##_result = vec_##NAME(f_src);\
>   }
>   // moving the checking code to its main.
> 
> move #include , FLOAT_TEST(NAME), DOUBLE_TEST(NAME)
> defines
> and their uses into vsx-vector-6-func-1op.h.
> 
> 
> **For compilation file vsx-vector-6-func-1op.c**:
> 
> Include this header file into vsx-vector-6-func-1op.c, which has the
> 
> /* { dg-do compile { target lp64 } } */
> /* { dg-require-effective-target powerpc_vsx_ok } */
> /* { dg-options "-O2 -mvsx" } */
> 
> #include "vsx-vector-6-func-1op.h"
> 
> Then put the expected insn check here, like 
> 
> /* { dg-final { scan-assembler-times {\mxvabssp\M} 1 } } */
> ...
> 
> By organizing it like this, these scan-assembler-times would only
> focus on what
> are generated for bifs (excluding possible noises from main function
> for running).
> 
> 
> **For runnable file 

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-07-03 Thread Carl Love via Gcc-patches
Kewen:

On Fri, 2023-06-30 at 15:20 -0700, Carl Love wrote:
> Segher never liked the above way of looking at the assembly.  He
> prefers:
>   gcc -S -g -mcpu=power8 -o vsx-vector-6-func-2lop.s vsx-vector-6-
> func-
> 2lop.c
> 
>   grep xxlor vsx-vector-6-func-2lop.s | wc
>  34  68 516
> 
> So, again, I get the same count of 34 on both makalu and genoa.  But
> again, that doesn't agree with what make script/scan-assembler thinks
> the counts should be.
> 
> When I looked at the vsx-vector-6-func-2lop.s I see on BE:
> 
>  
> lxvd2x 0,10,9
> xxlor 0,12,0
> xxlnor 0,0,0
>  ...
> 
> I was guessing that it was adjusting the data layout from the load. 
> But looking again more carefully versus LE:
> 
> 
> lxvd2x 0,31,9 
>xxpermdi 0,0,0,2 
>xxlor 0,12,0  
>xxlnor 0,0,0  
>xxpermdi 0,0,0,2 
> 
> 
> the xxpermdi is probably what is really doing the data layout change.
> 
> So, we have the issue that looking at the assembly gives different
> instruction counts then what 
> 
>dg-final { scan-assembler-times {\mxxlor\M} }
> 
> comes up with???  Now I am really confused.  I don't know how the
> scan-
> assembler-times works but I will go see if I can find it and see if I
> can figure out what the issue is.  I would expect that the scan-
> assembler is working off the --save-temp files, which get deleted as
> part of the run.  I would guess that scan-assembler does a grep to
> find
> the instructions and then maybe uses wc to count them??? I will go
> see
> if I can figure out how scan-assembler-times works.

OK, I figured out why I was getting 34 xxlor instructions instead of
the 22 that the scan-assembler-times was getting.  The difference was
when I compiled the program I forgot to use -O2.  So with -O2 I get the
same number of xxlor instructins as scan-assembler-instructions.  I get
34 if I do not specify optimization.

So, I think the scan-assembler-times are all correct.

As Peter says, counting xxlor is a bit problematic in general.  We
could just drop counting xxlor or have the LE/BE count qualifier for
the instructions.  Your call.

 Carl 



[PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-30 Thread Carl Love via Gcc-patches


GCC maintainers:

Ver 2,  Went back thru the requirements and emails.  Not sure where I
came up with the requirement for an overloaded version with double
argument.  Removed the overloaded version with the double argument. 
Added the macro to announce if the __builtin_set_fpscr_rn returns a
void or a double with the FPSCR bits.  Updated the documentation file. 
Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
file.  Per request, the original test file functionality was not
changed.  Just changed the name from test_fpscr_rn_builtin.c to 
test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
new test file, test_fpscr_rn_builtin_2.c.

The GLibC team requested a builtin to replace the mffscrn and
mffscrniinline asm instructions in the GLibC code.  Previously there
was discussion on adding builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 


--
rs6000, __builtin_set_fpscr_rn add retrun value

Change the return value from void to double.  The return value consists of
the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
overloaded version which accepts a double argument.

The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
double reterun value and the new double argument.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
builtin definition return type.
* config/rs6000-c.cc(rs6000_target_modify_macros): Add check, define
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.
* config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
define_expand.
(rs6000_update_fpscr_rn_field): New define_expand.
(rs6000_set_fpscr_rn): Addedreturn argument.  Updated to use new
rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
 _expands.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value and new double argument.  Add descripton for
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
test_fpscr_rn_builtin_1.c.  Added comment.
gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
return value of __builtin_set_fpscr_rn builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   2 +-
 gcc/config/rs6000/rs6000-c.cc |   4 +
 gcc/config/rs6000/rs6000.md   |  87 +++---
 gcc/doc/extend.texi   |  26 ++-
 ...rn_builtin.c => test_fpscr_rn_builtin_1.c} |   6 +
 .../powerpc/test_fpscr_rn_builtin_2.c | 153 ++
 6 files changed, 246 insertions(+), 32 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{test_fpscr_rn_builtin.c => 
test_fpscr_rn_builtin_1.c} (92%)
 create mode 100644 

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-30 Thread Carl Love via Gcc-patches
Kewen:

On Fri, 2023-06-30 at 15:20 -0700, Carl Love wrote:
> So, went to look at the assembly to verify my comment on the
> difference
> being related to the loads. I decided to actually count the
> instructions just to verify the number in the assembly files. 
> Before,
> I just looked at the assembly briefly but didn't dig in very deep.
> 
> If I compile the tests and dump the assembly with:
>   gcc -g -mcpu=power8 -o vsx-vector-6-func-2lop vsx-vector-6-func-
> 2lop.c
> 
>   objdump -S -d vsx-vector-6-func-2lop > vsx-vector-6-func-2lop.dump
>   
>   grep xxlor vsx-vector-6-func-2lop.dump | wc
>   4  28 192
> 
> So we see 4 xxlor instructions not 32 as expeced for BE or 22 as
> expected for LE as the test claims.  I get the same count of 4 on
> both
> makalu and on genoa. 

With a little help from Peter and Julian Wang.  Objdump decodes some of
the xxlor instructions as xxmr instsructions.  The xxmr is a new
mnemonic which will be out in the next ISA.  But objdump already
produces it.  So if you add the counts for grep xxlor and grep xxmr you
get a total of 34 which agress with the count of xxlor in the gcc -S
generated assembly.

  Carl 



Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-30 Thread Carl Love via Gcc-patches
Kewen:

On Fri, 2023-06-30 at 11:37 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/30 05:36, Carl Love wrote:
> > Kewen:
> > 
> > On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
> > > > Yea, I was going with a runnable test and didn't include the
> > > > instruction counts.  Added back in.  Rather then doing by
> > > > processor
> > > > version (P8, P9, P10) I was able to do it by BE/LE.  The
> > > > instruction
> > > > counts were the same for LE accross processor versions but
> > > > there
> > > > are a
> > > > few instruction counts that vary with BE and LE.
> > > 
> > > But the original test case only checks for cpu-types (processor
> > > version)
> > > but not for endianness, it means for the bif usages, there should
> > > not
> > > be
> > > different for endianness.  Why does this changes with your new
> > > test
> > > case?
> > > Could you have a further look and make it consistent with some
> > > adjustment
> > > if possible?  As we know, checking insn counts sometimes are
> > > fragile,
> > > so
> > > I think we should try our best to make it as robust as possible
> > > in
> > > the
> > > first place.
> > > 
> > > Besides, the original case also have some differences between
> > > p7/p8
> > > and
> > > p9.
> > >   
> > 
> > There are differences on P8 LE versus BE.  I did a diff between the
> > P8
> > and P9 tests:
> > 
> >  diff vsx-vector-6.p8.c vsx-vector-6.p9.c
> > 3,4c3,4
> > < /* { dg-require-effective-target powerpc_p8vector_ok } */
> > < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> > ---
> > > /* { dg-require-effective-target powerpc_p9vector_ok } */
> > > /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> > 12c12
> > < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
> > ---
> > > /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } }
> > > */
> > 23d22
> > < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> > 37c36
> > < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
> > ---
> > > /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> > 
> > So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
> > xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are
> > different
> > between the two architectures.  I then wrote a script to compile
> > the
> > CPU specific test on Power 8, Power 9 and Power 10 architectures
> > and
> > then grep for the above list of instructions.  If I run the scrip
> > on P8
> > BE  and LE I get> 
> > 
> > Power 8 BEPower 8 LE   Power 9 LE   Power 9
> > BEPower 10 LE*
> >(makalu-
> > lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
> > instruction   count countcount count   
> >  count
> > vperm  1  10 0 
> >0
> > vpermr 0  00 0 
> >0
> > xxpermr0  01 0 
> >1
> > xvmsubadp  1  01 1 
> >1
> > xvmsubmdp  0  10 0 
> >0
> > xvsubdp1  11 1 
> >1
> > 
> 
> Thanks for looking into this and making this statistics.
> 
> Is there a typo for column nilram?   Otherwise, the below insn check
> 
> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
> 
> would fail there.

Yes, there is a typo in the nilram column.  The test generates a vperm
instruction.

#if defined (__BIG_ENDIAN__) || defined (_ARCH_PWR9)
  dst[8].d = vec_perm (src0[8].d, src1[8].d, src2[8].uc);
 f74:   e9 3f 00 78 ld  r9,120(r31)
 f78:   39 29 07 00 addir9,r9,1792
 f7c:   f5 89 00 01 lxv vs12,0(r9)
 f80:   e9 3f 00 80 ld  r9,128(r31)
 f84:   39 29 07 00 addir9,r9,1792
 f88:   f4 09 00 01 lxv vs0,0(r9)
 f8c:   e9 3f 00 88 ld  r9,136(r31)
 f90:   39 29 07 00 addir9,r9,1792
 f94:   f4 09 00 89 lxv vs32,128(r9)
 f98:   e9 3f 00 70 ld  r9,112(r31)
 f9c:   39 29 07 00 addir9,r9,1792
 fa0:   f0 2c

[PATCH ver 3] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Carl Love via Gcc-patches
GCC maintainers:

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl


-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

The tests are broken up into a seriers of files for related tests.  The
new tests are runnable tests to verify the builtin argument types and the
functional correctness of each test rather then verifying the type and
number of instructions generated.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op.c   | 141 ++
 .../powerpc/vsx-vector-6-func-2lop.c  | 217 +++
 .../powerpc/vsx-vector-6-func-2op.c   | 133 +
 .../powerpc/vsx-vector-6-func-3op.c   | 257 ++
 .../powerpc/vsx-vector-6-func-cmp-all.c   | 211 ++
 .../powerpc/vsx-vector-6-func-cmp.c   | 121 +
 .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
 10 files changed, 1080 insertions(+), 282 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
new file mode 100644
index 000..52c7ae3e983
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
@@ -0,0 +1,141 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-options "-O2 -save-temps" } */
+
+/* Functional test of the one operand vector builtins.  */
+
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+void abort (void);
+
+/* Macro to check the results for the various floating point argument tests.
+ */
+#define FLOAT_TEST(NAME)  \
+  void __attribute__ ((noipa))\
+  float_##NAME (vector float f_src, vector float f_##NAME##_expected) \
+  {  \
+vector float f_result = vec_##NAME(f_src);   \
+  \
+if 

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Carl Love via Gcc-patches
Kewen:

On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
> > Yea, I was going with a runnable test and didn't include the
> > instruction counts.  Added back in.  Rather then doing by processor
> > version (P8, P9, P10) I was able to do it by BE/LE.  The
> > instruction
> > counts were the same for LE accross processor versions but there
> > are a
> > few instruction counts that vary with BE and LE.
> 
> But the original test case only checks for cpu-types (processor
> version)
> but not for endianness, it means for the bif usages, there should not
> be
> different for endianness.  Why does this changes with your new test
> case?
> Could you have a further look and make it consistent with some
> adjustment
> if possible?  As we know, checking insn counts sometimes are fragile,
> so
> I think we should try our best to make it as robust as possible in
> the
> first place.
> 
> Besides, the original case also have some differences between p7/p8
> and
> p9.
>   

There are differences on P8 LE versus BE.  I did a diff between the P8
and P9 tests:

 diff vsx-vector-6.p8.c vsx-vector-6.p9.c
3,4c3,4
< /* { dg-require-effective-target powerpc_p8vector_ok } */
< /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
---
> /* { dg-require-effective-target powerpc_p9vector_ok } */
> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
12c12
< /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
23d22
< /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
37c36
< /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */

So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
between the two architectures.  I then wrote a script to compile the
CPU specific test on Power 8, Power 9 and Power 10 architectures and
then grep for the above list of instructions.  If I run the scrip on P8
BE  and LE I get


Power 8 BEPower 8 LE   Power 9 LE   Power 9 BEPower 10 LE*
   (makalu-lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
instruction   count countcount countcount
vperm  1  10 00
vpermr 0  00 00
xxpermr0  01 01
xvmsubadp  1  01 11
xvmsubmdp  0  10 00
xvsubdp1  11 11


>From the diff we see 

  { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }

This test picks up the correct subtraction instruction for LE versus BE
so this "masks" the LE/BE difference.  I changed the check in vsx-
vector-6-func-3op.c to match.  This eliminates the LE and BE checks and
reduces the number of specific checks.

In vsx-vector-6-func-3op.c  The new test checks the counts for
xxpermdi, which the original test does not check.  The check for
xxpermdi are not needed.  They are not directly related to the builtin
tests.  I removed them.

Looking at the LE/BE checks in the other test file vsx-vector-6-func-
2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
the original test.  The functions where these instructions are used get
inlined.  On LE, the binary instructions show up in the inlined code as
well as what appears to be the binary for the original, non-inlined
function.  Best I can see, the binary for the original function is dead
code.  I don't see any calls to it.  Seems like it shouldn't be there
as it would make the binary smaller. On BE, I don't see the binary for
the original non-inlined function.  

I had played with putting -Wno-inline on the command line but that
didn't seem to make any difference.  However, you suggestion of
__attribute__ ((noipa)) does prevent the inlining and we don't get the
second copy of the instructions showing up. The inlining eliminated the
LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.

The instruction count test for xxlor in vsx-vector-6-func-2lop.c
differs on LE and BE vsx-vector-6-func-2op.c.  I believe the
instruction is used with loads to reorder the data.  I don't see anyway
to get around the extra xxlor instructions and verify the vec_or
builtin test generates the instruction.  

I was able to eliminate all of the LE/BE qualifiers in the instruction
counts with the exception of xxlor.  By using the same checks that look
for multiple versions of xvmsumb*, as was done in the original test, we
can also eliminate LE/BE specific tests and account for different
instructions across CPU versions.  We could go back to checking for
specific instructions being generated on Power 8, Power 9, Power 10 if
you prefer not using checks that cover multiple flavors of a given

[PATCH ver 2] rs6000: Update the vsx-vector-6.* tests.

2023-06-21 Thread Carl Love via Gcc-patches


GCC maintainers:

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl

--
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

The tests are broken up into a seriers of files for related tests.  The
new tests are runnable tests to verify the builtin argument types and the
functional correctness of each test rather then verifying the type and
number of instructions generated.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op.c   | 156 ++
 .../powerpc/vsx-vector-6-func-2lop.c  | 223 ++
 .../powerpc/vsx-vector-6-func-2op.c   | 142 +
 .../powerpc/vsx-vector-6-func-3op.c   | 273 ++
 .../powerpc/vsx-vector-6-func-cmp-all.c   | 205 +
 .../powerpc/vsx-vector-6-func-cmp.c   | 130 +
 .../gcc.target/powerpc/vsx-vector-6.h | 154 --
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
 10 files changed, 1129 insertions(+), 282 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
new file mode 100644
index 000..0d4e237673b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
@@ -0,0 +1,156 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-options "-O2 -save-temps" } */
+
+/* Functional test of the one operand vector builtins.  */
+
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+void abort (void);
+
+  /* Macro to check the results for the various floating point argument tests.
+   */
+#define FLOAT_CHECK(NAME)  \
+  f_result = vec_##NAME(f_src);\
+   \
+  if ((f_result[0] != f_##NAME##_expected[0]) ||   \
+  (f_result[1] != f_##NAME##_expected[1]) ||   \
+  (f_result[2] != f_##NAME##_expected[2]) ||   \
+  (f_result[3] != f_##NAME##_expected[3])) \
+{  \
+  if (DEBUG) { \
+printf("ERROR: vec_%s (float) expected value does not match\n",\
+   

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-21 Thread Carl Love via Gcc-patches
On Mon, 2023-06-19 at 15:17 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/31 04:46, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch takes the tests in vsx-vector-6-p7.h,  vsx-
> > vector-
> > 6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of
> > smaller
> > test files by functionality rather than processor version.
> > 
> > The patch has been tested on Power 10 with no regressions.
> > 
> > Please let me know if this patch is acceptable for
> > mainline.  Thanks.
> > 
> >Carl
> > 
> > --
> > rs6000: Update the vsx-vector-6.* tests.
> > 
> > The vsx-vector-6.h file is included into the processor specific
> > test files
> > vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The
> > .h file
> > contains a large number of vsx vector builtin tests.  The processor
> > specific files contain the number of instructions that the tests
> > are
> > expected to generate for that processor.  The tests are compile
> > only.
> > 
> > The tests are broken up into a seriers of files for related
> > tests.  The
> > new tests are runnable tests to verify the builtin argument types
> > and the
> 
> But the newly added test cases are all with "dg-do compile", it
> doesn't
> match what you said here.

Ah, yea, that is wrong.  Fixed.

> 
> > functional correctness of each test rather then verifying the type
> > and
> > number of instructions generated.
> 
> It's good to have more coverage with runnable case, but we miss some
> test
> coverages on the expected insn counts which cases p{7,8,9}.c can
> provide
> originally.  Unless we can ensure it's already tested somewhere else
> (do
> we? it wasn't stated in this patch), I think we still need those
> checks.

Yea, I was going with a runnable test and didn't include the
instruction counts.  Added back in.  Rather then doing by processor
version (P8, P9, P10) I was able to do it by BE/LE.  The instruction
counts were the same for LE accross processor versions but there are a
few instruction counts that vary with BE and LE.  

I did noticed in one of the tests that the compiler computed the
answers at compile time and thus didn't actually generate the builtin
code.  After digging a little more I found a few more tests where the
compiler was doing the calculations and just inserting the answers.

So, I moved all of the tests to functions so the compiler would
actually generate the desired builtin code.  

> 
> > gcc/testsuite/
> > * gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
> > ---
> >  .../powerpc/vsx-vector-6-func-1op.c   | 319 +
> >  .../powerpc/vsx-vector-6-func-2lop.c  | 305 +
> >  .../powerpc/vsx-vector-6-func-2op.c   | 278 
> >  .../powerpc/vsx-vector-6-func-3op.c   | 229 ++
> >  .../powerpc/vsx-vector-6-func-cmp-all.c   | 429
> > ++
> >  .../powerpc/vsx-vector-6-func-cmp.c   | 237 ++
> >  .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 --
> >  .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 --
> >  .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 --
> >  10 files changed, 1797 insertions(+), 282 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-1op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2lop.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-3op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp-all.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
> >  

[PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches


Kewen, GCC maintainers:

Version 6, Fixed missing change log entry.  Changed builtin id names as
requested.  Missed making the change on the last version.  Fixed
comment in the three test cases.  Reran regression suite on Power 10,
no regressions.

Version 5, Tested the patch on P9 BE per request.  Fixed up test case
to get the correct expected values for BE and LE.  Fixed typos. 
Updated the doc/extend.texi to clarify the vector arguments.  Changed
test file names per request.  Moved builtin defs next to related
definitions.  Renamed new mode_attr. Removed new mode_iterator, used
existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
Fixed up overloaded definitions per request.

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 9 BE and Power 10 LE with no
regressions.  Please let me know if the patch is acceptable or not. 
Thanks.

   Carl


rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

The instructions used in the builtins operate on vector registers.  Thus
the result must be moved to a scalar type.  There is no clean, performant
way to do this.  The user code typically needs the result as a vector
anyway.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/vsx.md (V2DI_DI): New mode iterator.
(DI_to_TI): New mode attribute.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-16.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  21 +++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++-
 gcc/config/rs6000/rs6000-c.cc |  10 +-
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 gcc/config/rs6000/vsx.md  |  25 +++--
 gcc/doc/extend.texi   |  24 +++-
 

Re: [PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches
Kewen:

On Mon, 2023-06-19 at 14:08 +0800, Kewen.Lin wrote:
> > 



> Hi Carl,
> 
> on 2023/6/17 01:57, Carl Love wrote:
> > overloaded instance. Update comments.
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
> > overloaded definitions.
> > * config/vsx.md (V2DI_DI): New mode iterator.
> 
> Missing an entry for DI_to_TI.

Opps, missed that.  Sorry, fixed.

> > 



> 
> >  
> >const signed long long __builtin_vsx_scalar_extract_expq
> > (_Float128);
> > -VSEEQP xsxexpqp_kf {}
> > +VSEEQP xsxexpqp_kf_di {}
> > +
> > +  vull __builtin_vsx_scalar_extract_exp_to_vec (_Float128);
> > +VSEEXPKF xsxexpqp_kf_v2di {}
> 
> As I pointed out previously, the related id is VSEEQP, since both of
> them

Oops, I guess I forgot to change that.  Sorry.

> have kf in their names, having KF in its id doesn't look good IMHO.
> How about VSEEQPV instead of VSEEXPKF?  It's also consistent with
> what
> we use for VSIEQP.

Yup, makes sense, changed to VSEEQPV.
> 
> >  
> >const signed __int128 __builtin_vsx_scalar_extract_sigq
> > (_Float128);
> > -VSESQP xsxsigqp_kf {}
> > +VSESQP xsxsigqp_kf_ti {}
> > +
> > +  vuq __builtin_vsx_scalar_extract_sig_to_vec (_Float128);
> > +VSESIGKF xsxsigqp_kf_v1ti {}
> 
> Similar to the above, s/VSESIGKF/VSESQPV/
 
Changed to VSESQPV.
> 
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned
> > __int128, \
> >   unsigned long
> > long);
> > -VSIEQP xsiexpqp_kf {}
> > +VSIEQP xsiexpqp_kf_di {}
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, \
> >unsigned
> > long long);
> >  VSIEQPF xsiexpqpf_kf {}
> >  
> > +  const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
> > +VSIEQPV xsiexpqp_kf_v2di {}
> > +
> >const signed int __builtin_vsx_scalar_test_data_class_qp
> > (_Float128, \
> >  const
> > int<7>);
> >  VSTDCQP xststdcqp_kf {}
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index 8555174d36e..11060f697db 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1929,11 +1929,15 @@ altivec_resolve_overloaded_builtin
> > (location_t loc, tree fndecl,
> >128-bit variant of built-in function.  */
> > if (GET_MODE_PRECISION (arg1_mode) > 64)
> >   {
> > -   /* If first argument is of float variety, choose variant
> > -  that expects __ieee128 argument.  Otherwise, expect
> > -  __int128 argument.  */
> > +   /* If first argument is of float variety, choose the
> > variant that
> > +  expects __ieee128 argument.  If the first argument is
> > vector
> > +  int, choose the variant that expects vector unsigned
> > +  __int128 argument.  Otherwise, expect scalar __int128
> > argument.
> > +   */
> > if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
> >   instance_code = RS6000_BIF_VSIEQPF;
> > +   else if (GET_MODE_CLASS (arg1_mode) == MODE_VECTOR_INT)
> > + instance_code = RS6000_BIF_VSIEQPV;
> > else
> >   instance_code = RS6000_BIF_VSIEQP;
> >   }
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..05a5ca6a04d 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -4515,6 +4515,18 @@
> >  VSIEQP
> >_Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned
> > long long);
> >  VSIEQPF
> > +  _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
> > +VSIEQPV
> > +
> > +[VEC_VSEEV, scalar_extract_exp_to_vec, \
> > +__builtin_vec_scalar_extract_exp_to_vector]
> > +  vull __builtin_vec_scalar_extract_exp_to_vector (_Float128);
> > +VSEEXPKF
> > +
> 
> Need to update if the above changes.

changed 
> 
> > +[VEC_VSESV, scalar_extract_sig_to_vec, \
> > +__builtin_vec_scalar_extract_sig_to_vector]
> > +  vuq __builtin_vec_scalar_extract_sig_to_v

[PATCH] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-19 Thread Carl Love via Gcc-patches
GCC maintainers:


The GLibC team requested a builtin to replace the mffscrn and mffscrniinline 
asm instructions in the GLibC code.  Previously there was discussion on adding 
builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 


rs6000, __builtin_set_fpscr_rn add retrun value

Change the return value from void to double.  The return value consists of
the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
overloaded version which accepts a double argument.

The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
double reterun value and the new double argument.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Delete.
(__builtin_set_fpscr_rn_i): New builtin definition.
(__builtin_set_fpscr_rn_d): New builtin definition.
* config/rs6000/rs6000-overload.def (__builtin_set_fpscr_rn): New
overloaded definition.
* config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
define_expand.
(rs6000_update_fpscr_rn_field): New define_expand.
(rs6000_set_fpscr_rn_d): New define expand.
(rs6000_set_fpscr_rn_i): Renamed from rs6000_set_fpscr_rn, Added
return argument.  Updated to use new rs6000_get_fpscr_fields and
rs6000_update_fpscr_rn_field define _expands.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value and new double argument.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Add new tests th check
double return value.  Add tests for overloaded double argument.
re
---
 gcc/config/rs6000/rs6000-builtins.def |   7 +-
 gcc/config/rs6000/rs6000-overload.def |   6 +
 gcc/config/rs6000/rs6000.md   | 122 ---
 gcc/doc/extend.texi   |  25 ++-
 .../powerpc/test_fpscr_rn_builtin.c   | 143 +-
 5 files changed, 262 insertions(+), 41 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 289a37998b1..30e0b0bb06d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -237,8 +237,11 @@
   const __ibm128 __builtin_pack_ibm128 (double, double);
 PACK_IF packif {ibm128}
 
-  void __builtin_set_fpscr_rn (const int[0,3]);
-SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
+  double __builtin_set_fpscr_rn_i (const int[0,3]);
+SET_FPSCR_RN_I rs6000_set_fpscr_rn_i {nosoft}
+
+  double __builtin_set_fpscr_rn_d (double);
+SET_FPSCR_RN_D rs6000_set_fpscr_rn_d {nosoft}
 
   const double __builtin_unpack_ibm128 (__ibm128, const int<1>);
 UNPACK_IF unpackif {ibm128}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 

[PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-16 Thread Carl Love via Gcc-patches
Kewen, GCC maintainers:

Version 5, Tested the patch on P9 BE per request.  Fixed up test case
to get the correct expected values for BE and LE.  Fixed typos. 
Updated the doc/extend.texi to clarify the vector arguments.  Changed
test file names per request.  Moved builtin defs next to related
definitions.  Renamed new mode_attr. Removed new mode_iterator, used
existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
Fixed up overloaded definitions per request.

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 9 BE and Power 10 LE with no
regressions.  Please let me know if the patch is acceptable or not. 
Thanks.

   Carl


rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

The instructions used in the builtins operate on vector registers.  Thus
the result must be moved to a scalar type.  There is no clean, performant
way to do this.  The user code typically needs the result as a vector
anyway.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/vsx.md (V2DI_DI): New mode iterator.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-16.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  21 +++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++-
 gcc/config/rs6000/rs6000-c.cc |  10 +-
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 gcc/config/rs6000/vsx.md  |  25 +++--
 gcc/doc/extend.texi   |  24 +++-
 .../powerpc/bfp/scalar-extract-exp-8.c|  58 ++
 .../powerpc/bfp/scalar-extract-sig-8.c|  65 +++
 .../powerpc/bfp/scalar-insert-exp-16.c| 103 ++
 9 files changed, 307 insertions(+), 26 deletions(-)
 create mode 100644 

Re: [PATCH ver 4] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-16 Thread Carl Love via Gcc-patches
On Thu, 2023-06-15 at 14:23 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/15 04:37, Carl Love wrote:
> > Kewen, GCC maintainers:
> > 
> > Version 4, added missing cases for new xxexpqp, xsxexpdp and
> > xsxsigqp
> > cases to rs6000_expand_builtin.  Merged the new define_insn
> > definitions
> > with the existing definitions.  Renamed the builtins by removing
> > the
> > __builtin_ prefix from the names.  Fixed the documentation for the
> > builtins.  Updated the test files to check the desired instructions
> > were generated.  Retested patch on Power 10 with no regressions.
> > 
> > Version 3, was able to get the overloaded version of
> > scalar_insert_exp
> > to work and the change to xsxexpqp_f128_ define instruction
> > to
> > work with the suggestions from Kewen.  
> > 
> > Version 2, I have addressed the various comments from Kewen.  I had
> > issues with adding an additional overloaded version of
> > scalar_insert_exp with vector arguments.  The overload
> > infrastructure
> > didn't work with a mix of scalar and vector arguments.  I did
> > rename
> > the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp
> > make
> > it similar to the existing builtin.  I also wasn't able to get the
> > suggested merge of xsxexpqp_f128_ with xsxexpqp_ to
> > work so
> > I left the two simpler definitiions.
> > 
> > The patch add three new builtins to extract the significand and
> > exponent of an IEEE float 128-bit value where the builtin argument
> > is a
> > vector.  Additionally, a builtin to insert the exponent into an
> > IEEE
> > float 128-bit vector argument is added.  These builtins were
> > requested
> > since there is no clean and optimal way to transfer between a
> > vector
> > and a scalar IEEE 128 bit value.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable or not.  Thanks.
> 
> I'd suggest you to test this on P9 BE as well to ensure the test case
> to work well on BE too.

Tested on P9 BE.  Updated test cases for the correct expected BE and LE
results.

> 
> >Carl
> > 
> > 
> > 
> > rs6000: Add builtins for IEEE 128-bit floating point values
> > 
> > Add support for the following builtins:
> > 
> >  __vector unsigned long long int scalar_extract_exp_to_vec
> > (__ieee128);
> >  __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
> >  __ieee128 scalar_insert_exp (__vector unsigned __int128,
> >   __vector unsigned long long);
> > 
> > These builtins were requesed since there is no clean and performant
> > way to
> 
> s/requesed/requested/

Fixed.

> 
> > transfer a value from a vector type and scalar type, despite the
> > fact
> 
> Describe it oppositely?  As the related existing bifs returns scalar
> type,
> the users want them in vector type, so it's "from scalar type to
> vector
> type"?

Updated the description.

> 
> > that they both reside in vector registers.
> 
> the fact is the related hardware insns have vsx registers
> destination.
> 
> > gcc/
> > * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
> > Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
> > Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
> > (CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
> > CODE_FOR_xsiexpqp_kf_v2di   ): Add case statements.
> 
> unnecessary tab.

Fixed.

> 
> > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
> >  __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
> > builtin definitions.
> > Rename xsxexpqp_kf, xsxsigqp_kf, xxsiexpqp_kf to xsexpqp_kf_di,
> 
> typo, xxsiexpqp_kf => xsiexpqp_kf
> 
> > xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
> > * config/rs6000/rs6000-c.cc
> > (altivec_resolve_overloaded_builtin):
> > Add else if for MODE_VECTOR_INT. Update comments.
> 
> May be better with "Update RS6000_OVLD_VEC_VSIE handling for
> MODE_VECTOR_INT
> which is used for newly added overloaded instance"?

Changed.

> 
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
> > odverloaded definitions.
> 
> s/odverloaded/overloaded/

Fixed.


Re: [PATCH] rs6000, fix vec_replace_unaligned builtin arguments

2023-06-15 Thread Carl Love via Gcc-patches
On Tue, 2023-06-13 at 11:24 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/31 04:41, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> > 
> >commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:26:44 2022 -0600
> > 
> >rs6000: Clean up ISA 3.1 documentation [PR100808]
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned was
> >implemented with the same function prototypes as
> > vec_replace_elt.  It was
> >intended that vec_replace_unaligned always specify output
> > vectors as having
> >type vector unsigned char, to emphasize that elements are
> > potentially
> >misaligned by this built-in function.  This patch corrects
> > the
> >misimplementation.
> > 
> >2022-02-04  Bill Schmidt  
> > 
> >gcc/
> >PR target/100808
> >* doc/extend.texi (Basic PowerPC Built-in Functions
> > Available on ISA
> >3.1): Provide consistent type names.  Remove
> > unnecessary semicolons.
> >Fix bad line breaks.
> > 
> 
> Wrong referred commit, should be
> ed3fea09b18f67e757b5768b42cb6e816626f1db.
> The above commit used the wrong commit log.

Fixed the commit reference as noted.

> 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > --
> > rs6000, fix vec_replace_unaligned builtin arguments
> > 
> > The first argument of the vec_replace_unaligned builtin should
> > always be
> > unsinged char, as specified in gcc/doc/extend.texi.
> 
> s/unsinged/unsigned/

Fixed.

> 
> > This patch fixes the buitin definitions and updates the testcases
> > to use
> 
> s/buitin/builtin/

Fixed.

> 
> > the correct arguments.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
> > Fix first argument type.
> > 
> > gcc/testsuite/ChangeLog:
> > * gcc.target/powerpc/ver-replace-word-runnable.c
> > (vec_replace_unaligned) Fix first argument type.
> > (vresult_uchar): Fix expected   results.
> 
> Nit: unexpected tab.

Fixed.

> 
> > (vec_replace_unaligned): Update for loop to check uchar
> > results.
> > Remove extra spaces in if statements.
> > Insert missing spaces in for statements.
> > (dg-final): Update expected instruction counts.
> > ---
> >  gcc/config/rs6000/rs6000-overload.def |  12 +-
> >  .../powerpc/vec-replace-word-runnable.c   | 157 ++--
> > --
> >  2 files changed, 92 insertions(+), 77 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..26dc662b8fb 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -3059,17 +3059,17 @@
> >  VREPLACE_ELT_V2DF
> >  
> >  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
> > -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
> > +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
> >  VREPLACE_UN_UV4SI
> > -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
> > +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
> >  VREPLACE_UN_V4SI
> > -  vuc __builtin_vec_replace_un (vull, unsigned long long, const
> > int);
> > +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const
> > int);
> >  VREPLACE_UN_UV2DI
> > -  vuc __builtin_vec_replace_un (vsll, signed long long, const
> > int);
> > +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
> >  VREPLACE_UN_V2DI
> > -  vuc __builtin_vec_replace_un (vf, float, const int);
> > +  vuc __builtin_vec_replace_un (vuc, float, const int);
> >  VREPLACE_UN_V4SF
> &g

[PATCH ver 2] rs6000, fix vec_replace_unaligned builtin arguments

2023-06-15 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned was
   implemented with the same function prototypes as vec_replace_elt.  It was
   intended that vec_replace_unaligned always specify output vectors as 
having
   type vector unsigned char, to emphasize that elements are potentially
   misaligned by this built-in function.  This patch corrects the
   misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 

--
rs6000, fix vec_replace_unaligned builtin arguments

The first argument of the vec_replace_unaligned builtin should always be
unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the testcases to use
the correct arguments.  The expected instruction counts for the testcase
are updated.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/ver-replace-word-runnable.c
(vec_replace_unaligned) Fix first argument type.
(vresult_uchar): Fix expected results.
(vec_replace_unaligned): Update for loop to check uchar results.
Remove extra spaces in if statements.
Insert missing spaces in for statements.
(dg-final): Update expected instruction counts.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 .../powerpc/vec-replace-word-runnable.c   | 157 ++
 2 files changed, 92 insertions(+), 77 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..26dc662b8fb 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3059,17 +3059,17 @@
 VREPLACE_ELT_V2DF
 
 [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
-  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
 VREPLACE_UN_UV4SI
-  vuc __builtin_vec_replace_un (vsi, signed int, const int);
+  vuc __builtin_vec_replace_un (vuc, signed int, const int);
 VREPLACE_UN_V4SI
-  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned long long, const int);
 VREPLACE_UN_UV2DI
-  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
+  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
 VREPLACE_UN_V2DI
-  vuc __builtin_vec_replace_un (vf, float, const int);
+  vuc __builtin_vec_replace_un (vuc, float, const int);
 VREPLACE_UN_V4SF
-  vuc __builtin_vec_replace_un (vd, double, const int);
+  vuc __builtin_vec_replace_un (vuc, double, const int);
 VREPLACE_UN_V2DF
 
 [VEC_REVB, vec_revb, __builtin_vec_revb]
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
index 27318822871..66b0ef58996 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
@@ -20,6 +20,9 @@ main (int argc, char *argv [])
   unsigned char ch;
   unsigned int index;
 
+  vector unsigned char src_va_uchar;
+  vector unsigned char expected_vresult_uchar;
+
   vector unsigned int vresult_uint;
   vector unsigned int expected_vresult_uint;
   vector unsigned int src_va_uint;
@@ -64,10 +67,10 @@ main (int argc, char *argv [])
 
   vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
 
-  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
 #if DEBUG
 printf("ERROR, 

[PATCH ver 4] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-14 Thread Carl Love via Gcc-patches
Kewen, GCC maintainers:

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable or not.  Thanks.

   Carl



rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

These builtins were requesed since there is no clean and performant way to
transfer a value from a vector type and scalar type, despite the fact
that they both reside in vector registers.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di   ): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xxsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Add else if for MODE_VECTOR_INT. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
odverloaded definitions.
* config/vsx.md (VSEEQP_DI, VSESQP_TI): New mode iterators.
(VSEEQP_DI_base): New mode attribute definition.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
(xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn for
new builtins.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-1.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-1.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-1.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 21 +++--
 gcc/config/rs6000/rs6000-builtins.def | 15 ++-
 gcc/config/rs6000/rs6000-c.cc | 10 +-
 gcc/config/rs6000/rs6000-overload.def | 10 ++
 gcc/config/rs6000/vsx.md  | 26 +++--
 gcc/doc/extend.texi   | 21 -
 .../gcc.target/powerpc/bfp/extract-exp-1.c| 53 +++
 .../gcc.target/powerpc/bfp/extract-sig-1.c| 60 
 .../gcc.target/powerpc/bfp/insert-exp-1.c | 94 +++
 9 files changed, 284 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-1.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 534698e7d3e..a8f291c6a72 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ 

Re: [PATCH ver 3] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-14 Thread Carl Love via Gcc-patches
On Tue, 2023-06-13 at 11:10 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/8 23:21, Carl Love wrote:
> > Kewen, GCC maintainers:
> > 
> > Version 3, was able to get the overloaded version of
> > scalar_insert_exp
> > to work and the change to xsxexpqp_f128_ define instruction
> > to
> > work with the suggestions from Kewen.  
> > 
> > Version 2, I have addressed the various comments from Kewen.  I had
> > issues with adding an additional overloaded version of
> > scalar_insert_exp with vector arguments.  The overload
> > infrastructure
> > didn't work with a mix of scalar and vector arguments.  I did
> > rename
> > the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp
> > make
> > it similar to the existing builtin.  I also wasn't able to get the
> > suggested merge of xsxexpqp_f128_ with xsxexpqp_ to
> > work so
> > I left the two simpler definitiions.
> > 
> > The patch add three new builtins to extract the significand and
> > exponent of an IEEE float 128-bit value where the builtin argument
> > is a
> > vector.  Additionally, a builtin to insert the exponent into an
> > IEEE
> > float 128-bit vector argument is added.  These builtins were
> > requested
> > since there is no clean and optimal way to transfer between a
> > vector
> > and a scalar IEEE 128 bit value.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable or not.  Thanks.
> > 
> >Carl
> > 
> > ---
> > rs6000: Add builtins for IEEE 128-bit floating point values
> > 
> > Add support for the following builtins:
> > 
> >  __vector unsigned long long int
> >  __builtin_scalar_extract_exp_to_vec (__ieee128);
> >  __vector unsigned __int128
> >  __builtin_scalar_extract_sig_to_vec (__ieee128);
> >  __ieee128 scalar_insert_exp (__vector unsigned __int128,
> >   __vector unsigned long long);

Fixed commit log, removed __builtin_ from the names per comments from
Kewen below.
> > 
> > These builtins were requesed since there is no clean and performant
> > way to
> > transfer a value from a vector type and scalar type, despite the
> > fact
> > that they both reside in vector registers.
> > 
> > gcc/
> > * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
> > Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
> > Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
> > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
> >  __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
> > builtin definitions.
> > Rename xsxexpqp_kf to xsxexpqp_kf_di.
> > * config/rs6000/rs6000-c.cc
> > (altivec_resolve_overloaded_builtin):
> > Add else if for MODE_VECTOR_INT. Update comments.
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > * config/vsx.md (VSEEQP_DI): New mode iterator.
> > Rename define_insn xsxexpqp_ to
> > sxexpqp__.
> > (xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn
> > for
> > new builtins.
> > * doc/extend.texi (__builtin_extractf128_exp,
> > __builtin_extractf128_sig): Add documentation for new builtins.
> > (scalar_insert_exp): Add new overloaded builtin definition.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
> > * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
> > * gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
> > ---
> >  gcc/config/rs6000/rs6000-builtin.cc   |  4 +-
> >  gcc/config/rs6000/rs6000-builtins.def | 11 ++-
> >  gcc/config/rs6000/rs6000-c.cc | 10 +-
> >  gcc/config/rs6000/rs6000-overload.def |  2 +
> >  gcc/config/rs6000/vsx.md  | 28 +-
> >  gcc/doc/extend.texi   |  9 ++
> >  .../powerpc/bfp/extract-exp-ieee128.c | 50 ++
> >  .../powerpc/bfp/extract-sig-ieee128.c | 57 
> >  .../powerpc/bfp/insert-exp-ieee128.c  | 91
> > +++
> >  9 files changed, 253 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
> > exp-ieee128.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/bf

Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-08 Thread Carl Love via Gcc-patches


Kewen:
On Wed, 2023-06-07 at 17:36 +0800, Kewen.Lin wrote:
> Hi,
> 
> on 2023/6/7 03:54, Carl Love wrote:
> > On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote:
> > > Hi Carl,
> > > 
> > > on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
> > > > GCC maintainers:
> > > > 
> > > > The following patch adds three buitins for inserting and
> > > > extracting
> > > > the
> > > > exponent and significand for an IEEE 128-bit floating point
> > > > values. 
> > > > The builtins are valid for Power 9 and Power 10.  
> > > 
> > > We already have:
> > > 
> > > unsigned long long int scalar_extract_exp (__ieee128 source);
> > > unsigned __int128 scalar_extract_sig (__ieee128 source);
> > > ieee_128 scalar_insert_exp (unsigned __int128 significand,
> > > unsigned long long int exponent);
> > > ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long
> > > long
> > > int exponent);
> > > 
> > > you need to say something about the requirements or the
> > > justification
> > > for
> > > adding more, for this patch itself, some comments are inline
> > > below,
> > > thanks!
> > 
> > I implemented the patch based on a request for the builtins.  It
> > didn't
> > include any justification so I reached out to Steve Monroe who
> > requested the builtins to understand why he wanted them.  Here is
> > his
> > reply:
> > 
> >Basically there is no clean and performant way to transfer
> > between a
> >vector type and the ieee128 scalar, despite the fact that both
> >reside in vector registers. Also a union transfer does not work
> >correctly on most GCC versions (and will likely break again in
> > the
> >next release). I offer the long sad history of the IBM long
> > double
> >float runtime.
> 
> Thanks for clarifying this.  As the proposed changes, I think he
> meant
> to say "Basically there is no clean and performant way to transfer
> between
> a vector type and the scalar **types**". :) Because the proposed
> changes
> are:
>   scalar_extract_exp: unsigned long long => vector unsigned long long
>   scalar_extract_sig: unsigned __int128  => vector unsigned __int128
>   scalar_insert_exp: unsigned __int128 => vector unsigned __int128
>  unsigned long long => vector unsigned long long.
> 
> >Also there are __ieee128 operations that are provided by
> > builtins
> >for POWER9 but are not provided by libgcc (for POWER8).
> > 
> >Finally I can prove that a softfloat __ieee128 implementation
> > using
> >VMX integer operations, out-performs the current libgcc
> >implementation using DW GPRs.
> > 
> >The details are in the PVECLIB documentation
> >pveclib/vec__f128__ppc.h
> > 
> > 
> > > > The patch has been tested on both Power 9 and Power 10.
> > > > 
> > > > Please let me know if this patch is acceptable for
> > > > mainline.  Thanks.
> > > > 
> > > > Carl 
> > > > 
> > > > 
> > > > --
> > > > From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17
> > > > 00:00:00
> > > > 2001
> > > > From: Carl Love 
> > > > Date: Wed, 12 Apr 2023 17:46:37 -0400
> > > > Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating
> > > > point values
> > > > 
> > > > Add support for the following builtins:
> > > > 
> > > >  __vector unsigned long long int __builtin_extractf128_exp
> > > > (__ieee128);
> > > 
> > > Could you make the name similar to the existing one?  The
> > > existing
> > > one
> > >   
> > >   unsigned long long int scalar_extract_exp (__ieee128 source);
> > > 
> > > has nothing like f128 on its name, this variant is just to change
> > > the
> > > return type to vector type, how about scalar_extract_exp_to_vec?
> > 
> > I changed the name  __builtin_extractf128_exp  to
> > __builtin_scalar_extract_exp_to_vec.
> > 
> > > >  __vector unsigned __int128 __builtin_extractf128_sig
> > > > (__ieee128);
> > > 
> > > Ditto.
> > 
> > I changed the name  __builtin_extractf128_sig to
> > __builtin_scalar_extract_sig_to_vec.
> > 
> > > &g

[PATCH ver 3] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-08 Thread Carl Love via Gcc-patches
Kewen, GCC maintainers:

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable or not.  Thanks.

   Carl

---
rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int
 __builtin_scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128
 __builtin_scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

These builtins were requesed since there is no clean and performant way to
transfer a value from a vector type and scalar type, despite the fact
that they both reside in vector registers.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf to xsxexpqp_kf_di.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Add else if for MODE_VECTOR_INT. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
* config/vsx.md (VSEEQP_DI): New mode iterator.
Rename define_insn xsxexpqp_ to
sxexpqp__.
(xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn for
new builtins.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 +-
 gcc/config/rs6000/rs6000-builtins.def | 11 ++-
 gcc/config/rs6000/rs6000-c.cc | 10 +-
 gcc/config/rs6000/rs6000-overload.def |  2 +
 gcc/config/rs6000/vsx.md  | 28 +-
 gcc/doc/extend.texi   |  9 ++
 .../powerpc/bfp/extract-exp-ieee128.c | 50 ++
 .../powerpc/bfp/extract-sig-ieee128.c | 57 
 .../powerpc/bfp/insert-exp-ieee128.c  | 91 +++
 9 files changed, 253 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-ieee128.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 534698e7d3e..d99f0ae5dda 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -3326,8 +3326,8 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
subtarget */,
   case CODE_FOR_fmakf4_odd:
icode = CODE_FOR_fmatf4_odd;
break;
-  case CODE_FOR_xsxexpqp_kf:
-   icode = CODE_FOR_xsxexpqp_tf;
+  case CODE_FOR_xsxexpqp_kf_di:
+   icode = CODE_FOR_xsxexpqp_tf_di;
break;
   case CODE_FOR_xsxsigqp_kf:
icode = CODE_FOR_xsxsigqp_tf;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..dcd4a393906 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2901,8 +2901,14 @@
   fpmath double __builtin_truncf128_round_to_odd (_Float128);
 TRUNCF128_ODD trunckfdf2_odd {}
 
+  vull __builtin_scalar_extract_exp_to_vec 

Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-06 Thread Carl Love via Gcc-patches
On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
> > GCC maintainers:
> > 
> > The following patch adds three buitins for inserting and extracting
> > the
> > exponent and significand for an IEEE 128-bit floating point
> > values. 
> > The builtins are valid for Power 9 and Power 10.  
> 
> We already have:
> 
> unsigned long long int scalar_extract_exp (__ieee128 source);
> unsigned __int128 scalar_extract_sig (__ieee128 source);
> ieee_128 scalar_insert_exp (unsigned __int128 significand,
> unsigned long long int exponent);
> ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long
> int exponent);
> 
> you need to say something about the requirements or the justification
> for
> adding more, for this patch itself, some comments are inline below,
> thanks!

I implemented the patch based on a request for the builtins.  It didn't
include any justification so I reached out to Steve Monroe who
requested the builtins to understand why he wanted them.  Here is his
reply:

   Basically there is no clean and performant way to transfer between a
   vector type and the ieee128 scalar, despite the fact that both
   reside in vector registers. Also a union transfer does not work
   correctly on most GCC versions (and will likely break again in the
   next release). I offer the long sad history of the IBM long double
   float runtime.

   Also there are __ieee128 operations that are provided by builtins
   for POWER9 but are not provided by libgcc (for POWER8).

   Finally I can prove that a softfloat __ieee128 implementation using
   VMX integer operations, out-performs the current libgcc
   implementation using DW GPRs.

   The details are in the PVECLIB documentation
   pveclib/vec__f128__ppc.h


> 
> > The patch has been tested on both Power 9 and Power 10.
> > 
> > Please let me know if this patch is acceptable for
> > mainline.  Thanks.
> > 
> > Carl 
> > 
> > 
> > --
> > From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00
> > 2001
> > From: Carl Love 
> > Date: Wed, 12 Apr 2023 17:46:37 -0400
> > Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating
> > point values
> > 
> > Add support for the following builtins:
> > 
> >  __vector unsigned long long int __builtin_extractf128_exp
> > (__ieee128);
> 
> Could you make the name similar to the existing one?  The existing
> one
>   
>   unsigned long long int scalar_extract_exp (__ieee128 source);
> 
> has nothing like f128 on its name, this variant is just to change the
> return type to vector type, how about scalar_extract_exp_to_vec?

I changed the name  __builtin_extractf128_exp  to
__builtin_scalar_extract_exp_to_vec.

> 
> >  __vector unsigned __int128 __builtin_extractf128_sig (__ieee128);
> 
> Ditto.

I changed the name  __builtin_extractf128_sig to
__builtin_scalar_extract_sig_to_vec.

> 
> >  __ieee128 __builtin_insertf128_exp (__vector unsigned __int128,
> >  __vector unsigned long long);
> 
> This one can just overload the existing scalar_insert_exp?


I tried making this one an overloaded version of
scalar_insert_exp.  However, the overload with the vector arguments
isn't recognized when I put the overload definition at the end of the
list of overloads.  When I tried putting the vector version as the
first overloaded definition, I get an internal error
on  __builtin_vsx_scalar_insert_exp_q which is has the same arguments
types but as scalars not vectors.  Best I can tell, there is an issue
with mixing scalar and vector arguments in an overloaded builtin.  

I renamed __builtin_insertf128_exp as
__builtin_vsx_scalar_insert_exp_vqp which is just the vector version of
  the existing __builtin_vsx_scalar_insert_exp_qp builtin.
> 
> gcc/
> > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
> >  __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
> > builtin definitions.
> > * config/rs6000.md (extractf128_exp_,
> > insertf128_exp_,
> > extractf128_sig_): Add define_expand for new builtins.
> > (xsxexpqp_f128_, xsxsigqp_f128_,
> > siexpqpf_f128_):
> > Add define_insn for new builtins.
> > * doc/extend.texi (__builtin_extractf128_exp,
> > __builtin_extractf128_sig,
> > __builtin_insertf128_exp): Add documentation for new builtins.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
> > * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
> > * gcc.target/powe

[PATCH ver 2] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-06 Thread Carl Love via Gcc-patches
Kewen, GCC maintainers:

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable or not.  Thanks.

   Carl

---
rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int
 __builtin_scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128
 __builtin_scalar_extract_sig_to_vec (__ieee128);
 __ieee128 __builtin_vsx_scalar_insert_exp_vqp (__vector unsigned __int128,
 __vector unsigned long long);

These builtins were requesed since there is no clean and performant way to
transfer between a vector type and the ieee128 scalar, despite the fact
that both reside in vector registers. Also a union transfer does not work
correctly on most GCC versions.

gcc/
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
* config/rs6000.md (extractf128_exp_, insertf128_exp_,
extractf128_sig_): Add define_expand for new builtins.
(xsxexpqp_f128_, xsxsigqp_f128_, siexpqpf_f128_):
Add define_insn for new builtins.
* doc/extend.texi (__builtin_extractf128_exp, __builtin_extractf128_sig,
__builtin_insertf128_exp): Add documentation for new builtins.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
---
 gcc/config/rs6000/rs6000-builtins.def |  9 +++
 gcc/config/rs6000/rs6000-overload.def |  2 +
 gcc/config/rs6000/vsx.md  | 31 +-
 gcc/doc/extend.texi   | 10 
 .../powerpc/bfp/extract-exp-ieee128.c | 50 
 .../powerpc/bfp/extract-sig-ieee128.c | 57 ++
 .../powerpc/bfp/insert-exp-ieee128.c  | 58 +++
 7 files changed, 216 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-ieee128.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..92f22481687 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2901,6 +2901,12 @@
   fpmath double __builtin_truncf128_round_to_odd (_Float128);
 TRUNCF128_ODD trunckfdf2_odd {}
 
+  vull __builtin_scalar_extract_exp_to_vec (_Float128);
+EEXPKF xsxexpqp_f128_kf {}
+
+  vuq __builtin_scalar_extract_sig_to_vec (_Float128);
+ESIGKF xsxsigqp_f128_kf {}
+
   const signed long long __builtin_vsx_scalar_extract_expq (_Float128);
 VSEEQP xsxexpqp_kf {}
 
@@ -2915,6 +2921,9 @@
   unsigned long long);
 VSIEQPF xsiexpqpf_kf {}
 
+  const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
+VSIEDP_VULL xsiexpqpf_f128_kf {}
+
   const signed int __builtin_vsx_scalar_test_data_class_qp (_Float128, \
 const int<7>);
 VSTDCQP xststdcqp_kf {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..102ead9f80b 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4515,6 +4515,8 @@
 VSIEQP
   _Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned long long);
 VSIEQPF
+  _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
+VSIEDP_VULL
 
 [VEC_VSTDC, scalar_test_data_class, __builtin_vec_scalar_test_data_class]
   unsigned int __builtin_vec_scalar_test_data_class (float, const int);
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 7d845df5c2d..0f6df4bbcf5 100644

  1   2   3   4   >