Re: Setting insn mnemonic partly automagically

2024-06-22 Thread Stefan Schulze Frielinghaus via Gcc
On Sat, Jun 22, 2024 at 01:00:54PM +0200, Georg-Johann Lay wrote:
> Am 22.06.24 um 10:46 schrieb Stefan Schulze Frielinghaus:
> > On Fri, Jun 21, 2024 at 09:50:43PM +0200, Georg-Johann Lay wrote:
> > > 
> > > 
> > > Am 17.06.24 um 21:13 schrieb Stefan Schulze Frielinghaus via Gcc:
> > > > Hi all,
> > > > 
> > > > I'm trying to add an alternative to an existing insn foobar:
> > > > 
> > > > (define_insn "foobar"
> > > > [(set (match_operand ...)
> > > >   (match_operand ...))]
> > > > ""
> > > > "@
> > > >  foo
> > > >  bar
> > > >  #")
> > > > 
> > > > Since the asm output depends on the operands in a non-trivial way which 
> > > > isn't
> > > > easily solved via iterators, I went for a general C function and came 
> > > > up with:
> > > > 
> > > > (define_insn "foobar"
> > > > [(set (match_operand ...)
> > > >   (match_operand ...))]
> > > > ""
> > > > "@
> > > >  foo
> > > >  * return foobar_helper (operands[0], operands[1]);
> > > >  bar
> > > >  #"
> > > > [(set_attr_alternative "mnemonic" [(const_string "foo")
> > > >(const_string "specialcase")
> > > >(const_string "bar")
> > > >(const_string "unknown")])])
> > > > 
> > > > If there exist a lot of alternatives, then setting the mnemonic 
> > > > attribute like
> > > > this feels repetitive and is error prone.  Furthermore, if there exists 
> > > > no
> > > > other insn with an output template containing foo/bar, then I would 
> > > > have to
> > > > declare foo/bar via
> > > > 
> > > > (define_attr "mnemonic" "...,foo,bar,..." (const_string "unknown"))
> > > > 
> > > > which again is repetitive.  Thus, I'm wondering if there exists a more 
> > > > elegant
> > > > way to achieve this?  Ultimately, I would like to set the mnemonic
> > > > attribute only manually for the alternative which is implemented via C
> > > > code and let the mnemonic attribute for the remaining alternatives be
> > > > set automagically.  Not sure whether this is supported?
> > > > 
> > > > If all fails, I have another idea how to solve this by utilizing 
> > > > PRINT_OPERAND.
> > > > However, now I'm curious whether my current attempt is feasible or not.
> > > > 
> > > > Cheers,
> > > > Stefan
> > > 
> > > It's a bit unclear to me what you are trying to do, as you are not only
> > > adding an insn alternative, but also are adding insn attribute
> > > "mnemonic", which the original insn did not have.
> > 
> > My take so far is that every insn has a mnemonic attribute which is set
> > either explicitly or implicitly (assuming that the target requested this
> > via define_attr "mnemonic" "...").  This is done in function
> > gen_mnemonic_attr() from gensupport.cc.  Thus, something like
> > 
> > (define_insn "foobar"
> > [(set (match_operand ...)
> >   (match_operand ...))]
> > ""
> > "@
> >  foo
> >  bar
> >  #")
> > 
> > and
> > 
> > (define_insn "foobar"
> > [(set (match_operand ...)
> >   (match_operand ...))]
> > ""
> > "@
> >  foo
> >  bar
> >  #"
> > [(set_attr_alternative "mnemonic" [(const_string "foo")
> >(const_string "bar")
> >(const_string "unknown")])])
> > 
> > should be equivalent.
> > 
> > Of course, the implicit method fails if the pattern is generated via C
> > statements which is way I set it manually in the initial example.  The
> > initial example contained 3 alternatives plus 1 for the generated one.
> > Setting it manually there might be feasible, however, for my actual
> > problem I have an insn with 27 alternatives where I do 

Re: Setting insn mnemonic partly automagically

2024-06-22 Thread Stefan Schulze Frielinghaus via Gcc
On Fri, Jun 21, 2024 at 09:50:43PM +0200, Georg-Johann Lay wrote:
> 
> 
> Am 17.06.24 um 21:13 schrieb Stefan Schulze Frielinghaus via Gcc:
> > Hi all,
> > 
> > I'm trying to add an alternative to an existing insn foobar:
> > 
> > (define_insn "foobar"
> >[(set (match_operand ...)
> >  (match_operand ...))]
> >""
> >"@
> > foo
> > bar
> > #")
> > 
> > Since the asm output depends on the operands in a non-trivial way which 
> > isn't
> > easily solved via iterators, I went for a general C function and came up 
> > with:
> > 
> > (define_insn "foobar"
> >[(set (match_operand ...)
> >  (match_operand ...))]
> >""
> >"@
> > foo
> > * return foobar_helper (operands[0], operands[1]);
> > bar
> > #"
> >[(set_attr_alternative "mnemonic" [(const_string "foo")
> >   (const_string "specialcase")
> >   (const_string "bar")
> >   (const_string "unknown")])])
> > 
> > If there exist a lot of alternatives, then setting the mnemonic attribute 
> > like
> > this feels repetitive and is error prone.  Furthermore, if there exists no
> > other insn with an output template containing foo/bar, then I would have to
> > declare foo/bar via
> > 
> > (define_attr "mnemonic" "...,foo,bar,..." (const_string "unknown"))
> > 
> > which again is repetitive.  Thus, I'm wondering if there exists a more 
> > elegant
> > way to achieve this?  Ultimately, I would like to set the mnemonic
> > attribute only manually for the alternative which is implemented via C
> > code and let the mnemonic attribute for the remaining alternatives be
> > set automagically.  Not sure whether this is supported?
> > 
> > If all fails, I have another idea how to solve this by utilizing 
> > PRINT_OPERAND.
> > However, now I'm curious whether my current attempt is feasible or not.
> > 
> > Cheers,
> > Stefan
> 
> It's a bit unclear to me what you are trying to do, as you are not only
> adding an insn alternative, but also are adding insn attribute
> "mnemonic", which the original insn did not have.

My take so far is that every insn has a mnemonic attribute which is set
either explicitly or implicitly (assuming that the target requested this
via define_attr "mnemonic" "...").  This is done in function
gen_mnemonic_attr() from gensupport.cc.  Thus, something like

(define_insn "foobar"
   [(set (match_operand ...)
 (match_operand ...))]
   ""
   "@
foo
bar
#")

and

(define_insn "foobar"
   [(set (match_operand ...)
 (match_operand ...))]
   ""
   "@
foo
bar
#"
   [(set_attr_alternative "mnemonic" [(const_string "foo")
  (const_string "bar")
  (const_string "unknown")])])

should be equivalent.

Of course, the implicit method fails if the pattern is generated via C
statements which is way I set it manually in the initial example.  The
initial example contained 3 alternatives plus 1 for the generated one.
Setting it manually there might be feasible, however, for my actual
problem I have an insn with 27 alternatives where I do not want to set
and maintain it manually.  A side effect of setting the attribute
implicitly is that each mnemonic is added automatically to the mnemonic
hash table which I would have to do manually for my 27 alternatives
which I would like to avoid, too.

> 
> Also, it's unclear how PRINT_OPERAND would help with setting the attribute.

For my particular problem I think one can also utilize PRINT_OPERAND
which I should have elaborated a bit more but feared to make the example
unnecessarily complicated.  The C code

  foobar_helper (operands[0], operands[1])

emits actually an extended mnemonic "specialcase$VAR\t%0,%1" where $VAR
can be either A, B, or C.  The extended mnemonic is just syntactic sugar
for the base mnemonic "specialcase\t%0,%1,$IMM" which is why we can lie
and hard code the mnemonic attribute to specialcase since this won't
effect scheduling.  Since the choice which extended mnemonic should be
used depends only on operands[1] I thought about rewriting all this into

(define_insn "foobar"
   [(set (match_operand ...)
 (match_operand ...))]
   ""
   "@
foo
specialcase\t%0,%1,%X1
bar
#")

Obviously we have to sacrifice the usage of an extended mnemonic but
more problematic is that we have to allocate one of those very few codes
X just for this insn.  So this doesn't scale either if one has to come
up with many different codes.  Furthermore, this only works in my very
particular case since I can split the extended mnemonic into a base
mnemonic and an immediate which only depends on one operand, i.e., it
would fail if it depended on operands[0] and operands[1].

I hope this makes it a bit more clear, if not just let me know.

Cheers,
Stefan

> 
> Johann


Re: [PATCH] s390: define single step vector casts

2024-06-20 Thread Stefan Schulze Frielinghaus
On Thu, Jun 20, 2024 at 09:06:11AM +0200, Juergen Christ wrote:
> Some casts were missing leading to missed of bad vectorizations where
> casting was done scalar followed by a vector creation from the
> individual elements.
> 
> gcc/ChangeLog:
> 
>   * config/s390/vector.md (VEC_HALF_NARROWED): New mode iterator.
>   (vec_half_narrowed): ditto.
>   (trunc2): New pattern.
>   (vec_pack_ufix_trunc_v2df): ditto.
>   (vec_pack_sfix_trunc_v2df): ditto.
>   (vec_unpack_sfix_trunc_lo_v4sf): ditto.
>   (vec_unpack_sfix_trunc_hi_v4sf): ditto.
>   (vec_unpack_ufix_trunc_lo_v4sf): ditto.
>   (vec_unpack_ufix_trunc_hi_v4sf): ditto.
>   (floatv2siv2sf2): ditto.
>   (floatunsv2siv2sf2): ditto.
>   (vec_unpacks_float_hi_v4si): ditto.
>   (vec_unpacks_float_lo_v4si): ditto.
>   (vec_unpacku_float_hi_v4si): ditto.
>   (vec_unpacku_float_lo_v4si): ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/vec-cast-single.c: New test.
>   * gcc.target/s390/vector/vec_pack_ufix_trunc_v2df.c: New test.
> 
> Bootstrapped and regtested on s390x.  Ok for trunk?
> 
> Signed-off-by: Juergen Christ 
> ---
>  gcc/config/s390/vector.md | 170 ++-
>  .../gcc.target/s390/vector/vec-cast-single.c  | 271 ++
>  .../s390/vector/vec_pack_ufix_trunc_v2df.c|  30 ++
>  3 files changed, 463 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-cast-single.c
>  create mode 100644 
> gcc/testsuite/gcc.target/s390/vector/vec_pack_ufix_trunc_v2df.c
> 
> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> index 40de0c75a7cf..356f25d26deb 100644
> --- a/gcc/config/s390/vector.md
> +++ b/gcc/config/s390/vector.md
> @@ -89,6 +89,8 @@
>  
>  (define_mode_iterator VI_EXTEND [V2QI V2HI V2SI V4QI V4HI])
>  
> +(define_mode_iterator VI_TRUNC [V2HI V2SI V2DI V4HI V4SI])
> +
>  ; Empty string for all but TImode.  This is used to hide the TImode
>  ; expander name in case it is defined already.  See addti3 for an
>  ; example.
> @@ -211,6 +213,14 @@
>  (V1SF "v1df") (V2SF "v2df") (V4SF "v4df")
>  (V1DF "v1tf") (V2DF "v2tf")])
>  
> +; Vector with narrowed element size and the same number of elements.
> +(define_mode_attr VEC_HALF_NARROWED [(V1HI "V1QI") (V2HI "V2QI") (V4HI 
> "V4QI") (V8HI "V8QI")
> +   (V1SI "V1HI") (V2SI "V2HI") (V4SI "V4HI")
> +(V1DI "V1DI") (V2DI "V2SI")])
> +(define_mode_attr vec_half_narrowed [(V1HI "v1qi") (V2HI "v2qi") (V4HI 
> "v4qi") (V8HI "v8qi")
> +   (V1SI "v1hi") (V2SI "v2hi") (V4SI "v4hi")
> +(V1DI "v1di") (V2DI "v2si")])
> +
>  ; Vector with half the element size AND half the number of elements.
>  (define_mode_attr vec_halfhalf
>[(V2HI "V2QI") (V4HI "V4QI") (V8HI "V8QI")
> @@ -2422,6 +2432,17 @@
>operands[2] = gen_reg_rtx (V4SFmode);
>  })
>  
> +;; vector truncate
> +
> +; downcasts
> +
> +(define_insn "trunc2"
> +  [(set (match_operand: 0 "register_operand" "=v")
> +(truncate: (match_operand:VI_TRUNC 1 
> "register_operand" "v")))]
> +  "TARGET_VX"
> +  "vpk\t %0,%1,%1"
  ^
whitespace

> +  [(set_attr "op_type" "VRR")])
> +
>  ;; vector unpack v16qi
>  
>  ; signed
> @@ -3177,17 +3198,150 @@
>emit_move_insn (len, gen_rtx_ZERO_EXTEND (SImode, operands[2]));
>emit_insn (gen_vstlv16qi (operands[1], len, mem));
>DONE;
> -});;
> +})
> +
> +(define_expand "vec_pack_ufix_trunc_v2df"
> +  [(match_operand:V4SI 0 "register_operand")
> +   (match_operand:V2DF 1 "register_operand")
> +   (match_operand:V2DF 2 "register_operand")]
> +  "TARGET_VX"
> +{
> +  rtx r1 = gen_reg_rtx (V2DImode);
> +  rtx r2 = gen_reg_rtx (V2DImode);
> +
> +  emit_insn (gen_fixuns_truncv2dfv2di2 (r1, operands[1]));
> +  emit_insn (gen_fixuns_truncv2dfv2di2 (r2, operands[2]));
> +  emit_insn (gen_vec_pack_trunc_v2di (operands[0], r1, r2));
> +  DONE;
> +})

I haven't really wrapped my head around this, however, this two step
conversion could miss an IEEE-inexact-exception if a double fits into a
64-bit integer but not in a 32-bit integer.  What does the IL/vectorizer
say about exceptions?  Ok to miss some or do we have to guard this by
no-trapping-math et al.?

> +
> +(define_expand "vec_pack_sfix_trunc_v2df"
> +  [(match_operand:V4SI 0 "register_operand")
> +   (match_operand:V2DF 1 "register_operand")
> +   (match_operand:V2DF 2 "register_operand")]
> +  "TARGET_VX"
> +{
> +  rtx r1 = gen_reg_rtx (V2DImode);
> +  rtx r2 = gen_reg_rtx (V2DImode);
> +
> +  emit_insn (gen_fix_truncv2dfv2di2 (r1, operands[1]));
> +  emit_insn (gen_fix_truncv2dfv2di2 (r2, operands[2]));
> +  emit_insn (gen_vec_pack_trunc_v2di (operands[0], r1, r2));
> +  DONE;
> +})

same as above

> +
> +; v4sf -> v2di
> +(define_expand "vec_unpack_sfix_trunc_lo_v4sf"
> +  [(match_operand:V2DI 0 "register_operand")
> +   

Setting insn mnemonic partly automagically

2024-06-17 Thread Stefan Schulze Frielinghaus via Gcc
Hi all,

I'm trying to add an alternative to an existing insn foobar:

(define_insn "foobar"
  [(set (match_operand ...)
(match_operand ...))]
  ""
  "@
   foo
   bar
   #")

Since the asm output depends on the operands in a non-trivial way which isn't
easily solved via iterators, I went for a general C function and came up with:

(define_insn "foobar"
  [(set (match_operand ...)
(match_operand ...))]
  ""
  "@
   foo
   * return foobar_helper (operands[0], operands[1]);
   bar
   #"
  [(set_attr_alternative "mnemonic" [(const_string "foo")
 (const_string "specialcase")
 (const_string "bar")
 (const_string "unknown")])])

If there exist a lot of alternatives, then setting the mnemonic attribute like
this feels repetitive and is error prone.  Furthermore, if there exists no
other insn with an output template containing foo/bar, then I would have to
declare foo/bar via

(define_attr "mnemonic" "...,foo,bar,..." (const_string "unknown"))

which again is repetitive.  Thus, I'm wondering if there exists a more elegant
way to achieve this?  Ultimately, I would like to set the mnemonic
attribute only manually for the alternative which is implemented via C
code and let the mnemonic attribute for the remaining alternatives be
set automagically.  Not sure whether this is supported?

If all fails, I have another idea how to solve this by utilizing PRINT_OPERAND.
However, now I'm curious whether my current attempt is feasible or not.

Cheers,
Stefan


Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Stefan Schulze Frielinghaus
On Mon, Jun 17, 2024 at 08:16:34AM +0200, Richard Biener wrote:
> On Mon, 17 Jun 2024, Kewen.Lin wrote:
> 
> > Hi Richi,
> > 
> > on 2024/6/14 18:31, Richard Biener wrote:
> > > The following retires vcond{,u,eq} optabs by stopping to use them
> > > from the middle-end.  Targets instead (should) implement vcond_mask
> > > and vec_cmp{,u,eq} optabs.  The PR this change refers to lists
> > > possibly affected targets - those implementing these patterns,
> > > and in particular it lists mips, sparc and ia64 as targets that
> > > most definitely will regress while others might simply remove
> > > their vcond{,u,eq} patterns.
> > > 
> > > I'd appreciate testing, I do not expect fallout for x86 or arm/aarch64.
> > > I know riscv doesn't implement any of the legacy optabs.  But less
> > > maintained vector targets might need adjustments.
> > 
> > Thanks for making this change, this patch can be bootstrapped on ppc64{,le}
> > but both have one failure on gcc/testsuite/gcc.target/powerpc/pr66144-3.c,
> > by looking into it, I found it just exposed one oversight in the current
> > rs6000 vcond_mask support (the condition mask location is wrong), so I think
> > this change is fine for rs6000 port, I'll also test SPEC2017 for this (with
> > rs6000 vcond_mask change) soon.
> 
> Btw, for those targets where the patch works out fine it would be nice
> to delete their vcond{,u,eq} expanders (and double-check that doesn't
> cause issues on its own).
> 
> Can target maintainers note whether their targets support all condition
> codes for their vector comparisons (including FP variants)?  And 
> whether they choose to implement all condition codes in vec_cmp
> and adjust with inversion / operand swapping for not supported cases?

On s390 we support all comparison operations with inverse / operand
swapping via s390_expand_vec_compare.  However, we still have some
failures for which I opened PR115519.  Currently it is unclear to me
what precisely is missing and will have a further look.  vcond_mask
expander is also implemented for all modes.

Cheers,
Stefan

> 
> Thanks,
> Richard.
> 
> > BR,
> > Kewen
> > 
> > > 
> > > I want to get rid of those optabs for GCC 15.  If I don't hear from
> > > you I will assume your target is fine.
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > >   PR middle-end/114189
> > >   * optabs-query.h (get_vcond_icode): Always return CODE_FOR_nothing.
> > >   (get_vcond_eq_icode): Likewise.
> > > ---
> > >  gcc/optabs-query.h | 13 -
> > >  1 file changed, 4 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> > > index 0cb2c21ba85..31fbce80175 100644
> > > --- a/gcc/optabs-query.h
> > > +++ b/gcc/optabs-query.h
> > > @@ -112,14 +112,9 @@ get_vec_cmp_eq_icode (machine_mode vmode, 
> > > machine_mode mask_mode)
> > > mode CMODE, unsigned if UNS is true, resulting in a value of mode 
> > > VMODE.  */
> > >  
> > >  inline enum insn_code
> > > -get_vcond_icode (machine_mode vmode, machine_mode cmode, bool uns)
> > > +get_vcond_icode (machine_mode, machine_mode, bool)
> > >  {
> > > -  enum insn_code icode = CODE_FOR_nothing;
> > > -  if (uns)
> > > -icode = convert_optab_handler (vcondu_optab, vmode, cmode);
> > > -  else
> > > -icode = convert_optab_handler (vcond_optab, vmode, cmode);
> > > -  return icode;
> > > +  return CODE_FOR_nothing;
> > >  }
> > >  
> > >  /* Return insn code for a conditional operator with a mask mode
> > > @@ -135,9 +130,9 @@ get_vcond_mask_icode (machine_mode vmode, 
> > > machine_mode mmode)
> > > mode CMODE (only EQ/NE), resulting in a value of mode VMODE.  */
> > >  
> > >  inline enum insn_code
> > > -get_vcond_eq_icode (machine_mode vmode, machine_mode cmode)
> > > +get_vcond_eq_icode (machine_mode, machine_mode)
> > >  {
> > > -  return convert_optab_handler (vcondeq_optab, vmode, cmode);
> > > +  return CODE_FOR_nothing;
> > >  }
> > >  
> > >  /* Enumerates the possible extraction_insn operations.  */
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[gcc r14-10317] s390: testsuite: Fix ifcvt-one-insn-bool.c

2024-06-17 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:0ed63e3791345a9933cbbf28594ab5549d336bd4

commit r14-10317-g0ed63e3791345a9933cbbf28594ab5549d336bd4
Author: Stefan Schulze Frielinghaus 
Date:   Mon Jun 17 08:52:28 2024 +0200

s390: testsuite: Fix ifcvt-one-insn-bool.c

With the change of r15-787-g57e04879389f9c I forgot to also update this
test.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ifcvt-one-insn-bool.c: Fix loc.

(cherry picked from commit ac66736bf2f8a10d2f43e83ed6377e4179027a39)

Diff:
---
 gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
index 0c8c2f879a69..4ae29dbd6b61 100644
--- a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
@@ -3,7 +3,7 @@
 /* { dg-do compile { target { s390*-*-* } } } */
 /* { dg-options "-O2 -march=z13 -mzarch" } */
 
-/* { dg-final { scan-assembler "lochinh\t%r.?,1" } } */
+/* { dg-final { scan-assembler "lochile\t%r.?,1" } } */
 #include 
 
 int foo (int *a, unsigned int n)


[gcc r14-10316] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-06-17 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:8f124e6b79daa43618dbb1e67c09629676d07396

commit r14-10316-g8f124e6b79daa43618dbb1e67c09629676d07396
Author: Stefan Schulze Frielinghaus 
Date:   Mon Jun 17 08:52:20 2024 +0200

s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

Consider a NOCE conversion as profitable if there is at least one
conditional move.

gcc/ChangeLog:

PR target/109549
* config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
Define.
(s390_noce_conversion_profitable_p): Implement.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
consequence the condition has to be reversed.

(cherry picked from commit 57e04879389f9c0d5d53f316b468ce1bddbab350)

Diff:
---
 gcc/config/s390/s390.cc  | 32 
 gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 5968808fcb6e..fa517bd3e77a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "context.h"
 #include "builtins.h"
+#include "ifcvt.h"
 #include "rtl-iter.h"
 #include "intl.h"
 #include "tm-constrs.h"
@@ -18037,6 +18038,34 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   return vectorize_vec_perm_const_1 (d);
 }
 
+/* Consider a NOCE conversion as profitable if there is at least one
+   conditional move.  */
+
+static bool
+s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  if (if_info->speed_p)
+{
+  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
+   {
+ rtx set = single_set (insn);
+ if (set == NULL)
+   continue;
+ if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
+   continue;
+ rtx src = SET_SRC (set);
+ machine_mode mode = GET_MODE (src);
+ if (GET_MODE_CLASS (mode) != MODE_INT
+ && GET_MODE_CLASS (mode) != MODE_FLOAT)
+   continue;
+ if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   continue;
+ return true;
+   }
+}
+  return default_noce_conversion_profitable_p (seq, if_info);
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ -18350,6 +18379,9 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
 #undef TARGET_VECTORIZE_VEC_PERM_CONST
 #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
 
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
+#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-s390.h"
diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
b/gcc/testsuite/gcc.target/s390/ccor.c
index 31f30f60314e..36a3c3a999a9 100644
--- a/gcc/testsuite/gcc.target/s390/ccor.c
+++ b/gcc/testsuite/gcc.target/s390/ccor.c
@@ -42,7 +42,7 @@ GENFUN1(2)
 
 GENFUN1(3)
 
-/* { dg-final { scan-assembler {locrno} } } */
+/* { dg-final { scan-assembler {locro} } } */
 
 GENFUN2(0,1)
 
@@ -58,7 +58,7 @@ GENFUN2(0,3)
 
 GENFUN2(1,2)
 
-/* { dg-final { scan-assembler {locrnlh} } } */
+/* { dg-final { scan-assembler {locrlh} } } */
 
 GENFUN2(1,3)


[gcc r15-1367] s390: Delete mistakenly added tests

2024-06-17 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:e86d4e4ac7d7438f2f1b2437508cfd394a0a34d9

commit r15-1367-ge86d4e4ac7d7438f2f1b2437508cfd394a0a34d9
Author: Stefan Schulze Frielinghaus 
Date:   Mon Jun 17 08:46:38 2024 +0200

s390: Delete mistakenly added tests

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vgm-df-1.c: Removed.
* gcc.target/s390/vector/vgm-di-1.c: Removed.
* gcc.target/s390/vector/vgm-hi-1.c: Removed.
* gcc.target/s390/vector/vgm-int128-1.c: Removed.
* gcc.target/s390/vector/vgm-longdouble-1.c: Removed.
* gcc.target/s390/vector/vgm-qi-1.c: Removed.
* gcc.target/s390/vector/vgm-sf-1.c: Removed.
* gcc.target/s390/vector/vgm-si-1.c: Removed.
* gcc.target/s390/vector/vgm-ti-1.c: Removed.

Diff:
---
 gcc/testsuite/gcc.target/s390/vector/vgm-df-1.c|  30 ---
 gcc/testsuite/gcc.target/s390/vector/vgm-di-1.c| 102 --
 gcc/testsuite/gcc.target/s390/vector/vgm-hi-1.c| 212 
 .../gcc.target/s390/vector/vgm-int128-1.c  |  64 ---
 .../gcc.target/s390/vector/vgm-longdouble-1.c  |  55 --
 gcc/testsuite/gcc.target/s390/vector/vgm-qi-1.c| 213 -
 gcc/testsuite/gcc.target/s390/vector/vgm-sf-1.c|  43 -
 gcc/testsuite/gcc.target/s390/vector/vgm-si-1.c| 146 --
 gcc/testsuite/gcc.target/s390/vector/vgm-ti-1.c|  63 --
 9 files changed, 928 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/vector/vgm-df-1.c 
b/gcc/testsuite/gcc.target/s390/vector/vgm-df-1.c
deleted file mode 100644
index 07aa6b9deece..
--- a/gcc/testsuite/gcc.target/s390/vector/vgm-df-1.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -march=z13 -mzarch" } */
-/* { dg-final { check-function-bodies "**" "" "" } } */
-
-typedef double v1df __attribute__ ((vector_size (8)));
-typedef double v2df __attribute__ ((vector_size (16)));
-
-/*
-** test_v1df_via_vgmb:
-** vgmb%v24,0,1
-** br  %r14
-*/
-
-v1df
-test_v1df_via_vgmb (void)
-{
-  return (v1df){-8577.505882352939806878566741943359375};
-}
-
-/*
-** test_v2df_via_vgmb:
-** vgmb%v24,0,1
-** br  %r14
-*/
-
-v2df
-test_v2df_via_vgmb (void)
-{
-  return (v2df){-8577.505882352939806878566741943359375, 
-8577.505882352939806878566741943359375};
-}
diff --git a/gcc/testsuite/gcc.target/s390/vector/vgm-di-1.c 
b/gcc/testsuite/gcc.target/s390/vector/vgm-di-1.c
deleted file mode 100644
index fa608f2b5ae8..
--- a/gcc/testsuite/gcc.target/s390/vector/vgm-di-1.c
+++ /dev/null
@@ -1,102 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -march=z13 -mzarch" } */
-/* { dg-final { check-function-bodies "**" "" "" } } */
-
-typedef long long v1di __attribute__ ((vector_size (8)));
-typedef long long v2di __attribute__ ((vector_size (16)));
-
-/*
-** test_v1di_via_vgmb:
-** vgmb%v24,0,2
-** br  %r14
-*/
-
-v1di
-test_v1di_via_vgmb (void)
-{
-  return (v1di){0xe0e0e0e0e0e0e0e0};
-}
-
-/*
-** test_v2di_via_vgmb:
-** vgmb%v24,0,2
-** br  %r14
-*/
-
-v2di
-test_v2di_via_vgmb (void)
-{
-  return (v2di){0xe0e0e0e0e0e0e0e0, 0xe0e0e0e0e0e0e0e0};
-}
-
-/*
-** test_v1di_via_vgmb_wrap:
-** vgmb%v24,5,2
-** br  %r14
-*/
-
-v1di
-test_v1di_via_vgmb_wrap (void)
-{
-  return (v1di){0xe7e7e7e7e7e7e7e7};
-}
-
-/*
-** test_v2di_via_vgmb_wrap:
-** vgmb%v24,5,2
-** br  %r14
-*/
-
-v2di
-test_v2di_via_vgmb_wrap (void)
-{
-  return (v2di){0xe7e7e7e7e7e7e7e7, 0xe7e7e7e7e7e7e7e7};
-}
-
-/*
-** test_v1di_via_vgmh:
-** vgmh%v24,5,10
-** br  %r14
-*/
-
-v1di
-test_v1di_via_vgmh (void)
-{
-  return (v1di){0x7e007e007e007e0};
-}
-
-/*
-** test_v2di_via_vgmh:
-** vgmh%v24,5,10
-** br  %r14
-*/
-
-v2di
-test_v2di_via_vgmh (void)
-{
-  return (v2di){0x7e007e007e007e0, 0x7e007e007e007e0};
-}
-
-/*
-** test_v1di_via_vgmg:
-** vgmg%v24,17,46
-** br  %r14
-*/
-
-v1di
-test_v1di_via_vgmg (void)
-{
-  return (v1di){0x7ffe};
-}
-
-/*
-** test_v2di_via_vgmg:
-** vgmg%v24,17,46
-** br  %r14
-*/
-
-v2di
-test_v2di_via_vgmg (void)
-{
-  return (v2di){0x7ffe, 0x7ffe};
-}
diff --git a/gcc/testsuite/gcc.target/s390/vector/vgm-hi-1.c 
b/gcc/testsuite/gcc.target/s390/vector/vgm-hi-1.c
deleted file mode 100644
index da064792cfc9..
--- a/gcc/testsuite/gcc.target/s390/vector/vgm-hi-1.c
+++ /dev/null
@@ -1,212 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -march=z13 -mzarch" } */
-/* { dg-final { check-function-bodies "**" "" "" } } */
-
-typedef short  v1hi __attribute__ ((vector_size (2)));
-typedef short  v2hi __attribute__ ((vector_size (4)));
-typedef short  v4hi __attribute__ ((vector_size (8)));
-typedef short  v8hi __attribute__ ((vector_size (16)));
-
-/*
-** tes

[gcc r15-1366] s390: Extend two element float vector

2024-06-17 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:9965acb77cbd686283a9d0a867c80b1e710f46b9

commit r15-1366-g9965acb77cbd686283a9d0a867c80b1e710f46b9
Author: Stefan Schulze Frielinghaus 
Date:   Mon Jun 17 08:37:11 2024 +0200

s390: Extend two element float vector

This implements a V2SF -> V2DF extend.

gcc/ChangeLog:

* config/s390/vector.md (*vmrhf_half): New.
(extendv2sfv2df2): New.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-extend-3.c: New test.

Diff:
---
 gcc/config/s390/vector.md  |  28 +++
 .../gcc.target/s390/vector/vec-extend-3.c  |  18 ++
 gcc/testsuite/gcc.target/s390/vector/vgm-df-1.c|  30 +++
 gcc/testsuite/gcc.target/s390/vector/vgm-di-1.c| 102 ++
 gcc/testsuite/gcc.target/s390/vector/vgm-hi-1.c| 212 
 .../gcc.target/s390/vector/vgm-int128-1.c  |  64 +++
 .../gcc.target/s390/vector/vgm-longdouble-1.c  |  55 ++
 gcc/testsuite/gcc.target/s390/vector/vgm-qi-1.c| 213 +
 gcc/testsuite/gcc.target/s390/vector/vgm-sf-1.c|  43 +
 gcc/testsuite/gcc.target/s390/vector/vgm-si-1.c| 146 ++
 gcc/testsuite/gcc.target/s390/vector/vgm-ti-1.c|  63 ++
 11 files changed, 974 insertions(+)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index a931a4b1b17e..40de0c75a7cf 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -895,6 +895,17 @@
   "vmrhf\t%0,%1,%2";
   [(set_attr "op_type" "VRR")])
 
+(define_insn "*vmrhf_half"
+  [(set (match_operand:V_HW_40 
"register_operand" "=v")
+   (vec_select:V_HW_4
+(vec_concat:V_HW_4 (match_operand: 1 
"register_operand"  "v")
+   (match_operand: 2 
"register_operand"  "v"))
+(parallel [(const_int 0) (const_int 2)
+   (const_int 1) (const_int 3)])))]
+  "TARGET_VX"
+  "vmrhf\t%0,%1,%2";
+  [(set_attr "op_type" "VRR")])
+
 (define_insn "*vmrlf"
   [(set (match_operand:V_HW_4  0 
"register_operand" "=v")
 (vec_select:V_HW_4
@@ -2394,6 +2405,23 @@
   "vuph\t%0,%1"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "extendv2sfv2df2"
+  [(set (match_dup 2)
+   (vec_select:V4SF
+(vec_concat:V4SF (match_operand:V2SF 1 "register_operand")
+ (match_dup 1))
+(parallel [(const_int 0) (const_int 2)
+   (const_int 1) (const_int 3)])))
+   (set (match_operand:V2DF 0 "register_operand")
+   (float_extend:V2DF
+(vec_select:V2SF
+ (match_dup 2)
+ (parallel [(const_int 0) (const_int 2)]]
+  "TARGET_VX"
+{
+  operands[2] = gen_reg_rtx (V4SFmode);
+})
+
 ;; vector unpack v16qi
 
 ; signed
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-extend-3.c 
b/gcc/testsuite/gcc.target/s390/vector/vec-extend-3.c
new file mode 100644
index ..2b02e7bf9f80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-extend-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=z13 -mzarch" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef float v2sf __attribute__ ((vector_size (8)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+/*
+** extendv2sfv2df2:
+** vmrhf   %v24,%v24,%v24
+** vldeb   %v24,%v24
+** br  %r14
+*/
+
+v2df extendv2sfv2df2 (v2sf x)
+{
+  return __builtin_convertvector (x, v2df);
+}
diff --git a/gcc/testsuite/gcc.target/s390/vector/vgm-df-1.c 
b/gcc/testsuite/gcc.target/s390/vector/vgm-df-1.c
new file mode 100644
index ..07aa6b9deece
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vgm-df-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=z13 -mzarch" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef double v1df __attribute__ ((vector_size (8)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+/*
+** test_v1df_via_vgmb:
+** vgmb%v24,0,1
+** br  %r14
+*/
+
+v1df
+test_v1df_via_vgmb (void)
+{
+  return (v1df){-8577.505882352939806878566741943359375};
+}
+
+/*
+** test_v2df_via_vgmb:
+** vgmb%v24,0,1
+** br  %r14
+*/
+
+v2df
+test_v2df_via_vgmb (void)
+{
+  return (v2df){-8577.505882352939806878566741943359375, 
-8577.505882352939806878566741943359375};
+}
diff --git a/gcc/testsuite/gcc.target/s390/vector/vgm-di-1.c 
b/gcc/testsuite/gcc.target/s390/vector/vgm-di-1.c
new file mode 100644
index ..fa608f2b5ae8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vgm-di-1.c
@@ -0,0 +1,102 @@
+/* { dg-do co

[gcc r15-1365] s390: Extend two/four element integer vectors

2024-06-17 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:2ab143df110a40bd41b5368ef84819953bf971b1

commit r15-1365-g2ab143df110a40bd41b5368ef84819953bf971b1
Author: Stefan Schulze Frielinghaus 
Date:   Mon Jun 17 08:36:11 2024 +0200

s390: Extend two/four element integer vectors

For the moment I deliberately left out one-element QHS vectors since it
is unclear whether these are pathological cases or whether they are
really used.  If we ever get an extend for V1DI -> V1TI we should
reconsider this.

As a side-effect this fixes PR115261.

gcc/ChangeLog:

PR target/115261
* config/s390/s390.md (any_extend,extend_insn,zero_extend):
New code attributes and code iterator.
* config/s390/vector.md (V_EXTEND): New mode iterator.
(2): New insn.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-extend-1.c: New test.
* gcc.target/s390/vector/vec-extend-2.c: New test.

Diff:
---
 gcc/config/s390/s390.md|  4 ++
 gcc/config/s390/vector.md  | 29 ++--
 .../gcc.target/s390/vector/vec-extend-1.c  | 79 ++
 .../gcc.target/s390/vector/vec-extend-2.c  | 55 +++
 4 files changed, 162 insertions(+), 5 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index c607dce3cf0f..1311a5f01cf3 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -602,6 +602,10 @@
 
 (define_attr "relative_long" "no,yes" (const_string "no"))
 
+(define_code_attr extend_insn [(sign_extend "extend") (zero_extend 
"zero_extend")])
+(define_code_attr zero_extend [(sign_extend "") (zero_extend "l")])
+(define_code_iterator any_extend [sign_extend zero_extend])
+
 ;; Pipeline description for z900.
 (include "2064.md")
 
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index ed4742d93c91..a931a4b1b17e 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -87,6 +87,8 @@
 ; 32 bit int<->fp vector conversion instructions are available since VXE2 
(z15).
 (define_mode_iterator VX_VEC_CONV_BFP [V2DF (V4SF "TARGET_VXE2")])
 
+(define_mode_iterator VI_EXTEND [V2QI V2HI V2SI V4QI V4HI])
+
 ; Empty string for all but TImode.  This is used to hide the TImode
 ; expander name in case it is defined already.  See addti3 for an
 ; example.
@@ -195,13 +197,20 @@
(V1DF "V2DF") (V2DF "V4DF")])
 
 ; Vector with widened element size and the same number of elements.
-(define_mode_attr vec_2x_wide [(V1QI "V1HI") (V2QI "V2HI") (V4QI "V4HI") (V8QI 
"V8HI") (V16QI "V16HI")
+(define_mode_attr VEC_2X_WIDE [(V1QI "V1HI") (V2QI "V2HI") (V4QI "V4HI") (V8QI 
"V8HI") (V16QI "V16HI")
   (V1HI "V1SI") (V2HI "V2SI") (V4HI "V4SI") (V8HI 
"V8SI")
   (V1SI "V1DI") (V2SI "V2DI") (V4SI "V4DI")
   (V1DI "V1TI") (V2DI "V2TI")
   (V1SF "V1DF") (V2SF "V2DF") (V4SF "V4DF")
   (V1DF "V1TF") (V2DF "V2TF")])
 
+(define_mode_attr vec_2x_wide [(V1QI "v1hi") (V2QI "v2hi") (V4QI "v4hi") (V8QI 
"v8hi") (V16QI "v16hi")
+  (V1HI "v1si") (V2HI "v2si") (V4HI "v4si") (V8HI 
"v8si")
+  (V1SI "v1di") (V2SI "v2di") (V4SI "v4di")
+  (V1DI "v1ti") (V2DI "v2ti")
+  (V1SF "v1df") (V2SF "v2df") (V4SF "v4df")
+  (V1DF "v1tf") (V2DF "v2tf")])
+
 ; Vector with half the element size AND half the number of elements.
 (define_mode_attr vec_halfhalf
   [(V2HI "V2QI") (V4HI "V4QI") (V8HI "V8QI")
@@ -1604,7 +1613,7 @@
 UNSPEC_VEC_UMULT_ODD))
(set (match_operand: 0 "register_operand" "")
 (vec_select:
-(vec_concat: (match_dup 3) (match_dup 4))
+(vec_concat: (match_dup 3) (match_dup 4))
 (match_dup 5)))]
   "TARGET_VX"
  {
@@ -1623,7 +1632,7 @@
 UNSPEC_VEC_UMULT_ODD))
(set (match_operand: 0 "register_operand" "")
 (vec_select:
-(vec_concat: (match_dup 3) (match_dup 4))
+(vec_concat: (match_dup 3) (match_dup 4))
 (match_dup 5)))]
   "TARGET_VX"
  {
@@ -1642,7 

[gcc r15-1364] s390: testsuite: Fix nobp-table-jump-*.c

2024-06-17 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:0bf3f14e0d79f3258d4e5570216b5d81af6d60ef

commit r15-1364-g0bf3f14e0d79f3258d4e5570216b5d81af6d60ef
Author: Stefan Schulze Frielinghaus 
Date:   Mon Jun 17 08:35:27 2024 +0200

s390: testsuite: Fix nobp-table-jump-*.c

Starting with r14-5628-g53ba8d669550d3 interprocedural VRP became strong
enough in order to render these tests useless.  Fixed by disabling IPA.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-table-jump-inline-z10.c: Do not perform
IPA.
* gcc.target/s390/nobp-table-jump-inline-z900.c: Dito.
* gcc.target/s390/nobp-table-jump-z10.c: Dito.
* gcc.target/s390/nobp-table-jump-z900.c: Dito.

Diff:
---
 .../gcc.target/s390/nobp-table-jump-inline-z10.c   | 42 +++---
 .../gcc.target/s390/nobp-table-jump-inline-z900.c  | 42 +++---
 .../gcc.target/s390/nobp-table-jump-z10.c  | 42 +++---
 .../gcc.target/s390/nobp-table-jump-z900.c | 42 +++---
 4 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c 
b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
index 8dfd7e4c7861..121751166d0a 100644
--- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
+++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
@@ -4,29 +4,29 @@
 /* case-values-threshold will be set to 20 by the back-end when jump
thunk are requested.  */
 
-int __attribute__((noinline,noclone)) foo1 (void) { return 1; }
-int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
-int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
-int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
-int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
-int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
-int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
-int __attribute__((noinline,noclone)) foo8 (void) { return 8; }
-int __attribute__((noinline,noclone)) foo9 (void) { return 9; }
-int __attribute__((noinline,noclone)) foo10 (void) { return 10; }
-int __attribute__((noinline,noclone)) foo11 (void) { return 11; }
-int __attribute__((noinline,noclone)) foo12 (void) { return 12; }
-int __attribute__((noinline,noclone)) foo13 (void) { return 13; }
-int __attribute__((noinline,noclone)) foo14 (void) { return 14; }
-int __attribute__((noinline,noclone)) foo15 (void) { return 15; }
-int __attribute__((noinline,noclone)) foo16 (void) { return 16; }
-int __attribute__((noinline,noclone)) foo17 (void) { return 17; }
-int __attribute__((noinline,noclone)) foo18 (void) { return 18; }
-int __attribute__((noinline,noclone)) foo19 (void) { return 19; }
-int __attribute__((noinline,noclone)) foo20 (void) { return 20; }
+int __attribute__((noipa)) foo1 (void) { return 1; }
+int __attribute__((noipa)) foo2 (void) { return 2; }
+int __attribute__((noipa)) foo3 (void) { return 3; }
+int __attribute__((noipa)) foo4 (void) { return 4; }
+int __attribute__((noipa)) foo5 (void) { return 5; }
+int __attribute__((noipa)) foo6 (void) { return 6; }
+int __attribute__((noipa)) foo7 (void) { return 7; }
+int __attribute__((noipa)) foo8 (void) { return 8; }
+int __attribute__((noipa)) foo9 (void) { return 9; }
+int __attribute__((noipa)) foo10 (void) { return 10; }
+int __attribute__((noipa)) foo11 (void) { return 11; }
+int __attribute__((noipa)) foo12 (void) { return 12; }
+int __attribute__((noipa)) foo13 (void) { return 13; }
+int __attribute__((noipa)) foo14 (void) { return 14; }
+int __attribute__((noipa)) foo15 (void) { return 15; }
+int __attribute__((noipa)) foo16 (void) { return 16; }
+int __attribute__((noipa)) foo17 (void) { return 17; }
+int __attribute__((noipa)) foo18 (void) { return 18; }
+int __attribute__((noipa)) foo19 (void) { return 19; }
+int __attribute__((noipa)) foo20 (void) { return 20; }
 
 
-int __attribute__((noinline,noclone))
+int __attribute__((noipa))
 bar (int a)
 {
   int ret = 0;
diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c 
b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
index 46d2c54bcff1..5ad0c72afc36 100644
--- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
+++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
@@ -4,29 +4,29 @@
 /* case-values-threshold will be set to 20 by the back-end when jump
thunk are requested.  */
 
-int __attribute__((noinline,noclone)) foo1 (void) { return 1; }
-int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
-int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
-int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
-int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
-int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
-int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
-int __attribute__((noinline,noclone)) foo8 (void) { return 8

[gcc r15-1363] s390: testsuite: Fix ifcvt-one-insn-bool.c

2024-06-17 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:ac66736bf2f8a10d2f43e83ed6377e4179027a39

commit r15-1363-gac66736bf2f8a10d2f43e83ed6377e4179027a39
Author: Stefan Schulze Frielinghaus 
Date:   Mon Jun 17 08:34:34 2024 +0200

s390: testsuite: Fix ifcvt-one-insn-bool.c

With the change of r15-787-g57e04879389f9c I forgot to also update this
test.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ifcvt-one-insn-bool.c: Fix loc.

Diff:
---
 gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
index 0c8c2f879a69..4ae29dbd6b61 100644
--- a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
@@ -3,7 +3,7 @@
 /* { dg-do compile { target { s390*-*-* } } } */
 /* { dg-options "-O2 -march=z13 -mzarch" } */
 
-/* { dg-final { scan-assembler "lochinh\t%r.?,1" } } */
+/* { dg-final { scan-assembler "lochile\t%r.?,1" } } */
 #include 
 
 int foo (int *a, unsigned int n)


Re: [PATCH] s390: testsuite: Fix ifcvt-one-insn-bool.c

2024-06-13 Thread Stefan Schulze Frielinghaus
Ping.

On Wed, Jun 05, 2024 at 08:00:15AM +0200, Stefan Schulze Frielinghaus wrote:
> With the change of r15-787-g57e04879389f9c I forgot to also update this
> test.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/ifcvt-one-insn-bool.c: Fix loc.
> ---
>  Ok for mainline?  Ok for GCC 14 if the corresponding backport is also
>  approved?
> 
>  gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c 
> b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
> index 0c8c2f879a6..4ae29dbd6b6 100644
> --- a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
> +++ b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
> @@ -3,7 +3,7 @@
>  /* { dg-do compile { target { s390*-*-* } } } */
>  /* { dg-options "-O2 -march=z13 -mzarch" } */
>  
> -/* { dg-final { scan-assembler "lochinh\t%r.?,1" } } */
> +/* { dg-final { scan-assembler "lochile\t%r.?,1" } } */
>  #include 
>  
>  int foo (int *a, unsigned int n)
> -- 
> 2.45.1
> 


Re: [PATCH v2] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-06-13 Thread Stefan Schulze Frielinghaus
Ping.

On Sun, Jun 02, 2024 at 02:07:24PM +0200, Stefan Schulze Frielinghaus wrote:
> Since the patch works fine so far for mainline, ok to backport to GCC 14?
> 
> On Fri, May 17, 2024 at 08:59:05AM +0200, Stefan Schulze Frielinghaus wrote:
> > I've adapted the patch as follows and will push.
> > 
> > Thanks,
> > Stefan
> > 
> > --
> > 
> > Consider a NOCE conversion as profitable if there is at least one
> > conditional move.
> > 
> > gcc/ChangeLog:
> > 
> > * config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
> > Define.
> > (s390_noce_conversion_profitable_p): Implement.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
> > consequence the condition has to be reversed.
> > ---
> >  gcc/config/s390/s390.cc  | 32 
> >  gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
> >  2 files changed, 34 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> > index bf46eab2d63..7f8f1681c2a 100644
> > --- a/gcc/config/s390/s390.cc
> > +++ b/gcc/config/s390/s390.cc
> > @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-pass.h"
> >  #include "context.h"
> >  #include "builtins.h"
> > +#include "ifcvt.h"
> >  #include "rtl-iter.h"
> >  #include "intl.h"
> >  #include "tm-constrs.h"
> > @@ -18037,6 +18038,34 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
> > machine_mode op_mode,
> >return vectorize_vec_perm_const_1 (d);
> >  }
> >  
> > +/* Consider a NOCE conversion as profitable if there is at least one
> > +   conditional move.  */
> > +
> > +static bool
> > +s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info 
> > *if_info)
> > +{
> > +  if (if_info->speed_p)
> > +{
> > +  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
> > +   {
> > + rtx set = single_set (insn);
> > + if (set == NULL)
> > +   continue;
> > + if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
> > +   continue;
> > + rtx src = SET_SRC (set);
> > + machine_mode mode = GET_MODE (src);
> > + if (GET_MODE_CLASS (mode) != MODE_INT
> > + && GET_MODE_CLASS (mode) != MODE_FLOAT)
> > +   continue;
> > + if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> > +   continue;
> > + return true;
> > +   }
> > +}
> > +  return default_noce_conversion_profitable_p (seq, if_info);
> > +}
> > +
> >  /* Initialize GCC target structure.  */
> >  
> >  #undef  TARGET_ASM_ALIGNED_HI_OP
> > @@ -18350,6 +18379,9 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
> > machine_mode op_mode,
> >  #undef TARGET_VECTORIZE_VEC_PERM_CONST
> >  #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
> >  
> > +#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
> > +#define TARGET_NOCE_CONVERSION_PROFITABLE_P 
> > s390_noce_conversion_profitable_p
> > +
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >  
> >  #include "gt-s390.h"
> > diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
> > b/gcc/testsuite/gcc.target/s390/ccor.c
> > index 31f30f60314..36a3c3a999a 100644
> > --- a/gcc/testsuite/gcc.target/s390/ccor.c
> > +++ b/gcc/testsuite/gcc.target/s390/ccor.c
> > @@ -42,7 +42,7 @@ GENFUN1(2)
> >  
> >  GENFUN1(3)
> >  
> > -/* { dg-final { scan-assembler {locrno} } } */
> > +/* { dg-final { scan-assembler {locro} } } */
> >  
> >  GENFUN2(0,1)
> >  
> > @@ -58,7 +58,7 @@ GENFUN2(0,3)
> >  
> >  GENFUN2(1,2)
> >  
> > -/* { dg-final { scan-assembler {locrnlh} } } */
> > +/* { dg-final { scan-assembler {locrlh} } } */
> >  
> >  GENFUN2(1,3)
> >  
> > -- 
> > 2.45.0
> > 


Re: [PATCH] s390: testsuite: Fix nobp-table-jump-*.c

2024-06-13 Thread Stefan Schulze Frielinghaus
Ping.

On Mon, Jun 03, 2024 at 03:43:39PM +0200, Stefan Schulze Frielinghaus wrote:
> Starting with r14-5628-g53ba8d669550d3 interprocedural VRP became strong
> enough in order to render these tests useless.  Fixed by disabling IPA.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/nobp-table-jump-inline-z10.c: Do not perform
>   IPA.
>   * gcc.target/s390/nobp-table-jump-inline-z900.c: Dito.
>   * gcc.target/s390/nobp-table-jump-z10.c: Dito.
>   * gcc.target/s390/nobp-table-jump-z900.c: Dito.
> ---
>  Ok for mainline?
> 
>  .../s390/nobp-table-jump-inline-z10.c | 42 +--
>  .../s390/nobp-table-jump-inline-z900.c| 42 +--
>  .../gcc.target/s390/nobp-table-jump-z10.c | 42 +--
>  .../gcc.target/s390/nobp-table-jump-z900.c| 42 +--
>  4 files changed, 84 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c 
> b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
> index 8dfd7e4c786..121751166d0 100644
> --- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
> +++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
> @@ -4,29 +4,29 @@
>  /* case-values-threshold will be set to 20 by the back-end when jump
> thunk are requested.  */
>  
> -int __attribute__((noinline,noclone)) foo1 (void) { return 1; }
> -int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
> -int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
> -int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
> -int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
> -int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
> -int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
> -int __attribute__((noinline,noclone)) foo8 (void) { return 8; }
> -int __attribute__((noinline,noclone)) foo9 (void) { return 9; }
> -int __attribute__((noinline,noclone)) foo10 (void) { return 10; }
> -int __attribute__((noinline,noclone)) foo11 (void) { return 11; }
> -int __attribute__((noinline,noclone)) foo12 (void) { return 12; }
> -int __attribute__((noinline,noclone)) foo13 (void) { return 13; }
> -int __attribute__((noinline,noclone)) foo14 (void) { return 14; }
> -int __attribute__((noinline,noclone)) foo15 (void) { return 15; }
> -int __attribute__((noinline,noclone)) foo16 (void) { return 16; }
> -int __attribute__((noinline,noclone)) foo17 (void) { return 17; }
> -int __attribute__((noinline,noclone)) foo18 (void) { return 18; }
> -int __attribute__((noinline,noclone)) foo19 (void) { return 19; }
> -int __attribute__((noinline,noclone)) foo20 (void) { return 20; }
> +int __attribute__((noipa)) foo1 (void) { return 1; }
> +int __attribute__((noipa)) foo2 (void) { return 2; }
> +int __attribute__((noipa)) foo3 (void) { return 3; }
> +int __attribute__((noipa)) foo4 (void) { return 4; }
> +int __attribute__((noipa)) foo5 (void) { return 5; }
> +int __attribute__((noipa)) foo6 (void) { return 6; }
> +int __attribute__((noipa)) foo7 (void) { return 7; }
> +int __attribute__((noipa)) foo8 (void) { return 8; }
> +int __attribute__((noipa)) foo9 (void) { return 9; }
> +int __attribute__((noipa)) foo10 (void) { return 10; }
> +int __attribute__((noipa)) foo11 (void) { return 11; }
> +int __attribute__((noipa)) foo12 (void) { return 12; }
> +int __attribute__((noipa)) foo13 (void) { return 13; }
> +int __attribute__((noipa)) foo14 (void) { return 14; }
> +int __attribute__((noipa)) foo15 (void) { return 15; }
> +int __attribute__((noipa)) foo16 (void) { return 16; }
> +int __attribute__((noipa)) foo17 (void) { return 17; }
> +int __attribute__((noipa)) foo18 (void) { return 18; }
> +int __attribute__((noipa)) foo19 (void) { return 19; }
> +int __attribute__((noipa)) foo20 (void) { return 20; }
>  
>  
> -int __attribute__((noinline,noclone))
> +int __attribute__((noipa))
>  bar (int a)
>  {
>int ret = 0;
> diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c 
> b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
> index 46d2c54bcff..5ad0c72afc3 100644
> --- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
> +++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
> @@ -4,29 +4,29 @@
>  /* case-values-threshold will be set to 20 by the back-end when jump
> thunk are requested.  */
>  
> -int __attribute__((noinline,noclone)) foo1 (void) { return 1; }
> -int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
> -int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
> -int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
> -int __attribute__((noinline,noclone)) foo5 (void)

Re: [PATCH] s390: Extend two element float vector

2024-06-11 Thread Stefan Schulze Frielinghaus
On Tue, Jun 11, 2024 at 10:42:26AM +0200, Andreas Krebbel wrote:
> On 6/11/24 10:26, Stefan Schulze Frielinghaus wrote:
> > This implements a V2SF -> V2DF extend.
> > 
> > gcc/ChangeLog:
> > 
> > * config/s390/vector.md (*vmrhf): New.
> > (extendv2sfv2df2): New.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/vector/vec-extend-3.c: New test.
> 
> Since we already have a *vmrhf pattern, should we perhaps add something to
> the name to make it easier to distinguish in the rtl dumps? You have added
> the mode already, but perhaps something like *vmrhf_half or something
> like this?

I like the one with _half added which I will push soon.

Thanks,
Stefan

> 
> Ok with or without that change. Thanks!
> 
> 
> Andreas
> 
> 


[PATCH] s390: Extend two element float vector

2024-06-11 Thread Stefan Schulze Frielinghaus
This implements a V2SF -> V2DF extend.

gcc/ChangeLog:

* config/s390/vector.md (*vmrhf): New.
(extendv2sfv2df2): New.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-extend-3.c: New test.
---
 Bootstrap and regtested on s390.  Ok for mainline?

 gcc/config/s390/vector.md | 28 +++
 .../gcc.target/s390/vector/vec-extend-3.c | 18 
 2 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-extend-3.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index a931a4b1b17..d8657fae56d 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -895,6 +895,17 @@
   "vmrhf\t%0,%1,%2";
   [(set_attr "op_type" "VRR")])
 
+(define_insn "*vmrhf"
+  [(set (match_operand:V_HW_40 
"register_operand" "=v")
+   (vec_select:V_HW_4
+(vec_concat:V_HW_4 (match_operand: 1 
"register_operand"  "v")
+   (match_operand: 2 
"register_operand"  "v"))
+(parallel [(const_int 0) (const_int 2)
+   (const_int 1) (const_int 3)])))]
+  "TARGET_VX"
+  "vmrhf\t%0,%1,%2";
+  [(set_attr "op_type" "VRR")])
+
 (define_insn "*vmrlf"
   [(set (match_operand:V_HW_4  0 
"register_operand" "=v")
 (vec_select:V_HW_4
@@ -2394,6 +2405,23 @@
   "vuph\t%0,%1"
   [(set_attr "op_type" "VRR")])
 
+(define_expand "extendv2sfv2df2"
+  [(set (match_dup 2)
+   (vec_select:V4SF
+(vec_concat:V4SF (match_operand:V2SF 1 "register_operand")
+ (match_dup 1))
+(parallel [(const_int 0) (const_int 2)
+   (const_int 1) (const_int 3)])))
+   (set (match_operand:V2DF 0 "register_operand")
+   (float_extend:V2DF
+(vec_select:V2SF
+ (match_dup 2)
+ (parallel [(const_int 0) (const_int 2)]]
+  "TARGET_VX"
+{
+  operands[2] = gen_reg_rtx (V4SFmode);
+})
+
 ;; vector unpack v16qi
 
 ; signed
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-extend-3.c 
b/gcc/testsuite/gcc.target/s390/vector/vec-extend-3.c
new file mode 100644
index 000..2b02e7bf9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-extend-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=z13 -mzarch" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+typedef float v2sf __attribute__ ((vector_size (8)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+/*
+** extendv2sfv2df2:
+** vmrhf   %v24,%v24,%v24
+** vldeb   %v24,%v24
+** br  %r14
+*/
+
+v2df extendv2sfv2df2 (v2sf x)
+{
+  return __builtin_convertvector (x, v2df);
+}
-- 
2.45.1



[PATCH] s390: Extend two/four element integer vectors

2024-06-11 Thread Stefan Schulze Frielinghaus
For the moment I deliberately left out one-element QHS vectors since it
is unclear whether these are pathological cases or whether they are
really used.  If we ever get an extend for V1DI -> V1TI we should
reconsider this.

As a side-effect this fixes PR115261.

gcc/ChangeLog:

target/PR115261
* config/s390/s390.md (any_extend,extend_insn,zero_extend):
New code attributes and code iterator.
* config/s390/vector.md (V_EXTEND): New mode iterator.
(2): New insn.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-extend-1.c: New test.
* gcc.target/s390/vector/vec-extend-2.c: New test.
---
 Bootstrap and regtested on s390.  Ok for mainline?

 gcc/config/s390/s390.md   |  4 +
 gcc/config/s390/vector.md | 29 +--
 .../gcc.target/s390/vector/vec-extend-1.c | 79 +++
 .../gcc.target/s390/vector/vec-extend-2.c | 55 +
 4 files changed, 162 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-extend-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-extend-2.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index c607dce3cf0..1311a5f01cf 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -602,6 +602,10 @@
 
 (define_attr "relative_long" "no,yes" (const_string "no"))
 
+(define_code_attr extend_insn [(sign_extend "extend") (zero_extend 
"zero_extend")])
+(define_code_attr zero_extend [(sign_extend "") (zero_extend "l")])
+(define_code_iterator any_extend [sign_extend zero_extend])
+
 ;; Pipeline description for z900.
 (include "2064.md")
 
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index ed4742d93c9..a931a4b1b17 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -87,6 +87,8 @@
 ; 32 bit int<->fp vector conversion instructions are available since VXE2 
(z15).
 (define_mode_iterator VX_VEC_CONV_BFP [V2DF (V4SF "TARGET_VXE2")])
 
+(define_mode_iterator VI_EXTEND [V2QI V2HI V2SI V4QI V4HI])
+
 ; Empty string for all but TImode.  This is used to hide the TImode
 ; expander name in case it is defined already.  See addti3 for an
 ; example.
@@ -195,13 +197,20 @@
(V1DF "V2DF") (V2DF "V4DF")])
 
 ; Vector with widened element size and the same number of elements.
-(define_mode_attr vec_2x_wide [(V1QI "V1HI") (V2QI "V2HI") (V4QI "V4HI") (V8QI 
"V8HI") (V16QI "V16HI")
+(define_mode_attr VEC_2X_WIDE [(V1QI "V1HI") (V2QI "V2HI") (V4QI "V4HI") (V8QI 
"V8HI") (V16QI "V16HI")
   (V1HI "V1SI") (V2HI "V2SI") (V4HI "V4SI") (V8HI 
"V8SI")
   (V1SI "V1DI") (V2SI "V2DI") (V4SI "V4DI")
   (V1DI "V1TI") (V2DI "V2TI")
   (V1SF "V1DF") (V2SF "V2DF") (V4SF "V4DF")
   (V1DF "V1TF") (V2DF "V2TF")])
 
+(define_mode_attr vec_2x_wide [(V1QI "v1hi") (V2QI "v2hi") (V4QI "v4hi") (V8QI 
"v8hi") (V16QI "v16hi")
+  (V1HI "v1si") (V2HI "v2si") (V4HI "v4si") (V8HI 
"v8si")
+  (V1SI "v1di") (V2SI "v2di") (V4SI "v4di")
+  (V1DI "v1ti") (V2DI "v2ti")
+  (V1SF "v1df") (V2SF "v2df") (V4SF "v4df")
+  (V1DF "v1tf") (V2DF "v2tf")])
+
 ; Vector with half the element size AND half the number of elements.
 (define_mode_attr vec_halfhalf
   [(V2HI "V2QI") (V4HI "V4QI") (V8HI "V8QI")
@@ -1604,7 +1613,7 @@
 UNSPEC_VEC_UMULT_ODD))
(set (match_operand: 0 "register_operand" "")
 (vec_select:
-(vec_concat: (match_dup 3) (match_dup 4))
+(vec_concat: (match_dup 3) (match_dup 4))
 (match_dup 5)))]
   "TARGET_VX"
  {
@@ -1623,7 +1632,7 @@
 UNSPEC_VEC_UMULT_ODD))
(set (match_operand: 0 "register_operand" "")
 (vec_select:
-(vec_concat: (match_dup 3) (match_dup 4))
+(vec_concat: (match_dup 3) (match_dup 4))
 (match_dup 5)))]
   "TARGET_VX"
  {
@@ -1642,7 +1651,7 @@
 UNSPEC_VEC_SMULT_ODD))
(set (match_operand: 0 "register_operand" "")
 (vec_select:
-(vec_concat: (match_dup 3) (match_dup 4))
+(vec_concat: (match_dup 3) (match_dup 4))
 (match_dup 5)))]
   "TARGET_VX"
  {
@@ -1661,7 +1670,7 @@
 UNSPEC_VEC_SMULT_ODD))
(set (match_operand: 0 "register_operand" "")
 (vec_select:
-(vec_concat: (match_dup 3) (match_dup 4))
+(vec_concat: (match_dup 3) (match_dup 4))
 (match_dup 5)))]
   "TARGET_VX"
  {
@@ -2375,6 +2384,16 @@
   "vpkls\t%0,%1,%2"
   [(set_attr "op_type" "VRR")])
 
+;; vector unpack / extend
+
+(define_insn "2"
+  [(set (match_operand: 0 "register_operand" "=v")
+   (any_extend:
+ 

Re: [PATCH] Hard register asm constraint

2024-06-09 Thread Stefan Schulze Frielinghaus
Ping.

On Fri, May 24, 2024 at 11:13:12AM +0200, Stefan Schulze Frielinghaus wrote:
> This implements hard register constraints for inline asm.  A hard register
> constraint is of the form {regname} where regname is any valid register.  This
> basically renders register asm superfluous.  For example, the snippet
> 
> int test (int x, int y)
> {
>   register int r4 asm ("r4") = x;
>   register int r5 asm ("r5") = y;
>   unsigned int copy = y;
>   asm ("foo %0,%1,%2" : "+d" (r4) : "d" (r5), "d" (copy));
>   return r4;
> }
> 
> could be rewritten into
> 
> int test (int x, int y)
> {
>   asm ("foo %0,%1,%2" : "+{r4}" (x) : "{r5}" (y), "d" (y));
>   return x;
> }
> 
> As a side-effect this also solves the problem of call-clobbered registers.
> That being said, I was wondering whether we could utilize this feature in 
> order
> to get rid of local register asm automatically?  For example, converting
> 
> // Result will be in r2 on s390
> extern int bar (void);
> 
> void test (void)
> {
>   register int x asm ("r2") = 42;
>   bar ();
>   asm ("foo %0\n" :: "r" (x));
> }
> 
> into
> 
> void test (void)
> {
>   int x = 42;
>   bar ();
>   asm ("foo %0\n" :: "{r2}" (x));
> }
> 
> in order to get rid of the limitation of call-clobbered registers which may
> lead to subtle bugs---especially if you think of non-obvious calls e.g.
> introduced by sanitizer/tracer/whatever.  Since such a transformation has the
> potential to break existing code do you see any edge cases where this might be
> problematic or even show stoppers?  Currently, even
> 
> int test (void)
> {
>   register int x asm ("r2") = 42;
>   register int y asm ("r2") = 24;
>   asm ("foo %0,%1\n" :: "r" (x), "r" (y));
> }
> 
> is allowed which seems error prone to me.  Thus, if 100% backwards
> compatibility would be required, then automatically converting every register
> asm to the new mechanism isn't viable.  Still quite a lot could be 
> transformed.
> Any thoughts?
> 
> Currently I allow multiple alternatives as demonstrated by
> gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c.  However, since a hard 
> register
> constraint is pretty specific I could also think of erroring out in case of
> alternatives.  Are there any real use cases out there for multiple
> alternatives where one would like to use hard register constraints?
> 
> With the current implementation we have a "user visible change" in the sense
> that for
> 
> void test (void)
> {
>   register int x asm ("r2") = 42;
>   register int y asm ("r2") = 24;
>   asm ("foo   %0,%1\n" : "=r" (x), "=r" (y));
> }
> 
> we do not get the error
> 
>   "invalid hard register usage between output operands"
> 
> anymore but rather
> 
>   "multiple outputs to hard register: %r2"
> 
> This is due to the error handling in gimplify_asm_expr ().  Speaking of 
> errors,
> I also error out earlier as before which means that e.g. in pr87600-2.c only
> the first error is reported and processing is stopped afterwards which means
> the subsequent tests fail.
> 
> I've been skimming through all targets and it looks to me as if none is using
> curly brackets for their constraints.  Of course, I may have missed something.
> 
> Cheers,
> Stefan
> 
> PS: Current state for Clang: https://reviews.llvm.org/D105142
> 
> ---
>  gcc/cfgexpand.cc  |  42 ---
>  gcc/genpreds.cc   |   4 +-
>  gcc/gimplify.cc   | 115 +-
>  gcc/lra-constraints.cc|  17 +++
>  gcc/recog.cc  |  14 ++-
>  gcc/stmt.cc   | 102 +++-
>  gcc/stmt.h|  10 +-
>  .../gcc.target/s390/asm-hard-reg-1.c  | 103 
>  .../gcc.target/s390/asm-hard-reg-2.c  |  29 +
>  .../gcc.target/s390/asm-hard-reg-3.c  |  24 
>  gcc/testsuite/lib/scanasm.exp |   4 +
>  11 files changed, 407 insertions(+), 57 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-1.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-3.c
> 
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 557cb28733b..47f71a2e803 100644
> --- a/gcc/cfgexpand.

[PATCH] s390: testsuite: Fix ifcvt-one-insn-bool.c

2024-06-05 Thread Stefan Schulze Frielinghaus
With the change of r15-787-g57e04879389f9c I forgot to also update this
test.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ifcvt-one-insn-bool.c: Fix loc.
---
 Ok for mainline?  Ok for GCC 14 if the corresponding backport is also
 approved?

 gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
index 0c8c2f879a6..4ae29dbd6b6 100644
--- a/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-one-insn-bool.c
@@ -3,7 +3,7 @@
 /* { dg-do compile { target { s390*-*-* } } } */
 /* { dg-options "-O2 -march=z13 -mzarch" } */
 
-/* { dg-final { scan-assembler "lochinh\t%r.?,1" } } */
+/* { dg-final { scan-assembler "lochile\t%r.?,1" } } */
 #include 
 
 int foo (int *a, unsigned int n)
-- 
2.45.1



Re: Partial vector

2024-06-04 Thread Stefan Schulze Frielinghaus via Gcc
On Tue, Jun 04, 2024 at 09:50:04AM +0200, Richard Biener wrote:
> On Tue, Jun 4, 2024 at 8:52 AM Stefan Schulze Frielinghaus via Gcc
>  wrote:
> >
> > Hi all,
> >
> > Is there some sort of guarantee that the unused part of a partial vector has
> > all bits set to zero?
> >
> > The question came up while implementing an insn for mode V2SF on s390
> > where only half of the hard register would be utilized.  The final
> > machine instruction, however, would make use of the full register
> > (V4SF).  Therefore, if the other half is not guaranteed to be zero, then
> > a floating-point exception might occur in this particular case.  Of
> > course, if such a guarantee exists, then one would have to maintain that
> > for all insn implementations.
> >
> > This all sounds a bit fragile and probably better solved by having some
> > sort of masking support by the hardware but I'm still keen to know.
> 
> There is no guarantee by the middle-end (like having PROMOTE_MODE
> for vectors).  You may want to look how x86 implements MMX-with-SSE
> (aka 8 byte vectors within 16 byte SSE regs).
> 
> In particular there's no generic middle-end support for "lowering" V2SFmode
> to V4SFmode during RTL expansion, your machine description expanders
> have to do that.

Thanks for clarification.  I will also have a look at the MMX-with-SSE
feature.

Cheers,
Stefan

> 
> Richard.
> 
> > Cheers,
> > Stefan


Partial vector

2024-06-04 Thread Stefan Schulze Frielinghaus via Gcc
Hi all,

Is there some sort of guarantee that the unused part of a partial vector has
all bits set to zero?

The question came up while implementing an insn for mode V2SF on s390
where only half of the hard register would be utilized.  The final
machine instruction, however, would make use of the full register
(V4SF).  Therefore, if the other half is not guaranteed to be zero, then
a floating-point exception might occur in this particular case.  Of
course, if such a guarantee exists, then one would have to maintain that
for all insn implementations.

This all sounds a bit fragile and probably better solved by having some
sort of masking support by the hardware but I'm still keen to know.

Cheers,
Stefan


[PATCH] s390: testsuite: Fix nobp-table-jump-*.c

2024-06-03 Thread Stefan Schulze Frielinghaus
Starting with r14-5628-g53ba8d669550d3 interprocedural VRP became strong
enough in order to render these tests useless.  Fixed by disabling IPA.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-table-jump-inline-z10.c: Do not perform
IPA.
* gcc.target/s390/nobp-table-jump-inline-z900.c: Dito.
* gcc.target/s390/nobp-table-jump-z10.c: Dito.
* gcc.target/s390/nobp-table-jump-z900.c: Dito.
---
 Ok for mainline?

 .../s390/nobp-table-jump-inline-z10.c | 42 +--
 .../s390/nobp-table-jump-inline-z900.c| 42 +--
 .../gcc.target/s390/nobp-table-jump-z10.c | 42 +--
 .../gcc.target/s390/nobp-table-jump-z900.c| 42 +--
 4 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c 
b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
index 8dfd7e4c786..121751166d0 100644
--- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
+++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z10.c
@@ -4,29 +4,29 @@
 /* case-values-threshold will be set to 20 by the back-end when jump
thunk are requested.  */
 
-int __attribute__((noinline,noclone)) foo1 (void) { return 1; }
-int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
-int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
-int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
-int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
-int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
-int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
-int __attribute__((noinline,noclone)) foo8 (void) { return 8; }
-int __attribute__((noinline,noclone)) foo9 (void) { return 9; }
-int __attribute__((noinline,noclone)) foo10 (void) { return 10; }
-int __attribute__((noinline,noclone)) foo11 (void) { return 11; }
-int __attribute__((noinline,noclone)) foo12 (void) { return 12; }
-int __attribute__((noinline,noclone)) foo13 (void) { return 13; }
-int __attribute__((noinline,noclone)) foo14 (void) { return 14; }
-int __attribute__((noinline,noclone)) foo15 (void) { return 15; }
-int __attribute__((noinline,noclone)) foo16 (void) { return 16; }
-int __attribute__((noinline,noclone)) foo17 (void) { return 17; }
-int __attribute__((noinline,noclone)) foo18 (void) { return 18; }
-int __attribute__((noinline,noclone)) foo19 (void) { return 19; }
-int __attribute__((noinline,noclone)) foo20 (void) { return 20; }
+int __attribute__((noipa)) foo1 (void) { return 1; }
+int __attribute__((noipa)) foo2 (void) { return 2; }
+int __attribute__((noipa)) foo3 (void) { return 3; }
+int __attribute__((noipa)) foo4 (void) { return 4; }
+int __attribute__((noipa)) foo5 (void) { return 5; }
+int __attribute__((noipa)) foo6 (void) { return 6; }
+int __attribute__((noipa)) foo7 (void) { return 7; }
+int __attribute__((noipa)) foo8 (void) { return 8; }
+int __attribute__((noipa)) foo9 (void) { return 9; }
+int __attribute__((noipa)) foo10 (void) { return 10; }
+int __attribute__((noipa)) foo11 (void) { return 11; }
+int __attribute__((noipa)) foo12 (void) { return 12; }
+int __attribute__((noipa)) foo13 (void) { return 13; }
+int __attribute__((noipa)) foo14 (void) { return 14; }
+int __attribute__((noipa)) foo15 (void) { return 15; }
+int __attribute__((noipa)) foo16 (void) { return 16; }
+int __attribute__((noipa)) foo17 (void) { return 17; }
+int __attribute__((noipa)) foo18 (void) { return 18; }
+int __attribute__((noipa)) foo19 (void) { return 19; }
+int __attribute__((noipa)) foo20 (void) { return 20; }
 
 
-int __attribute__((noinline,noclone))
+int __attribute__((noipa))
 bar (int a)
 {
   int ret = 0;
diff --git a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c 
b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
index 46d2c54bcff..5ad0c72afc3 100644
--- a/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
+++ b/gcc/testsuite/gcc.target/s390/nobp-table-jump-inline-z900.c
@@ -4,29 +4,29 @@
 /* case-values-threshold will be set to 20 by the back-end when jump
thunk are requested.  */
 
-int __attribute__((noinline,noclone)) foo1 (void) { return 1; }
-int __attribute__((noinline,noclone)) foo2 (void) { return 2; }
-int __attribute__((noinline,noclone)) foo3 (void) { return 3; }
-int __attribute__((noinline,noclone)) foo4 (void) { return 4; }
-int __attribute__((noinline,noclone)) foo5 (void) { return 5; }
-int __attribute__((noinline,noclone)) foo6 (void) { return 6; }
-int __attribute__((noinline,noclone)) foo7 (void) { return 7; }
-int __attribute__((noinline,noclone)) foo8 (void) { return 8; }
-int __attribute__((noinline,noclone)) foo9 (void) { return 9; }
-int __attribute__((noinline,noclone)) foo10 (void) { return 10; }
-int __attribute__((noinline,noclone)) foo11 (void) { return 11; }
-int __attribute__((noinline,noclone)) foo12 (void) { return 12; }
-int __attribute__((noinline,noclone)) foo13 

Re: [PATCH v2] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-06-02 Thread Stefan Schulze Frielinghaus
Since the patch works fine so far for mainline, ok to backport to GCC 14?

On Fri, May 17, 2024 at 08:59:05AM +0200, Stefan Schulze Frielinghaus wrote:
> I've adapted the patch as follows and will push.
> 
> Thanks,
> Stefan
> 
> --
> 
> Consider a NOCE conversion as profitable if there is at least one
> conditional move.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
>   Define.
>   (s390_noce_conversion_profitable_p): Implement.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
>   consequence the condition has to be reversed.
> ---
>  gcc/config/s390/s390.cc  | 32 
>  gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
>  2 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index bf46eab2d63..7f8f1681c2a 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-pass.h"
>  #include "context.h"
>  #include "builtins.h"
> +#include "ifcvt.h"
>  #include "rtl-iter.h"
>  #include "intl.h"
>  #include "tm-constrs.h"
> @@ -18037,6 +18038,34 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
> machine_mode op_mode,
>return vectorize_vec_perm_const_1 (d);
>  }
>  
> +/* Consider a NOCE conversion as profitable if there is at least one
> +   conditional move.  */
> +
> +static bool
> +s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info 
> *if_info)
> +{
> +  if (if_info->speed_p)
> +{
> +  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
> + {
> +   rtx set = single_set (insn);
> +   if (set == NULL)
> + continue;
> +   if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
> + continue;
> +   rtx src = SET_SRC (set);
> +   machine_mode mode = GET_MODE (src);
> +   if (GET_MODE_CLASS (mode) != MODE_INT
> +   && GET_MODE_CLASS (mode) != MODE_FLOAT)
> + continue;
> +   if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> + continue;
> +   return true;
> + }
> +}
> +  return default_noce_conversion_profitable_p (seq, if_info);
> +}
> +
>  /* Initialize GCC target structure.  */
>  
>  #undef  TARGET_ASM_ALIGNED_HI_OP
> @@ -18350,6 +18379,9 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
> machine_mode op_mode,
>  #undef TARGET_VECTORIZE_VEC_PERM_CONST
>  #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
>  
> +#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
> +#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  #include "gt-s390.h"
> diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
> b/gcc/testsuite/gcc.target/s390/ccor.c
> index 31f30f60314..36a3c3a999a 100644
> --- a/gcc/testsuite/gcc.target/s390/ccor.c
> +++ b/gcc/testsuite/gcc.target/s390/ccor.c
> @@ -42,7 +42,7 @@ GENFUN1(2)
>  
>  GENFUN1(3)
>  
> -/* { dg-final { scan-assembler {locrno} } } */
> +/* { dg-final { scan-assembler {locro} } } */
>  
>  GENFUN2(0,1)
>  
> @@ -58,7 +58,7 @@ GENFUN2(0,3)
>  
>  GENFUN2(1,2)
>  
> -/* { dg-final { scan-assembler {locrnlh} } } */
> +/* { dg-final { scan-assembler {locrlh} } } */
>  
>  GENFUN2(1,3)
>  
> -- 
> 2.45.0
> 


Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-02 Thread Stefan Schulze Frielinghaus
On Fri, May 31, 2024 at 10:05:55PM -0600, Jeff Law wrote:
> 
> 
> On 5/31/24 9:03 AM, Robin Dapp wrote:
> > Hi,
> > 
> > before noce_find_if_block processes a block it sets up an if_info
> > structure that holds the original costs.  At that point the costs of
> > the then/else blocks have not been added so we only care about the
> > "if" cost.
> > 
> > The code originally used BRANCH_COST for that but was then changed
> > to COST_N_INSNS (2) - a compare and a jump.
> > This patch computes the jump costs via
> >insn_cost (if_info.jump, ...)
> > which is supposed to incorporate the branch costs and, in case of a CC
> > comparison,
> >pattern_cost (if_info.cond, ...)
> > which is supposed to account for the CC creation.
> > 
> > For compare_and_jump patterns insn_cost should have already computed
> > the right cost.
> > 
> > Does this "split" make sense, generally?
> > 
> > Bootstrapped and regtested on x86, aarch64 and power10.  Regtested
> > on riscv.
> > 
> > Regards
> >   Robin
> > 
> > gcc/ChangeLog:
> > 
> > * ifcvt.cc (noce_process_if_block): Subtract condition pattern
> > cost if applicable.
> > (noce_find_if_block): Use insn_cost and pattern_cost for
> > original cost.
> OK.  Obviously we'll need to be on the lookout for regressions.  My bet is
> on s390 since you already tested the x86, aarch64 & p10 targets :-)

I just gave it a try on s390 where bootstrap and regtest were successful.

Cheers,
Stefan

> 
> 
> jeff
> 


[PATCH] Hard register asm constraint

2024-05-24 Thread Stefan Schulze Frielinghaus
This implements hard register constraints for inline asm.  A hard register
constraint is of the form {regname} where regname is any valid register.  This
basically renders register asm superfluous.  For example, the snippet

int test (int x, int y)
{
  register int r4 asm ("r4") = x;
  register int r5 asm ("r5") = y;
  unsigned int copy = y;
  asm ("foo %0,%1,%2" : "+d" (r4) : "d" (r5), "d" (copy));
  return r4;
}

could be rewritten into

int test (int x, int y)
{
  asm ("foo %0,%1,%2" : "+{r4}" (x) : "{r5}" (y), "d" (y));
  return x;
}

As a side-effect this also solves the problem of call-clobbered registers.
That being said, I was wondering whether we could utilize this feature in order
to get rid of local register asm automatically?  For example, converting

// Result will be in r2 on s390
extern int bar (void);

void test (void)
{
  register int x asm ("r2") = 42;
  bar ();
  asm ("foo %0\n" :: "r" (x));
}

into

void test (void)
{
  int x = 42;
  bar ();
  asm ("foo %0\n" :: "{r2}" (x));
}

in order to get rid of the limitation of call-clobbered registers which may
lead to subtle bugs---especially if you think of non-obvious calls e.g.
introduced by sanitizer/tracer/whatever.  Since such a transformation has the
potential to break existing code do you see any edge cases where this might be
problematic or even show stoppers?  Currently, even

int test (void)
{
  register int x asm ("r2") = 42;
  register int y asm ("r2") = 24;
  asm ("foo %0,%1\n" :: "r" (x), "r" (y));
}

is allowed which seems error prone to me.  Thus, if 100% backwards
compatibility would be required, then automatically converting every register
asm to the new mechanism isn't viable.  Still quite a lot could be transformed.
Any thoughts?

Currently I allow multiple alternatives as demonstrated by
gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c.  However, since a hard register
constraint is pretty specific I could also think of erroring out in case of
alternatives.  Are there any real use cases out there for multiple
alternatives where one would like to use hard register constraints?

With the current implementation we have a "user visible change" in the sense
that for

void test (void)
{
  register int x asm ("r2") = 42;
  register int y asm ("r2") = 24;
  asm ("foo %0,%1\n" : "=r" (x), "=r" (y));
}

we do not get the error

  "invalid hard register usage between output operands"

anymore but rather

  "multiple outputs to hard register: %r2"

This is due to the error handling in gimplify_asm_expr ().  Speaking of errors,
I also error out earlier as before which means that e.g. in pr87600-2.c only
the first error is reported and processing is stopped afterwards which means
the subsequent tests fail.

I've been skimming through all targets and it looks to me as if none is using
curly brackets for their constraints.  Of course, I may have missed something.

Cheers,
Stefan

PS: Current state for Clang: https://reviews.llvm.org/D105142

---
 gcc/cfgexpand.cc  |  42 ---
 gcc/genpreds.cc   |   4 +-
 gcc/gimplify.cc   | 115 +-
 gcc/lra-constraints.cc|  17 +++
 gcc/recog.cc  |  14 ++-
 gcc/stmt.cc   | 102 +++-
 gcc/stmt.h|  10 +-
 .../gcc.target/s390/asm-hard-reg-1.c  | 103 
 .../gcc.target/s390/asm-hard-reg-2.c  |  29 +
 .../gcc.target/s390/asm-hard-reg-3.c  |  24 
 gcc/testsuite/lib/scanasm.exp |   4 +
 11 files changed, 407 insertions(+), 57 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-3.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 557cb28733b..47f71a2e803 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2955,44 +2955,6 @@ expand_asm_loc (tree string, int vol, location_t locus)
   emit_insn (body);
 }
 
-/* Return the number of times character C occurs in string S.  */
-static int
-n_occurrences (int c, const char *s)
-{
-  int n = 0;
-  while (*s)
-n += (*s++ == c);
-  return n;
-}
-
-/* A subroutine of expand_asm_operands.  Check that all operands have
-   the same number of alternatives.  Return true if so.  */
-
-static bool
-check_operand_nalternatives (const vec )
-{
-  unsigned len = constraints.length();
-  if (len > 0)
-{
-  int nalternatives = n_occurrences (',', constraints[0]);
-
-  if (nalternatives + 1 > MAX_RECOG_ALTERNATIVES)
-   {
- 

[gcc r15-787] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-05-23 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:57e04879389f9c0d5d53f316b468ce1bddbab350

commit r15-787-g57e04879389f9c0d5d53f316b468ce1bddbab350
Author: Stefan Schulze Frielinghaus 
Date:   Thu May 23 08:43:35 2024 +0200

s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

Consider a NOCE conversion as profitable if there is at least one
conditional move.

gcc/ChangeLog:

PR target/109549
* config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
Define.
(s390_noce_conversion_profitable_p): Implement.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
consequence the condition has to be reversed.

Diff:
---
 gcc/config/s390/s390.cc  | 32 
 gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 5968808fcb6..fa517bd3e77 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "context.h"
 #include "builtins.h"
+#include "ifcvt.h"
 #include "rtl-iter.h"
 #include "intl.h"
 #include "tm-constrs.h"
@@ -18037,6 +18038,34 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   return vectorize_vec_perm_const_1 (d);
 }
 
+/* Consider a NOCE conversion as profitable if there is at least one
+   conditional move.  */
+
+static bool
+s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  if (if_info->speed_p)
+{
+  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
+   {
+ rtx set = single_set (insn);
+ if (set == NULL)
+   continue;
+ if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
+   continue;
+ rtx src = SET_SRC (set);
+ machine_mode mode = GET_MODE (src);
+ if (GET_MODE_CLASS (mode) != MODE_INT
+ && GET_MODE_CLASS (mode) != MODE_FLOAT)
+   continue;
+ if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   continue;
+ return true;
+   }
+}
+  return default_noce_conversion_profitable_p (seq, if_info);
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ -18350,6 +18379,9 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
 #undef TARGET_VECTORIZE_VEC_PERM_CONST
 #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
 
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
+#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-s390.h"
diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
b/gcc/testsuite/gcc.target/s390/ccor.c
index 31f30f60314..36a3c3a999a 100644
--- a/gcc/testsuite/gcc.target/s390/ccor.c
+++ b/gcc/testsuite/gcc.target/s390/ccor.c
@@ -42,7 +42,7 @@ GENFUN1(2)
 
 GENFUN1(3)
 
-/* { dg-final { scan-assembler {locrno} } } */
+/* { dg-final { scan-assembler {locro} } } */
 
 GENFUN2(0,1)
 
@@ -58,7 +58,7 @@ GENFUN2(0,3)
 
 GENFUN2(1,2)
 
-/* { dg-final { scan-assembler {locrnlh} } } */
+/* { dg-final { scan-assembler {locrlh} } } */
 
 GENFUN2(1,3)


[PATCH v2] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-05-17 Thread Stefan Schulze Frielinghaus
I've adapted the patch as follows and will push.

Thanks,
Stefan

--

Consider a NOCE conversion as profitable if there is at least one
conditional move.

gcc/ChangeLog:

* config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
Define.
(s390_noce_conversion_profitable_p): Implement.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
consequence the condition has to be reversed.
---
 gcc/config/s390/s390.cc  | 32 
 gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index bf46eab2d63..7f8f1681c2a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "context.h"
 #include "builtins.h"
+#include "ifcvt.h"
 #include "rtl-iter.h"
 #include "intl.h"
 #include "tm-constrs.h"
@@ -18037,6 +18038,34 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   return vectorize_vec_perm_const_1 (d);
 }
 
+/* Consider a NOCE conversion as profitable if there is at least one
+   conditional move.  */
+
+static bool
+s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  if (if_info->speed_p)
+{
+  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
+   {
+ rtx set = single_set (insn);
+ if (set == NULL)
+   continue;
+ if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
+   continue;
+ rtx src = SET_SRC (set);
+ machine_mode mode = GET_MODE (src);
+ if (GET_MODE_CLASS (mode) != MODE_INT
+ && GET_MODE_CLASS (mode) != MODE_FLOAT)
+   continue;
+ if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   continue;
+ return true;
+   }
+}
+  return default_noce_conversion_profitable_p (seq, if_info);
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ -18350,6 +18379,9 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
 #undef TARGET_VECTORIZE_VEC_PERM_CONST
 #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
 
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
+#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-s390.h"
diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
b/gcc/testsuite/gcc.target/s390/ccor.c
index 31f30f60314..36a3c3a999a 100644
--- a/gcc/testsuite/gcc.target/s390/ccor.c
+++ b/gcc/testsuite/gcc.target/s390/ccor.c
@@ -42,7 +42,7 @@ GENFUN1(2)
 
 GENFUN1(3)
 
-/* { dg-final { scan-assembler {locrno} } } */
+/* { dg-final { scan-assembler {locro} } } */
 
 GENFUN2(0,1)
 
@@ -58,7 +58,7 @@ GENFUN2(0,3)
 
 GENFUN2(1,2)
 
-/* { dg-final { scan-assembler {locrnlh} } } */
+/* { dg-final { scan-assembler {locrlh} } } */
 
 GENFUN2(1,3)
 
-- 
2.45.0



[gcc r15-319] tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops

2024-05-08 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:e755f478c24c3e99409936af545ac83d35d27ad9

commit r15-319-ge755f478c24c3e99409936af545ac83d35d27ad9
Author: Stefan Schulze Frielinghaus 
Date:   Wed May 8 10:48:45 2024 +0200

tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops

This fixes a couple of tests (gcc.dg/vect/pr109011-*.c) on s390 where
loops are unrolled although -fno-unroll-loops is specified.

gcc/ChangeLog:

* tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour
-fno-unroll-loops.

Diff:
---
 gcc/tree-ssa-loop-prefetch.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc
index 70073cc4fe46..bb5d5dec7795 100644
--- a/gcc/tree-ssa-loop-prefetch.cc
+++ b/gcc/tree-ssa-loop-prefetch.cc
@@ -1401,6 +1401,10 @@ determine_unroll_factor (class loop *loop, struct 
mem_ref_group *refs,
   struct mem_ref_group *agp;
   struct mem_ref *ref;
 
+  /* Bail out early in case we must not unroll loops.  */
+  if (!flag_unroll_loops)
+return 1;
+
   /* First check whether the loop is not too large to unroll.  We ignore
  PARAM_MAX_UNROLL_TIMES, because for small loops, it prevented us
  from unrolling them enough to make exactly one cache line covered by each


[PATCH] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-05-08 Thread Stefan Schulze Frielinghaus
Consider a NOCE conversion as profitable if there is at least one
conditional move.

gcc/ChangeLog:

* config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
Define.
(s390_noce_conversion_profitable_p): Implement.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
consequence the condition has to be reversed.
---
 Bootstrapped and regtested on s390.  Ok for mainline?

 gcc/config/s390/s390.cc  | 32 
 gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index bf46eab2d63..23b18b5c506 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "context.h"
 #include "builtins.h"
+#include "ifcvt.h"
 #include "rtl-iter.h"
 #include "intl.h"
 #include "tm-constrs.h"
@@ -18037,6 +18038,37 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   return vectorize_vec_perm_const_1 (d);
 }
 
+/* Consider a NOCE conversion as profitable if there is at least one
+   conditional move.  */
+
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
+#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
+
+static bool
+s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  if (if_info->speed_p)
+{
+  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
+   {
+ rtx set = single_set (insn);
+ if (set == NULL)
+   continue;
+ if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
+   continue;
+ rtx src = SET_SRC (set);
+ machine_mode mode = GET_MODE (src);
+ if (GET_MODE_CLASS (mode) != MODE_INT
+ && GET_MODE_CLASS (mode) != MODE_FLOAT)
+   continue;
+ if (GET_MODE_SIZE (mode) > GET_MODE_SIZE (Pmode))
+   continue;
+ return true;
+   }
+}
+  return default_noce_conversion_profitable_p (seq, if_info);
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
b/gcc/testsuite/gcc.target/s390/ccor.c
index 31f30f60314..36a3c3a999a 100644
--- a/gcc/testsuite/gcc.target/s390/ccor.c
+++ b/gcc/testsuite/gcc.target/s390/ccor.c
@@ -42,7 +42,7 @@ GENFUN1(2)
 
 GENFUN1(3)
 
-/* { dg-final { scan-assembler {locrno} } } */
+/* { dg-final { scan-assembler {locro} } } */
 
 GENFUN2(0,1)
 
@@ -58,7 +58,7 @@ GENFUN2(0,3)
 
 GENFUN2(1,2)
 
-/* { dg-final { scan-assembler {locrnlh} } } */
+/* { dg-final { scan-assembler {locrlh} } } */
 
 GENFUN2(1,3)
 
-- 
2.44.0



[PATCH] tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops

2024-05-08 Thread Stefan Schulze Frielinghaus
On s390 the following tests fail

FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .CLZ (vect" 1
FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .POPCOUNT (vect" 1
FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .CLZ 
(vect" 1
FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .POPCOUNT 
(vect" 1
FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .CTZ (vect" 2
FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .POPCOUNT (vect" 1
FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .CTZ 
(vect" 2
FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .POPCOUNT 
(vect" 1
FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .CTZ (vect" 2
FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .POPCOUNT (vect" 1
FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .CTZ 
(vect" 2
FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .POPCOUNT 
(vect" 1

because aprefetch unrolls loops even if -fno-unroll-loops is used.
Accordingly, the scan patterns match more than one time.

Could also be fixed by using -fno-prefetch-loop-arrays for the tests.
Though, I tend to prefer if aprefetch honours -fno-unroll-loops.  Any
preferences?

Bootstrapped and regtested on x86_64 and s390.  Ok for mainline?

gcc/ChangeLog:

* tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour
-fno-unroll-loops.
---
 gcc/tree-ssa-loop-prefetch.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc
index 70073cc4fe4..bb5d5dec779 100644
--- a/gcc/tree-ssa-loop-prefetch.cc
+++ b/gcc/tree-ssa-loop-prefetch.cc
@@ -1401,6 +1401,10 @@ determine_unroll_factor (class loop *loop, struct 
mem_ref_group *refs,
   struct mem_ref_group *agp;
   struct mem_ref *ref;
 
+  /* Bail out early in case we must not unroll loops.  */
+  if (!flag_unroll_loops)
+return 1;
+
   /* First check whether the loop is not too large to unroll.  We ignore
  PARAM_MAX_UNROLL_TIMES, because for small loops, it prevented us
  from unrolling them enough to make exactly one cache line covered by each
-- 
2.44.0



[gcc r15-274] tree-optimization/110490 - bitcount for narrow modes

2024-05-07 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:e1f56c67a82172730c377a96a46e8d75445e6a48

commit r15-274-ge1f56c67a82172730c377a96a46e8d75445e6a48
Author: Stefan Schulze Frielinghaus 
Date:   Tue May 7 14:12:55 2024 +0200

tree-optimization/110490 - bitcount for narrow modes

Bitcount operations popcount, clz, and ctz are emulated for narrow modes
in case an operation is only supported for wider modes.  Beside that ctz
may be emulated via clz in expand_ctz.  Reflect this in
expression_expensive_p.

I considered the emulation of ctz via clz as not expensive since this
basically reduces to ctz (x) = c - (clz (x & ~x)) where c is the mode
precision minus 1 which should be faster than a loop.

gcc/ChangeLog:

PR tree-optimization/110490
* tree-scalar-evolution.cc (expression_expensive_p): Also
consider mode widening for popcount, clz, and ctz.

Diff:
---
 gcc/tree-scalar-evolution.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index b0a5e09a77c..622c7246c1b 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3458,6 +3458,28 @@ bitcount_call:
  && (optab_handler (optab, word_mode)
  != CODE_FOR_nothing))
  break;
+ /* If popcount is available for a wider mode, we emulate the
+operation for a narrow mode by first zero-extending the value
+and then computing popcount in the wider mode.  Analogue for
+ctz.  For clz we do the same except that we additionally have
+to subtract the difference of the mode precisions from the
+result.  */
+ if (is_a  (mode, _mode))
+   {
+ machine_mode wider_mode_iter;
+ FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
+   if (optab_handler (optab, wider_mode_iter)
+   != CODE_FOR_nothing)
+ goto check_call_args;
+ /* Operation ctz may be emulated via clz in expand_ctz.  */
+ if (optab == ctz_optab)
+   {
+ FOR_EACH_WIDER_MODE_FROM (wider_mode_iter, mode)
+   if (optab_handler (clz_optab, wider_mode_iter)
+   != CODE_FOR_nothing)
+ goto check_call_args;
+   }
+   }
  return true;
}
  break;
@@ -3469,6 +3491,7 @@ bitcount_call:
  break;
}
 
+check_call_args:
   FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
if (expression_expensive_p (arg, cond_overflow_p, cache, op_cost))
  return true;


Re: [PATCH] tree-optimization/110490 - bitcount for narrow modes

2024-05-07 Thread Stefan Schulze Frielinghaus
Ping.  Ok for mainline?

On Thu, Apr 25, 2024 at 09:26:45AM +0200, Stefan Schulze Frielinghaus wrote:
> Bitcount operations popcount, clz, and ctz are emulated for narrow modes
> in case an operation is only supported for wider modes.  Beside that ctz
> may be emulated via clz in expand_ctz.  Reflect this in
> expression_expensive_p.
> 
> I considered the emulation of ctz via clz as not expensive since this
> basically reduces to ctz (x) = c - (clz (x & ~x)) where c is the mode
> precision minus 1 which should be faster than a loop.
> 
> Bootstrapped and regtested on x86_64 and s390.  Though, this is probably
> stage1 material?
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/110490
>   * tree-scalar-evolution.cc (expression_expensive_p): Also
>   consider mode widening for popcount, clz, and ctz.
> ---
>  gcc/tree-scalar-evolution.cc | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index b0a5e09a77c..622c7246c1b 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -3458,6 +3458,28 @@ bitcount_call:
> && (optab_handler (optab, word_mode)
> != CODE_FOR_nothing))
> break;
> +   /* If popcount is available for a wider mode, we emulate the
> +  operation for a narrow mode by first zero-extending the value
> +  and then computing popcount in the wider mode.  Analogue for
> +  ctz.  For clz we do the same except that we additionally have
> +  to subtract the difference of the mode precisions from the
> +  result.  */
> +   if (is_a  (mode, _mode))
> + {
> +   machine_mode wider_mode_iter;
> +   FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
> + if (optab_handler (optab, wider_mode_iter)
> + != CODE_FOR_nothing)
> +   goto check_call_args;
> +   /* Operation ctz may be emulated via clz in expand_ctz.  */
> +   if (optab == ctz_optab)
> + {
> +   FOR_EACH_WIDER_MODE_FROM (wider_mode_iter, mode)
> + if (optab_handler (clz_optab, wider_mode_iter)
> + != CODE_FOR_nothing)
> +   goto check_call_args;
> + }
> + }
> return true;
>   }
> break;
> @@ -3469,6 +3491,7 @@ bitcount_call:
> break;
>   }
>  
> +check_call_args:
>FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
>   if (expression_expensive_p (arg, cond_overflow_p, cache, op_cost))
> return true;
> -- 
> 2.44.0
> 


[gcc r15-102] s390: testsuite: Fix risbg-ll-2.c

2024-05-02 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:66f49ccd409c7a3f6eb89dd78e275ab57c983c79

commit r15-102-g66f49ccd409c7a3f6eb89dd78e275ab57c983c79
Author: Stefan Schulze Frielinghaus 
Date:   Thu May 2 08:43:50 2024 +0200

s390: testsuite: Fix risbg-ll-2.c

Starting with r14-2047-gd0e891406b16dc we see through subregs which
means for f10 in risbg-ll-2.c we do not end up with rosbg_si_noshift but
rather rosbg_di_noshift which materializes in slightly different start
index.  This saves us an extend.

gcc/testsuite/ChangeLog:

* gcc.target/s390/risbg-ll-2.c: Fix start offset for rosbg of
f10.

Diff:
---
 gcc/testsuite/gcc.target/s390/risbg-ll-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/risbg-ll-2.c 
b/gcc/testsuite/gcc.target/s390/risbg-ll-2.c
index 8bf1a0ff88b..ca80602a83f 100644
--- a/gcc/testsuite/gcc.target/s390/risbg-ll-2.c
+++ b/gcc/testsuite/gcc.target/s390/risbg-ll-2.c
@@ -113,7 +113,7 @@ i32 f9 (i64 v_x, i32 v_y)
 // ands with incompatible masks.
 i32 f10 (i64 v_x, i32 v_y)
 {
-  /* { dg-final { scan-assembler 
"f10:\n\tsrlg\t%r2,%r2,48\n\trosbg\t%r2,%r3,32,39,0" { target { lp64 } } } } */
+  /* { dg-final { scan-assembler 
"f10:\n\tsrlg\t%r2,%r2,48\n\trosbg\t%r2,%r3,0,39,0" { target { lp64 } } } } */
   /* { dg-final { scan-assembler 
"f10:\n\tnilf\t%r4,4278190080\n\trosbg\t%r4,%r2,48,63,48" { target { ! lp64 } } 
} } */
   i64 v_shr6 = ((ui64)v_x) >> 48;
   i32 v_conv = (ui32)v_shr6;


[gcc r15-100] s390: testsuite: Fix zero_bits_compound-1.c

2024-05-02 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:6c4a745c6910659a75d1881cf3c4128f24b5666f

commit r15-100-g6c4a745c6910659a75d1881cf3c4128f24b5666f
Author: Stefan Schulze Frielinghaus 
Date:   Thu May 2 08:39:32 2024 +0200

s390: testsuite: Fix zero_bits_compound-1.c

Starting with r12-2731-g96146e61cd7aee we do not generate code like

_5 = (unsigned int) c_2(D);
i_6 = _5 << 8;
_7 = _5 << 20;
i_8 = i_6 | _7;

anymore but instead

_5 = (unsigned int) c_2(D);
_3 = _5 * 1048832;

which leads finally to slightly different assembly code where we
previously ended up for z10 or newer with

lr  %r1,%r2
sll %r1,8
rosbg   %r1,%r2,32,43,20
llgfr   %r2,%r1
br  %r14

and now

lr  %r1,%r2
sll %r1,12
ar  %r2,%r1
risbg   %r2,%r2,35,128+55,8
br  %r14

The zero-extend materializes via risbg for which the pattern contains an
"and" which is why the test fails.  Thus, instead of scanning for RTL
expressions rather scan for assembler instructions for s390.

gcc/testsuite/ChangeLog:

* gcc.dg/zero_bits_compound-1.c: Fix for s390.

Diff:
---
 gcc/testsuite/gcc.dg/zero_bits_compound-1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/zero_bits_compound-1.c 
b/gcc/testsuite/gcc.dg/zero_bits_compound-1.c
index e71594911b2..f1e267e0fb0 100644
--- a/gcc/testsuite/gcc.dg/zero_bits_compound-1.c
+++ b/gcc/testsuite/gcc.dg/zero_bits_compound-1.c
@@ -39,4 +39,5 @@ unsigned long bar (unsigned char c)
 }
 
 /* Check that no pattern containing an AND expression was used.  */
-/* { dg-final { scan-assembler-not "\\(and:" } } */
+/* { dg-final { scan-assembler-not "\\(and:" { target { ! { s390*-*-* } } } } 
} */
+/* { dg-final { scan-assembler-not "\\tng?rk?\\t" { target { s390*-*-* } } } } 
*/


[PATCH] s390: testsuite: Fix risbg-ll-2.c

2024-04-30 Thread Stefan Schulze Frielinghaus
Starting with r14-2047-gd0e891406b16dc we see through subregs which
means for f10 in risbg-ll-2.c we do not end up with rosbg_si_noshift but
rather rosbg_di_noshift which materializes in slightly different start
index.  This saves us an extend.

gcc/testsuite/ChangeLog:

* gcc.target/s390/risbg-ll-2.c: Fix start offset for rosbg of
f10.
---
 Ok for mainline?

 gcc/testsuite/gcc.target/s390/risbg-ll-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/risbg-ll-2.c 
b/gcc/testsuite/gcc.target/s390/risbg-ll-2.c
index 8bf1a0ff88b..ca80602a83f 100644
--- a/gcc/testsuite/gcc.target/s390/risbg-ll-2.c
+++ b/gcc/testsuite/gcc.target/s390/risbg-ll-2.c
@@ -113,7 +113,7 @@ i32 f9 (i64 v_x, i32 v_y)
 // ands with incompatible masks.
 i32 f10 (i64 v_x, i32 v_y)
 {
-  /* { dg-final { scan-assembler 
"f10:\n\tsrlg\t%r2,%r2,48\n\trosbg\t%r2,%r3,32,39,0" { target { lp64 } } } } */
+  /* { dg-final { scan-assembler 
"f10:\n\tsrlg\t%r2,%r2,48\n\trosbg\t%r2,%r3,0,39,0" { target { lp64 } } } } */
   /* { dg-final { scan-assembler 
"f10:\n\tnilf\t%r4,4278190080\n\trosbg\t%r4,%r2,48,63,48" { target { ! lp64 } } 
} } */
   i64 v_shr6 = ((ui64)v_x) >> 48;
   i32 v_conv = (ui32)v_shr6;
-- 
2.44.0



[PATCH] s390: testsuite: Fix zero_bits_compound-1.c

2024-04-30 Thread Stefan Schulze Frielinghaus
Starting with r12-2731-g96146e61cd7aee we do not generate code like

_5 = (unsigned int) c_2(D);
i_6 = _5 << 8;
_7 = _5 << 20;
i_8 = i_6 | _7;

anymore but instead

_5 = (unsigned int) c_2(D);
_3 = _5 * 1048832;

which leads finally to slightly different assembly code where we
previously ended up for z10 or newer with

lr  %r1,%r2
sll %r1,8
rosbg   %r1,%r2,32,43,20
llgfr   %r2,%r1
br  %r14

and now

lr  %r1,%r2
sll %r1,12
ar  %r2,%r1
risbg   %r2,%r2,35,128+55,8
br  %r14

The zero-extend materializes via risbg for which the pattern contains an
"and" which is why the test fails.  Thus, instead of scanning for RTL
expressions rather scan for assembler instructions for s390.
---
 Ok for mainline?

 gcc/testsuite/gcc.dg/zero_bits_compound-1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/zero_bits_compound-1.c 
b/gcc/testsuite/gcc.dg/zero_bits_compound-1.c
index e71594911b2..f1e267e0fb0 100644
--- a/gcc/testsuite/gcc.dg/zero_bits_compound-1.c
+++ b/gcc/testsuite/gcc.dg/zero_bits_compound-1.c
@@ -39,4 +39,5 @@ unsigned long bar (unsigned char c)
 }
 
 /* Check that no pattern containing an AND expression was used.  */
-/* { dg-final { scan-assembler-not "\\(and:" } } */
+/* { dg-final { scan-assembler-not "\\(and:" { target { ! { s390*-*-* } } } } 
} */
+/* { dg-final { scan-assembler-not "\\tng?rk?\\t" { target { s390*-*-* } } } } 
*/
-- 
2.44.0



Build errors for older versions

2024-04-25 Thread Stefan Schulze Frielinghaus via Gcc
Hi all,

while bisecting I recently ran into build errors like

In file included from /devel/gcc/libgcc/../gcc/tsystem.h:101,
 from /devel/gcc/libgcc/libgcov.h:42,
 from /devel/gcc/libgcc/libgcov-interface.c:26:
/usr/include/stdlib.h:931:6: error: wrong number of arguments specified for 
'malloc' attribute
  931 |  __attr_dealloc_free __wur;
  |  ^~~
/usr/include/stdlib.h:931:6: note: expected between 0 and 0, found 2

My host system is Fedora 39 on x86_64 while trying to build
r11-3896-g61a43de58cb6de.  The error does not appear if I'm using e.g.
Fedora 34.  Is this known and if so does there exist a workaround such
that building older versions on a recent OS works?

Cheers,
Stefan


[PATCH] tree-optimization/110490 - bitcount for narrow modes

2024-04-25 Thread Stefan Schulze Frielinghaus
Bitcount operations popcount, clz, and ctz are emulated for narrow modes
in case an operation is only supported for wider modes.  Beside that ctz
may be emulated via clz in expand_ctz.  Reflect this in
expression_expensive_p.

I considered the emulation of ctz via clz as not expensive since this
basically reduces to ctz (x) = c - (clz (x & ~x)) where c is the mode
precision minus 1 which should be faster than a loop.

Bootstrapped and regtested on x86_64 and s390.  Though, this is probably
stage1 material?

gcc/ChangeLog:

PR tree-optimization/110490
* tree-scalar-evolution.cc (expression_expensive_p): Also
consider mode widening for popcount, clz, and ctz.
---
 gcc/tree-scalar-evolution.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index b0a5e09a77c..622c7246c1b 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3458,6 +3458,28 @@ bitcount_call:
  && (optab_handler (optab, word_mode)
  != CODE_FOR_nothing))
  break;
+ /* If popcount is available for a wider mode, we emulate the
+operation for a narrow mode by first zero-extending the value
+and then computing popcount in the wider mode.  Analogue for
+ctz.  For clz we do the same except that we additionally have
+to subtract the difference of the mode precisions from the
+result.  */
+ if (is_a  (mode, _mode))
+   {
+ machine_mode wider_mode_iter;
+ FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
+   if (optab_handler (optab, wider_mode_iter)
+   != CODE_FOR_nothing)
+ goto check_call_args;
+ /* Operation ctz may be emulated via clz in expand_ctz.  */
+ if (optab == ctz_optab)
+   {
+ FOR_EACH_WIDER_MODE_FROM (wider_mode_iter, mode)
+   if (optab_handler (clz_optab, wider_mode_iter)
+   != CODE_FOR_nothing)
+ goto check_call_args;
+   }
+   }
  return true;
}
  break;
@@ -3469,6 +3491,7 @@ bitcount_call:
  break;
}
 
+check_call_args:
   FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
if (expression_expensive_p (arg, cond_overflow_p, cache, op_cost))
  return true;
-- 
2.44.0



[gcc r14-10090] s390: testsuite: Xfail forwprop-4{0,1}.c

2024-04-23 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:3d5699930fe6cfc595e5a920ab36a1bc065be534

commit r14-10090-g3d5699930fe6cfc595e5a920ab36a1bc065be534
Author: Stefan Schulze Frielinghaus 
Date:   Tue Apr 23 13:29:10 2024 +0200

s390: testsuite: Xfail forwprop-4{0,1}.c

The tests fail on s390 since can_vec_perm_const_p fails and therefore
the bit insert/ref survive which r14-3381-g27de9aa152141e aims for.
Strictly speaking, the tests only fail in case the target supports
vectors, i.e., for targets prior z13 or in case of -mesa the emulated
vector operations are optimized out.

Set to xfail and tracked by PR114802.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/forwprop-40.c: Xfail for s390.
* gcc.dg/tree-ssa/forwprop-41.c: Xfail for s390.
* lib/target-supports.exp: Add target check s390_mvx.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c |  4 ++--
 gcc/testsuite/lib/target-supports.exp   | 14 ++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
index 7513497f552..0c5233a68f4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
@@ -10,5 +10,5 @@ vector int g(vector int a)
   return a;
 }
 
-/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 0 "optimized" { xfail 
s390_mvx } } } Xfail: PR114802 */
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" { xfail 
s390_mvx } } } Xfail: PR114802 */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
index b1e75797a90..a1f08289dd6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
@@ -11,6 +11,6 @@ vector int g(vector int a, int c)
   return a;
 }
 
-/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 1 "optimized" { xfail 
s390_mvx } } } Xfail PR114802 */
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" { xfail 
s390_mvx } } } Xfail PR114802 */
 /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 3a5713d9869..3a55b2a4159 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12392,6 +12392,20 @@ proc check_effective_target_profile_update_atomic {} {
 } "-fprofile-update=atomic -fprofile-generate"]
 }
 
+# Return 1 if the target has a vector facility.
+proc check_effective_target_s390_mvx { } {
+if ![istarget s390*-*-*] then {
+   return 0;
+}
+
+return [check_no_compiler_messages_nocache s390_mvx assembly {
+   #if !defined __VX__
+   #error no vector facility.
+   #endif
+   int dummy;
+} [current_compiler_flags]]
+}
+
 # Return 1 if vector (va - vector add) instructions are understood by
 # the assembler and can be executed.  This also covers checking for
 # the VX kernel feature.  A kernel without that feature does not


[PATCH] s390: testsuite: Xfail forwprop-4{0,1}.c

2024-04-22 Thread Stefan Schulze Frielinghaus
Hi Andreas,

Ok then I will proceed with the patch as is.  Opened PR114802.

Cheers,
Stefan

--

The tests fail on s390 since can_vec_perm_const_p fails and therefore
the bit insert/ref survive which r14-3381-g27de9aa152141e aims for.
Strictly speaking, the tests only fail in case the target supports
vectors, i.e., for targets prior z13 or in case of -mesa the emulated
vector operations are optimized out.

Set to xfail and tracked by PR114802.
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c |  4 ++--
 gcc/testsuite/lib/target-supports.exp   | 14 ++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
index 7513497f552..0c5233a68f4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
@@ -10,5 +10,5 @@ vector int g(vector int a)
   return a;
 }
 
-/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 0 "optimized" { xfail 
s390_mvx } } } Xfail: PR114802 */
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" { xfail 
s390_mvx } } } Xfail: PR114802 */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
index b1e75797a90..a1f08289dd6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
@@ -11,6 +11,6 @@ vector int g(vector int a, int c)
   return a;
 }
 
-/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 1 "optimized" { xfail 
s390_mvx } } } Xfail PR114802 */
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" { xfail 
s390_mvx } } } Xfail PR114802 */
 /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 3a5713d9869..3a55b2a4159 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12392,6 +12392,20 @@ proc check_effective_target_profile_update_atomic {} {
 } "-fprofile-update=atomic -fprofile-generate"]
 }
 
+# Return 1 if the target has a vector facility.
+proc check_effective_target_s390_mvx { } {
+if ![istarget s390*-*-*] then {
+   return 0;
+}
+
+return [check_no_compiler_messages_nocache s390_mvx assembly {
+   #if !defined __VX__
+   #error no vector facility.
+   #endif
+   int dummy;
+} [current_compiler_flags]]
+}
+
 # Return 1 if vector (va - vector add) instructions are understood by
 # the assembler and can be executed.  This also covers checking for
 # the VX kernel feature.  A kernel without that feature does not
-- 
2.44.0



[gcc r14-10066] s390: testsuite: Remove xfail for vpopct{b,h}

2024-04-22 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:16aea8c584ea2784a4f5a39352f867506d3441f6

commit r14-10066-g16aea8c584ea2784a4f5a39352f867506d3441f6
Author: Stefan Schulze Frielinghaus 
Date:   Mon Apr 15 15:28:43 2024 +0200

s390: testsuite: Remove xfail for vpopct{b,h}

Starting with r14-9316-g7890836de20912 patterns for vpopct{b,h} are also
detected.  Thus, remove xfails.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vxe/popcount-1.c: Remove xfail.

Diff:
---
 gcc/testsuite/gcc.target/s390/vxe/popcount-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c 
b/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c
index 9ea835a1cf0..25ef354f963 100644
--- a/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c
+++ b/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c
@@ -21,7 +21,7 @@ vpopctb (uv16qi a)
 
   return r;
 }
-/* { dg-final { scan-assembler "vpopctb\t%v24,%v24" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler "vpopctb\t%v24,%v24" } } */
 
 uv8hi __attribute__((noinline))
 vpopcth (uv8hi a)
@@ -34,7 +34,7 @@ vpopcth (uv8hi a)
 
   return r;
 }
-/* { dg-final { scan-assembler "vpopcth\t%v24,%v24" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler "vpopcth\t%v24,%v24" } } */
 
 uv4si __attribute__((noinline))
 vpopctf (uv4si a)


[PATCH] s390: testsuite: Fix forwprop-4{0,1}.c

2024-04-22 Thread Stefan Schulze Frielinghaus
The tests fail on s390 since can_vec_perm_const_p fails and therefore
the bit insert/ref survive which r14-3381-g27de9aa152141e aims for.
Strictly speaking, the tests only fail in case the target supports
vectors, i.e., for targets prior z13 or in case of -mesa the emulated
vector operations are optimized out.

Easiest would be to skip the entire test for s390.  Another solution
would be to xfail in case of vector support hoping that eventually we
end up with an xpass for a future machine generation or if gcc advances.
That is implemented by this patch.  In order to do so I implemented a
new target test s390_mvx which tests whether vector support is available
or not.  Maybe this is already over-engineered for a simple test?  Any
thoughts?
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c |  4 ++--
 gcc/testsuite/lib/target-supports.exp   | 14 ++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
index 7513497f552..b67e3e93a7f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
@@ -10,5 +10,5 @@ vector int g(vector int a)
   return a;
 }
 
-/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 0 "optimized" { xfail 
s390_mvx } } } */
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" { xfail 
s390_mvx } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
index b1e75797a90..0f119675207 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
@@ -11,6 +11,6 @@ vector int g(vector int a, int c)
   return a;
 }
 
-/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 1 "optimized" { xfail 
s390_mvx } } } */
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" { xfail 
s390_mvx } } } */
 /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index edce672c0e2..5a692baa8ef 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12380,6 +12380,20 @@ proc check_effective_target_profile_update_atomic {} {
 } "-fprofile-update=atomic -fprofile-generate"]
 }
 
+# Return 1 if the target has a vector facility.
+proc check_effective_target_s390_mvx { } {
+if ![istarget s390*-*-*] then {
+   return 0;
+}
+
+return [check_no_compiler_messages_nocache s390_mvx assembly {
+   #if !defined __VX__
+   #error no vector facility.
+   #endif
+   int dummy;
+} [current_compiler_flags]]
+}
+
 # Return 1 if vector (va - vector add) instructions are understood by
 # the assembler and can be executed.  This also covers checking for
 # the VX kernel feature.  A kernel without that feature does not
-- 
2.44.0



[PATCH] s390: testsuite: Remove xfail for vpopct{b,h}

2024-04-22 Thread Stefan Schulze Frielinghaus
Starting with r14-9316-g7890836de20912 patterns for vpopct{b,h} are also
detected.  Thus, remove xfails.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vxe/popcount-1.c: Remove xfail.
---
 Ok for mainline?

 gcc/testsuite/gcc.target/s390/vxe/popcount-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c 
b/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c
index 9ea835a1cf0..25ef354f963 100644
--- a/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c
+++ b/gcc/testsuite/gcc.target/s390/vxe/popcount-1.c
@@ -21,7 +21,7 @@ vpopctb (uv16qi a)
 
   return r;
 }
-/* { dg-final { scan-assembler "vpopctb\t%v24,%v24" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler "vpopctb\t%v24,%v24" } } */
 
 uv8hi __attribute__((noinline))
 vpopcth (uv8hi a)
@@ -34,7 +34,7 @@ vpopcth (uv8hi a)
 
   return r;
 }
-/* { dg-final { scan-assembler "vpopcth\t%v24,%v24" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler "vpopcth\t%v24,%v24" } } */
 
 uv4si __attribute__((noinline))
 vpopctf (uv4si a)
-- 
2.44.0



[gcc r14-9939] s390: testsuite: Xfail range-sincos.c and vrp-float-abs-1.c

2024-04-12 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:a76f236e084cbd02e4e3711cdfc3191dc7eeb460

commit r14-9939-ga76f236e084cbd02e4e3711cdfc3191dc7eeb460
Author: Stefan Schulze Frielinghaus 
Date:   Fri Apr 12 16:54:38 2024 +0200

s390: testsuite: Xfail range-sincos.c and vrp-float-abs-1.c

As mentioned in PR114678 those failures will be fixed by
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html
For GCC 14 just xfail them which should be reverted once the patch is
applied.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/range-sincos.c: Xfail for s390.
* gcc.dg/tree-ssa/vrp-float-abs-1.c: Dito.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
index 337f9cda02f..35b38c3c914 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
@@ -40,4 +40,4 @@ stool (double x)
 link_error ();
 }
 
-// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { *-*-linux* 
} && { glibc } } } } }
+// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { *-*-linux* 
} && { glibc } } xfail s390*-*-* } } } xfail: PR114678
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
index 4b7b75833e0..a814a973963 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
@@ -14,4 +14,4 @@ foo (double x, double y)
 }
 }
 
-// { dg-final { scan-tree-dump-not "link_error" "evrp" } }
+// { dg-final { scan-tree-dump-not "link_error" "evrp" { xfail s390*-*-* } } } 
xfail: PR114678


[gcc r14-9935] analyzer: Bail out on function pointer for -Wanalyzer-allocation-size

2024-04-12 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:67e1433a94f8ca82e2c36b79af44256430c73c38

commit r14-9935-g67e1433a94f8ca82e2c36b79af44256430c73c38
Author: Stefan Schulze Frielinghaus 
Date:   Fri Apr 12 11:06:24 2024 +0200

analyzer: Bail out on function pointer for -Wanalyzer-allocation-size

On s390 pr94688.c is failing due to excess error

pr94688.c:6:5: warning: allocated buffer size is not a multiple of the 
pointee's size [CWE-131] [-Wanalyzer-allocation-size]

This is because on s390 functions are by default aligned to an 8-byte
boundary and during function type construction size is set to function
boundary.  Thus, for the assignment

a.0_1 = (void (*) ()) 

we have that the right-hand side is pointing to a 4-byte memory region
whereas the size of the function pointer is 8 byte and a warning is
emitted.

Since -Wanalyzer-allocation-size is not about pointers to code, bail out
early.

gcc/analyzer/ChangeLog:

* region-model.cc (region_model::check_region_size): Bail out
early on function pointers.

Diff:
---
 gcc/analyzer/region-model.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 665873dbe94..bebe2ed3cd6 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3514,6 +3514,10 @@ region_model::check_region_size (const region *lhs_reg, 
const svalue *rhs_sval,
   || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
 return;
 
+  /* Bail out early on function pointers.  */
+  if (TREE_CODE (pointee_type) == FUNCTION_TYPE)
+return;
+
   /* Bail out early on pointers to structs where we can
  not deduce whether the buffer size is compatible.  */
   bool is_struct = RECORD_OR_UNION_TYPE_P (pointee_type);


[PATCH] s390: testsuite: Xfail range-sincos.c and vrp-float-abs-1.c

2024-04-12 Thread Stefan Schulze Frielinghaus
As mentioned in PR114678 those failures will be fixed by
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html
For GCC 14 just xfail them which should be reverted once the patch is
applied.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/range-sincos.c: Xfail for s390.
* gcc.dg/tree-ssa/vrp-float-abs-1.c: Dito.
---
 Ok for mainline?

 gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
index 337f9cda02f..35b38c3c914 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
@@ -40,4 +40,4 @@ stool (double x)
 link_error ();
 }
 
-// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { *-*-linux* 
} && { glibc } } } } }
+// { dg-final { scan-tree-dump-not "link_error" "evrp" { target { { *-*-linux* 
} && { glibc } } xfail s390*-*-* } } } xfail: PR114678
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
index 4b7b75833e0..a814a973963 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
@@ -14,4 +14,4 @@ foo (double x, double y)
 }
 }
 
-// { dg-final { scan-tree-dump-not "link_error" "evrp" } }
+// { dg-final { scan-tree-dump-not "link_error" "evrp" { xfail s390*-*-* } } } 
xfail: PR114678
-- 
2.43.0



[gcc r14-9931] testsuite: Fix loop-interchange-16.c

2024-04-12 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:b6c8259076a336e8082853ed6dda083c25a465d0

commit r14-9931-gb6c8259076a336e8082853ed6dda083c25a465d0
Author: Stefan Schulze Frielinghaus 
Date:   Fri Apr 12 09:20:53 2024 +0200

testsuite: Fix loop-interchange-16.c

Prevent loop unrolling of the innermost loop because otherwise we are
left with no loop interchange for targets like s390 which have a more
aggressive loop unrolling strategy.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/loop-interchange-16.c: Prevent loop unrolling
of the innermost loop.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
index 781555e085d..bbcb14f9c6c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
@@ -11,6 +11,7 @@ double s231(int iterations)
 //loop with data dependency
 for (int nl = 0; nl < 100*(iterations/LEN_2D); nl++) {
 for (int i = 0; i < LEN_2D; ++i) {
+#pragma GCC unroll 0
 for (int j = 1; j < LEN_2D; j++) {
 aa[j][i] = aa[j - 1][i] + bb[j][i];
 }


[PATCH] testsuite: Fix loop-interchange-16.c

2024-04-11 Thread Stefan Schulze Frielinghaus
Yes, that works, too.  Will commit.

Thanks,
Stefan

--

Prevent loop unrolling of the innermost loop because otherwise we are
left with no loop interchange for targets like s390 which have a more
aggressive loop unrolling strategy.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/loop-interchange-16.c: Prevent loop unrolling
of the innermost loop.
---
 gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
index 781555e085d..bbcb14f9c6c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
@@ -11,6 +11,7 @@ double s231(int iterations)
 //loop with data dependency
 for (int nl = 0; nl < 100*(iterations/LEN_2D); nl++) {
 for (int i = 0; i < LEN_2D; ++i) {
+#pragma GCC unroll 0
 for (int j = 1; j < LEN_2D; j++) {
 aa[j][i] = aa[j - 1][i] + bb[j][i];
 }
-- 
2.43.0



[PATCH] s390: testsuite: Fix loop-interchange-16.c

2024-04-11 Thread Stefan Schulze Frielinghaus
Revert parameter max-completely-peel-times to 16, otherwise, the
innermost loop is removed and we are left with no loop interchange which
this test is all about.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/loop-interchange-16.c: Revert parameter
max-completely-peel-times for s390.
---
 Ok for mainline?

 gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
index 781555e085d..2530ec84bc0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
@@ -1,6 +1,7 @@
 /* PR/101280 */
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-linterchange-details" } */
+/* { dg-additional-options "--param max-completely-peel-times=16" { target 
s390*-*-* } } */
 
 void dummy (double *, double *);
 #define LEN_2D 32
-- 
2.43.0



Re: [PATCH] s390x: Optimize vector permute with constant indexes

2024-04-09 Thread Stefan Schulze Frielinghaus
On Tue, Apr 02, 2024 at 09:56:01AM +0200, Juergen Christ wrote:
> Loop vectorizer can generate vector permutes with constant indexes
> where all indexes are equal.  Optimize this case to use vector
> replicate instead of vector permute.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.cc (expand_perm_as_replicate): Implement.
>   (vectorize_vec_perm_const_1): Call new function.
>   * config/s390/vx-builtins.md (vec_splat): Change to...
>   (@vec_splat): ...this.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/vec-expand-replicate.c: New test.
> 
> Bootstrapped and regtested on s390x.  Ok for trunk?
> 
> Signed-off-by: Juergen Christ 
> ---
>  gcc/config/s390/s390.cc   | 32 +++
>  gcc/config/s390/vx-builtins.md|  2 +-
>  .../s390/vector/vec-expand-replicate.c| 30 +
>  3 files changed, 63 insertions(+), 1 deletion(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
> 
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index 372a23244032..4b4014ebe444 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -17923,6 +17923,35 @@ expand_perm_as_a_vlbr_vstbr_candidate (const struct 
> expand_vec_perm_d )
>return false;
>  }
>  
> +static bool expand_perm_as_replicate (const struct expand_vec_perm_d )
   ^~~~
Function names start on a new line.

> +{
> +  unsigned char i;
> +  unsigned char elem;
> +  rtx base = d.op0;
> +  rtx insn;
> +  /* Needed to silence maybe-uninitialized warning.  */
> +  gcc_assert(d.nelt > 0);
 ~~^~~~
Between function name and open bracket whitespace is missing.

Curiously enough, the error is about d which is a reference and cannot
be null.  If you are eager you could reduce this and open a PR.

s390.cc:17935:8: warning: ‘d’ may be used uninitialized [-Wmaybe-uninitialized]
17935 |   elem = d.perm[0];
  |   ~^~~

> +  elem = d.perm[0];
> +  for (i = 1; i < d.nelt; ++i)
> +if (d.perm[i] != elem)
> +  return false;
> +  if (!d.testing_p)
> +{
> +  if (elem >= d.nelt)
> + {
> +   base = d.op1;
> +   elem -= d.nelt;
> + }
> +  insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem));
> +  if (insn == NULL_RTX)
> + return false;
> +  emit_insn (insn);
> +  return true;
> +}
> +  else
> +return maybe_code_for_vec_splat (d.vmode) != CODE_FOR_nothing;
> +}
> +
>  /* Try to find the best sequence for the vector permute operation
> described by D.  Return true if the operation could be
> expanded.  */
> @@ -17941,6 +17970,9 @@ vectorize_vec_perm_const_1 (const struct 
> expand_vec_perm_d )
>if (expand_perm_as_a_vlbr_vstbr_candidate (d))
>  return true;
>  
> +  if (expand_perm_as_replicate(d))
 ^~~
Between function name and open bracket whitespace is missing.

> +return true;
> +
>return false;
>  }
>  
> diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
> index 432d81a719fc..93c0d408a43e 100644
> --- a/gcc/config/s390/vx-builtins.md
> +++ b/gcc/config/s390/vx-builtins.md
> @@ -424,7 +424,7 @@
>  
>  
>  ; Replicate from vector element
> -(define_expand "vec_splat"
> +(define_expand "@vec_splat"
>[(set (match_operand:V_HW  0 "register_operand"  "")
>   (vec_duplicate:V_HW (vec_select:
>(match_operand:V_HW 1 "register_operand"  "")
> diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c 
> b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
> new file mode 100644
> index ..27563a00f22b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/vector/vec-expand-replicate.c
> @@ -0,0 +1,30 @@
> +/* Check that the vectorize_vec_perm_const expander correctly deals with
> +   replication.  Extracted from spec "nab".  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mzarch -march=z13 -fvect-cost-model=unlimited" } */
> +
> +
> +#define REAL_T  double
> +typedef REAL_T  MATRIX_T[ 4 ][ 4 ];
> +
> +int concat_mat_i, concat_mat_j;
> +static void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3);
> +MATRIX_T *rot4p() {
> +  MATRIX_T mat3, mat4;
> +  static MATRIX_T mat5;
> +  concat_mat(mat4, mat3, mat5);
> +}
> +void concat_mat(MATRIX_T m1, MATRIX_T, MATRIX_T m3) {
> +  int k;
> +  for (;; concat_mat_i++) {
> +concat_mat_j = 0;
> +for (; 4; concat_mat_j++) {
> +  k = 0;
> +  for (; k < 4; k++)
> +m3[concat_mat_i][concat_mat_j] += m1[concat_mat_i][k];
> +}

Just nitpicking, if we could come up with a test case which does not
involve integer overflows due to non-terminating loops, I would prefer
that.

Cheers,
Stefan

> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "vperm" } } */
> -- 
> 2.39.3
> 


[gcc r14-9683] testsuite: Fix copy-headers-8.c

2024-03-27 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:291c46a3f0d0355680f94280e955f4faf1cae6f9

commit r14-9683-g291c46a3f0d0355680f94280e955f4faf1cae6f9
Author: Stefan Schulze Frielinghaus 
Date:   Wed Mar 27 08:50:47 2024 +0100

testsuite: Fix copy-headers-8.c

For targets where LOGICAL_OP_NON_SHORT_CIRCUIT evaluates to false, two
conditional jumps are emitted instead of a combined conditional which
this test is all about.  Thus, set it to true.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/copy-headers-8.c: Set
LOGICAL_OP_NON_SHORT_CIRCUIT to true.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c
index 8b4b5e7ea81..e35aaf93da8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c
@@ -1,5 +1,8 @@
+/* For targets where LOGICAL_OP_NON_SHORT_CIRCUIT evaluates to false, two
+   conditional jumps are emitted instead of a combined conditional which this
+   test is all about.  Thus, set it to true.  */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-ch2-details" } */
+/* { dg-options "-O2 -fdump-tree-ch2-details --param 
logical-op-non-short-circuit=1" } */
 
 int is_sorted(int *a, int n, int m, int k)
 {


[PATCH] testsuite: Fix copy-headers-8.c

2024-03-26 Thread Stefan Schulze Frielinghaus
This fixes the test on s390x.  I'm also seeing test failures for
riscv64-suse-linux-gnu, m68k-unknown-linux-gnu, pru-unknown-elf, and
powerpc64le-unknown-linux-gnu.  However, I didn't check them so this
might or might not fix those, too.

OK for mainline?

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/copy-headers-8.c: Set
LOGICAL_OP_NON_SHORT_CIRCUIT to true.
---
 gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c
index 8b4b5e7ea81..28b4d15d87f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-8.c
@@ -1,5 +1,8 @@
+/* For targets where LOGICAL_OP_NON_SHORT_CIRCUIT evaluates to false, two
+   conditional jumps are emitted instead of a combined conditional which this
+   test is all about.  Thus, set it to true.  */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-ch2-details" } */
+/* { dg-options "-O2 -fdump-tree-ch2-details --param 
logical-op-non-short-circuit=1" } */
 
 int is_sorted(int *a, int n, int m, int k)
 {
-- 
2.43.0



[gcc r14-9615] s390: testsuite: Fix backprop-6.c

2024-03-22 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:e0a7233e1d2e617e1913b9873599e7a50bfe1c8f

commit r14-9615-ge0a7233e1d2e617e1913b9873599e7a50bfe1c8f
Author: Stefan Schulze Frielinghaus 
Date:   Fri Mar 22 11:23:24 2024 +0100

s390: testsuite: Fix backprop-6.c

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/backprop-6.c: On s390 we also have a copysign
optab for long double.  Thus, scan 3 instead of 2 times for it.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
index 4087ba93018..dbde681e383 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
@@ -27,8 +27,9 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" { 
target ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 
"backprop" { target ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 
"backprop" { target ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" { 
target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 
"backprop" { target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 
"backprop" { target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 3 
"backprop" { target { ifn_copysign && s390*-*-* } } } } */
 /* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" { 
target { ! ifn_copysign } } } } */
 /* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 
"backprop" { target { ! ifn_copysign } } } } */


[PATCH] s390: testsuite: Fix backprop-6.c

2024-03-22 Thread Stefan Schulze Frielinghaus
gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/backprop-6.c: On s390 we also have a copysign
optab for long double.  Thus, scan 3 instead of 2 times for it.
---
 OK for mainline?

 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
index 4087ba93018..dbde681e383 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
@@ -27,8 +27,9 @@ TEST_FUNCTION (float, f)
 TEST_FUNCTION (double, )
 TEST_FUNCTION (long double, l)
 
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" { 
target ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 
"backprop" { target ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 
"backprop" { target ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" { 
target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 
"backprop" { target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 
"backprop" { target { ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 3 
"backprop" { target { ifn_copysign && s390*-*-* } } } } */
 /* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" { 
target { ! ifn_copysign } } } } */
 /* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 
"backprop" { target { ! ifn_copysign } } } } */
-- 
2.43.0



[gcc r14-9608] s390: testsuite: Fix abs-4.c

2024-03-22 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:d4ad99b0355bce23524aa0ecb5100b987279de96

commit r14-9608-gd4ad99b0355bce23524aa0ecb5100b987279de96
Author: Stefan Schulze Frielinghaus 
Date:   Fri Mar 22 08:41:39 2024 +0100

s390: testsuite: Fix abs-4.c

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/abs-4.c: On s390 we also have a copysign optab
for long double.  Thus, scan 3 instead of 2 times for it.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/abs-4.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
index 80fa448df12..4144d1cd954 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
@@ -10,8 +10,9 @@ long double abs_ld(long double x) { return 
__builtin_signbit(x) ? x : -x; }
 
 /* __builtin_signbit(x) ? x : -x. Should be convert into - ABS_EXP */
 /* { dg-final { scan-tree-dump-not "signbit" "optimized"} } */
-/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized" { target 
ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times "= -" 1 "optimized" { target ifn_copysign 
} } } */
-/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized" { target 
ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= -" 1 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 3 "optimized" { target { 
ifn_copysign && s390*-*-* } } } } */
 /* { dg-final { scan-tree-dump-times "= ABS_EXPR" 3 "optimized" { target { ! 
ifn_copysign } } } } */
 /* { dg-final { scan-tree-dump-times "= -" 3 "optimized" { target { ! 
ifn_copysign } } } } */


[PATCH] s390: testsuite: Fix abs-4.c

2024-03-21 Thread Stefan Schulze Frielinghaus
gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/abs-4.c: On s390 we also have a copysign optab
for long double.  Thus, scan 3 instead of 2 times for it.
---
 Ok for mainline?

 gcc/testsuite/gcc.dg/tree-ssa/abs-4.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
index 80fa448df12..4144d1cd954 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
@@ -10,8 +10,9 @@ long double abs_ld(long double x) { return 
__builtin_signbit(x) ? x : -x; }
 
 /* __builtin_signbit(x) ? x : -x. Should be convert into - ABS_EXP */
 /* { dg-final { scan-tree-dump-not "signbit" "optimized"} } */
-/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized" { target 
ifn_copysign } } } */
-/* { dg-final { scan-tree-dump-times "= -" 1 "optimized" { target ifn_copysign 
} } } */
-/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized" { target 
ifn_copysign } } } */
+/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= -" 1 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized" { target { 
ifn_copysign && { ! { s390*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 3 "optimized" { target { 
ifn_copysign && s390*-*-* } } } } */
 /* { dg-final { scan-tree-dump-times "= ABS_EXPR" 3 "optimized" { target { ! 
ifn_copysign } } } } */
 /* { dg-final { scan-tree-dump-times "= -" 3 "optimized" { target { ! 
ifn_copysign } } } } */
-- 
2.43.0



Re: [PATCH] analyzer: Bail out on function pointer for -Wanalyzer-allocation-size

2024-03-21 Thread Stefan Schulze Frielinghaus
On Tue, Mar 19, 2024 at 12:38:34PM -0400, David Malcolm wrote:
> On Tue, 2024-03-19 at 16:10 +0100, Stefan Schulze Frielinghaus wrote:
> > On s390 pr94688.c is failing due to excess error
> > 
> > pr94688.c:6:5: warning: allocated buffer size is not a multiple of
> > the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> > 
> > This is because on s390 functions are by default aligned to an 8-byte
> > boundary and during function type construction size is set to
> > function
> > boundary.  Thus, for the assignment
> > 
> > a.0_1 = (void (*) ()) 
> > 
> > we have that the right-hand side is pointing to a 4-byte memory
> > region
> > whereas the size of the function pointer is 8 byte and a warning is
> > emitted.
> 
> FWIW the test case in question is a regression test for an ICE seen in
> the GCC 10 implementation of the analyzer, which was fixed by the big
> rewrite in r11-2694-g808f4dfeb3a95f.
> 
> So the code in the test doesn't make a great deal of sense.
> 
> > 
> > I could follow and skip this test as done in PR112705, or we could
> > bail
> > out early in the analyzer for function pointers.  My intuition so far
> > is that -Wanalyzer-allocation-size shouldn't care about function
> > pointer.  Therefore, I went for bailing out early.  If you believe
> > this
> > is wrong I can still go by skipping this test on s390.  Any thoughts?
> 
> I tried imagining a situation where we're analyzing a function
> generated at run-time, but it strikes me that the buffer allocated for
> such a function can be of arbitrary size.  So -Wanalyzer-allocation-
> size is meaningless for functions.
> 
> There's probably a case for checking for mismatches between pointers to
> code vs pointers to data (e.g. alignments, Harvard architecture
> machines, etc), but -Wanalyzer-allocation-size doesn't do that.
> 
> So I think your patch is correct.
> 
> OK to push it if it passes bootstrap testing.

Bootstrapped and regtested on x64 and s390x.

Thanks,
Stefan

> 
> Thanks
> Dave
> 
> > ---
> >  gcc/analyzer/region-model.cc | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> > model.cc
> > index f079d1fb37e..1b43443d168 100644
> > --- a/gcc/analyzer/region-model.cc
> > +++ b/gcc/analyzer/region-model.cc
> > @@ -3514,6 +3514,10 @@ region_model::check_region_size (const region
> > *lhs_reg, const svalue *rhs_sval,
> >    || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
> >  return;
> >  
> > +  /* Bail out early on function pointers.  */
> > +  if (TREE_CODE (pointee_type) == FUNCTION_TYPE)
> > +    return;
> > +
> >    /* Bail out early on pointers to structs where we can
> >   not deduce whether the buffer size is compatible.  */
> >    bool is_struct = RECORD_OR_UNION_TYPE_P (pointee_type);
> 


[PATCH] analyzer: Bail out on function pointer for -Wanalyzer-allocation-size

2024-03-19 Thread Stefan Schulze Frielinghaus
On s390 pr94688.c is failing due to excess error

pr94688.c:6:5: warning: allocated buffer size is not a multiple of the 
pointee's size [CWE-131] [-Wanalyzer-allocation-size]

This is because on s390 functions are by default aligned to an 8-byte
boundary and during function type construction size is set to function
boundary.  Thus, for the assignment

a.0_1 = (void (*) ()) 

we have that the right-hand side is pointing to a 4-byte memory region
whereas the size of the function pointer is 8 byte and a warning is
emitted.

I could follow and skip this test as done in PR112705, or we could bail
out early in the analyzer for function pointers.  My intuition so far
is that -Wanalyzer-allocation-size shouldn't care about function
pointer.  Therefore, I went for bailing out early.  If you believe this
is wrong I can still go by skipping this test on s390.  Any thoughts?
---
 gcc/analyzer/region-model.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index f079d1fb37e..1b43443d168 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3514,6 +3514,10 @@ region_model::check_region_size (const region *lhs_reg, 
const svalue *rhs_sval,
   || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
 return;
 
+  /* Bail out early on function pointers.  */
+  if (TREE_CODE (pointee_type) == FUNCTION_TYPE)
+return;
+
   /* Bail out early on pointers to structs where we can
  not deduce whether the buffer size is compatible.  */
   bool is_struct = RECORD_OR_UNION_TYPE_P (pointee_type);
-- 
2.43.0



Re: RFC: New mechanism for hard reg operands to inline asm

2024-03-15 Thread Stefan Schulze Frielinghaus
On Fri, Jun 04, 2021 at 06:02:27PM +, Andreas Krebbel via Gcc wrote:
> Hi,
> 
> I wonder if we could replace the register asm construct for
> inline assemblies with something a bit nicer and more obvious.
> E.g. turning this (real world example from IBM Z kernel code):
> 
> int diag8_response(int cmdlen, char *response, int *rlen)
> {
> register unsigned long reg2 asm ("2") = (addr_t) cpcmd_buf;
> register unsigned long reg3 asm ("3") = (addr_t) response;
> register unsigned long reg4 asm ("4") = cmdlen | 0x4000L;
> register unsigned long reg5 asm ("5") = *rlen; /* <-- */
> asm volatile(
> "   diag%2,%0,0x8\n"
> "   brc 8,1f\n"
> "   agr %1,%4\n"
> "1:\n"
> : "+d" (reg4), "+d" (reg5)
> : "d" (reg2), "d" (reg3), "d" (*rlen): "cc");
> *rlen = reg5;
> return reg4;
> }
> 
> into this:
> 
> int diag8_response(int cmdlen, char *response, int *rlen)
> {
> unsigned long len = cmdlen | 0x4000L;
> 
> asm volatile(
> "   diag%2,%0,0x8\n"
> "   brc 8,1f\n"
> "   agr %1,%4\n"
> "1:\n"
> : "+{r4}" (len), "+{r5}" (*rlen)
> : "{r2}" ((addr_t)cpcmd_buf), "{r3}" ((addr_t)response), "d" 
> (*rlen): "cc");
> return len;
> }
> 
> Apart from being much easier to read because the hard regs become part
> of the inline assembly it solves also a couple of other issues:
> 
> - function calls might clobber register asm variables see BZ100908
> - the constraints for the register asm operands are superfluous
> - one register asm variable cannot be used for 2 different inline
>   assemblies if the value is expected in different hard regs
> 
> I've started with a hackish implementation for IBM Z using the
> TARGET_MD_ASM_ADJUST hook and let all the places parsing constraints
> skip over the {} parts.  But perhaps it would be useful to make this a
> generic mechanism for all targets?!
> 
> Andrea

Hi all,

I would like to resurrect this topic
https://gcc.gnu.org/pipermail/gcc/2021-June/236269.html and have been
coming up with a first implementation in order to discuss this further.

Basically, I see two ways to implement this.  First is by letting LRA
assign the registers and the second one by introducing extra moves just
before/after asm statements.  Currently I went for the latter and emit
extra moves during expand into hard regs as specified by the
input/output constraints.

Before going forward I would like to get some feedback whether this approach
makes sense to you at all or whether you see some show stoppers.  I was
wondering whether my current approach is robust enough in the sense that no
other pass could potentially remove the extra moves I introduced before.
In particular I was first worried about code motion.  Initially I thought I
have to make use not only of hard regs but hard regs which are flagged as
register-asms in order to prevent optimizations to fiddly around with those
moves.  However, after some more investigation I tend to conclude that this is
not necessary.  Any thoughts about this approach?

With the current approach I can at least handle cases like:

int __attribute__ ((noipa))
foo (int x) { return x; }

int test (int x)
{
  asm ("foo %0,%1\n" :: "{r3}" (foo (x + 1)), "{r2}" (x));
  return x;
}

Note, this is written with the s390 ABI in mind where the first int argument
and return value are passed in register r2.  The point here is that r2 needs to
be altered and restored multiple times until we reach } of function test().
Luckily, during expand we get all this basically for free.

This brings me to the general question what should be allowed and what not?
Evaluation order of input expressions is probably unspecified similar to
function arguments.  However, what about this one:

int test (int x)
{
  register int y asm ("r5") = x + 1;
  asm ("foo %0,%1\n" : "={r4}" (y) : "{r1}" (y));
  return y;
}

IMHO the input is just fine but the output constraint is misleading and it is
not obvious in which register variable y resides after the asm statement.
With my current implementation, were I don't bail out, it is register r4
contrary to the decl.  Interestingly, the other way around where one register
is "aliased" by multiple variables is accepted by vanilla GCC:

int foo (int x, int y)
{
  register int a asm ("r1") = x;
  register int b asm ("r1") = y;
  return a + b;
}

Though, probably not intentionally.

Cheers,
Stefan


[gcc r14-9451] s390: Fix TARGET_SECONDARY_RELOAD for non-SYMBOL_REFs

2024-03-13 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:4d049fadc25585e336c06e6b60b592f40ddbcc12

commit r14-9451-g4d049fadc25585e336c06e6b60b592f40ddbcc12
Author: Stefan Schulze Frielinghaus 
Date:   Wed Mar 13 11:07:03 2024 +0100

s390: Fix TARGET_SECONDARY_RELOAD for non-SYMBOL_REFs

RTX X need not necessarily be a SYMBOL_REF and may e.g. be an
UNSPEC_GOTENT for which SYMBOL_FLAG_NOTALIGN2_P fails.

gcc/ChangeLog:

* config/s390/s390.cc (s390_secondary_reload): Guard
SYMBOL_FLAG_NOTALIGN2_P.

Diff:
---
 gcc/config/s390/s390.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index c857b2028f2..e63965578f1 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -4779,7 +4779,7 @@ s390_secondary_reload (bool in_p, rtx x, reg_class_t 
rclass_i,
   if (in_p
  && s390_loadrelative_operand_p (x, , )
  && mode == Pmode
- && !SYMBOL_FLAG_NOTALIGN2_P (symref)
+ && (!SYMBOL_REF_P (symref) || !SYMBOL_FLAG_NOTALIGN2_P (symref))
  && (offset & 1) == 1)
sri->icode = ((mode == DImode) ? CODE_FOR_reloaddi_larl_odd_addend_z10
  : CODE_FOR_reloadsi_larl_odd_addend_z10);


[gcc r14-9450] s390: Fix tests rosbg_si_srl and rxsbg_si_srl

2024-03-13 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:a63fb786f8564880c91a30b99fda6d8a44adf81d

commit r14-9450-ga63fb786f8564880c91a30b99fda6d8a44adf81d
Author: Stefan Schulze Frielinghaus 
Date:   Wed Mar 13 11:05:08 2024 +0100

s390: Fix tests rosbg_si_srl and rxsbg_si_srl

Starting with r14-2047-gd0e891406b16dc two SI mode tests are optimized
into DI mode.  Thus, the scan-assembler directives fail.  For example
RTL expression

(ior:SI (subreg:SI (lshiftrt:DI (reg:DI 69)
(const_int 2 [0x2])) 4)
(subreg:SI (reg:DI 68) 4))

is optimized into

(ior:DI (lshiftrt:DI (reg:DI 69)
(const_int 2 [0x2]))
(reg:DI 68))

Fixed by moving operands into memory in order to enforce SI mode
computation.

Furthermore, in r9-6056-g290dfd9bc7bea2 the starting bit position of the
scan-assembler directive for rosbg was incorrectly set to 32 which
actually should be 32+SHIFT_AMOUNT, i.e., in this particular case 34.

gcc/testsuite/ChangeLog:

* gcc.target/s390/md/rXsbg_mode_sXl.c: Fix tests rosbg_si_srl
and rxsbg_si_srl.

Diff:
---
 gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c 
b/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
index ede813818ff..cf454d2783c 100644
--- a/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
+++ b/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
@@ -22,6 +22,8 @@
 { dg-skip-if "" { *-*-* } { "*" } { "-march=*" } }
 */
 
+unsigned int a, b;
+
 __attribute__ ((noinline)) unsigned int
 si_sll (unsigned int x)
 {
@@ -42,11 +44,11 @@ rosbg_si_sll (unsigned int a, unsigned int b)
 /* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,32,62,1" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
-rosbg_si_srl (unsigned int a, unsigned int b)
+rosbg_si_srl (void)
 {
   return a | (b >> 2);
 }
-/* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,32,63,62" 1 } } */
+/* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,34,63,62" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
 rxsbg_si_sll (unsigned int a, unsigned int b)
@@ -56,11 +58,11 @@ rxsbg_si_sll (unsigned int a, unsigned int b)
 /* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,32,62,1" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
-rxsbg_si_srl (unsigned int a, unsigned int b)
+rxsbg_si_srl (void)
 {
   return a ^ (b >> 2);
 }
-/* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,32,63,62" 1 } } */
+/* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,34,63,62" 1 } } */
 
 __attribute__ ((noinline)) unsigned long long
 di_sll (unsigned long long x)
@@ -108,21 +110,21 @@ main (void)
   /* SIMode */
   {
 unsigned int r;
-unsigned int a = 0x12488421u;
-unsigned int b = 0xu;
+a = 0x12488421u;
+b = 0xu;
 unsigned int csll = si_sll (b);
 unsigned int csrl = si_srl (b);
 
 r = rosbg_si_sll (a, b);
 if (r != (a | csll))
   __builtin_abort ();
-r = rosbg_si_srl (a, b);
+r = rosbg_si_srl ();
 if (r != (a | csrl))
   __builtin_abort ();
 r = rxsbg_si_sll (a, b);
 if (r != (a ^ csll))
   __builtin_abort ();
-r = rxsbg_si_srl (a, b);
+r = rxsbg_si_srl ();
 if (r != (a ^ csrl))
   __builtin_abort ();
   }


[gcc r14-9449] s390: Streamline vector builtins with LLVM

2024-03-13 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:9f2b16ce1efef0648a6d52c1d744735c46e2eec1

commit r14-9449-g9f2b16ce1efef0648a6d52c1d744735c46e2eec1
Author: Stefan Schulze Frielinghaus 
Date:   Wed Mar 13 11:03:02 2024 +0100

s390: Streamline vector builtins with LLVM

Similar as to s390_lcbb, s390_vll, s390_vstl, et al. make use of a
signed vector type for vlbb.  Furthermore, a const void pointer seems
more common and an integer for the mask.

For s390_vfi(s,d)b make use of integers for masks, too.

Use unsigned integers for all s390_vlbr/vstbr variants.

Make use of type UV16QI for the length operand of s390_vstrs(,z)(h,f).

Following the Principles of Operation, change from signed to unsigned
type for s390_va(c,cc,ccc)q and s390_vs(,c,bc)biq and s390_vmslg.

Make use of scalar type UINT128 instead of UV16QI for s390_vgfm(,a)g,
and s390_vsumq(f,g).

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Update to reflect latest
changes.
* config/s390/s390-builtins.def: Streamline vector builtins with
LLVM.

Diff:
---
 gcc/config/s390/s390-builtin-types.def | 23 +---
 gcc/config/s390/s390-builtins.def  | 48 +-
 2 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 556104e0e23..d70eaade8ea 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -69,6 +69,7 @@ DEF_TYPE (BT_SHORTCONST, short_integer_type_node, 1)
 DEF_TYPE (BT_UCHAR, unsigned_char_type_node, 0)
 DEF_TYPE (BT_UCHARCONST, unsigned_char_type_node, 1)
 DEF_TYPE (BT_UINT, unsigned_type_node, 0)
+DEF_TYPE (BT_UINT128, unsigned_intTI_type_node, 0)
 DEF_TYPE (BT_UINT64, c_uint64_type_node, 0)
 DEF_TYPE (BT_UINTCONST, unsigned_type_node, 1)
 DEF_TYPE (BT_ULONG, long_unsigned_type_node, 0)
@@ -83,7 +84,6 @@ DEF_VECTOR_TYPE (BT_UV2DI, BT_ULONGLONG, 2)
 DEF_VECTOR_TYPE (BT_UV4SI, BT_UINT, 4)
 DEF_VECTOR_TYPE (BT_UV8HI, BT_USHORT, 8)
 DEF_VECTOR_TYPE (BT_V16QI, BT_SCHAR, 16)
-DEF_VECTOR_TYPE (BT_V1TI, BT_INT128, 1)
 DEF_VECTOR_TYPE (BT_V2DF, BT_DBL, 2)
 DEF_VECTOR_TYPE (BT_V2DI, BT_LONGLONG, 2)
 DEF_VECTOR_TYPE (BT_V4SF, BT_FLT, 4)
@@ -114,9 +114,11 @@ DEF_POINTER_TYPE (BT_VOIDCONSTPTR, BT_VOIDCONST)
 DEF_POINTER_TYPE (BT_VOIDPTR, BT_VOID)
 DEF_DISTINCT_TYPE (BT_BCHAR, BT_UCHAR)
 DEF_DISTINCT_TYPE (BT_BINT, BT_UINT)
+DEF_DISTINCT_TYPE (BT_BINT128, BT_UINT128)
 DEF_DISTINCT_TYPE (BT_BLONGLONG, BT_ULONGLONG)
 DEF_DISTINCT_TYPE (BT_BSHORT, BT_USHORT)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV16QI, BT_BCHAR, 16)
+DEF_OPAQUE_VECTOR_TYPE (BT_BV1TI, BT_BINT128, 1)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV2DI, BT_BLONGLONG, 2)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV4SI, BT_BINT, 4)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV8HI, BT_BSHORT, 8)
@@ -131,6 +133,7 @@ DEF_FN_TYPE_1 (BT_FN_INT_VOIDPTR, BT_INT, BT_VOIDPTR)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_INT, BT_OV4SI, BT_INT)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_INTCONSTPTR, BT_OV4SI, BT_INTCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_OV4SI, BT_OV4SI, BT_OV4SI)
+DEF_FN_TYPE_1 (BT_FN_UINT128_UINT128, BT_UINT128, BT_UINT128)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_UCHAR, BT_UV16QI, BT_UCHAR)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_UCHARCONSTPTR, BT_UV16QI, BT_UCHARCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_USHORT, BT_UV16QI, BT_USHORT)
@@ -154,7 +157,6 @@ DEF_FN_TYPE_1 (BT_FN_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_1 (BT_FN_V16QI_SCHAR, BT_V16QI, BT_SCHAR)
 DEF_FN_TYPE_1 (BT_FN_V16QI_UCHAR, BT_V16QI, BT_UCHAR)
 DEF_FN_TYPE_1 (BT_FN_V16QI_V16QI, BT_V16QI, BT_V16QI)
-DEF_FN_TYPE_1 (BT_FN_V1TI_V1TI, BT_V1TI, BT_V1TI)
 DEF_FN_TYPE_1 (BT_FN_V2DF_DBL, BT_V2DF, BT_DBL)
 DEF_FN_TYPE_1 (BT_FN_V2DF_DBLCONSTPTR, BT_V2DF, BT_DBLCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_V2DF_FLTCONSTPTR, BT_V2DF, BT_FLTCONSTPTR)
@@ -207,18 +209,18 @@ DEF_FN_TYPE_2 (BT_FN_OV4SI_OV4SI_OV4SI, BT_OV4SI, 
BT_OV4SI, BT_OV4SI)
 DEF_FN_TYPE_2 (BT_FN_OV4SI_OV4SI_UCHAR, BT_OV4SI, BT_OV4SI, BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_OV4SI_OV4SI_ULONG, BT_OV4SI, BT_OV4SI, BT_ULONG)
 DEF_FN_TYPE_2 (BT_FN_UCHAR_UV16QI_INT, BT_UCHAR, BT_UV16QI, BT_INT)
+DEF_FN_TYPE_2 (BT_FN_UINT128_UINT128_UINT128, BT_UINT128, BT_UINT128, 
BT_UINT128)
+DEF_FN_TYPE_2 (BT_FN_UINT128_UV2DI_UV2DI, BT_UINT128, BT_UV2DI, BT_UV2DI)
+DEF_FN_TYPE_2 (BT_FN_UINT128_UV4SI_UV4SI, BT_UINT128, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UINT_UV4SI_INT, BT_UINT, BT_UV4SI, BT_INT)
 DEF_FN_TYPE_2 (BT_FN_UINT_VOIDCONSTPTR_INT, BT_UINT, BT_VOIDCONSTPTR, BT_INT)
 DEF_FN_TYPE_2 (BT_FN_ULONGLONG_UV2DI_INT, BT_ULONGLONG, BT_UV2DI, BT_INT)
 DEF_FN_TYPE_2 (BT_FN_USHORT_UV8HI_INT, BT_USHORT, BT_UV8HI, BT_INT)
-DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHARCONSTPTR_USHORT, BT_UV16QI, BT_UCHARCONSTPTR, 
BT_USHORT)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHAR_INT, BT_UV16QI, BT_UCHAR, BT_INT)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHAR_UCHAR, BT_UV16QI, BT_UCHAR, BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_INTPTR, BT_UV16QI, BT_UV16QI, BT_INTPTR)
 DEF_FN_TYPE_2

[gcc r14-9448] s390: Deprecate some vector builtins

2024-03-13 Thread Stefan Schulze Frielinghaus via Gcc-cvs
https://gcc.gnu.org/g:b59f0c9c5a4838658dd2a1db58ac09d9f3be0f51

commit r14-9448-gb59f0c9c5a4838658dd2a1db58ac09d9f3be0f51
Author: Stefan Schulze Frielinghaus 
Date:   Wed Mar 13 10:59:02 2024 +0100

s390: Deprecate some vector builtins

According to IBM Open XL C/C++ for z/OS version 1.1 builtins

- vec_permi
- vec_ctd
- vec_ctsl
- vec_ctul
- vec_ld2f
- vec_st2f

are deprecated.  Also deprecate helper builtins vec_ctd_s64 and
vec_ctd_u64.

Furthermore, the overloads of vec_insert which make use of a bool vector
are deprecated, too.

gcc/ChangeLog:

* config/s390/s390-builtins.def (vec_permi): Deprecate.
(vec_ctd): Deprecate.
(vec_ctd_s64): Deprecate.
(vec_ctd_u64): Deprecate.
(vec_ctsl): Deprecate.
(vec_ctul): Deprecate.
(vec_ld2f): Deprecate.
(vec_st2f): Deprecate.
(vec_insert): Deprecate overloads with bool vectors.

Diff:
---
 gcc/config/s390/s390-builtins.def | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 680a038fa4b..54f400ceb5a 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -416,16 +416,16 @@ B_DEF  (s390_vec_splat_s64, vec_splatsv2di,   
  0,
 OB_DEF (s390_vec_insert,s390_vec_insert_s8, 
s390_vec_insert_dbl,B_VX,   BT_FN_OV4SI_INT_OV4SI_INT)
 OB_DEF_VAR (s390_vec_insert_s8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_V16QI_SCHAR_V16QI_INT)
 OB_DEF_VAR (s390_vec_insert_u8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_UV16QI_INT)
-OB_DEF_VAR (s390_vec_insert_b8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_BV16QI_INT)
+OB_DEF_VAR (s390_vec_insert_b8, s390_vlvgb, B_DEP, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_BV16QI_INT)
 OB_DEF_VAR (s390_vec_insert_s16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_V8HI_SHORT_V8HI_INT)
 OB_DEF_VAR (s390_vec_insert_u16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_UV8HI_USHORT_UV8HI_INT)
-OB_DEF_VAR (s390_vec_insert_b16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_UV8HI_USHORT_BV8HI_INT)
+OB_DEF_VAR (s390_vec_insert_b16,s390_vlvgh, B_DEP, 
 O3_ELEM,BT_OV_UV8HI_USHORT_BV8HI_INT)
 OB_DEF_VAR (s390_vec_insert_s32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_V4SI_INT_V4SI_INT)
 OB_DEF_VAR (s390_vec_insert_u32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_UV4SI_UINT_UV4SI_INT)
-OB_DEF_VAR (s390_vec_insert_b32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_UV4SI_UINT_BV4SI_INT)
+OB_DEF_VAR (s390_vec_insert_b32,s390_vlvgf, B_DEP, 
 O3_ELEM,BT_OV_UV4SI_UINT_BV4SI_INT)
 OB_DEF_VAR (s390_vec_insert_s64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_V2DI_LONGLONG_V2DI_INT)
 OB_DEF_VAR (s390_vec_insert_u64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_UV2DI_INT)
-OB_DEF_VAR (s390_vec_insert_b64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_BV2DI_INT)
+OB_DEF_VAR (s390_vec_insert_b64,s390_vlvgg, B_DEP, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_BV2DI_INT)
 OB_DEF_VAR (s390_vec_insert_flt,s390_vlvgf_flt, B_VXE, 
 O3_ELEM,BT_OV_V4SF_FLT_V4SF_INT) /* vlvgf */
 OB_DEF_VAR (s390_vec_insert_dbl,s390_vlvgg_dbl, 0, 
 O3_ELEM,BT_OV_V2DF_DBL_V2DF_INT) /* vlvgg */
 
@@ -658,7 +658,7 @@ OB_DEF_VAR (s390_vec_perm_dbl,  s390_vperm, 
0,
 
 B_DEF  (s390_vperm, vec_permv16qi,  0, 
 B_VX,   0,  BT_FN_UV16QI_UV16QI_UV16QI_UV16QI)
 
-OB_DEF (s390_vec_permi, s390_vec_permi_s64, 
s390_vec_permi_dbl, B_VX,   BT_FN_OV4SI_OV4SI_OV4SI_INT)
+OB_DEF (s390_vec_permi, s390_vec_permi_s64, 
s390_vec_permi_dbl, B_DEP | B_VX,   BT_FN_OV4SI_OV4SI_OV4SI_INT)
 OB_DEF_VAR (s390_vec_permi_s64, s390_vpdi,  0, 
 O3_U2,  BT_OV_V2DI_V2DI_V2DI_INT)
 OB_DEF_VAR (s390_vec_permi_b64, s390_vpdi,  0, 
 O3_U2,  BT_OV_BV2DI_BV2DI_BV2DI_INT)
 OB_DEF_VAR (s390_vec_permi_u64, s390_vpdi,  0, 
 O3_U2,  BT_OV_UV2DI_UV2DI_UV2DI_INT)
@@ -2806,7 +2806,7 @@ OB_DEF (s390_vec_any_ngt,   
s390_vec_any_ngt_flt,s390_vec_any_ngt_db
 OB_DEF_VAR

Re: [PATCH] s390: Fix test vector/long-double-to-i64.c

2024-03-12 Thread Stefan Schulze Frielinghaus
On Mon, Mar 11, 2024 at 11:14:04AM +0100, Andreas Krebbel wrote:
> On 2/29/24 13:15, Stefan Schulze Frielinghaus wrote:
> > Starting with r14-8319-g86de9b66480b71 fwprop improved so that vpdi is
> > no longer required.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/vector/long-double-to-i64.c: Fix scan
> > assembler directive.
> 
> Should we perhaps rather turn the scan-assembler directives into something 
> which checks for the
> absence of vpdi then? In order to get notified once this really useful 
> optimization breaks?

I thought about checking for the most optimal code which would be just
two loads and a convert instruction.  Thus if this fails, then we have a
regression.  Speaking of regressions, the old behaviour was restored by
r14-9412-g3e3e4156a5f93e which means we are back using vpdi.  Thus, I
will leave this patch on hold and have a second look.

Cheers,
Stefan

> 
> Andreas
> 
> > ---
> >  .../gcc.target/s390/vector/long-double-to-i64.c | 13 +
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c 
> > b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
> > index 2dbbb5d1c03..ed89878e6ee 100644
> > --- a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
> > +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
> > @@ -1,19 +1,24 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
> >  /* { dg-do run { target { s390_z14_hw } } } */
> > +/* { dg-final { check-function-bodies "**" "" "" { target { lp64 } } } } */
> > +
> >  #include 
> >  #include 
> >  
> > +/*
> > +** long_double_to_i64:
> > +** ld  %f0,0\(%r2\)
> > +** ld  %f2,8\(%r2\)
> > +** cgxbr   %r2,5,%f0
> > +** br  %r14
> > +*/
> >  __attribute__ ((noipa)) static int64_t
> >  long_double_to_i64 (long double x)
> >  {
> >return x;
> >  }
> >  
> > -/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,1\n} 1 } 
> > } */
> > -/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,5\n} 1 } 
> > } */
> > -/* { dg-final { scan-assembler-times {\n\tcgxbr\t} 1 } } */
> > -
> >  int
> >  main (void)
> >  {
> 


Re: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-12 Thread Stefan O'Rear
On Tue, Mar 12, 2024, at 2:15 AM, pan2...@intel.com wrote:
> From: Pan Li 
>
> Update in v3:
> * Add pre-defined __riscv_v_fixed_vlen when zvl.
>
> Update in v2:
> * Cleanup some unused code.
> * Fix some typo of commit log.
>
> Original log:
>
> This patch would like to introduce one new gcc attribute for RVV.
> This attribute is used to define fixed-length variants of one
> existing sizeless RVV types.
>
> This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
> one args should be the integer constant and its' value is terminated
> by the LMUL and the vector register bits in zvl*b.  For example:
>
> typedef vint32m2_t fixed_vint32m2_t 
> __attribute__((riscv_rvv_vector_bits(128)));
>
> The above type define is valid when -march=rv64gc_zve64d_zvl64b
> (aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
> -march=rv64gcv_zvl128b similar to below.
>
> "error: invalid RVV vector size '128', expected size is '256' based on
> LMUL of type and '-mrvv-vector-bits=zvl'"
>
> Meanwhile, a pre-define macro __riscv_v_fixed_vlen is introduced to
> represent the fixed vlen in a RVV vector register.

Shouldn't a major user-facing change like this be discussed in a PR against
https://github.com/riscv-non-isa/riscv-c-api-doc/ or
https://github.com/riscv-non-isa/rvv-intrinsic-doc before or concurrent with
compiler implementation?

-s

> For the vint*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
>
> For the vfloat*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, -
>
> For the vbool*_t types only below operations are allowed except
> the CMP and ALU. The CMP and ALU operations on vbool*_t is not
> well defined currently.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
>
> For the vint*x*m*_t tuple types are not suppored in this patch
> which is compatible with clang.
>
> This patch passed the below testsuites.
> * The riscv fully regression tests.
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add pre-define
>   macro __riscv_v_fixed_vlen when zvl.
>   * config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
>   New static func to take care of the RVV types decorated by
>   the attributes.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-13.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-14.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-15.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-16.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-17.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-c.cc   |   3 +
>  gcc/config/riscv/riscv.cc |  87 +-
>  .../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-10.c |  53 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-11.c |  76 
>  .../riscv/rvv/base/riscv_rvv_vector_bits-12.c |  14 +++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-13.c |  10 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-14.c |  10 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-15.c |  10 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-16.c |  11 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-17.c |  10 ++
>  .../riscv/rvv/base/riscv_rvv_vector_bits-2.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-3.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-4.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-5.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-6.c  |   6 +
>  .../riscv/rvv/base/riscv_rvv_vector_bits-7.c  |  76 

Re: [PATCH] s390: Streamline NNPA builtins with POP mnemonics

2024-03-06 Thread Stefan Schulze Frielinghaus
Since there is no straight forward way to introduce an overload with
different return types where we would expand differently depending on an
immediate operand, lets drop this patch.

On Fri, Mar 01, 2024 at 04:18:31PM +0100, Stefan Schulze Frielinghaus wrote:
> At the moment there are no extended mnemonics for vclfn(h,l) and vcrnf
> defined in the Principles of Operation.  Thus, remove the suffix "s"
> from the builtins and expanders and introduce a further operand for the
> data type.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtin-types.def: Update to reflect latest
>   changes.
>   * config/s390/s390-builtins.def: Remove suffix s from
>   s390_vclfn(h,l)s and s390_vcrnfs.
>   * config/s390/s390.md: Similar, remove suffix s from unspec
>   definitions.
>   * config/s390/vecintrin.h (vec_extend_to_fp32_hi): Redefine.
>   (vec_extend_to_fp32_lo): Redefine.
>   (vec_round_from_fp32): Redefine.
>   * config/s390/vx-builtins.md (vclfnhs_v8hi): Remove suffix s.
>   (vclfnh_v8hi): Add with extra operand.
>   (vclfnls_v8hi): Remove suffix s.
>   (vclfnl_v8hi): Add with extra operand.
>   (vcrnfs_v8hi): Remove suffix s.
>   (vcrnf_v8hi): Add with extra operand.
> ---
> OK for mainline?
> 
>  gcc/config/s390/s390-builtin-types.def |  4 ++--
>  gcc/config/s390/s390-builtins.def  |  6 +++---
>  gcc/config/s390/s390.md|  6 +++---
>  gcc/config/s390/vecintrin.h|  6 +++---
>  gcc/config/s390/vx-builtins.md | 27 ++
>  5 files changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-builtin-types.def 
> b/gcc/config/s390/s390-builtin-types.def
> index ce51ae8cd3f..c3d09b42835 100644
> --- a/gcc/config/s390/s390-builtin-types.def
> +++ b/gcc/config/s390/s390-builtin-types.def
> @@ -273,7 +273,6 @@ DEF_FN_TYPE_2 (BT_FN_V2DI_V2DF_V2DF, BT_V2DI, BT_V2DF, 
> BT_V2DF)
>  DEF_FN_TYPE_2 (BT_FN_V2DI_V2DI_V2DI, BT_V2DI, BT_V2DI, BT_V2DI)
>  DEF_FN_TYPE_2 (BT_FN_V2DI_V4SI_V4SI, BT_V2DI, BT_V4SI, BT_V4SI)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_FLT_INT, BT_V4SF, BT_FLT, BT_INT)
> -DEF_FN_TYPE_2 (BT_FN_V4SF_UV8HI_UINT, BT_V4SF, BT_UV8HI, BT_UINT)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_V4SF, BT_V4SF, BT_V4SF, BT_V4SF)
>  DEF_FN_TYPE_2 (BT_FN_V4SI_BV4SI_V4SI, BT_V4SI, BT_BV4SI, BT_V4SI)
> @@ -324,7 +323,6 @@ DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_USHORT_INT, BT_UV8HI, 
> BT_UV8HI, BT_USHORT, BT_I
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_INT)
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_UV8HI)
> -DEF_FN_TYPE_3 (BT_FN_UV8HI_V4SF_V4SF_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
> BT_UINT)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_UV16QI_UV16QI_INTPTR, BT_V16QI, BT_UV16QI, 
> BT_UV16QI, BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_INTPTR, BT_V16QI, BT_V16QI, BT_V16QI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_V16QI, BT_V16QI, BT_V16QI, BT_V16QI, 
> BT_V16QI)
> @@ -340,6 +338,7 @@ DEF_FN_TYPE_3 (BT_FN_V2DI_V2DF_INT_INTPTR, BT_V2DI, 
> BT_V2DF, BT_INT, BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V2DI_V2DF_V2DF_INTPTR, BT_V2DI, BT_V2DF, BT_V2DF, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V2DI_V2DI_V2DI_INTPTR, BT_V2DI, BT_V2DI, BT_V2DI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V2DI_V4SI_V4SI_V2DI, BT_V2DI, BT_V4SI, BT_V4SI, BT_V2DI)
> +DEF_FN_TYPE_3 (BT_FN_V4SF_UV8HI_UINT_UINT, BT_V4SF, BT_UV8HI, BT_UINT, 
> BT_UINT)
>  DEF_FN_TYPE_3 (BT_FN_V4SF_V2DF_INT_INT, BT_V4SF, BT_V2DF, BT_INT, BT_INT)
>  DEF_FN_TYPE_3 (BT_FN_V4SF_V4SF_FLT_INT, BT_V4SF, BT_V4SF, BT_FLT, BT_INT)
>  DEF_FN_TYPE_3 (BT_FN_V4SF_V4SF_UCHAR_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR, 
> BT_UCHAR)
> @@ -377,6 +376,7 @@ DEF_FN_TYPE_4 
> (BT_FN_UV4SI_UV4SI_UV4SI_UINTCONSTPTR_UCHAR, BT_UV4SI, BT_UV4SI, B
>  DEF_FN_TYPE_4 (BT_FN_UV4SI_UV4SI_UV4SI_UV4SI_INT, BT_UV4SI, BT_UV4SI, 
> BT_UV4SI, BT_UV4SI, BT_INT)
>  DEF_FN_TYPE_4 (BT_FN_UV8HI_UV8HI_UV8HI_INT_INTPTR, BT_UV8HI, BT_UV8HI, 
> BT_UV8HI, BT_INT, BT_INTPTR)
>  DEF_FN_TYPE_4 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, 
> BT_UV8HI, BT_UV8HI, BT_INT)
> +DEF_FN_TYPE_4 (BT_FN_UV8HI_V4SF_V4SF_UINT_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
> BT_UINT, BT_UINT)
>  DEF_FN_TYPE_4 (BT_FN_VOID_UV2DI_UV2DI_ULONGLONGPTR_ULONGLONG, BT_VOID, 
> BT_UV2DI, BT_UV2DI, BT_ULONGLONGPTR, BT_ULONGLONG)
>  DEF_FN_TYPE_4 (BT_FN_VOID_UV4SI_UV4SI_UINTPTR_ULONGLONG, BT_VOID, BT_UV4SI, 
> BT_UV4SI, BT_UINTPTR, BT_ULONGLONG)
>  DEF_FN_TYPE_4 (BT_FN_VOID_V4SI_V4SI_INTPTR_ULONGLONG, BT_VOID, BT_V4SI, 
> BT_V4SI, BT_I

[PATCH] s390: Deprecate some vector builtins

2024-03-01 Thread Stefan Schulze Frielinghaus
According to IBM Open XL C/C++ for z/OS version 1.1 builtins

- vec_permi
- vec_ctd
- vec_ctsl
- vec_ctul
- vec_ld2f
- vec_st2f

are deprecated.  Also deprecate helper builtins vec_ctd_s64 and
vec_ctd_u64.

Furthermore, the overloads of vec_insert which make use of a bool vector
are deprecated, too.

gcc/ChangeLog:

* config/s390/s390-builtins.def (vec_permi): Deprecate.
(vec_ctd): Deprecate.
(vec_ctd_s64): Deprecate.
(vec_ctd_u64): Deprecate.
(vec_ctsl): Deprecate.
(vec_ctul): Deprecate.
(vec_ld2f): Deprecate.
(vec_st2f): Deprecate.
(vec_insert): Deprecate overloads with bool vectors.
---
 Ok for mainline?

 gcc/config/s390/s390-builtins.def | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 680a038fa4b..54f400ceb5a 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -416,16 +416,16 @@ B_DEF  (s390_vec_splat_s64, vec_splatsv2di,   
  0,
 OB_DEF (s390_vec_insert,s390_vec_insert_s8, 
s390_vec_insert_dbl,B_VX,   BT_FN_OV4SI_INT_OV4SI_INT)
 OB_DEF_VAR (s390_vec_insert_s8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_V16QI_SCHAR_V16QI_INT)
 OB_DEF_VAR (s390_vec_insert_u8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_UV16QI_INT)
-OB_DEF_VAR (s390_vec_insert_b8, s390_vlvgb, 0, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_BV16QI_INT)
+OB_DEF_VAR (s390_vec_insert_b8, s390_vlvgb, B_DEP, 
 O3_ELEM,BT_OV_UV16QI_UCHAR_BV16QI_INT)
 OB_DEF_VAR (s390_vec_insert_s16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_V8HI_SHORT_V8HI_INT)
 OB_DEF_VAR (s390_vec_insert_u16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_UV8HI_USHORT_UV8HI_INT)
-OB_DEF_VAR (s390_vec_insert_b16,s390_vlvgh, 0, 
 O3_ELEM,BT_OV_UV8HI_USHORT_BV8HI_INT)
+OB_DEF_VAR (s390_vec_insert_b16,s390_vlvgh, B_DEP, 
 O3_ELEM,BT_OV_UV8HI_USHORT_BV8HI_INT)
 OB_DEF_VAR (s390_vec_insert_s32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_V4SI_INT_V4SI_INT)
 OB_DEF_VAR (s390_vec_insert_u32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_UV4SI_UINT_UV4SI_INT)
-OB_DEF_VAR (s390_vec_insert_b32,s390_vlvgf, 0, 
 O3_ELEM,BT_OV_UV4SI_UINT_BV4SI_INT)
+OB_DEF_VAR (s390_vec_insert_b32,s390_vlvgf, B_DEP, 
 O3_ELEM,BT_OV_UV4SI_UINT_BV4SI_INT)
 OB_DEF_VAR (s390_vec_insert_s64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_V2DI_LONGLONG_V2DI_INT)
 OB_DEF_VAR (s390_vec_insert_u64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_UV2DI_INT)
-OB_DEF_VAR (s390_vec_insert_b64,s390_vlvgg, 0, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_BV2DI_INT)
+OB_DEF_VAR (s390_vec_insert_b64,s390_vlvgg, B_DEP, 
 O3_ELEM,BT_OV_UV2DI_ULONGLONG_BV2DI_INT)
 OB_DEF_VAR (s390_vec_insert_flt,s390_vlvgf_flt, B_VXE, 
 O3_ELEM,BT_OV_V4SF_FLT_V4SF_INT) /* vlvgf */
 OB_DEF_VAR (s390_vec_insert_dbl,s390_vlvgg_dbl, 0, 
 O3_ELEM,BT_OV_V2DF_DBL_V2DF_INT) /* vlvgg */
 
@@ -658,7 +658,7 @@ OB_DEF_VAR (s390_vec_perm_dbl,  s390_vperm, 
0,
 
 B_DEF  (s390_vperm, vec_permv16qi,  0, 
 B_VX,   0,  BT_FN_UV16QI_UV16QI_UV16QI_UV16QI)
 
-OB_DEF (s390_vec_permi, s390_vec_permi_s64, 
s390_vec_permi_dbl, B_VX,   BT_FN_OV4SI_OV4SI_OV4SI_INT)
+OB_DEF (s390_vec_permi, s390_vec_permi_s64, 
s390_vec_permi_dbl, B_DEP | B_VX,   BT_FN_OV4SI_OV4SI_OV4SI_INT)
 OB_DEF_VAR (s390_vec_permi_s64, s390_vpdi,  0, 
 O3_U2,  BT_OV_V2DI_V2DI_V2DI_INT)
 OB_DEF_VAR (s390_vec_permi_b64, s390_vpdi,  0, 
 O3_U2,  BT_OV_BV2DI_BV2DI_BV2DI_INT)
 OB_DEF_VAR (s390_vec_permi_u64, s390_vpdi,  0, 
 O3_U2,  BT_OV_UV2DI_UV2DI_UV2DI_INT)
@@ -2806,7 +2806,7 @@ OB_DEF (s390_vec_any_ngt,   
s390_vec_any_ngt_flt,s390_vec_any_ngt_db
 OB_DEF_VAR (s390_vec_any_ngt_flt,   vec_any_unlev4sf,   B_VXE, 
 0,  BT_OV_INT_V4SF_V4SF)
 OB_DEF_VAR (s390_vec_any_ngt_dbl,   vec_any_unlev2df,   0, 
 0,  BT_OV_INT_V2DF_V2DF)
 
-OB_DEF (s390_vec_ctd,   s390_vec_ctd_s64,   s390_vec_ctd_u64,  
 B_VX,   

[PATCH] s390: Streamline NNPA builtins with POP mnemonics

2024-03-01 Thread Stefan Schulze Frielinghaus
At the moment there are no extended mnemonics for vclfn(h,l) and vcrnf
defined in the Principles of Operation.  Thus, remove the suffix "s"
from the builtins and expanders and introduce a further operand for the
data type.

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Update to reflect latest
changes.
* config/s390/s390-builtins.def: Remove suffix s from
s390_vclfn(h,l)s and s390_vcrnfs.
* config/s390/s390.md: Similar, remove suffix s from unspec
definitions.
* config/s390/vecintrin.h (vec_extend_to_fp32_hi): Redefine.
(vec_extend_to_fp32_lo): Redefine.
(vec_round_from_fp32): Redefine.
* config/s390/vx-builtins.md (vclfnhs_v8hi): Remove suffix s.
(vclfnh_v8hi): Add with extra operand.
(vclfnls_v8hi): Remove suffix s.
(vclfnl_v8hi): Add with extra operand.
(vcrnfs_v8hi): Remove suffix s.
(vcrnf_v8hi): Add with extra operand.
---
OK for mainline?

 gcc/config/s390/s390-builtin-types.def |  4 ++--
 gcc/config/s390/s390-builtins.def  |  6 +++---
 gcc/config/s390/s390.md|  6 +++---
 gcc/config/s390/vecintrin.h|  6 +++---
 gcc/config/s390/vx-builtins.md | 27 ++
 5 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index ce51ae8cd3f..c3d09b42835 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -273,7 +273,6 @@ DEF_FN_TYPE_2 (BT_FN_V2DI_V2DF_V2DF, BT_V2DI, BT_V2DF, 
BT_V2DF)
 DEF_FN_TYPE_2 (BT_FN_V2DI_V2DI_V2DI, BT_V2DI, BT_V2DI, BT_V2DI)
 DEF_FN_TYPE_2 (BT_FN_V2DI_V4SI_V4SI, BT_V2DI, BT_V4SI, BT_V4SI)
 DEF_FN_TYPE_2 (BT_FN_V4SF_FLT_INT, BT_V4SF, BT_FLT, BT_INT)
-DEF_FN_TYPE_2 (BT_FN_V4SF_UV8HI_UINT, BT_V4SF, BT_UV8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_V4SF, BT_V4SF, BT_V4SF, BT_V4SF)
 DEF_FN_TYPE_2 (BT_FN_V4SI_BV4SI_V4SI, BT_V4SI, BT_BV4SI, BT_V4SI)
@@ -324,7 +323,6 @@ DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_USHORT_INT, BT_UV8HI, 
BT_UV8HI, BT_USHORT, BT_I
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_INT)
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_UV8HI)
-DEF_FN_TYPE_3 (BT_FN_UV8HI_V4SF_V4SF_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, BT_UINT)
 DEF_FN_TYPE_3 (BT_FN_V16QI_UV16QI_UV16QI_INTPTR, BT_V16QI, BT_UV16QI, 
BT_UV16QI, BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_INTPTR, BT_V16QI, BT_V16QI, BT_V16QI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_V16QI, BT_V16QI, BT_V16QI, BT_V16QI, 
BT_V16QI)
@@ -340,6 +338,7 @@ DEF_FN_TYPE_3 (BT_FN_V2DI_V2DF_INT_INTPTR, BT_V2DI, 
BT_V2DF, BT_INT, BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V2DI_V2DF_V2DF_INTPTR, BT_V2DI, BT_V2DF, BT_V2DF, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V2DI_V2DI_V2DI_INTPTR, BT_V2DI, BT_V2DI, BT_V2DI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V2DI_V4SI_V4SI_V2DI, BT_V2DI, BT_V4SI, BT_V4SI, BT_V2DI)
+DEF_FN_TYPE_3 (BT_FN_V4SF_UV8HI_UINT_UINT, BT_V4SF, BT_UV8HI, BT_UINT, BT_UINT)
 DEF_FN_TYPE_3 (BT_FN_V4SF_V2DF_INT_INT, BT_V4SF, BT_V2DF, BT_INT, BT_INT)
 DEF_FN_TYPE_3 (BT_FN_V4SF_V4SF_FLT_INT, BT_V4SF, BT_V4SF, BT_FLT, BT_INT)
 DEF_FN_TYPE_3 (BT_FN_V4SF_V4SF_UCHAR_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR, 
BT_UCHAR)
@@ -377,6 +376,7 @@ DEF_FN_TYPE_4 (BT_FN_UV4SI_UV4SI_UV4SI_UINTCONSTPTR_UCHAR, 
BT_UV4SI, BT_UV4SI, B
 DEF_FN_TYPE_4 (BT_FN_UV4SI_UV4SI_UV4SI_UV4SI_INT, BT_UV4SI, BT_UV4SI, 
BT_UV4SI, BT_UV4SI, BT_INT)
 DEF_FN_TYPE_4 (BT_FN_UV8HI_UV8HI_UV8HI_INT_INTPTR, BT_UV8HI, BT_UV8HI, 
BT_UV8HI, BT_INT, BT_INTPTR)
 DEF_FN_TYPE_4 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, 
BT_UV8HI, BT_UV8HI, BT_INT)
+DEF_FN_TYPE_4 (BT_FN_UV8HI_V4SF_V4SF_UINT_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
BT_UINT, BT_UINT)
 DEF_FN_TYPE_4 (BT_FN_VOID_UV2DI_UV2DI_ULONGLONGPTR_ULONGLONG, BT_VOID, 
BT_UV2DI, BT_UV2DI, BT_ULONGLONGPTR, BT_ULONGLONG)
 DEF_FN_TYPE_4 (BT_FN_VOID_UV4SI_UV4SI_UINTPTR_ULONGLONG, BT_VOID, BT_UV4SI, 
BT_UV4SI, BT_UINTPTR, BT_ULONGLONG)
 DEF_FN_TYPE_4 (BT_FN_VOID_V4SI_V4SI_INTPTR_ULONGLONG, BT_VOID, BT_V4SI, 
BT_V4SI, BT_INTPTR, BT_ULONGLONG)
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 02ff516c677..0d4e20ea425 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -3025,10 +3025,10 @@ B_DEF  (s390_vstrszf,vstrszv4si,
0,
 
 /* arch 14 builtins */
 
-B_DEF  (s390_vclfnhs,vclfnhs_v8hi,  0, 
 B_NNPA, O2_U4,  BT_FN_V4SF_UV8HI_UINT)
-B_DEF  (s390_vclfnls,vclfnls_v8hi,  0, 
 B_NNPA, O2_U4,  BT_FN_V4SF_UV8HI_UINT)
+B_DEF  (s390_vclfnh, vclfnh_v8hi,   0, 
 

[PATCH] s390: Streamline vector builtins with LLVM

2024-03-01 Thread Stefan Schulze Frielinghaus
Similar as to s390_lcbb, s390_vll, s390_vstl, et al. make use of a
signed vector type for vlbb.  Furthermore, a const void pointer seems
more common and an integer for the mask.

For s390_vfi(s,d)b make use of integers for masks, too.

Use unsigned integers for all s390_vlbr/vstbr variants.

Make use of type UV16QI for the length operand of s390_vstrs(,z)(h,f).

Following the Principles of Operation, change from signed to unsigned
type for s390_va(c,cc,ccc)q and s390_vs(,c,bc)biq and s390_vmslg.

Make use of scalar type UINT128 instead of UV16QI for s390_vgfm(,a)g,
and s390_vsumq(f,g).

Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Update to reflect latest
changes.
* config/s390/s390-builtins.def: Streamline vector builtins with
LLVM.
---
 gcc/config/s390/s390-builtin-types.def | 29 +++-
 gcc/config/s390/s390-builtins.def  | 48 +-
 2 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 556104e0e23..ce51ae8cd3f 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -58,6 +58,7 @@ DEF_TYPE (BT_FLT, float_type_node, 0)
 DEF_TYPE (BT_FLTCONST, float_type_node, 1)
 DEF_TYPE (BT_INT, integer_type_node, 0)
 DEF_TYPE (BT_INT128, intTI_type_node, 0)
+DEF_TYPE (BT_INT128CONST, intTI_type_node, 1)
 DEF_TYPE (BT_INTCONST, integer_type_node, 1)
 DEF_TYPE (BT_LONG, long_integer_type_node, 0)
 DEF_TYPE (BT_LONGLONG, long_long_integer_type_node, 0)
@@ -69,6 +70,8 @@ DEF_TYPE (BT_SHORTCONST, short_integer_type_node, 1)
 DEF_TYPE (BT_UCHAR, unsigned_char_type_node, 0)
 DEF_TYPE (BT_UCHARCONST, unsigned_char_type_node, 1)
 DEF_TYPE (BT_UINT, unsigned_type_node, 0)
+DEF_TYPE (BT_UINT128, unsigned_intTI_type_node, 0)
+DEF_TYPE (BT_UINT128CONST, unsigned_intTI_type_node, 1)
 DEF_TYPE (BT_UINT64, c_uint64_type_node, 0)
 DEF_TYPE (BT_UINTCONST, unsigned_type_node, 1)
 DEF_TYPE (BT_ULONG, long_unsigned_type_node, 0)
@@ -79,6 +82,7 @@ DEF_TYPE (BT_USHORTCONST, short_unsigned_type_node, 1)
 DEF_TYPE (BT_VOID, void_type_node, 0)
 DEF_TYPE (BT_VOIDCONST, void_type_node, 1)
 DEF_VECTOR_TYPE (BT_UV16QI, BT_UCHAR, 16)
+DEF_VECTOR_TYPE (BT_UV1TI, BT_UINT128, 1)
 DEF_VECTOR_TYPE (BT_UV2DI, BT_ULONGLONG, 2)
 DEF_VECTOR_TYPE (BT_UV4SI, BT_UINT, 4)
 DEF_VECTOR_TYPE (BT_UV8HI, BT_USHORT, 8)
@@ -93,6 +97,8 @@ DEF_POINTER_TYPE (BT_DBLCONSTPTR, BT_DBLCONST)
 DEF_POINTER_TYPE (BT_DBLPTR, BT_DBL)
 DEF_POINTER_TYPE (BT_FLTCONSTPTR, BT_FLTCONST)
 DEF_POINTER_TYPE (BT_FLTPTR, BT_FLT)
+DEF_POINTER_TYPE (BT_INT128CONSTPTR, BT_INT128CONST)
+DEF_POINTER_TYPE (BT_INT128PTR, BT_INT128)
 DEF_POINTER_TYPE (BT_INTCONSTPTR, BT_INTCONST)
 DEF_POINTER_TYPE (BT_INTPTR, BT_INT)
 DEF_POINTER_TYPE (BT_LONGLONGCONSTPTR, BT_LONGLONGCONST)
@@ -103,6 +109,8 @@ DEF_POINTER_TYPE (BT_SHORTCONSTPTR, BT_SHORTCONST)
 DEF_POINTER_TYPE (BT_SHORTPTR, BT_SHORT)
 DEF_POINTER_TYPE (BT_UCHARCONSTPTR, BT_UCHARCONST)
 DEF_POINTER_TYPE (BT_UCHARPTR, BT_UCHAR)
+DEF_POINTER_TYPE (BT_UINT128CONSTPTR, BT_UINT128CONST)
+DEF_POINTER_TYPE (BT_UINT128PTR, BT_UINT128)
 DEF_POINTER_TYPE (BT_UINT64PTR, BT_UINT64)
 DEF_POINTER_TYPE (BT_UINTCONSTPTR, BT_UINTCONST)
 DEF_POINTER_TYPE (BT_UINTPTR, BT_UINT)
@@ -114,9 +122,11 @@ DEF_POINTER_TYPE (BT_VOIDCONSTPTR, BT_VOIDCONST)
 DEF_POINTER_TYPE (BT_VOIDPTR, BT_VOID)
 DEF_DISTINCT_TYPE (BT_BCHAR, BT_UCHAR)
 DEF_DISTINCT_TYPE (BT_BINT, BT_UINT)
+DEF_DISTINCT_TYPE (BT_BINT128, BT_UINT128)
 DEF_DISTINCT_TYPE (BT_BLONGLONG, BT_ULONGLONG)
 DEF_DISTINCT_TYPE (BT_BSHORT, BT_USHORT)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV16QI, BT_BCHAR, 16)
+DEF_OPAQUE_VECTOR_TYPE (BT_BV1TI, BT_BINT128, 1)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV2DI, BT_BLONGLONG, 2)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV4SI, BT_BINT, 4)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV8HI, BT_BSHORT, 8)
@@ -131,6 +141,7 @@ DEF_FN_TYPE_1 (BT_FN_INT_VOIDPTR, BT_INT, BT_VOIDPTR)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_INT, BT_OV4SI, BT_INT)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_INTCONSTPTR, BT_OV4SI, BT_INTCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_OV4SI_OV4SI, BT_OV4SI, BT_OV4SI)
+DEF_FN_TYPE_1 (BT_FN_UINT128_UINT128, BT_UINT128, BT_UINT128)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_UCHAR, BT_UV16QI, BT_UCHAR)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_UCHARCONSTPTR, BT_UV16QI, BT_UCHARCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_UV16QI_USHORT, BT_UV16QI, BT_USHORT)
@@ -154,7 +165,6 @@ DEF_FN_TYPE_1 (BT_FN_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_1 (BT_FN_V16QI_SCHAR, BT_V16QI, BT_SCHAR)
 DEF_FN_TYPE_1 (BT_FN_V16QI_UCHAR, BT_V16QI, BT_UCHAR)
 DEF_FN_TYPE_1 (BT_FN_V16QI_V16QI, BT_V16QI, BT_V16QI)
-DEF_FN_TYPE_1 (BT_FN_V1TI_V1TI, BT_V1TI, BT_V1TI)
 DEF_FN_TYPE_1 (BT_FN_V2DF_DBL, BT_V2DF, BT_DBL)
 DEF_FN_TYPE_1 (BT_FN_V2DF_DBLCONSTPTR, BT_V2DF, BT_DBLCONSTPTR)
 DEF_FN_TYPE_1 (BT_FN_V2DF_FLTCONSTPTR, BT_V2DF, BT_FLTCONSTPTR)
@@ -207,18 +217,18 @@ DEF_FN_TYPE_2 (BT_FN_OV4SI_OV4SI_OV4SI, BT_OV4SI, 
BT_OV4SI, BT_OV4SI)
 DEF_FN_TYPE_2 (BT_FN_OV4SI_OV4SI_UCHAR, BT_OV4SI, BT_OV4SI, 

Re: [PATCH] s390: Fix TARGET_SECONDARY_RELOAD for non-SYMBOL_REFs

2024-02-29 Thread Stefan Schulze Frielinghaus
On Thu, Feb 29, 2024 at 01:26:54PM +0100, Andreas Schwab wrote:
> On Feb 29 2024, Stefan Schulze Frielinghaus wrote:
> 
> > RTX X must not necessarily be a SYMBOL_REF and may e.g. be an
> 
> False friend: s/must not/need not/

Argh I always fall for this ;-) Thanks for pointing this out.  Changed
for the final commit.

Cheers,
Stefan


[PATCH] s390: Fix test vector/long-double-to-i64.c

2024-02-29 Thread Stefan Schulze Frielinghaus
Starting with r14-8319-g86de9b66480b71 fwprop improved so that vpdi is
no longer required.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-to-i64.c: Fix scan
assembler directive.
---
 .../gcc.target/s390/vector/long-double-to-i64.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
index 2dbbb5d1c03..ed89878e6ee 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
@@ -1,19 +1,24 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
 /* { dg-do run { target { s390_z14_hw } } } */
+/* { dg-final { check-function-bodies "**" "" "" { target { lp64 } } } } */
+
 #include 
 #include 
 
+/*
+** long_double_to_i64:
+** ld  %f0,0\(%r2\)
+** ld  %f2,8\(%r2\)
+** cgxbr   %r2,5,%f0
+** br  %r14
+*/
 __attribute__ ((noipa)) static int64_t
 long_double_to_i64 (long double x)
 {
   return x;
 }
 
-/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,1\n} 1 } } */
-/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,5\n} 1 } } */
-/* { dg-final { scan-assembler-times {\n\tcgxbr\t} 1 } } */
-
 int
 main (void)
 {
-- 
2.43.0



[PATCH] s390: Fix tests rosbg_si_srl and rxsbg_si_srl

2024-02-29 Thread Stefan Schulze Frielinghaus
Starting with r14-2047-gd0e891406b16dc two SI mode tests are optimized
into DI mode.  Thus, the scan-assembler directives fail.  For example
RTL expression

(ior:SI (subreg:SI (lshiftrt:DI (reg:DI 69)
(const_int 2 [0x2])) 4)
(subreg:SI (reg:DI 68) 4))

is optimized into

(ior:DI (lshiftrt:DI (reg:DI 69)
(const_int 2 [0x2]))
(reg:DI 68))

Fixed by moving operands into memory in order to enforce SI mode
computation.

Furthermore, in r9-6056-g290dfd9bc7bea2 the starting bit position of the
scan-assembler directive for rosbg was incorrectly set to 32 which
actually should be 32+SHIFT_AMOUNT, i.e., in this particular case 34.

gcc/testsuite/ChangeLog:

* gcc.target/s390/md/rXsbg_mode_sXl.c: Fix tests rosbg_si_srl
and rxsbg_si_srl.
---
 .../gcc.target/s390/md/rXsbg_mode_sXl.c| 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c 
b/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
index ede813818ff..cf454d2783c 100644
--- a/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
+++ b/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
@@ -22,6 +22,8 @@
 { dg-skip-if "" { *-*-* } { "*" } { "-march=*" } }
 */
 
+unsigned int a, b;
+
 __attribute__ ((noinline)) unsigned int
 si_sll (unsigned int x)
 {
@@ -42,11 +44,11 @@ rosbg_si_sll (unsigned int a, unsigned int b)
 /* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,32,62,1" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
-rosbg_si_srl (unsigned int a, unsigned int b)
+rosbg_si_srl (void)
 {
   return a | (b >> 2);
 }
-/* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,32,63,62" 1 } } */
+/* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,34,63,62" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
 rxsbg_si_sll (unsigned int a, unsigned int b)
@@ -56,11 +58,11 @@ rxsbg_si_sll (unsigned int a, unsigned int b)
 /* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,32,62,1" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
-rxsbg_si_srl (unsigned int a, unsigned int b)
+rxsbg_si_srl (void)
 {
   return a ^ (b >> 2);
 }
-/* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,32,63,62" 1 } } */
+/* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,34,63,62" 1 } } */
 
 __attribute__ ((noinline)) unsigned long long
 di_sll (unsigned long long x)
@@ -108,21 +110,21 @@ main (void)
   /* SIMode */
   {
 unsigned int r;
-unsigned int a = 0x12488421u;
-unsigned int b = 0xu;
+a = 0x12488421u;
+b = 0xu;
 unsigned int csll = si_sll (b);
 unsigned int csrl = si_srl (b);
 
 r = rosbg_si_sll (a, b);
 if (r != (a | csll))
   __builtin_abort ();
-r = rosbg_si_srl (a, b);
+r = rosbg_si_srl ();
 if (r != (a | csrl))
   __builtin_abort ();
 r = rxsbg_si_sll (a, b);
 if (r != (a ^ csll))
   __builtin_abort ();
-r = rxsbg_si_srl (a, b);
+r = rxsbg_si_srl ();
 if (r != (a ^ csrl))
   __builtin_abort ();
   }
-- 
2.43.0



[PATCH] s390: Fix TARGET_SECONDARY_RELOAD for non-SYMBOL_REFs

2024-02-29 Thread Stefan Schulze Frielinghaus
RTX X must not necessarily be a SYMBOL_REF and may e.g. be an
UNSPEC_GOTENT for which SYMBOL_FLAG_NOTALIGN2_P fails.

gcc/ChangeLog:

* config/s390/s390.cc (s390_secondary_reload): Guard
SYMBOL_FLAG_NOTALIGN2_P.
---
 gcc/config/s390/s390.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 943fc9bfd72..12430d77786 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -4778,7 +4778,7 @@ s390_secondary_reload (bool in_p, rtx x, reg_class_t 
rclass_i,
   if (in_p
  && s390_loadrelative_operand_p (x, , )
  && mode == Pmode
- && !SYMBOL_FLAG_NOTALIGN2_P (symref)
+ && (!SYMBOL_REF_P (symref) || !SYMBOL_FLAG_NOTALIGN2_P (symref))
  && (offset & 1) == 1)
sri->icode = ((mode == DImode) ? CODE_FOR_reloaddi_larl_odd_addend_z10
  : CODE_FOR_reloadsi_larl_odd_addend_z10);
-- 
2.43.0



[Bug tree-optimization/113664] False positive warnings with -fno-strict-overflow (-Warray-bounds, -Wstringop-overflow)

2024-01-30 Thread stefan at bytereef dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113664

--- Comment #6 from Stefan Krah  ---
Sometimes you hear "code should be rewritten" because squashing the warnings
makes it better.

I disagree. I've seen many segfaults introduced in projects that rush
to squash warnings.

Sometimes, analyzers just cannot cope with established idioms. clang-analyzer
for instance hates Knuth's algorithm D (long division). It would be strange to
change that for an analyzer.

[Bug tree-optimization/113664] False positive warnings with -fno-strict-overflow (-Warray-bounds, -Wstringop-overflow)

2024-01-30 Thread stefan at bytereef dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113664

--- Comment #5 from Stefan Krah  ---
> So the diagnostic messages leave a lot to be desired but in the end
> they point to a problem in your code which is a guard against a NULL 's'.

Hmm, the real code is used to print floating point numbers and integers.
Integers get dot==NULL. It is fine (and desired!) in that case to optimize
away the if clause.

As far as I can see, it is compliant with the C standard.


Even with -fno-strict-overflow one could make the case that the warning
is strange. If "s" wraps around, the allocated output string is too small,
and you have bigger problems.

It is impossible for gcc to detect whether the string size is sufficient,
so IMHO it should not warn.


In essence, since gcc-10 (12?) idioms that were warning-free for 10 years
tend to receive false positive warnings now.

This also applies to -Warray-bounds. I think the Linux kernel disables at
least -Warray-bounds and -Wmaybe-uninitialized.

I think this is becoming a problem, because most projects do not report
false positives but just silently disable the warnings.

[Bug tree-optimization/113664] False positive warnings with -fno-strict-overflow (-Warray-bounds, -Wstringop-overflow)

2024-01-29 Thread stefan at bytereef dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113664

--- Comment #2 from Stefan Krah  ---
Thanks for the explanation!  I agree that one should not rely on
-fno-strict-overflow. In this case, my project is "vendored" in CPython and
they compile everything with -fno-strict-overflow, so it's out of my control:

https://github.com/python/cpython/issues/108562


mpdecimal itself does not need -fno-strict-overflow.

[Bug c/113664] New: False positive warnings with -fno-strict-overflow (-Warray-bounds, -Wstringop-overflow)

2024-01-29 Thread stefan at bytereef dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113664

Bug ID: 113664
   Summary: False positive warnings with -fno-strict-overflow
(-Warray-bounds, -Wstringop-overflow)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stefan at bytereef dot org
  Target Milestone: ---

These false positives only occur in combination with fno-strict-overflow:


 -Warray-bounds


foo.c
=
#include 

static char *
f(char *s, int n, char *dot)
{
  switch(n) {
  case 1:
if (s == dot) {
  *s++ = '.';
}
*s++ = '0'; /* fall-through (yes, really!) */
  default:
if (s == dot) {
  *s++ = '.';
}
  }

  *s = '\0';
  return s;
}

char *
g(char *s)
{
  return f(s, 1, NULL);
}
=


$ /home/skrah/gcc/bin/gcc -Wall -O3 -c foo.c
$ /home/skrah/gcc/bin/gcc -Wall -O3 -fno-strict-overflow -c foo.c
In function ‘f’,
inlined from ‘g’ at foo.c:25:10:
foo.c:11:10: warning: array subscript 0 is outside array bounds of ‘char[0]’
[-Warray-bounds=]
   11 | *s++ = '0'; /* fall-through (yes, really!) */
  | ~^
In function ‘g’:
cc1: note: source object is likely at address zero



=
 -Wstringop-overflow 
=

bar.c
=
#include 

static char *
f(char *s, int n, char *dot)
{
  switch(n) {
  case 1:
if (s == dot) {
  *s++ = '.';
}
*s++ = '0'; /* fall-through (yes, really!) */
  default:
if (s == dot) {
  *s++ = '.';
}
  }

  *s = '\0';
  return s;
}

char *
g(char *s)
{
char sign = '+';
*s++ = sign;

return f(s, 1, NULL);
}
=


$ /home/skrah/gcc/bin/gcc -Wall -O3 -c bar.c
$ /home/skrah/gcc/bin/gcc -Wall -O3 -fno-strict-overflow -c bar.c
In function ‘f’,
inlined from ‘g’ at bar.c:28:12:
bar.c:11:10: warning: writing 1 byte into a region of size 0
[-Wstringop-overflow=]
   11 | *s++ = '0'; /* fall-through (yes, really!) */
  | ~^
In function ‘g’:
cc1: note: destination object is likely at address zero




Note that a very small change gives a very different warning.

Re: [RFA] [V3] new pass for sign/zero extension elimination

2024-01-04 Thread Stefan Schulze Frielinghaus
I have successfully bootstrapped and regtested the patch on s390.  Out
of curiosity I also ran some benchmarks which didn't show much changes
except in one case which I will have to analyze further.  If there is
anything interesting I will reach back to you.

Cheers,
Stefan

On Mon, Jan 01, 2024 at 02:04:42PM -0700, Jeff Law wrote:
> I know we're deep into stage3 and about to transition to stage4.  So if the
> consensus is for this to wait, I'll understand
> 
> This it the V3 of the ext-dce patch based on Joern's work from last year.
> 
> Changes since V2:
>   Handle MINUS
>   Minor logic cleanup for SUBREGs in ext_dce_process_sets
>   Includes Joern's carry_backpropagate work
>   Cleaned up and removed some use handling code for STRICT_LOW_PART
>   Moved non-local goto special case out of main use handling, similar to
>   how we handle CALL_INSN_FUSAGE
>   Use df_simple_dataflow rather than custom dataflow handling
> 
> There's more cleanups we could be doing here, but the question is do we stop
> commit what we've got and iterate on the trunk or do we defer until gcc-15
> in which case we iterate on a branch or something.
> 
> 
> 
> This still is enabled at -O1 or above, but that's to get as much testing as
> possible.  Assuming the rest is ACK'd for the trunk we'll put it into the
> list of optimizations enabled by -O2.

>   PR target/95650
>   PR rtl-optimization/96031
>   PR rtl-optimization/104387
>   PR rtl-optimization/111384
> 
> gcc/
>   * Makefile.in (OBJS): Add ext-dce.o.
>   * common.opt (ext-dce): Add new option.
>   * df-scan.cc (df_get_exit_block_use_set): No longer static.
>   * df.h (df_get_exit_block_use_set): Prototype.
>   * ext-dce.cc: New file.
>   * passes.def: Add ext-dce before combine.
>   * tree-pass.h (make_pass_ext_dce): Prototype..
> 
> gcc/testsuite
>   * gcc.target/riscv/core_bench_list.c: New test.
>   * gcc.target/riscv/core_init_matrix.c: New test.
>   * gcc.target/riscv/core_list_init.c: New test.
>   * gcc.target/riscv/matrix_add_const.c: New test.
>   * gcc.target/riscv/mem-extend.c: New test.
>   * gcc.target/riscv/pr111384.c: New test.
> 
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 754eceb23bb..3450eb860c6 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1451,6 +1451,7 @@ OBJS = \
>   explow.o \
>   expmed.o \
>   expr.o \
> + ext-dce.o \
>   fibonacci_heap.o \
>   file-prefix-map.o \
>   final.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d263a959df3..8bbcaad2ec4 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3846,4 +3846,8 @@ fipa-ra
>  Common Var(flag_ipa_ra) Optimization
>  Use caller save register across calls if possible.
>  
> +fext-dce
> +Common Var(flag_ext_dce, 1) Optimization Init(0)
> +Perform dead code elimination on zero and sign extensions with special 
> dataflow analysis.
> +
>  ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/df-scan.cc b/gcc/df-scan.cc
> index 934c9ca2d81..93c0ba4e15c 100644
> --- a/gcc/df-scan.cc
> +++ b/gcc/df-scan.cc
> @@ -78,7 +78,6 @@ static void df_get_eh_block_artificial_uses (bitmap);
>  
>  static void df_record_entry_block_defs (bitmap);
>  static void df_record_exit_block_uses (bitmap);
> -static void df_get_exit_block_use_set (bitmap);
>  static void df_get_entry_block_def_set (bitmap);
>  static void df_grow_ref_info (struct df_ref_info *, unsigned int);
>  static void df_ref_chain_delete_du_chain (df_ref);
> @@ -3642,7 +3641,7 @@ df_epilogue_uses_p (unsigned int regno)
>  
>  /* Set the bit for regs that are considered being used at the exit. */
>  
> -static void
> +void
>  df_get_exit_block_use_set (bitmap exit_block_uses)
>  {
>unsigned int i;
> diff --git a/gcc/df.h b/gcc/df.h
> index 402657a7076..abcbb097734 100644
> --- a/gcc/df.h
> +++ b/gcc/df.h
> @@ -1091,6 +1091,7 @@ extern bool df_epilogue_uses_p (unsigned int);
>  extern void df_set_regs_ever_live (unsigned int, bool);
>  extern void df_compute_regs_ever_live (bool);
>  extern void df_scan_verify (void);
> +extern void df_get_exit_block_use_set (bitmap);
>  
>  
>  
> /*
> diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
> new file mode 100644
> index 000..379264e0bca
> --- /dev/null
> +++ b/gcc/ext-dce.cc
> @@ -0,0 +1,964 @@
> +/* RTL dead zero/sign extension (code) elimination.
> +   Copyright (C) 2000-2022 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify

[Bug ipa/113203] __attribute__ ((always_inline)) fails with C99/LTO/-Og.

2024-01-03 Thread stefan at bytereef dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113203

--- Comment #4 from Stefan Krah  ---
> Or, if the intention is that all calls to the function within its TU
> are inlined and not the other ones, split the function into two, one
> always_inline which is used from within the TU and another one which
> just calls it and is used from other TUs.

Yes, that's the intention. The real project has more than 100 functions in
mpdecimal.c. I'm using C99 inline to both automatically inline functions
specifically in that TU but generate regular functions for the other TUs
and libmpdec.so.

C99 saves the work of creating the wrappers manually.


Do note that this issue started with gcc-12, same as in:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107931


So it's a behavior change. I agree that the combination of "-flto -Og" is not
particularly important. But is it guaranteed that the above C99 scheme will
always work with -O{1,2,3}? Or are there other loopholes that might show up
in the future?

I guess that in order to be safe I'll remove always_inline and use your wrapper
suggestion some time in the future.

[Bug c/113203] New: __attribute__ ((always_inline)) fails with C99/LTO/-Og.

2024-01-02 Thread stefan at bytereef dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113203

Bug ID: 113203
   Summary: __attribute__ ((always_inline)) fails with
C99/LTO/-Og.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stefan at bytereef dot org
  Target Milestone: ---

This is similar to #107931. I'm opening a new issue because there are no
indirect function calls and the problem occurs with -std=c99 -flto -Og.


foo.c

#include 
#include "foo.h"

inline __attribute__ ((always_inline)) bool
f(int x)
{
  return (x > 2);
}


foo.h

#include 
bool f(int);


main.c

#include 
#include "foo.h"

int
main(int argc, char *argv[])
{
   (void)argv;

   if (f(argc)) {
 puts("yes");
   }
   else {
 puts("no");
   }

   return 0;
}




$ gcc -Wall -Wextra -std=c99 -flto -Og -o main foo.c main.c

foo.c: In function ‘main’:
foo.c:5:1: error: inlining failed in call to ‘always_inline’ ‘f’: function not
considered for inlining
5 | f(int x)
  | ^
main.c:9:8: note: called from here
9 |if (f(argc)) {
  |^
lto-wrapper: fatal error: /home/skrah/gcc/bin/gcc returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed




This is extracted from the mpdecimal project that has used C99 and
always_inline for a decade without problems. The code was written before the
amendment to the always_inline documentation in 2014 and always_inline has
consistently produced a speedup of 1-2.5% even with -O3.


My questions:

1) Since this is C99, should always_inline work without errors when -std=c99 is
active? If not, should -std=c99 reject always_inline?

2) There is a clear demand for something like "really_inline" that ignores the
heuristics and just inlines whenever possible without errors or warnings. In
practice that is how MSVC __forceinline or clang always_inline behave. Could
that be added?

[Bug middle-end/98753] -Wfree-nonheap-object on unreachable code with -O0

2024-01-02 Thread stefan at bytereef dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98753

--- Comment #16 from Stefan Krah  ---
I have encountered the same issue (gcc emits a false positive warning when
free() is called conditionally) in the mpdecimal project when compiled with
-flto.

Worse, mpdecimal itself as well as a large test suite compile without warnings,
so distributions will think everything is fine.

Until a user uses the static libmpdec.a for a trivial program:


wget
https://www.bytereef.org/software/mpdecimal/releases/mpdecimal-2.5.1.tar.gz
tar xvf mpdecimal-2.5.1.tar.gz
cd mpdecimal-2.5.1
./configure CFLAGS="-flto=auto" CXXFLAGS="-flto=auto" LDFLAGS="-flto=auto"
LDXXFLAGS="-flto=auto"
make

# The trivial program:
$ cd libmpdec
$ make bench
gcc -Wall -Wextra -Wno-unknown-pragmas -std=c99 -pedantic -DNDEBUG -O2
-flto=auto -o bench bench.c libmpdec.a -lm
In function ‘mpd_del’,
inlined from ‘_mpd_qaddsub.constprop’ at mpdecimal.c:3471:5:
mpdecimal.c:470:9: warning: attempt to free a non-heap object ‘big_aligned’
[-Wfree-nonheap-object]
 mpd_free(dec);


Here, the user will get a static library to which he may not even have the
source code readily available and gets a false positive warning for code in
that library.

Like others, I think this warning should be under a category -Wmaybe and be
less decisive in its message.

For the next mpdecimal release the Makefile will filter out -flto for the
static library build.

[Bug target/98390] AIX: exceptions in threads: IOT/Abort trap(coredump)

2024-01-02 Thread stefan at bytereef dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98390

--- Comment #1 from Stefan Krah  ---

The issue can still be reproduced with a gcc-14 snapshot. ibm-clang++ does not
have this problem. The LLVM unwinder has been reworked for AIX:

https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg275024.html


Would it be possible for gcc to use the LLVM libunwind on AIX? It would be
great if IBM unlocked some funding for this.

gcc often produces faster binaries than clang and this is literally the only
gcc issue on AIX that I've come across after many long running tests. It would
be worth fixing --- both AIX and gcc (except for this issue) are very stable.

[PATCH] s390: Fix expansion of vec_step

2023-12-04 Thread Stefan Schulze Frielinghaus
Add missing "s390" while expanding vec_step to __builtin_s390_vec_step.

gcc/ChangeLog:

* config/s390/vecintrin.h (vec_step): Expand vec_step to
__builtin_s390_vec_step.
---
 gcc/config/s390/vecintrin.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/vecintrin.h b/gcc/config/s390/vecintrin.h
index 133492c5b19..7cd1db57aec 100644
--- a/gcc/config/s390/vecintrin.h
+++ b/gcc/config/s390/vecintrin.h
@@ -59,8 +59,8 @@ along with GCC; see the file COPYING3.  If not see
| __VEC_CLASS_FP_INFINITY)
 
 /* This also accepts a type for its parameter, so it is not enough
-   to #define vec_step to __builtin_vec_step.  */
-#define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
+   to #define vec_step to __builtin_s390_vec_step.  */
+#define vec_step(x) __builtin_s390_vec_step (* (__typeof__ (x) *) 0)
 
 static inline int
 __lcbb(const void *ptr, int bndry)
-- 
2.43.0



[PATCH] s390: Add missing builtin type

2023-11-27 Thread Stefan Schulze Frielinghaus
One builtin type slipped through the cracks of the last commits.

Bootstrapped on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtin-types.def (BT_FN_UV8HI_UV8HI_UINT):
Add missing builtin type.
---
 gcc/config/s390/s390-builtin-types.def | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 6d2a3f912b8..5057f342f0b 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -242,6 +242,7 @@ DEF_FN_TYPE_2 (BT_FN_UV8HI_UV16QI_UV16QI, BT_UV8HI, 
BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV4SI_UV4SI, BT_UV8HI, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_INTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UCHAR, BT_UV8HI, BT_UV8HI, BT_UCHAR)
+DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UINT, BT_UV8HI, BT_UV8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_2 (BT_FN_V16QI_BV16QI_V16QI, BT_V16QI, BT_BV16QI, BT_V16QI)
 DEF_FN_TYPE_2 (BT_FN_V16QI_UINT_VOIDCONSTPTR, BT_V16QI, BT_UINT, 
BT_VOIDCONSTPTR)
-- 
2.41.0



[PATCH] s390: Fixup builtins vec_rli and verll

2023-11-27 Thread Stefan Schulze Frielinghaus
Commit 248df13b966f46649e16dc3c8c92b263790ef503 restricted the rotate
count to immediates.  Although the documentation of vec_rli (Vector
Element Rotate Left Immediate) can be read as if it where restricted to
immediates, this is not the case.  Thus, revert this commit.

In order to finally allow register operands, the rotate count must be of
type unsigned char since the expander expects it to be of mode QI.  The
previously used type unsigned integer worked out for immediates since
those are of VOID mode anyway.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Remove types.
* config/s390/s390-builtins.def (O_U64): Remove 64-bit literal support.
Don't restrict s390_vec_rli and s390_verll[bhfg] to immediates.
* config/s390/s390.cc (s390_const_operand_ok): Remove 64-bit
literal support.
---
 gcc/config/s390/s390-builtin-types.def |  4 --
 gcc/config/s390/s390-builtins.def  | 60 +++---
 gcc/config/s390/s390.cc|  6 +--
 3 files changed, 27 insertions(+), 43 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 6799b883e29..6d2a3f912b8 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -216,7 +216,6 @@ DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHAR_INT, BT_UV16QI, BT_UCHAR, 
BT_INT)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHAR_UCHAR, BT_UV16QI, BT_UCHAR, BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_INTPTR, BT_UV16QI, BT_UV16QI, BT_INTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_UCHAR, BT_UV16QI, BT_UV16QI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_UINT, BT_UV16QI, BT_UV16QI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI_UV16QI, BT_UV16QI, BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV2DI_UV2DI, BT_UV16QI, BT_UV2DI, BT_UV2DI)
 DEF_FN_TYPE_2 (BT_FN_UV16QI_UV4SI_UV4SI, BT_UV16QI, BT_UV4SI, BT_UV4SI)
@@ -225,7 +224,6 @@ DEF_FN_TYPE_2 (BT_FN_UV2DI_UCHAR_UCHAR, BT_UV2DI, BT_UCHAR, 
BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_ULONGLONG_INT, BT_UV2DI, BT_ULONGLONG, BT_INT)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV16QI_UV16QI, BT_UV2DI, BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV2DI_UCHAR, BT_UV2DI, BT_UV2DI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV2DI_UV2DI_UINT, BT_UV2DI, BT_UV2DI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV2DI_UV2DI, BT_UV2DI, BT_UV2DI, BT_UV2DI)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV4SI_UV4SI, BT_UV2DI, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV2DI_UV8HI_UV8HI, BT_UV2DI, BT_UV8HI, BT_UV8HI)
@@ -236,7 +234,6 @@ DEF_FN_TYPE_2 (BT_FN_UV4SI_UV16QI_UV16QI, BT_UV4SI, 
BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV2DI_UV2DI, BT_UV4SI, BT_UV2DI, BT_UV2DI)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI_INTPTR, BT_UV4SI, BT_UV4SI, BT_INTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI_UCHAR, BT_UV4SI, BT_UV4SI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI_UINT, BT_UV4SI, BT_UV4SI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI_UV4SI, BT_UV4SI, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV4SI_UV8HI_UV8HI, BT_UV4SI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UCHAR_UCHAR, BT_UV8HI, BT_UCHAR, BT_UCHAR)
@@ -245,7 +242,6 @@ DEF_FN_TYPE_2 (BT_FN_UV8HI_UV16QI_UV16QI, BT_UV8HI, 
BT_UV16QI, BT_UV16QI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV4SI_UV4SI, BT_UV8HI, BT_UV4SI, BT_UV4SI)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_INTPTR)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UCHAR, BT_UV8HI, BT_UV8HI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UINT, BT_UV8HI, BT_UV8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_2 (BT_FN_V16QI_BV16QI_V16QI, BT_V16QI, BT_BV16QI, BT_V16QI)
 DEF_FN_TYPE_2 (BT_FN_V16QI_UINT_VOIDCONSTPTR, BT_V16QI, BT_UINT, 
BT_VOIDCONSTPTR)
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index f5540106adc..b09c303adc0 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -28,7 +28,6 @@
 #undef O_U12
 #undef O_U16
 #undef O_U32
-#undef O_U64
 
 #undef O_M12
 
@@ -89,11 +88,6 @@
 #undef O3_U32
 #undef O4_U32
 
-#undef O1_U64
-#undef O2_U64
-#undef O3_U64
-#undef O4_U64
-
 #undef O1_M12
 #undef O2_M12
 #undef O3_M12
@@ -163,21 +157,20 @@
 #define O_U127 /* unsigned 16 bit literal */
 #define O_U168 /* unsigned 16 bit literal */
 #define O_U329 /* unsigned 32 bit literal */
-#define O_U64   10 /* unsigned 64 bit literal */
 
-#define O_M12   11 /* matches bitmask of 12 */
+#define O_M12   10 /* matches bitmask of 12 */
 
-#define O_S212 /* signed  2 bit literal */
-#define O_S313 /* signed  3 bit literal */
-#define O_S414 /* signed  4 bit literal */
-#define O_S515 /* signed  5 bit literal */
-#define O_S816 /* signed  8 bit literal */
-#define O_S12   17 /* signed 12 bit literal */
-#define O_S16   18 /* signed 16 bit literal */
-#define O_S32   19 /* signed 32 bit literal */
+#define O_S211 /* signed  2 bit literal */
+#define O_S312 /* signed  3 bit literal 

Re: [PATCH] s390: Streamline NNPA builtins with their LLVM counterparts

2023-11-27 Thread Stefan Schulze Frielinghaus
Ping.

On Thu, Nov 16, 2023 at 01:07:30PM +0100, Stefan Schulze Frielinghaus wrote:
> For the opaque NNP-data type prefer unsigned over signed integer types.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtin-types.def: Add/remove types.
>   * config/s390/s390-builtins.def
>   (s390_vclfnhs,s390_vclfnls,s390_vcrnfs,s390_vcfn,s390_vcnf):
>   Replace type V8HI with UV8HI.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/zvector/vec-nnpa-fp16-convert.c: Replace V8HI
>   types with UV8HI.
>   * gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c: Dito.
>   * gcc.target/s390/zvector/vec_convert_from_fp16.c: Dito.
>   * gcc.target/s390/zvector/vec_convert_to_fp16.c: Dito.
>   * gcc.target/s390/zvector/vec_extend_to_fp32_hi.c: Dito.
>   * gcc.target/s390/zvector/vec_extend_to_fp32_lo.c: Dito.
>   * gcc.target/s390/zvector/vec_round_from_fp32.c: Dito.
> ---
>  gcc/config/s390/s390-builtin-types.def |  5 ++---
>  gcc/config/s390/s390-builtins.def  | 10 +-
>  .../gcc.target/s390/zvector/vec-nnpa-fp16-convert.c|  6 +++---
>  .../gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c  |  2 +-
>  .../gcc.target/s390/zvector/vec_convert_from_fp16.c|  4 ++--
>  .../gcc.target/s390/zvector/vec_convert_to_fp16.c  |  4 ++--
>  .../gcc.target/s390/zvector/vec_extend_to_fp32_hi.c|  2 +-
>  .../gcc.target/s390/zvector/vec_extend_to_fp32_lo.c|  2 +-
>  .../gcc.target/s390/zvector/vec_round_from_fp32.c  |  2 +-
>  9 files changed, 18 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-builtin-types.def 
> b/gcc/config/s390/s390-builtin-types.def
> index 3d8b30cdcc8..0bf759bd77a 100644
> --- a/gcc/config/s390/s390-builtin-types.def
> +++ b/gcc/config/s390/s390-builtin-types.def
> @@ -265,9 +265,9 @@ DEF_FN_TYPE_2 (BT_FN_V2DI_V2DF_V2DF, BT_V2DI, BT_V2DF, 
> BT_V2DF)
>  DEF_FN_TYPE_2 (BT_FN_V2DI_V2DI_V2DI, BT_V2DI, BT_V2DI, BT_V2DI)
>  DEF_FN_TYPE_2 (BT_FN_V2DI_V4SI_V4SI, BT_V2DI, BT_V4SI, BT_V4SI)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_FLT_INT, BT_V4SF, BT_FLT, BT_INT)
> +DEF_FN_TYPE_2 (BT_FN_V4SF_UV8HI_UINT, BT_V4SF, BT_UV8HI, BT_UINT)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR)
>  DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_V4SF, BT_V4SF, BT_V4SF, BT_V4SF)
> -DEF_FN_TYPE_2 (BT_FN_V4SF_V8HI_UINT, BT_V4SF, BT_V8HI, BT_UINT)
>  DEF_FN_TYPE_2 (BT_FN_V4SI_BV4SI_V4SI, BT_V4SI, BT_BV4SI, BT_V4SI)
>  DEF_FN_TYPE_2 (BT_FN_V4SI_INT_VOIDCONSTPTR, BT_V4SI, BT_INT, BT_VOIDCONSTPTR)
>  DEF_FN_TYPE_2 (BT_FN_V4SI_UV4SI_UV4SI, BT_V4SI, BT_UV4SI, BT_UV4SI)
> @@ -279,7 +279,6 @@ DEF_FN_TYPE_2 (BT_FN_V8HI_BV8HI_V8HI, BT_V8HI, BT_BV8HI, 
> BT_V8HI)
>  DEF_FN_TYPE_2 (BT_FN_V8HI_UV8HI_UV8HI, BT_V8HI, BT_UV8HI, BT_UV8HI)
>  DEF_FN_TYPE_2 (BT_FN_V8HI_V16QI_V16QI, BT_V8HI, BT_V16QI, BT_V16QI)
>  DEF_FN_TYPE_2 (BT_FN_V8HI_V4SI_V4SI, BT_V8HI, BT_V4SI, BT_V4SI)
> -DEF_FN_TYPE_2 (BT_FN_V8HI_V8HI_UINT, BT_V8HI, BT_V8HI, BT_UINT)
>  DEF_FN_TYPE_2 (BT_FN_V8HI_V8HI_V8HI, BT_V8HI, BT_V8HI, BT_V8HI)
>  DEF_FN_TYPE_2 (BT_FN_VOID_UINT64PTR_UINT64, BT_VOID, BT_UINT64PTR, BT_UINT64)
>  DEF_FN_TYPE_2 (BT_FN_VOID_V2DF_FLTPTR, BT_VOID, BT_V2DF, BT_FLTPTR)
> @@ -317,6 +316,7 @@ DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_USHORT_INT, BT_UV8HI, 
> BT_UV8HI, BT_USHORT, BT_I
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_INT)
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
> BT_UV8HI)
> +DEF_FN_TYPE_3 (BT_FN_UV8HI_V4SF_V4SF_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, 
> BT_UINT)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_UV16QI_UV16QI_INTPTR, BT_V16QI, BT_UV16QI, 
> BT_UV16QI, BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_INTPTR, BT_V16QI, BT_V16QI, BT_V16QI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_V16QI, BT_V16QI, BT_V16QI, BT_V16QI, 
> BT_V16QI)
> @@ -347,7 +347,6 @@ DEF_FN_TYPE_3 (BT_FN_V4SI_V4SI_V4SI_V4SI, BT_V4SI, 
> BT_V4SI, BT_V4SI, BT_V4SI)
>  DEF_FN_TYPE_3 (BT_FN_V4SI_V8HI_V8HI_V4SI, BT_V4SI, BT_V8HI, BT_V8HI, BT_V4SI)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_UV8HI_UV8HI_INTPTR, BT_V8HI, BT_UV8HI, BT_UV8HI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_V16QI_V16QI_V8HI, BT_V8HI, BT_V16QI, BT_V16QI, 
> BT_V8HI)
> -DEF_FN_TYPE_3 (BT_FN_V8HI_V4SF_V4SF_UINT, BT_V8HI, BT_V4SF, BT_V4SF, BT_UINT)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_V4SI_V4SI_INTPTR, BT_V8HI, BT_V4SI, BT_V4SI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_V8HI_V8HI_INTPTR, BT_V8HI, BT_V8HI, BT_V8HI, 
> BT_INTPTR)
>  DEF_FN_TYPE_3 (BT_FN_V8HI_V8HI_V8HI_V8HI, BT_V8HI, BT_V8HI, BT_V8HI, BT_V8HI)
> diff --git a/gcc/config/s390/s390-builtins.def 
> b/gcc/config/s390/s390-builtin

Re: [PATCH] s390: Fix constraint for insn *cmphi_ccu

2023-11-27 Thread Stefan Schulze Frielinghaus
Ping.

On Wed, Oct 25, 2023 at 11:27:33AM +0200, Stefan Schulze Frielinghaus wrote:
> Currently for an unsigned 16-bit comparison between memory and an
> immediate where the high bit is set, a clc is emitted.  This is because
> the constant is created for mode HI and therefore sign extended.  This
> means constraint D does not hold anymore.  Since the mode already
> restricts the immediate to 16 bit, it is enough to make use of
> constraint n and chop of the high bits in the output template.
> 
> Bootstrapped and regtested on s390.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.md (*cmphi_ccu): For immediate operand 1 make
>   use of constraint n instead of D and chop of high bits in the
>   output template.
> ---
>  gcc/config/s390/s390.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index 3f29ba21442..777a20f8e77 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -1355,13 +1355,13 @@
>  (define_insn "*cmphi_ccu"
>[(set (reg CC_REGNUM)
>  (compare (match_operand:HI 0 "nonimmediate_operand" "d,d,Q,Q,BQ")
> - (match_operand:HI 1 "general_operand"  "Q,S,D,BQ,Q")))]
> + (match_operand:HI 1 "general_operand"  "Q,S,n,BQ,Q")))]
>"s390_match_ccmode (insn, CCUmode)
> && !register_operand (operands[1], HImode)"
>"@
> clm\t%0,3,%S1
> clmy\t%0,3,%S1
> -   clhhsi\t%0,%1
> +   clhhsi\t%0,%x1
> #
> #"
>[(set_attr "op_type" "RS,RSY,SIL,SS,SS")
> -- 
> 2.41.0
> 


Re: [PATCH] s390: Fix builtins floating-point convert to/from fixed

2023-11-27 Thread Stefan Schulze Frielinghaus
Ping.

On Tue, Nov 14, 2023 at 04:19:59PM +0100, Stefan Schulze Frielinghaus wrote:
> Remove flags for non-existing operands 2 and 3.
> 
> Bootstrapped on s390.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtins.def
>   (s390_vcefb,s390_vcdgb,s390_vcelfb,s390_vcdlgb,s390_vcfeb,s390_vcgdb,
>   s390_vclfeb,s390_vclgdb): Remove flags for non-existing operands
>   2 and 3.
> ---
>  gcc/config/s390/s390-builtins.def | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-builtins.def 
> b/gcc/config/s390/s390-builtins.def
> index 964d86c74a0..5bcf0d16ba3 100644
> --- a/gcc/config/s390/s390-builtins.def
> +++ b/gcc/config/s390/s390-builtins.def
> @@ -2840,10 +2840,10 @@ OB_DEF (s390_vec_double,
> s390_vec_double_s64,s390_vec_double_u64,
>  OB_DEF_VAR (s390_vec_double_s64,s390_vcdgb, 0,   
>0,  BT_OV_V2DF_V2DI)
>  OB_DEF_VAR (s390_vec_double_u64,s390_vcdlgb,0,   
>0,  BT_OV_V2DF_UV2DI)
>  
> -B_DEF  (s390_vcefb, floatv4siv4sf2, 0,   
>B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SF_V4SI)
> -B_DEF  (s390_vcdgb, floatv2div2df2, 0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DI)
> -B_DEF  (s390_vcelfb,floatunsv4siv4sf2,  0,   
>B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SF_UV4SI)
> -B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_UV2DI)
> +B_DEF  (s390_vcefb, floatv4siv4sf2, 0,   
>B_VXE2, 0,  BT_FN_V4SF_V4SI)
> +B_DEF  (s390_vcdgb, floatv2div2df2, 0,   
>B_VX,   0,  BT_FN_V2DF_V2DI)
> +B_DEF  (s390_vcelfb,floatunsv4siv4sf2,  0,   
>B_VXE2, 0,  BT_FN_V4SF_UV4SI)
> +B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0,   
>B_VX,   0,  BT_FN_V2DF_UV2DI)
>  
>  OB_DEF (s390_vec_signed,
> s390_vec_signed_flt,s390_vec_signed_dbl,B_VX,   BT_FN_OV4SI_OV4SI)
>  OB_DEF_VAR (s390_vec_signed_flt,s390_vcfeb, B_VXE2,  
>0,  BT_OV_V4SI_V4SF)
> @@ -2853,10 +2853,10 @@ OB_DEF (s390_vec_unsigned,  
> s390_vec_unsigned_flt,s390_vec_unsigned_
>  OB_DEF_VAR (s390_vec_unsigned_flt,  s390_vclfeb,B_VXE2,  
>0,  BT_OV_UV4SI_V4SF)
>  OB_DEF_VAR (s390_vec_unsigned_dbl,  s390_vclgdb,0,   
>0,  BT_OV_UV2DI_V2DF)
>  
> -B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0,   
>B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SI_V4SF)
> -B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DI_V2DF)
> -B_DEF  (s390_vclfeb,fixuns_truncv4sfv4si2, 0,
>B_VXE2, O2_U4 | O3_U3,  BT_FN_UV4SI_V4SF)
> -B_DEF  (s390_vclgdb,fixuns_truncv2dfv2di2, 0,
>B_VX,   O2_U4 | O3_U3,  BT_FN_UV2DI_V2DF)
> +B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0,   
>B_VXE2, 0,  BT_FN_V4SI_V4SF)
> +B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0,   
>B_VX,   0,  BT_FN_V2DI_V2DF)
> +B_DEF  (s390_vclfeb,fixuns_truncv4sfv4si2, 0,
>B_VXE2, 0,  BT_FN_UV4SI_V4SF)
> +B_DEF  (s390_vclgdb,fixuns_truncv2dfv2di2, 0,
>B_VX,   0,  BT_FN_UV2DI_V2DF)
>  
>  B_DEF  (s390_vfisb, vec_fpintv4sf,  0,   
>B_VXE,  O2_U4 | O3_U3,  BT_FN_V4SF_V4SF_UCHAR_UCHAR)
>  B_DEF  (s390_vfidb, vec_fpintv2df,  0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DF_UCHAR_UCHAR)
> -- 
> 2.41.0
> 


[PATCH] s390: Streamline NNPA builtins with their LLVM counterparts

2023-11-16 Thread Stefan Schulze Frielinghaus
For the opaque NNP-data type prefer unsigned over signed integer types.

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Add/remove types.
* config/s390/s390-builtins.def
(s390_vclfnhs,s390_vclfnls,s390_vcrnfs,s390_vcfn,s390_vcnf):
Replace type V8HI with UV8HI.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/vec-nnpa-fp16-convert.c: Replace V8HI
types with UV8HI.
* gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c: Dito.
* gcc.target/s390/zvector/vec_convert_from_fp16.c: Dito.
* gcc.target/s390/zvector/vec_convert_to_fp16.c: Dito.
* gcc.target/s390/zvector/vec_extend_to_fp32_hi.c: Dito.
* gcc.target/s390/zvector/vec_extend_to_fp32_lo.c: Dito.
* gcc.target/s390/zvector/vec_round_from_fp32.c: Dito.
---
 gcc/config/s390/s390-builtin-types.def |  5 ++---
 gcc/config/s390/s390-builtins.def  | 10 +-
 .../gcc.target/s390/zvector/vec-nnpa-fp16-convert.c|  6 +++---
 .../gcc.target/s390/zvector/vec-nnpa-fp32-convert-1.c  |  2 +-
 .../gcc.target/s390/zvector/vec_convert_from_fp16.c|  4 ++--
 .../gcc.target/s390/zvector/vec_convert_to_fp16.c  |  4 ++--
 .../gcc.target/s390/zvector/vec_extend_to_fp32_hi.c|  2 +-
 .../gcc.target/s390/zvector/vec_extend_to_fp32_lo.c|  2 +-
 .../gcc.target/s390/zvector/vec_round_from_fp32.c  |  2 +-
 9 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 3d8b30cdcc8..0bf759bd77a 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -265,9 +265,9 @@ DEF_FN_TYPE_2 (BT_FN_V2DI_V2DF_V2DF, BT_V2DI, BT_V2DF, 
BT_V2DF)
 DEF_FN_TYPE_2 (BT_FN_V2DI_V2DI_V2DI, BT_V2DI, BT_V2DI, BT_V2DI)
 DEF_FN_TYPE_2 (BT_FN_V2DI_V4SI_V4SI, BT_V2DI, BT_V4SI, BT_V4SI)
 DEF_FN_TYPE_2 (BT_FN_V4SF_FLT_INT, BT_V4SF, BT_FLT, BT_INT)
+DEF_FN_TYPE_2 (BT_FN_V4SF_UV8HI_UINT, BT_V4SF, BT_UV8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_UCHAR, BT_V4SF, BT_V4SF, BT_UCHAR)
 DEF_FN_TYPE_2 (BT_FN_V4SF_V4SF_V4SF, BT_V4SF, BT_V4SF, BT_V4SF)
-DEF_FN_TYPE_2 (BT_FN_V4SF_V8HI_UINT, BT_V4SF, BT_V8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_V4SI_BV4SI_V4SI, BT_V4SI, BT_BV4SI, BT_V4SI)
 DEF_FN_TYPE_2 (BT_FN_V4SI_INT_VOIDCONSTPTR, BT_V4SI, BT_INT, BT_VOIDCONSTPTR)
 DEF_FN_TYPE_2 (BT_FN_V4SI_UV4SI_UV4SI, BT_V4SI, BT_UV4SI, BT_UV4SI)
@@ -279,7 +279,6 @@ DEF_FN_TYPE_2 (BT_FN_V8HI_BV8HI_V8HI, BT_V8HI, BT_BV8HI, 
BT_V8HI)
 DEF_FN_TYPE_2 (BT_FN_V8HI_UV8HI_UV8HI, BT_V8HI, BT_UV8HI, BT_UV8HI)
 DEF_FN_TYPE_2 (BT_FN_V8HI_V16QI_V16QI, BT_V8HI, BT_V16QI, BT_V16QI)
 DEF_FN_TYPE_2 (BT_FN_V8HI_V4SI_V4SI, BT_V8HI, BT_V4SI, BT_V4SI)
-DEF_FN_TYPE_2 (BT_FN_V8HI_V8HI_UINT, BT_V8HI, BT_V8HI, BT_UINT)
 DEF_FN_TYPE_2 (BT_FN_V8HI_V8HI_V8HI, BT_V8HI, BT_V8HI, BT_V8HI)
 DEF_FN_TYPE_2 (BT_FN_VOID_UINT64PTR_UINT64, BT_VOID, BT_UINT64PTR, BT_UINT64)
 DEF_FN_TYPE_2 (BT_FN_VOID_V2DF_FLTPTR, BT_VOID, BT_V2DF, BT_FLTPTR)
@@ -317,6 +316,7 @@ DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_USHORT_INT, BT_UV8HI, 
BT_UV8HI, BT_USHORT, BT_I
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INT, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_INT)
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_INTPTR, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_UV8HI_UV8HI_UV8HI_UV8HI, BT_UV8HI, BT_UV8HI, BT_UV8HI, 
BT_UV8HI)
+DEF_FN_TYPE_3 (BT_FN_UV8HI_V4SF_V4SF_UINT, BT_UV8HI, BT_V4SF, BT_V4SF, BT_UINT)
 DEF_FN_TYPE_3 (BT_FN_V16QI_UV16QI_UV16QI_INTPTR, BT_V16QI, BT_UV16QI, 
BT_UV16QI, BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_INTPTR, BT_V16QI, BT_V16QI, BT_V16QI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V16QI_V16QI_V16QI_V16QI, BT_V16QI, BT_V16QI, BT_V16QI, 
BT_V16QI)
@@ -347,7 +347,6 @@ DEF_FN_TYPE_3 (BT_FN_V4SI_V4SI_V4SI_V4SI, BT_V4SI, BT_V4SI, 
BT_V4SI, BT_V4SI)
 DEF_FN_TYPE_3 (BT_FN_V4SI_V8HI_V8HI_V4SI, BT_V4SI, BT_V8HI, BT_V8HI, BT_V4SI)
 DEF_FN_TYPE_3 (BT_FN_V8HI_UV8HI_UV8HI_INTPTR, BT_V8HI, BT_UV8HI, BT_UV8HI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V8HI_V16QI_V16QI_V8HI, BT_V8HI, BT_V16QI, BT_V16QI, 
BT_V8HI)
-DEF_FN_TYPE_3 (BT_FN_V8HI_V4SF_V4SF_UINT, BT_V8HI, BT_V4SF, BT_V4SF, BT_UINT)
 DEF_FN_TYPE_3 (BT_FN_V8HI_V4SI_V4SI_INTPTR, BT_V8HI, BT_V4SI, BT_V4SI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V8HI_V8HI_V8HI_INTPTR, BT_V8HI, BT_V8HI, BT_V8HI, 
BT_INTPTR)
 DEF_FN_TYPE_3 (BT_FN_V8HI_V8HI_V8HI_V8HI, BT_V8HI, BT_V8HI, BT_V8HI, BT_V8HI)
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 964d86c74a0..f331eba100a 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -3037,10 +3037,10 @@ B_DEF  (s390_vstrszf,vstrszv4si,
0,
 
 /* arch 14 builtins */
 
-B_DEF  (s390_vclfnhs,vclfnhs_v8hi,  0, 
 B_NNPA, O2_U4,  BT_FN_V4SF_V8HI_UINT)
-B_DEF  (s390_vclfnls,vclfnls_v8hi,  0, 
 B_NNPA, O2_U4,  

[PATCH] s390: Fix generation of s390-gen-builtins.h

2023-11-15 Thread Stefan Schulze Frielinghaus
By default the preprocessed output includes linemarkers.  This leads to
an error if -pedantic is used as e.g. during bootstrap:

s390-gen-builtins.h:1:3: error: style of line directive is a GCC extension 
[-Werror]

Fixed by omitting linemarkers while generating s390-gen-builtins.h.

gcc/ChangeLog:

* config/s390/t-s390: Generate s390-gen-builtins.h without
linemarkers.
---
 gcc/config/s390/t-s390 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/t-s390 b/gcc/config/s390/t-s390
index 4ab9718f6e2..2e884c367de 100644
--- a/gcc/config/s390/t-s390
+++ b/gcc/config/s390/t-s390
@@ -33,4 +33,4 @@ s390-d.o: $(srcdir)/config/s390/s390-d.cc
$(POSTCOMPILE)
 
 s390-gen-builtins.h: $(srcdir)/config/s390/s390-builtins.h
-   $(COMPILER) -E $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > $@
+   $(COMPILER) -E -P $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > 
$@
-- 
2.41.0



[PATCH] s390: Fix builtins floating-point convert to/from fixed

2023-11-14 Thread Stefan Schulze Frielinghaus
Remove flags for non-existing operands 2 and 3.

Bootstrapped on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtins.def
(s390_vcefb,s390_vcdgb,s390_vcelfb,s390_vcdlgb,s390_vcfeb,s390_vcgdb,
s390_vclfeb,s390_vclgdb): Remove flags for non-existing operands
2 and 3.
---
 gcc/config/s390/s390-builtins.def | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 964d86c74a0..5bcf0d16ba3 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -2840,10 +2840,10 @@ OB_DEF (s390_vec_double,
s390_vec_double_s64,s390_vec_double_u64,
 OB_DEF_VAR (s390_vec_double_s64,s390_vcdgb, 0, 
 0,  BT_OV_V2DF_V2DI)
 OB_DEF_VAR (s390_vec_double_u64,s390_vcdlgb,0, 
 0,  BT_OV_V2DF_UV2DI)
 
-B_DEF  (s390_vcefb, floatv4siv4sf2, 0, 
 B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SF_V4SI)
-B_DEF  (s390_vcdgb, floatv2div2df2, 0, 
 B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DI)
-B_DEF  (s390_vcelfb,floatunsv4siv4sf2,  0, 
 B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SF_UV4SI)
-B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0, 
 B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_UV2DI)
+B_DEF  (s390_vcefb, floatv4siv4sf2, 0, 
 B_VXE2, 0,  BT_FN_V4SF_V4SI)
+B_DEF  (s390_vcdgb, floatv2div2df2, 0, 
 B_VX,   0,  BT_FN_V2DF_V2DI)
+B_DEF  (s390_vcelfb,floatunsv4siv4sf2,  0, 
 B_VXE2, 0,  BT_FN_V4SF_UV4SI)
+B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0, 
 B_VX,   0,  BT_FN_V2DF_UV2DI)
 
 OB_DEF (s390_vec_signed,
s390_vec_signed_flt,s390_vec_signed_dbl,B_VX,   BT_FN_OV4SI_OV4SI)
 OB_DEF_VAR (s390_vec_signed_flt,s390_vcfeb, B_VXE2,
 0,  BT_OV_V4SI_V4SF)
@@ -2853,10 +2853,10 @@ OB_DEF (s390_vec_unsigned,  
s390_vec_unsigned_flt,s390_vec_unsigned_
 OB_DEF_VAR (s390_vec_unsigned_flt,  s390_vclfeb,B_VXE2,
 0,  BT_OV_UV4SI_V4SF)
 OB_DEF_VAR (s390_vec_unsigned_dbl,  s390_vclgdb,0, 
 0,  BT_OV_UV2DI_V2DF)
 
-B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0, 
 B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SI_V4SF)
-B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0, 
 B_VX,   O2_U4 | O3_U3,  BT_FN_V2DI_V2DF)
-B_DEF  (s390_vclfeb,fixuns_truncv4sfv4si2, 0,  
 B_VXE2, O2_U4 | O3_U3,  BT_FN_UV4SI_V4SF)
-B_DEF  (s390_vclgdb,fixuns_truncv2dfv2di2, 0,  
 B_VX,   O2_U4 | O3_U3,  BT_FN_UV2DI_V2DF)
+B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0, 
 B_VXE2, 0,  BT_FN_V4SI_V4SF)
+B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0, 
 B_VX,   0,  BT_FN_V2DI_V2DF)
+B_DEF  (s390_vclfeb,fixuns_truncv4sfv4si2, 0,  
 B_VXE2, 0,  BT_FN_UV4SI_V4SF)
+B_DEF  (s390_vclgdb,fixuns_truncv2dfv2di2, 0,  
 B_VX,   0,  BT_FN_UV2DI_V2DF)
 
 B_DEF  (s390_vfisb, vec_fpintv4sf,  0, 
 B_VXE,  O2_U4 | O3_U3,  BT_FN_V4SF_V4SF_UCHAR_UCHAR)
 B_DEF  (s390_vfidb, vec_fpintv2df,  0, 
 B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_V2DF_UCHAR_UCHAR)
-- 
2.41.0



[PATCH] s390: Fix vec_scatter_element for vectors of floats

2023-11-14 Thread Stefan Schulze Frielinghaus
The offset for vec_scatter_element of floats should be a vector of type
UV4SI instead of V4SF.  Note, this is an incompatibility change.

Bootstrapped on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtin-types.def: Add/remove types.
* config/s390/s390-builtins.def (s390_vec_scatter_element_flt):
The type for the offset should be UV4SI instead of V4SF.
---
 gcc/config/s390/s390-builtin-types.def | 2 +-
 gcc/config/s390/s390-builtins.def  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 3d8b30cdcc8..22ee348dbbb 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -856,7 +856,7 @@ DEF_OV_TYPE (BT_OV_VOID_V2DI_LONG_LONGLONGPTR, BT_VOID, 
BT_V2DI, BT_LONG, BT_LON
 DEF_OV_TYPE (BT_OV_VOID_V2DI_UV2DI_LONGLONGPTR_ULONGLONG, BT_VOID, BT_V2DI, 
BT_UV2DI, BT_LONGLONGPTR, BT_ULONGLONG)
 DEF_OV_TYPE (BT_OV_VOID_V4SF_FLTPTR_UINT, BT_VOID, BT_V4SF, BT_FLTPTR, BT_UINT)
 DEF_OV_TYPE (BT_OV_VOID_V4SF_LONG_FLTPTR, BT_VOID, BT_V4SF, BT_LONG, BT_FLTPTR)
-DEF_OV_TYPE (BT_OV_VOID_V4SF_V4SF_FLTPTR_ULONGLONG, BT_VOID, BT_V4SF, BT_V4SF, 
BT_FLTPTR, BT_ULONGLONG)
+DEF_OV_TYPE (BT_OV_VOID_V4SF_UV4SI_FLTPTR_ULONGLONG, BT_VOID, BT_V4SF, 
BT_UV4SI, BT_FLTPTR, BT_ULONGLONG)
 DEF_OV_TYPE (BT_OV_VOID_V4SI_INTPTR_UINT, BT_VOID, BT_V4SI, BT_INTPTR, BT_UINT)
 DEF_OV_TYPE (BT_OV_VOID_V4SI_LONG_INTPTR, BT_VOID, BT_V4SI, BT_LONG, BT_INTPTR)
 DEF_OV_TYPE (BT_OV_VOID_V4SI_UV4SI_INTPTR_ULONGLONG, BT_VOID, BT_V4SI, 
BT_UV4SI, BT_INTPTR, BT_ULONGLONG)
diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 964d86c74a0..b59fa09fe07 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -708,7 +708,7 @@ OB_DEF_VAR (s390_vec_scatter_element_u32,s390_vscef,
0,
 OB_DEF_VAR (s390_vec_scatter_element_s64,s390_vsceg,0, 
 O4_U1,  BT_OV_VOID_V2DI_UV2DI_LONGLONGPTR_ULONGLONG)
 OB_DEF_VAR (s390_vec_scatter_element_b64,s390_vsceg,0, 
 O4_U1,  BT_OV_VOID_BV2DI_UV2DI_ULONGLONGPTR_ULONGLONG)
 OB_DEF_VAR (s390_vec_scatter_element_u64,s390_vsceg,0, 
 O4_U1,  BT_OV_VOID_UV2DI_UV2DI_ULONGLONGPTR_ULONGLONG)
-OB_DEF_VAR (s390_vec_scatter_element_flt,s390_vscef,B_VXE, 
 O4_U2,  BT_OV_VOID_V4SF_V4SF_FLTPTR_ULONGLONG)
+OB_DEF_VAR (s390_vec_scatter_element_flt,s390_vscef,B_VXE, 
 O4_U2,  BT_OV_VOID_V4SF_UV4SI_FLTPTR_ULONGLONG)
 OB_DEF_VAR (s390_vec_scatter_element_dbl,s390_vsceg,0, 
 O4_U1,  BT_OV_VOID_V2DF_UV2DI_DBLPTR_ULONGLONG)
 
 B_DEF  (s390_vscef, vec_scatter_elementv4si,0, 
 B_VX,   O4_U2,  
BT_FN_VOID_UV4SI_UV4SI_UINTPTR_ULONGLONG)
-- 
2.41.0



[PATCH 2/3] s390: Add expand_perm_reverse_elements

2023-11-09 Thread Stefan Schulze Frielinghaus
Replace expand_perm_with_rot, expand_perm_with_vster, and
expand_perm_with_vstbrq with a general implementation
expand_perm_reverse_elements.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_rot): Remove.
(expand_perm_reverse_elements): New.
(expand_perm_with_vster): Remove.
(expand_perm_with_vstbrq): Remove.
(vectorize_vec_perm_const_1): Replace removed functions with new
one.
---
 gcc/config/s390/s390.cc | 88 -
 1 file changed, 16 insertions(+), 72 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 185eb59f8b8..e36efec8ddc 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -17693,78 +17693,28 @@ is_reverse_perm_mask (const struct expand_vec_perm_d 
)
   return true;
 }
 
-/* The case of reversing a four-element vector [0, 1, 2, 3]
-   can be handled by first permuting the doublewords
-   [2, 3, 0, 1] and subsequently rotating them by 32 bits.  */
 static bool
-expand_perm_with_rot (const struct expand_vec_perm_d )
+expand_perm_reverse_elements (const struct expand_vec_perm_d )
 {
-  if (d.nelt != 4)
+  if (d.op0 != d.op1 || !is_reverse_perm_mask (d))
 return false;
 
-  if (d.op0 == d.op1 && is_reverse_perm_mask (d))
-{
-  if (d.testing_p)
-   return true;
-
-  rtx tmp = gen_reg_rtx (d.vmode);
-  rtx op0_reg = force_reg (GET_MODE (d.op0), d.op0);
-
-  emit_insn (gen_vpdi4_2 (d.vmode, tmp, op0_reg, op0_reg));
-  if (d.vmode == V4SImode)
-   emit_insn (gen_rotlv4si3_di (d.target, tmp));
-  else if (d.vmode == V4SFmode)
-   emit_insn (gen_rotlv4sf3_di (d.target, tmp));
-
-  return true;
-}
-
-  return false;
-}
+  if (d.testing_p)
+return true;
 
-/* If we just reverse the elements, emit an eltswap if we have
-   vler/vster.  */
-static bool
-expand_perm_with_vster (const struct expand_vec_perm_d )
-{
-  if (TARGET_VXE2 && d.op0 == d.op1 && is_reverse_perm_mask (d)
-  && (d.vmode == V2DImode || d.vmode == V2DFmode
- || d.vmode == V4SImode || d.vmode == V4SFmode
- || d.vmode == V8HImode))
+  switch (d.vmode)
 {
-  if (d.testing_p)
-   return true;
-
-  if (d.vmode == V2DImode)
-   emit_insn (gen_eltswapv2di (d.target, d.op0));
-  else if (d.vmode == V2DFmode)
-   emit_insn (gen_eltswapv2df (d.target, d.op0));
-  else if (d.vmode == V4SImode)
-   emit_insn (gen_eltswapv4si (d.target, d.op0));
-  else if (d.vmode == V4SFmode)
-   emit_insn (gen_eltswapv4sf (d.target, d.op0));
-  else if (d.vmode == V8HImode)
-   emit_insn (gen_eltswapv8hi (d.target, d.op0));
-  return true;
+case V1TImode: emit_move_insn (d.target, d.op0); break;
+case V2DImode: emit_insn (gen_eltswapv2di (d.target, d.op0)); break;
+case V4SImode: emit_insn (gen_eltswapv4si (d.target, d.op0)); break;
+case V8HImode: emit_insn (gen_eltswapv8hi (d.target, d.op0)); break;
+case V16QImode: emit_insn (gen_eltswapv16qi (d.target, d.op0)); break;
+case V2DFmode: emit_insn (gen_eltswapv2df (d.target, d.op0)); break;
+case V4SFmode: emit_insn (gen_eltswapv4sf (d.target, d.op0)); break;
+default: gcc_unreachable();
 }
-  return false;
-}
 
-/* If we reverse a byte-vector this is the same as
-   byte reversing it which can be done with vstbrq.  */
-static bool
-expand_perm_with_vstbrq (const struct expand_vec_perm_d )
-{
-  if (TARGET_VXE2 && d.op0 == d.op1 && is_reverse_perm_mask (d)
-  && d.vmode == V16QImode)
-{
-  if (d.testing_p)
-   return true;
-
-  emit_insn (gen_eltswapv16qi (d.target, d.op0));
-  return true;
-}
-  return false;
+  return true;
 }
 
 /* Try to emit vlbr/vstbr.  Note, this is only a candidate insn since
@@ -17826,21 +17776,15 @@ expand_perm_as_a_vlbr_vstbr_candidate (const struct 
expand_vec_perm_d )
 static bool
 vectorize_vec_perm_const_1 (const struct expand_vec_perm_d )
 {
-  if (expand_perm_with_merge (d))
-return true;
-
-  if (expand_perm_with_vster (d))
+  if (expand_perm_reverse_elements (d))
 return true;
 
-  if (expand_perm_with_vstbrq (d))
+  if (expand_perm_with_merge (d))
 return true;
 
   if (expand_perm_with_vpdi (d))
 return true;
 
-  if (expand_perm_with_rot (d))
-return true;
-
   if (expand_perm_as_a_vlbr_vstbr_candidate (d))
 return true;
 
-- 
2.41.0



[PATCH] s390: Reduce number of patterns where the condition is false anyway

2023-11-09 Thread Stefan Schulze Frielinghaus
For patterns which make use of two modes, do not build the cross product
and then exclude illegal combinations via conditions but rather do not
create those in the first place.  Here we are following the idea of the
attribute TOINTVEC/tointvec and introduce TOINT/toint.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390.md (VX_CONV_INT): Remove iterator.
(gf): Add float mappings.
(TOINT, toint): New attribute.
(*fixuns_trunc2_z13):
Remove.
(*fixuns_trunc2_z13): Add.
(*fix_trunc2_bfp_z13):
Remove.
(*fix_trunc2_bfp_z13): Add.
(*floatuns2_z13): Remove.
(*floatuns2_z13): Add.
* config/s390/vector.md (VX_VEC_CONV_INT): Remove iterator.
(float2): Remove.
(float2): Add.
(floatuns2): Remove.
(floatuns2): Add.
(fix_trunc2):
Remove.
(fix_trunc2): Add.
(fixuns_trunc2):
Remove.
(fixuns_trunc2): Add.
---
 gcc/config/s390/s390.md   | 52 +++
 gcc/config/s390/vector.md | 45 +++--
 2 files changed, 46 insertions(+), 51 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 3f29ba21442..0ea2aaf7627 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -668,7 +668,6 @@
 
 ; 32 bit int<->fp conversion instructions are available since VXE2 (z15).
 (define_mode_iterator VX_CONV_BFP [DF (SF "TARGET_VXE2")])
-(define_mode_iterator VX_CONV_INT [DI (SI "TARGET_VXE2")])
 
 ;; These mode iterators allow 31-bit and 64-bit GPR patterns to be generated
 ;; from the same template.
@@ -838,7 +837,7 @@
 
 ;; In GPR templates, a string like "cdbr" will expand to "cgdbr" in DImode
 ;; and "cfdbr" in SImode.
-(define_mode_attr gf [(DI "g") (SI "f")])
+(define_mode_attr gf [(DI "g") (SI "f") (DF "g") (SF "f")])
 
 ;; In GPR templates, a string like sll will expand to sllg for DI
 ;; and sllk for SI.  This way it is possible to merge the new z196 SI
@@ -897,6 +896,10 @@
 (define_mode_attr asm_fcmp [(CCVEQ "e") (CCVFH "h") (CCVFHE "he")])
 (define_mode_attr insn_cmp [(CCVEQ "eq") (CCVIH "h") (CCVIHU "hl") (CCVFH "h") 
(CCVFHE "he")])
 
+; Analogue to TOINTVEC / tointvec
+(define_mode_attr TOINT [(TF "TI") (DF "DI") (SF "SI")])
+(define_mode_attr toint [(TF "ti") (DF "di") (SF "si")])
+
 ;; Subst pattern definitions
 (include "subst.md")
 
@@ -5266,16 +5269,15 @@
 
 ; df -> unsigned di, vxe2: sf -> unsigned si
 ; clgdbr, clfebr, wclgdb, wclfeb
-(define_insn "*fixuns_trunc2_z13"
-  [(set (match_operand:VX_CONV_INT   0 
"register_operand" "=d,v")
-   (unsigned_fix:VX_CONV_INT (match_operand:VX_CONV_BFP 1 
"register_operand"  "f,v")))
-   (unspec:DI [(match_operand:DI 2 
"immediate_operand" "K,K")] UNSPEC_ROUND)
+(define_insn "*fixuns_trunc2_z13"
+  [(set (match_operand:   0 "register_operand" 
"=d,v")
+   (unsigned_fix: (match_operand:VX_CONV_BFP 1 "register_operand"  
"f,v")))
+   (unspec:DI [(match_operand:DI 2 "immediate_operand" 
"K,K")] UNSPEC_ROUND)
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_VX && TARGET_HARD_FLOAT
-   && GET_MODE_SIZE (mode) == GET_MODE_SIZE 
(mode)"
+  "TARGET_VX && TARGET_HARD_FLOAT"
   "@
-   clbr\t%0,%h2,%1,0
-   wclb\t%v0,%v1,0,%h2"
+   clbr\t%0,%h2,%1,0
+   wclb\t%v0,%v1,0,%h2"
   [(set_attr "op_type" "RRF,VRR")
(set_attr "type""ftoi")])
 
@@ -5305,16 +5307,15 @@
 
 ; df -> signed di, vxe2: sf -> signed si
 ; cgdbr, cfebr, wcgdb, wcfeb
-(define_insn "*fix_trunc2_bfp_z13"
-  [(set (match_operand:VX_CONV_INT  0 "register_operand" 
"=d,v")
-(fix:VX_CONV_INT (match_operand:VX_CONV_BFP 1 "register_operand"  
"f,v")))
-   (unspec:VX_CONV_INT [(match_operand:VX_CONV_INT  2 "immediate_operand" 
"K,K")] UNSPEC_ROUND)
+(define_insn "*fix_trunc2_bfp_z13"
+  [(set (match_operand:  0 "register_operand" "=d,v")
+(fix: (match_operand:VX_CONV_BFP 1 "register_operand"  "f,v")))
+   (unspec: [(match_operand:  2 "immediate_operand" "K,K")] 
UNSPEC_ROUND)
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_VX && TARGET_HARD_FLOAT
-   && GET_MODE_SIZE (mode) == GET_MODE_SIZE 
(mode)"
+  "TARGET_VX && TARGET_HARD_FLOAT"
   "@
-   cbr\t%0,%h2,%1
-   wcb\t%v0,%v1,0,%h2"
+   cbr\t%0,%h2,%1
+   wcb\t%v0,%v1,0,%h2"
   [(set_attr "op_type" "RRE,VRR")
(set_attr "type""ftoi")])
 
@@ -5420,14 +5421,13 @@
 ; floatuns(si|di)(tf|df|sf|td|dd)2 instruction pattern(s).
 ;
 
-(define_insn "*floatuns2_z13"
-  [(set (match_operand:VX_CONV_BFP 0 
"register_operand" "=f,v")
-(unsigned_float:VX_CONV_BFP (match_operand:VX_CONV_INT 1 
"register_operand"  "d,v")))]
-  "TARGET_VX && TARGET_HARD_FLOAT
-   && GET_MODE_SIZE (mode) == GET_MODE_SIZE 
(mode)"
+(define_insn "*floatuns2_z13"
+  [(set (match_operand:VX_CONV_BFP 

[PATCH 3/3] s390: Revise vector reverse elements

2023-11-09 Thread Stefan Schulze Frielinghaus
Replace UNSPEC_VEC_ELTSWAP with a vec_select implementation.

Furthermore, for a vector reverse elements operation between registers
of mode V8HI perform three rotates instead of a vperm operation since
the latter involves loading the permutation vector from the literal
pool.

Prior z15, instead of
  larl + vl + vl + vperm
prefer
  vl + vpdi (+ verllg (+ verllf))
for a load operation.

Likewise, prior z15, instead of
  larl + vl + vperm + vst
prefer
  vpdi (+ verllg (+ verllf)) + vst
for a store operation.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390.md: Remove UNSPEC_VEC_ELTSWAP.
* config/s390/vector.md (eltswapv16qi): New expander.
(*eltswapv16qi): New insn and splitter.
(eltswapv8hi): New insn and splitter.
(eltswap): New insn and splitter for modes V_HW_4 as well
as V_HW_2.
* config/s390/vx-builtins.md (eltswap): Remove.
(*eltswapv16qi): Remove.
(*eltswap): Remove.
(*eltswap_emu): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/vec-reve-load-halfword-z14.c: Remove
vperm and substitude by vpdi et al.
* gcc.target/s390/zvector/vec-reve-load-halfword.c: Likewise.
* gcc.target/s390/vector/reverse-elements-1.c: New test.
* gcc.target/s390/vector/reverse-elements-2.c: New test.
* gcc.target/s390/vector/reverse-elements-3.c: New test.
* gcc.target/s390/vector/reverse-elements-4.c: New test.
* gcc.target/s390/vector/reverse-elements-5.c: New test.
* gcc.target/s390/vector/reverse-elements-6.c: New test.
* gcc.target/s390/vector/reverse-elements-7.c: New test.
---
 gcc/config/s390/s390.md   |   2 -
 gcc/config/s390/vector.md | 146 ++
 gcc/config/s390/vx-builtins.md| 143 -
 .../s390/vector/reverse-elements-1.c  |  46 ++
 .../s390/vector/reverse-elements-2.c  |  16 ++
 .../s390/vector/reverse-elements-3.c  |  56 +++
 .../s390/vector/reverse-elements-4.c  |  67 
 .../s390/vector/reverse-elements-5.c  |  56 +++
 .../s390/vector/reverse-elements-6.c  |  67 
 .../s390/vector/reverse-elements-7.c  |  67 
 .../s390/zvector/vec-reve-load-halfword-z14.c |   4 +-
 .../s390/zvector/vec-reve-load-halfword.c |   4 +-
 12 files changed, 527 insertions(+), 147 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 3f29ba21442..f5e559c1ba4 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -241,8 +241,6 @@
UNSPEC_VEC_VFMIN
UNSPEC_VEC_VFMAX
 
-   UNSPEC_VEC_ELTSWAP
-
UNSPEC_NNPA_VCLFNHS_V8HI
UNSPEC_NNPA_VCLFNLS_V8HI
UNSPEC_NNPA_VCRNFS_V8HI
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 7d1eb36e844..c478fce09df 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -948,6 +948,152 @@
   operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8);
 })
 
+;; VECTOR REVERSE ELEMENTS V16QI
+
+(define_expand "eltswapv16qi"
+  [(parallel
+[(set (match_operand:V16QI  0 "nonimmediate_operand")
+ (vec_select:V16QI
+  (match_operand:V16QI 1 "nonimmediate_operand")
+  (match_dup 2)))
+ (use (match_dup 3))])]
+  "TARGET_VX"
+{
+  rtvec vec = rtvec_alloc (16);
+  for (int i = 0; i < 16; ++i)
+RTVEC_ELT (vec, i) = GEN_INT (15 - i);
+  operands[2] = gen_rtx_PARALLEL (VOIDmode, vec);
+  operands[3] = gen_rtx_CONST_VECTOR (V16QImode, vec);
+})
+
+(define_insn_and_split "*eltswapv16qi"
+  [(set (match_operand:V16QI  0 "nonimmediate_operand" "=v,^R,^v")
+   (vec_select:V16QI
+(match_operand:V16QI 1 "nonimmediate_operand"  "v,^v,^R")
+(parallel [(const_int 15)
+   (const_int 14)
+   (const_int 13)
+   (const_int 12)
+   (const_int 11)
+   (const_int 10)
+   (const_int 9)
+   (const_int 8)
+   (const_int 7)
+   (const_int 6)
+   (const_int 5)
+   (const_int 4)
+   (const_int 3)
+   (const_int 2)
+   (const_int 1)
+   (const_int 0)])))
+   (use (match_operand:V16QI 2 "permute_pattern_operand" "v,X,X"))]
+  

[PATCH 1/3] s390: Recognize further vpdi and vmr{l,h} pattern

2023-11-09 Thread Stefan Schulze Frielinghaus
Deal with cases where vpdi and vmr{l,h} are still applicable if the
operands of those instructions are swapped.  For example, currently for

V2DI foo (V2DI x)
{
  return (V2DI) {x[1], x[0]};
}

the assembler sequence

vlgvg   %r1,%v24,1
vzero   %v0
vlvgg   %v0,%r1,0
vmrhg   %v24,%v0,%v24

is emitted.  With this patch a single vpdi is emitted.

Extensive tests are included in a subsequent patch of this series where
more cases are covered.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_merge): Deal with cases
where vmr{l,h} are still applicable if the operands are swapped.
(expand_perm_with_vpdi): Likewise for vpdi.
---
 gcc/config/s390/s390.cc | 118 ++--
 1 file changed, 90 insertions(+), 28 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 64f56d8effa..185eb59f8b8 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -17532,40 +17532,86 @@ struct expand_vec_perm_d
 static bool
 expand_perm_with_merge (const struct expand_vec_perm_d )
 {
-  bool merge_lo_p = true;
-  bool merge_hi_p = true;
-
-  if (d.nelt % 2)
+  static const unsigned char hi_perm_di[2] = {0, 2};
+  static const unsigned char hi_perm_si[4] = {0, 4, 1, 5};
+  static const unsigned char hi_perm_hi[8] = {0, 8, 1, 9, 2, 10, 3, 11};
+  static const unsigned char hi_perm_qi[16]
+= {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23};
+
+  static const unsigned char hi_perm_di_swap[2] = {2, 0};
+  static const unsigned char hi_perm_si_swap[4] = {4, 0, 6, 2};
+  static const unsigned char hi_perm_hi_swap[8] = {8, 0, 10, 2, 12, 4, 14, 6};
+  static const unsigned char hi_perm_qi_swap[16]
+= {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14};
+
+  static const unsigned char lo_perm_di[2] = {1, 3};
+  static const unsigned char lo_perm_si[4] = {2, 6, 3, 7};
+  static const unsigned char lo_perm_hi[8] = {4, 12, 5, 13, 6, 14, 7, 15};
+  static const unsigned char lo_perm_qi[16]
+= {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31};
+
+  static const unsigned char lo_perm_di_swap[2] = {3, 1};
+  static const unsigned char lo_perm_si_swap[4] = {5, 1, 7, 3};
+  static const unsigned char lo_perm_hi_swap[8] = {9, 1, 11, 3, 13, 5, 15, 7};
+  static const unsigned char lo_perm_qi_swap[16]
+= {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15};
+
+  bool merge_lo_p = false;
+  bool merge_hi_p = false;
+  bool swap_operands_p = false;
+
+  if ((d.nelt == 2 && memcmp (d.perm, hi_perm_di, 2) == 0)
+  || (d.nelt == 4 && memcmp (d.perm, hi_perm_si, 4) == 0)
+  || (d.nelt == 8 && memcmp (d.perm, hi_perm_hi, 8) == 0)
+  || (d.nelt == 16 && memcmp (d.perm, hi_perm_qi, 16) == 0))
+{
+  merge_hi_p = true;
+}
+  else if ((d.nelt == 2 && memcmp (d.perm, hi_perm_di_swap, 2) == 0)
+  || (d.nelt == 4 && memcmp (d.perm, hi_perm_si_swap, 4) == 0)
+  || (d.nelt == 8 && memcmp (d.perm, hi_perm_hi_swap, 8) == 0)
+  || (d.nelt == 16 && memcmp (d.perm, hi_perm_qi_swap, 16) == 0))
+{
+  merge_hi_p = true;
+  swap_operands_p = true;
+}
+  else if ((d.nelt == 2 && memcmp (d.perm, lo_perm_di, 2) == 0)
+  || (d.nelt == 4 && memcmp (d.perm, lo_perm_si, 4) == 0)
+  || (d.nelt == 8 && memcmp (d.perm, lo_perm_hi, 8) == 0)
+  || (d.nelt == 16 && memcmp (d.perm, lo_perm_qi, 16) == 0))
+{
+  merge_lo_p = true;
+}
+  else if ((d.nelt == 2 && memcmp (d.perm, lo_perm_di_swap, 2) == 0)
+  || (d.nelt == 4 && memcmp (d.perm, lo_perm_si_swap, 4) == 0)
+  || (d.nelt == 8 && memcmp (d.perm, lo_perm_hi_swap, 8) == 0)
+  || (d.nelt == 16 && memcmp (d.perm, lo_perm_qi_swap, 16) == 0))
+{
+  merge_lo_p = true;
+  swap_operands_p = true;
+}
+
+  if (!merge_lo_p && !merge_hi_p)
 return false;
 
-  // For V4SI this checks for: { 0, 4, 1, 5 }
-  for (int telt = 0; telt < d.nelt; telt++)
-if (d.perm[telt] != telt / 2 + (telt % 2) * d.nelt)
-  {
-   merge_hi_p = false;
-   break;
-  }
+  if (d.testing_p)
+return merge_lo_p || merge_hi_p;
 
-  if (!merge_hi_p)
+  rtx op0, op1;
+  if (swap_operands_p)
 {
-  // For V4SI this checks for: { 2, 6, 3, 7 }
-  for (int telt = 0; telt < d.nelt; telt++)
-   if (d.perm[telt] != (telt + d.nelt) / 2 + (telt % 2) * d.nelt)
- {
-   merge_lo_p = false;
-   break;
- }
+  op0 = d.op1;
+  op1 = d.op0;
 }
   else
-merge_lo_p = false;
-
-  if (d.testing_p)
-return merge_lo_p || merge_hi_p;
+{
+  op0 = d.op0;
+  op1 = d.op1;
+}
 
-  if (merge_lo_p || merge_hi_p)
-s390_expand_merge (d.target, d.op0, d.op1, merge_hi_p);
+  s390_expand_merge (d.target, op0, op1, merge_hi_p);
 
-  return merge_lo_p || merge_hi_p;
+  return true;
 }
 
 /* Try to expand the vector permute operation described by D using the
@@ -17582,6 

[PATCH] s390: Fix constraint for insn *cmphi_ccu

2023-10-25 Thread Stefan Schulze Frielinghaus
Currently for an unsigned 16-bit comparison between memory and an
immediate where the high bit is set, a clc is emitted.  This is because
the constant is created for mode HI and therefore sign extended.  This
means constraint D does not hold anymore.  Since the mode already
restricts the immediate to 16 bit, it is enough to make use of
constraint n and chop of the high bits in the output template.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390.md (*cmphi_ccu): For immediate operand 1 make
use of constraint n instead of D and chop of high bits in the
output template.
---
 gcc/config/s390/s390.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 3f29ba21442..777a20f8e77 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -1355,13 +1355,13 @@
 (define_insn "*cmphi_ccu"
   [(set (reg CC_REGNUM)
 (compare (match_operand:HI 0 "nonimmediate_operand" "d,d,Q,Q,BQ")
- (match_operand:HI 1 "general_operand"  "Q,S,D,BQ,Q")))]
+ (match_operand:HI 1 "general_operand"  "Q,S,n,BQ,Q")))]
   "s390_match_ccmode (insn, CCUmode)
&& !register_operand (operands[1], HImode)"
   "@
clm\t%0,3,%S1
clmy\t%0,3,%S1
-   clhhsi\t%0,%1
+   clhhsi\t%0,%x1
#
#"
   [(set_attr "op_type" "RS,RSY,SIL,SS,SS")
-- 
2.41.0



[PATCH] testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c

2023-10-24 Thread Stefan Schulze Frielinghaus
Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.

gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not 
supported on this target
  237 | _BitInt(32) b32_v;
  | ^~~

Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.

Tested on s390x and x86_64.  Ok for mainline?

gcc/testsuite/ChangeLog:

* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.
---
 gcc/testsuite/gcc.misc-tests/godump-1.c | 12 
 gcc/testsuite/gcc.misc-tests/godump-2.c | 18 ++
 2 files changed, 18 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.misc-tests/godump-2.c

diff --git a/gcc/testsuite/gcc.misc-tests/godump-1.c 
b/gcc/testsuite/gcc.misc-tests/godump-1.c
index f359a657827..b661d04719c 100644
--- a/gcc/testsuite/gcc.misc-tests/godump-1.c
+++ b/gcc/testsuite/gcc.misc-tests/godump-1.c
@@ -234,18 +234,6 @@ const char cc_v1;
 cc_t cc_v2;
 /* { dg-final { scan-file godump-1.out "(?n)^var _cc_v2 _cc_t$" } } */
 
-_BitInt(32) b32_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b32_v int32$" } } */
-
-_BitInt(64) b64_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b64_v int64$" } } */
-
-unsigned _BitInt(32) b32u_v;
-/* { dg-final { scan-file godump-1.out "(?n)^var _b32u_v uint32$" } } */
-
-_BitInt(33) b33_v;
-/* { dg-final { scan-file godump-1.out "(?n)^// var _b33_v INVALID-bitint-33$" 
} } */
-
 /*** pointer and array types ***/
 typedef void *vp_t;
 /* { dg-final { scan-file godump-1.out "(?n)^type _vp_t \\*byte$" } } */
diff --git a/gcc/testsuite/gcc.misc-tests/godump-2.c 
b/gcc/testsuite/gcc.misc-tests/godump-2.c
new file mode 100644
index 000..ed093c964ac
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/godump-2.c
@@ -0,0 +1,18 @@
+/* { dg-options "-c -fdump-go-spec=godump-2.out" } */
+/* { dg-do compile { target bitint } } */
+/* { dg-skip-if "not supported for target" { ! "alpha*-*-* s390*-*-* i?86-*-* 
x86_64-*-*" } } */
+/* { dg-skip-if "not supported for target" { ! lp64 } } */
+
+_BitInt(32) b32_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b32_v int32$" } } */
+
+_BitInt(64) b64_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b64_v int64$" } } */
+
+unsigned _BitInt(32) b32u_v;
+/* { dg-final { scan-file godump-2.out "(?n)^var _b32u_v uint32$" } } */
+
+_BitInt(33) b33_v;
+/* { dg-final { scan-file godump-2.out "(?n)^// var _b33_v INVALID-bitint-33$" 
} } */
+
+/* { dg-final { remove-build-file "godump-2.out" } } */
-- 
2.41.0



[PATCH] s390: Fix expander popcountv8hi2_vx

2023-10-16 Thread Stefan Schulze Frielinghaus
The normal form of a CONST_INT which represents an integer of a mode
with fewer bits than in HOST_WIDE_INT is sign extended.  This even holds
for unsigned integers.

This fixes an ICE during cse1 where we bail out at rtl.h:2297 since
INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) does not hold.

gcc/ChangeLog:

* config/s390/vector.md (popcountv8hi2_vx): Sign extend each
unsigned vector element.
---
 gcc/config/s390/vector.md | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index f0e9ed3d263..7d1eb36e844 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1154,14 +1154,14 @@
(plus:V16QI (match_dup 2) (match_dup 3)))
; Generate mask for the odd numbered byte elements
(set (match_dup 3)
-   (const_vector:V16QI [(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)]))
+   (const_vector:V16QI [(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)]))
; Zero out the even indexed bytes
(set (match_operand:V8HI 0 "register_operand" "=v")
(and:V8HI (subreg:V8HI (match_dup 2) 0)
-- 
2.41.0



  1   2   3   4   5   6   >