Re: [PATCH] Simplify & expand c_readstr

2023-09-28 Thread Richard Biener
On Thu, Sep 28, 2023 at 3:37 PM Richard Sandiford
 wrote:
>
> c_readstr only operated on integer modes.  It worked by reading
> the source string into an array of HOST_WIDE_INTs, converting
> that array into a wide_int, and from there to an rtx.
>
> It's simpler to do this by building a target memory image and
> using native_decode_rtx to convert that memory image into an rtx.
> It avoids all the endianness shenanigans because both the string and
> native_decode_rtx follow target memory order.  It also means that the
> function can handle all fixed-size modes, which simplifies callers
> and allows vector modes to be used more widely.
>
> Tested on aarch64-linux-gnu so far.  OK to install?

OK.

Richard.

> Richard
>
>
> gcc/
> * builtins.h (c_readstr): Take a fixed_size_mode rather than a
> scalar_int_mode.
> * builtins.cc (c_readstr): Likewise.  Build a local array of
> bytes and use native_decode_rtx to get the rtx image.
> (builtin_memcpy_read_str): Simplify accordingly.
> (builtin_strncpy_read_str): Likewise.
> (builtin_memset_read_str): Likewise.
> (builtin_memset_gen_str: Likewise.
> * expr.cc (string_cst_read_str): Likewise.
> ---
>  gcc/builtins.cc | 46 +++---
>  gcc/builtins.h  |  2 +-
>  gcc/expr.cc |  5 ++---
>  3 files changed, 14 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 40dfd36a319..cb90bd03b3e 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -743,39 +743,22 @@ c_strlen (tree arg, int only_value, c_strlen_data 
> *data, unsigned eltsize)
> as needed.  */
>
>  rtx
> -c_readstr (const char *str, scalar_int_mode mode,
> +c_readstr (const char *str, fixed_size_mode mode,
>bool null_terminated_p/*=true*/)
>  {
> -  HOST_WIDE_INT ch;
> -  unsigned int i, j;
> -  HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
> +  auto_vec bytes;
>
> -  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
> -  unsigned int len = (GET_MODE_PRECISION (mode) + HOST_BITS_PER_WIDE_INT - 1)
> -/ HOST_BITS_PER_WIDE_INT;
> +  bytes.reserve (GET_MODE_SIZE (mode));
>
> -  gcc_assert (len <= MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT);
> -  for (i = 0; i < len; i++)
> -tmp[i] = 0;
> -
> -  ch = 1;
> -  for (i = 0; i < GET_MODE_SIZE (mode); i++)
> +  target_unit ch = 1;
> +  for (unsigned int i = 0; i < GET_MODE_SIZE (mode); ++i)
>  {
> -  j = i;
> -  if (WORDS_BIG_ENDIAN)
> -   j = GET_MODE_SIZE (mode) - i - 1;
> -  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
> - && GET_MODE_SIZE (mode) >= UNITS_PER_WORD)
> -   j = j + UNITS_PER_WORD - 2 * (j % UNITS_PER_WORD) - 1;
> -  j *= BITS_PER_UNIT;
> -
>if (ch || !null_terminated_p)
> ch = (unsigned char) str[i];
> -  tmp[j / HOST_BITS_PER_WIDE_INT] |= ch << (j % HOST_BITS_PER_WIDE_INT);
> +  bytes.quick_push (ch);
>  }
>
> -  wide_int c = wide_int::from_array (tmp, len, GET_MODE_PRECISION (mode));
> -  return immed_wide_int_const (c, mode);
> +  return native_decode_rtx (mode, bytes, 0);
>  }
>
>  /* Cast a target constant CST to target CHAR and if that value fits into
> @@ -3530,10 +3513,7 @@ builtin_memcpy_read_str (void *data, void *, 
> HOST_WIDE_INT offset,
>   string but the caller guarantees it's large enough for MODE.  */
>const char *rep = (const char *) data;
>
> -  /* The by-pieces infrastructure does not try to pick a vector mode
> - for memcpy expansion.  */
> -  return c_readstr (rep + offset, as_a  (mode),
> -   /*nul_terminated=*/false);
> +  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
>  }
>
>  /* LEN specify length of the block of memcpy/memset operation.
> @@ -3994,9 +3974,7 @@ builtin_strncpy_read_str (void *data, void *, 
> HOST_WIDE_INT offset,
>if ((unsigned HOST_WIDE_INT) offset > strlen (str))
>  return const0_rtx;
>
> -  /* The by-pieces infrastructure does not try to pick a vector mode
> - for strncpy expansion.  */
> -  return c_readstr (str + offset, as_a  (mode));
> +  return c_readstr (str + offset, mode);
>  }
>
>  /* Helper to check the sizes of sequences and the destination of calls
> @@ -4227,8 +4205,7 @@ builtin_memset_read_str (void *data, void *prev,
>
>memset (p, *c, size);
>
> -  /* Vector modes should be handled above.  */
> -  return c_readstr (p, as_a  (mode));
> +  return c_readstr (p, mode);
>  }
>
>  /* Callback routine for store_by_pieces.  Return the RTL of a register
> @@ -4275,8 +4252,7 @@ builtin_memset_gen_str (void *data, void *prev,
>
>p = XALLOCAVEC (char, size);
>memset (p, 1, size);
> -  /* Vector modes should be handled above.  */
> -  coeff = c_readstr (p, as_a  (mode));
> +  coeff = c_readstr (p, mode);
>
>target = convert_to_mode (mode, (rtx) data, 1);
>target = expand_mult (mode, target, coeff, NULL_RTX, 1);
> diff --git a/gcc/builtins.h b/gcc/builtins.h
> index 3b

Re: [PATCH] Remove poly_int_pod

2023-09-28 Thread Richard Biener
On Thu, Sep 28, 2023 at 9:10 PM Jeff Law  wrote:
>
>
>
> On 9/28/23 11:26, Jason Merrill wrote:
> > On 9/28/23 05:55, Richard Sandiford wrote:
> >> poly_int was written before the switch to C++11 and so couldn't
> >> use explicit default constructors.  This led to an awkward split
> >> between poly_int_pod and poly_int.  poly_int simply inherited from
> >> poly_int_pod and added constructors, with the argumentless constructor
> >> having an empty body.  But inheritance meant that poly_int had to
> >> repeat the assignment operators from poly_int_pod (again, no C++11,
> >> so no "using" to inherit base-class implementations).
> >>
> >> All that goes away if we switch to using default constructors.
> >>
> >> The main complication is ensuring that braced initialisation still
> >> gives a constexpr, so that static variables can be initialised without
> >> runtime code.  The two problems here are:
> >>
> >> (1) When initialising a poly_int with fewer than N
> >>  coefficients, the other coefficients need to be a zero of
> >>  the same precision as the explicit coefficients.  This was
> >>  previously done in a for loop using wi::ints_for<...>::zero,
> >>  but C++11 constexpr constructors can't have function bodies.
> >>  The patch instead uses a series of delegated initialisers to
> >>  fill in the implicit coefficients.
> >
> > Perhaps it's time to update the bootstrap requirement to C++14 (i.e. GCC
> > 5, from eight years ago).  Not that this would affect this particular
> > patch.
> IIRC the primary reason we settled on gcc-4.8.x was RHEL7/Centos7.  With
> RHEL 7 approaching EOL moving the baseline forward would seem to make sense.
>
> I'd want to know if this affects folks using SuSE's enterprise distro
> before actually making the change, but I'm broadly in favor of moving
> forward it it's not going to have a major impact on users that are using
> enterprise distros.

We're thinking of making GCC 13 the last major release to officially
build for SLE12 which
also uses GCC 4.8, so we'd be fine with doing this for GCC 14.

Richard.

>
> jeff


Re: [PATCH] ggc: do not wipe out unrelated data via gt_ggc_rtab

2023-09-29 Thread Richard Biener
On Thu, 28 Sep 2023, Sergei Trofimovich wrote:

> From: Sergei Trofimovich 
> 
> There are 3 GC root tables:
> 
>gt_ggc_rtab
>gt_ggc_deletable_rtab
>gt_pch_scalar_rtab
> 
> `deletable` and `scalar` tables are both simple: each element always
> contains a pointer to the beginning of the object and it's size is the
> full object.
> 
> `rtab` is different: it's `base` is a pointer in the middle of the
> struct and `stride` points to the next GC pointer in the array.
> 
> Before the change there were 2 problems:
> 
> 1. We memset()ed not just pointers but data around them.
> 2. We wen out of bounds of the last object described by gt_ggc_rtab
>and triggered bootstrap failures in profile and asan bootstraps.
> 
> After the change we handle only pointers themselves like the rest of
> ggc-common.cc code.

OK/

Thanks,
Richard.

> gcc/
>   PR/111505
>   * ggc-common.cc (ggc_zero_out_root_pointers): New helper.
>   * ggc-common.cc (ggc_common_finalize): Use helper instead of
>   memset() to wipe out pointers.
> ---
>  gcc/ggc-common.cc | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ggc-common.cc b/gcc/ggc-common.cc
> index 95803fa95a1..39e2581affd 100644
> --- a/gcc/ggc-common.cc
> +++ b/gcc/ggc-common.cc
> @@ -75,6 +75,18 @@ ggc_mark_root_tab (const_ggc_root_tab_t rt)
>(*rt->cb) (*(void **) ((char *)rt->base + rt->stride * i));
>  }
>  
> +/* Zero out all the roots in the table RT.  */
> +
> +static void
> +ggc_zero_rtab_roots (const_ggc_root_tab_t rt)
> +{
> +  size_t i;
> +
> +  for ( ; rt->base != NULL; rt++)
> +for (i = 0; i < rt->nelt; i++)
> +  (*(void **) ((char *)rt->base + rt->stride * i)) = (void*)0;
> +}
> +
>  /* Iterate through all registered roots and mark each element.  */
>  
>  void
> @@ -1307,8 +1319,7 @@ ggc_common_finalize ()
>memset (rti->base, 0, rti->stride * rti->nelt);
>  
>for (rt = gt_ggc_rtab; *rt; rt++)
> -for (rti = *rt; rti->base != NULL; rti++)
> -  memset (rti->base, 0, rti->stride * rti->nelt);
> +ggc_zero_rtab_roots (*rt);
>  
>for (rt = gt_pch_scalar_rtab; *rt; rt++)
>  for (rti = *rt; rti->base != NULL; rti++)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] use *_grow_cleared rather than *_grow on vect_unpromoted_value

2023-09-29 Thread Richard Biener
On Fri, 29 Sep 2023, Jakub Jelinek wrote:

> On Wed, Sep 27, 2023 at 11:15:26AM +0000, Richard Biener wrote:
> > > tree-vect-patterns.cc:2947  unprom.quick_grow (nops);
> > > T = vect_unpromoted_value
> > > Go for quick_grow_cleared?  Something else?
> > 
> > The CTOR zero-initializes everything, so maybe it can go.  In theory
> > .set_op could also be changed to .push_op ...
> 
> So, I had a look at this and I think using quick_grow_cleared is best choice
> here.  The nops is 2 or 1 most of the time, worst case 3, so the price of
> extra initialization of 4 pointer-sized-or-less members times 1, 2 or 3
> doesn't seem worth bothering, it is similar to the bitmap_head case where
> we already pay the price for just one structure anytime we do
>   vect_unpromoted_value unprom_diff;
> (and later set_op on it) or even
>   vect_unpromoted_value unprom0[2];
> 
> With this patch and Richard S's poly_int_pod removal the static_assert can
> be enabled as well and gcc builds.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> The second patch waits for the poly_int_pod removal commit and has been just
> build tested but not bootstrapped yet.
> 
> 2023-09-29  Jakub Jelinek  
> 
>   * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Use
>   quick_grow_cleared method on unprom rather than quick_grow.
> 
> --- gcc/tree-vect-patterns.cc.jj  2023-08-24 15:37:29.321410276 +0200
> +++ gcc/tree-vect-patterns.cc 2023-09-29 09:45:27.980168865 +0200
> @@ -2944,7 +2944,7 @@ vect_recog_over_widening_pattern (vec_in
>/* Check the operands.  */
>unsigned int nops = gimple_num_ops (last_stmt) - first_op;
>auto_vec  unprom (nops);
> -  unprom.quick_grow (nops);
> +  unprom.quick_grow_cleared (nops);
>unsigned int min_precision = 0;
>bool single_use_p = false;
>for (unsigned int i = 0; i < nops; ++i)
> 
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

2023-09-29 Thread Richard Biener
On Fri, 29 Sep 2023, Jakub Jelinek wrote:

> On Thu, Sep 28, 2023 at 04:03:55PM +0200, Jakub Jelinek wrote:
> > Bet we should make wide_int_storage and widest_int_storage GTY ((user)) and
> > just declare but don't define the handlers or something similar.
> 
> That doesn't catch anything, but the following incremental patch compiles
> just fine, proving we don't have any wide_int in GC memory anymore after
> the wide_int -> rwide_int change in dwarf2out.h.
> And the attached incremental patch on top of it which deletes even
> widest_int from GC shows that we use widest_int in GC in:
[..]

> nb_iter_bound::member
> loop::nb_iterations_upper_bound
> loop::nb_iterations_likely_upper_bound
> loop::nb_iterations_estimate

I think those should better be bound to max-fixed-mode, they were
HWI at some point (even that should be OK, but of course
non-likely upper_bound needs to be conservative).  Using
widest_int here, esp. non-x86 is quite wasting.  The functions
setting these need to be careful with overflows then.

> so pretty much everything I spoke about (except I thought loop has
> 2 such members when it has 3).
> 
> --- gcc/wide-int.h2023-09-28 14:55:40.059632413 +0200
> +++ gcc/wide-int.h2023-09-29 09:59:58.703931879 +0200
> @@ -85,7 +85,7 @@
>   and it always uses an inline buffer.  offset_int and rwide_int are
>   GC-friendly, wide_int and widest_int are not.
>  
> - 3) widest_int.  This representation is an approximation of
> + 4) widest_int.  This representation is an approximation of
>   infinite precision math.  However, it is not really infinite
>   precision math as in the GMP library.  It is really finite
>   precision math where the precision is WIDEST_INT_MAX_PRECISION.
> @@ -4063,21 +4063,61 @@
>return wi::smod_trunc (x, y);
>  }
>  
> -template
> +void gt_ggc_mx (generic_wide_int  *) = delete;
> +void gt_pch_nx (generic_wide_int  *) = delete;
> +void gt_pch_nx (generic_wide_int  *,
> + gt_pointer_operator, void *) = delete;
> +
> +inline void
> +gt_ggc_mx (generic_wide_int  *)
> +{
> +}
> +
> +inline void
> +gt_pch_nx (generic_wide_int  *)
> +{
> +}
> +
> +inline void
> +gt_pch_nx (generic_wide_int  *, gt_pointer_operator, void 
> *)
> +{
> +}
> +
> +template
> +void
> +gt_ggc_mx (generic_wide_int  > *)
> +{
> +}
> +
> +template
> +void
> +gt_pch_nx (generic_wide_int  > *)
> +{
> +}
> +
> +template
> +void
> +gt_pch_nx (generic_wide_int  > *,
> +gt_pointer_operator, void *)
> +{
> +}
> +
> +template
>  void
> -gt_ggc_mx (generic_wide_int  *)
> +gt_ggc_mx (generic_wide_int  > *)
>  {
>  }
>  
> -template
> +template
>  void
> -gt_pch_nx (generic_wide_int  *)
> +gt_pch_nx (generic_wide_int  > *)
>  {
>  }
>  
> -template
> +template
>  void
> -gt_pch_nx (generic_wide_int  *, gt_pointer_operator, void *)
> +gt_pch_nx (generic_wide_int  > *,
> +gt_pointer_operator, void *)
>  {
>  }
>  
> 
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

2023-09-29 Thread Richard Biener
On Thu, 28 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> On Tue, Aug 29, 2023 at 05:09:52PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Aug 29, 2023 at 11:42:48AM +0100, Richard Sandiford wrote:
> > > > I'll note tree-ssa-loop-niter.cc also uses GMP in some cases, widest_int
> > > > is really trying to be poor-mans GMP by limiting the maximum precision.
> > > 
> > > I'd characterise widest_int as "a wide_int that is big enough to hold
> > > all supported integer types, without losing sign information".  It's
> > > not big enough to do arbitrary arithmetic without losing precision
> > > (in the way that GMP is).
> > > 
> > > If the new limit on integer sizes is 65535 bits for all targets,
> > > then I think that means that widest_int needs to become a 65536-bit type.
> > > (But not with all bits represented all the time, of course.)
> > 
> > If the widest_int storage would be dependent on the len rather than
> > precision for how it is stored, then I think we'd need a new method which
> > would be called at the start of filling the limbs where we'd tell how many
> > limbs there would be (i.e. what will set_len be called with later on), and
> > do nothing for all storages but the new widest_int_storage.
> 
> So, I've spent some time on this.  While wide_int is in the patch a 
> fixed/variable
> number of limbs (aka len) storage depending on precision (precision >
> WIDE_INT_MAX_PRECISION means heap allocated limb array, otherwise it is
> inline), widest_int has always very large precision
> (WIDEST_INT_MAX_PRECISION, currently defined to the INTEGER_CST imposed
> limitation of 255 64-bit limbs) but uses inline array for length
> corresponding up to WIDE_INT_MAX_PRECISION bits and for larger one uses
> similarly to wide_int a heap allocated array of limbs.
> These changes make both wide_int and widest_int obviously non-POD, not
> trivially default constructible, nor trivially copy constructible, trivially
> destructible, trivially copyable, so not a good fit for GC and some vec
> operations.
> One common use of wide_int in GC structures was in dwarf2out.{h,cc}; but as
> large _BitInt constants don't appear in RTL, we really don't need such large
> precisions there.
> So, for wide_int the patch introduces rwide_int, restricted wide_int, which
> acts like the old wide_int (except that it is now trivially default
> constructible and has assertions precision isn't set above
> WIDE_INT_MAX_PRECISION).
> For widest_int, the nastiness is that because it always has huge precision
> of 16320 right now,
> a) we need to be told upfront in wide-int.h before calling the large
>value internal functions in wide-int.cc how many elements we'll need for
>the result (some reasonable upper estimate is fine)
> b) various of the wide-int.cc functions were lazy and assumed precision is
>small enough and often used up to that many elements, which is
>undesirable; so, it now tries to decreas that and use xi.len etc. based
>estimates instead if possible (sometimes only if precision is above
>WIDE_INT_MAX_PRECISION)
> c) with the higher precision, behavior changes for lrshift (-1, 2) etc. or
>unsigned division with dividend having most significant bit set in
>widest_int - while such values were considered to be above or equal to
>1 << (WIDE_INT_MAX_PRECISION - 2), now they are with
>WIDEST_INT_MAX_PRECISION and so much larger; but lrshift on widest_int
>is I think only done in ccp and I'd strongly hope that we treat the
>values as unsigned and so usually much smaller length; so it is just
>when we call wi::lrshift (-1, 2) or similar that results change.
> I've noticed that for wide_int or widest_int references even simple
> operations like eq_p liked to allocate and immediately free huge buffers,
> which was caused by wide_int doing allocation on creation with a particular
> precision and e.g. get_binary_precision running into that.  So, I've
> duplicated that to avoid the allocations when all we need is just a
> precision.
> 
> The patch below doesn't actually build anymore since the vec.h asserts
> (which point to useful stuff though), so temporarily I've applied it also
> with
> --- gcc/vec.h.xx  2023-09-28 12:56:09.055786055 +0200
> +++ gcc/vec.h 2023-09-28 13:15:31.760487111 +0200
> @@ -1197,7 +1197,7 @@ template
>  inline void
>  vec::qsort (int (*cmp) (const void *, const void *))
>  {
> -  static_assert (vec_detail::is_trivially_copyable_or_pair ::value, "");
> +//  static_assert (vec_detail::is_trivially_copyable_or_pair ::value, "");
>if (length () > 1)
>  gcc_qsort (address (), length (), sizeof (T), cmp);
>  }
> @@ -1422,7 +1422,7 @@ template
>  void
>  gt_ggc_mx (vec *v)
>  {
> -  static_assert (std::is_trivially_destructible ::value, "");
> +//  static_assert (std::is_trivially_destructible ::value, "");
>extern void gt_ggc_mx (T &);
>for (unsigned i = 0; i < v->length (); i++)
>  gt_ggc_mx ((*v)[i]);
> hack.  The two spots that trigger a

[PATCH] tree-optimization/111583 - loop distribution issue

2023-09-29 Thread Richard Biener
The following conservatively fixes loop distribution to only
recognize memset/memcpy and friends when at least one element
is going to be processed.  This avoids having an unconditional
builtin call in the IL that might imply the source and destination
pointers are non-NULL when originally pointers were not always
dereferenced.

With -Os loop header copying is less likely to ensure this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111583
* tree-loop-distribution.cc (find_single_drs): Ensure the
load/store are always executed.

* gcc.dg/tree-ssa/pr111583-1.c: New testcase.
* gcc.dg/tree-ssa/pr111583-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr111583-1.c | 30 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr111583-2.c | 36 ++
 gcc/tree-loop-distribution.cc  | 15 +
 3 files changed, 81 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111583-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111583-2.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111583-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr111583-1.c
new file mode 100644
index 000..1dd8dbcf1d8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr111583-1.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-Os" } */
+
+short a, f, i;
+static const int *e;
+short *g;
+long h;
+int main()
+{
+{
+  unsigned j = i;
+  a = 1;
+  for (; a; a++) {
+   {
+ long b = j, d = h;
+ int c = 0;
+ while (d--)
+   *(char *)b++ = c;
+   }
+ if (e)
+   break;
+  }
+  j && (*g)--;
+  const int **k = &e;
+  *k = 0;
+}
+  if (f != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111583-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr111583-2.c
new file mode 100644
index 000..0ee21854552
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr111583-2.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-Os" } */
+
+int b, c, d;
+char e;
+short f;
+const unsigned short **g;
+char h(char k) {
+  if (k)
+return '0';
+  return 0;
+}
+int l() {
+  b = 0;
+  return 1;
+}
+static short m(unsigned k) {
+  const unsigned short *n[65];
+  g = &n[4];
+  k || l();
+  long a = k;
+  char i = 0;
+  unsigned long j = k;
+  while (j--)
+*(char *)a++ = i;
+  c = h(d);
+  f = k;
+  return 0;
+}
+int main() {
+  long o = (e < 0) << 5;
+  m(o);
+  if (f != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index a28470b66ea..39fd4402d25 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -1574,6 +1574,7 @@ find_single_drs (class loop *loop, struct graph *rdg, 
const bitmap &partition_st
 
   basic_block bb_ld = NULL;
   basic_block bb_st = NULL;
+  edge exit = single_exit (loop);
 
   if (single_ld)
 {
@@ -1589,6 +1590,14 @@ find_single_drs (class loop *loop, struct graph *rdg, 
const bitmap &partition_st
   bb_ld = gimple_bb (DR_STMT (single_ld));
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
return false;
+
+  /* The data reference must also be executed before possibly exiting
+the loop as otherwise we'd for example unconditionally execute
+memset (ptr, 0, n) which even with n == 0 implies ptr is non-NULL.  */
+  if (bb_ld != loop->header
+ && (!exit
+ || !dominated_by_p (CDI_DOMINATORS, exit->src, bb_ld)))
+   return false;
 }
 
   if (single_st)
@@ -1604,6 +1613,12 @@ find_single_drs (class loop *loop, struct graph *rdg, 
const bitmap &partition_st
   bb_st = gimple_bb (DR_STMT (single_st));
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
return false;
+
+  /* And before exiting the loop.  */
+  if (bb_st != loop->header
+ && (!exit
+ || !dominated_by_p (CDI_DOMINATORS, exit->src, bb_st)))
+   return false;
 }
 
   if (single_ld && single_st)
-- 
2.35.3


Re: [PATCH] vec.h: Guard most of static assertions for GCC >= 5

2023-09-29 Thread Richard Biener
cmp, data);
>  }
> @@ -1223,7 +1240,9 @@ inline void
>  vec::stablesort (int (*cmp) (const void *, const void *,
>void *), void *data)
>  {
> +#if GCC_VERSION >= 5000
>    static_assert (vec_detail::is_trivially_copyable_or_pair ::value, "");
> +#endif
>if (length () > 1)
>  gcc_stablesort_r (address (), length (), sizeof (T), cmp, data);
>  }
> @@ -1396,7 +1415,9 @@ inline void
>  vec::quick_grow (unsigned len)
>  {
>gcc_checking_assert (length () <= len && len <= m_vecpfx.m_alloc);
> +#if GCC_VERSION >= 5000
>  //  static_assert (std::is_trivially_default_constructible ::value, "");
> +#endif
>m_vecpfx.m_num = len;
>  }
>  
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] match.pd: Avoid another build_nonstandard_integer_type call [PR111369]

2023-10-03 Thread Richard Biener
On Sat, 30 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> I really can't figure out why one would need to add extra casts.
> type must be an integral type which has BIT_NOT_EXPR applied on it
> which yields all ones and we need a type in which negating 0 or 1
> range will yield 0 or all ones, I think all integral types satisfy
> that.

It seems to work for bool, so indeed.

> This fixes PR111369, where one of the bitint*.c tests FAILs with
> GCC_TEST_RUN_EXPENSIVE=1.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2023-09-30  Jakub Jelinek  
> 
>   PR middle-end/111369
>   * match.pd (a?~t:t -> (-(a))^t): Always convert to type rather
>   than using build_nonstandard_integer_type.
> 
> --- gcc/match.pd.jj   2023-09-28 11:32:16.122434235 +0200
> +++ gcc/match.pd  2023-09-29 18:05:50.554640268 +0200
> @@ -6742,12 +6742,7 @@ (define_operator_list SYNC_FETCH_AND_AND
>(if (INTEGRAL_TYPE_P (type)
> && bitwise_inverted_equal_p (@1, @2, wascmp)
> && (!wascmp || element_precision (type) == 1))
> -   (with {
> - auto prec = TYPE_PRECISION (type);
> - auto unsign = TYPE_UNSIGNED (type);
> - tree inttype = build_nonstandard_integer_type (prec, unsign);
> -}
> -(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype 
> @2)))
> +   (bit_xor (negate (convert:type @0)) (convert:type @2)
>  #endif
>  
>  /* Simplify pointer equality compares using PTA.  */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] match.pd: Avoid another build_nonstandard_integer_type call [PR111369]

2023-10-03 Thread Richard Biener
On Sat, 30 Sep 2023, Jakub Jelinek wrote:

> On Sat, Sep 30, 2023 at 11:44:59AM +0200, Jakub Jelinek wrote:
> > I really can't figure out why one would need to add extra casts.
> > type must be an integral type which has BIT_NOT_EXPR applied on it
> > which yields all ones and we need a type in which negating 0 or 1
> > range will yield 0 or all ones, I think all integral types satisfy
> > that.
> > This fixes PR111369, where one of the bitint*.c tests FAILs with
> > GCC_TEST_RUN_EXPENSIVE=1.
> 
> Though, I think there is an preexisting issue which the
> build_nonstandard_integer_type didn't help with; if type is signed 1-bit
> precision, then I think a ? ~t : t could be valid, but -(type)a would invoke
> UB if a is 1 - the cast would make it -1 and negation of -1 in signed 1-bit
> invokes UB.
> So perhaps we should guard this optimization on type having element precision 
> > 1
> or being unsigned.  Plus the (convert:type @2) didn't make sense, @2 already
> must have TREE_TYPE type.

Alternatively cast to unsigned:1 in that case?  OTOH when element 
precision is 1 we can also simply elide the negation?

> So untested patch would be then:
> 
> 2023-09-29  Jakub Jelinek  
> 
>   PR middle-end/111369
>   * match.pd (a?~t:t -> (-(a))^t): Always convert to type rather
>   than using build_nonstandard_integer_type.  Punt if type is
>   signed 1-bit precision type.
> 
> --- gcc/match.pd.jj   2023-09-29 18:58:42.724956659 +0200
> +++ gcc/match.pd  2023-09-30 11:54:16.603280666 +0200
> @@ -6741,13 +6741,9 @@ (define_operator_list SYNC_FETCH_AND_AND
>   (with { bool wascmp; }
>(if (INTEGRAL_TYPE_P (type)
> && bitwise_inverted_equal_p (@1, @2, wascmp)
> -   && (!wascmp || element_precision (type) == 1))
> -   (with {
> - auto prec = TYPE_PRECISION (type);
> - auto unsign = TYPE_UNSIGNED (type);
> - tree inttype = build_nonstandard_integer_type (prec, unsign);
> -}
> -(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype 
> @2)))
> +   && (!wascmp || element_precision (type) == 1)
> +   && (!TYPE_OVERFLOW_WRAPS (type) || element_precision (type) > 1))
> +   (bit_xor (negate (convert:type @0)) @2
>  #endif
>  
>  /* Simplify pointer equality compares using PTA.  */
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] match.pd: Avoid other build_nonstandard_integer_type calls [PR111369]

2023-10-03 Thread Richard Biener
On Tue, 3 Oct 2023, Jakub Jelinek wrote:

> Hi!
> 
> On Sat, Sep 30, 2023 at 11:57:38AM +0200, Jakub Jelinek wrote:
> > > This fixes PR111369, where one of the bitint*.c tests FAILs with
> > > GCC_TEST_RUN_EXPENSIVE=1.
> > 
> > Though, I think there is an preexisting issue which the
> > build_nonstandard_integer_type didn't help with; if type is signed 1-bit
> > precision, then I think a ? ~t : t could be valid, but -(type)a would invoke
> > UB if a is 1 - the cast would make it -1 and negation of -1 in signed 1-bit
> > invokes UB.
> > So perhaps we should guard this optimization on type having element 
> > precision > 1
> > or being unsigned.  Plus the (convert:type @2) didn't make sense, @2 already
> > must have TREE_TYPE type.
> 
> In the light of the PR111668 patch which shows that
> build_nonstandard_integer_type is needed (at least for some signed prec > 1
> BOOLEAN_TYPEs if we use e.g. negation), I've reworked this patch and handled
> the last problematic build_nonstandard_integer_type call in there as well.
> 
> In the x == cstN ? cst4 : cst3 optimization it uses
> build_nonstandard_integer_type solely for BOOLEAN_TYPEs (I really don't see
> why ENUMERAL_TYPEs would be a problem, we treat them in GIMPLE as uselessly
> convertible to same precision/sign INTEGER_TYPEs), for INTEGER_TYPEs it is
> really a no-op (might return a different type, but always INTEGER_TYPE
> with same TYPE_PRECISION same TYPE_UNSIGNED) and for BITINT_TYPE with larger
> precisions really harmful (we shouldn't create large precision
> INTEGER_TYPEs).
> 
> The a?~t:t optimization just omits the negation of a in type for 1-bit
> precision types or any BOOLEAN_TYPEs.  I think that is correct, because
> for both signed and unsigned 1-bit precision type, cast to type of a bool
> value yields already 0, -1 or 0, 1 values and for 1-bit precision negation
> of that is still 0, -1 or 0, 1 (except for invoking sometimes UB).
> And for signed larger precision BOOLEAN_TYPEs I think it is correct as well,
> cast of [0, 1] to type yields 0, -1 and those can be xored with 0 or -1
> to yield the proper result, any other values would be UB.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2023-10-03  Jakub Jelinek  
> 
>   PR middle-end/111369
>   * match.pd (x == cstN ? cst4 : cst3): Use
>   build_nonstandard_integer_type only if type1 is BOOLEAN_TYPE.
>   Fix comment typo.  Formatting fix.
>   (a?~t:t -> (-(a))^t): Always convert to type rather
>   than using build_nonstandard_integer_type.  Perform negation
>   only if type has precision > 1 and is not signed BOOLEAN_TYPE.
> 
> --- gcc/match.pd.jj   2023-10-03 10:33:30.817614648 +0200
> +++ gcc/match.pd  2023-10-03 11:29:54.089566764 +0200
> @@ -5178,7 +5178,7 @@ (define_operator_list SYNC_FETCH_AND_AND
>  
>  /* Optimize
> # x_5 in range [cst1, cst2] where cst2 = cst1 + 1
> -   x_5 ? cstN ? cst4 : cst3
> +   x_5 == cstN ? cst4 : cst3
> # op is == or != and N is 1 or 2
> to r_6 = x_5 + (min (cst3, cst4) - cst1) or
> r_6 = (min (cst3, cst4) + cst1) - x_5 depending on op, N and which
> @@ -5214,7 +5214,8 @@ (define_operator_list SYNC_FETCH_AND_AND
>type1 = type;
> auto prec = TYPE_PRECISION (type1);
> auto unsign = TYPE_UNSIGNED (type1);
> -   type1 = build_nonstandard_integer_type (prec, unsign);
> +   if (TREE_CODE (type1) == BOOLEAN_TYPE)
> + type1 = build_nonstandard_integer_type (prec, unsign);
> min = wide_int::from (min, prec,
>TYPE_SIGN (TREE_TYPE (@0)));
> wide_int a = wide_int::from (wi::to_wide (arg0), prec,
> @@ -5253,14 +5254,7 @@ (define_operator_list SYNC_FETCH_AND_AND
>}
>(if (code == PLUS_EXPR)
> (convert (plus (convert:type1 @0) { arg; }))
> -   (convert (minus { arg; } (convert:type1 @0)))
> -  )
> - )
> -)
> -   )
> -  )
> - )
> -)
> +   (convert (minus { arg; } (convert:type1 @0))
>  #endif
>  
>  (simplify
> @@ -6758,13 +6752,11 @@ (define_operator_list SYNC_FETCH_AND_AND
>   (with { bool wascmp; }
>(if (INTEGRAL_TYPE_P (type)
> && bitwise_inverted_equal_p (@1, @2, wascmp)
> -   && (!wascmp || element_precision (type) == 1))
> -   (with {
> - auto prec = TYPE_PRECISION (type);
> - auto unsign = TYPE_UNSIGNED (type);
> - tree inttype = build_nonstandard_integer_type (prec, unsign);
> -}
> -(convert (bit_xor (negate (convert:inttype @0)) (convert:inttype 
> @2)))
> +   && (!wascmp || TYPE_PRECISION (type) == 1))

Re: [PATCH] match.pd: Fix up a ? cst1 : cst2 regression on signed bool [PR111668]

2023-10-04 Thread Richard Biener
tt (bit_xor (convert:boolean_type_node @0)
> + { boolean_true_node; })
> +   (negate (convert:type (bit_xor (convert:boolean_type_node @0)
> +   { boolean_true_node; }))
> +/* a ? 0 : powerof2cst -> (!a) << (log2(powerof2cst)) */
>  (if (INTEGRAL_TYPE_P (type) && integer_pow2p (@2))
>   (with {
> tree shift = build_int_cst (integer_type_node, tree_log2 (@2));
>}
>(lshift (convert (bit_xor (convert:boolean_type_node @0)
> - { boolean_true_node; })) { shift; })))
> -/* a ? -1 : 0 -> -(!a).  No need to check the TYPE_PRECISION not being 1
> -   here as the powerof2cst case above will handle that case correctly.  
> */
> -(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@2))
> - (negate (convert:type (bit_xor (convert:boolean_type_node @0)
> - { boolean_true_node; }
> + { boolean_true_node; })) { shift; })))
>  
>  /* (a > 1) ? 0 : (cast)a is the same as (cast)(a == 1)
> for unsigned types. */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] Makefile.tpl: disable -Werror for feedback stage [PR111663]

2023-10-04 Thread Richard Biener
On Mon, Oct 2, 2023 at 2:06 PM Sergei Trofimovich  wrote:
>
> From: Sergei Trofimovich 
>
> Without the change profiled bootstrap fails for various warnings on
> master branch as:
>
> $ ../gcc/configure
> $ make profiledbootstrap
> ...
> gcc/genmodes.cc: In function ‘int main(int, char**)’:
> gcc/genmodes.cc:2152:1: error: ‘gcc/build/genmodes.gcda’ profile count 
> data file not found [-Werror=missing-profile]
> ...
> gcc/gengtype-parse.cc: In function ‘void parse_error(const char*, ...)’:
> gcc/gengtype-parse.cc:142:21: error: ‘%s’ directive argument is null 
> [-Werror=format-overflow=]
>
> The change removes -Werror just like autofeedback does today.

I think that makes sense, OK if nobody objects.

Richard.

> /
>
> PR bootstrap/111663
> * Makefile.tpl (STAGEfeedback_CONFIGURE_FLAGS): Disable -Werror.
> * Makefile.in: Regenerate.
> ---
>  Makefile.in  | 4 
>  Makefile.tpl | 4 
>  2 files changed, 8 insertions(+)
>
> diff --git a/Makefile.in b/Makefile.in
> index 2f136839c35..e0e3c4c8fe8 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -638,6 +638,10 @@ STAGEtrain_TFLAGS = $(filter-out 
> -fchecking=1,$(STAGE3_TFLAGS))
>
>  STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
> -fprofile-reproducible=parallel-runs
>  STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
> +# Disable warnings as errors for a few reasons:
> +# - sources for gen* binaries do not have .gcda files available
> +# - inlining decisions generate extra warnings
> +STAGEfeedback_CONFIGURE_FLAGS = $(filter-out 
> --enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
>
>  STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
>  STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
> diff --git a/Makefile.tpl b/Makefile.tpl
> index 5872dd03f2c..8b7783bb4f1 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -561,6 +561,10 @@ STAGEtrain_TFLAGS = $(filter-out 
> -fchecking=1,$(STAGE3_TFLAGS))
>
>  STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
> -fprofile-reproducible=parallel-runs
>  STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
> +# Disable warnings as errors for a few reasons:
> +# - sources for gen* binaries do not have .gcda files available
> +# - inlining decisions generate extra warnings
> +STAGEfeedback_CONFIGURE_FLAGS = $(filter-out 
> --enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
>
>  STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
>  STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
> --
> 2.42.0
>


[PATCH] ipa/111643 - clarify flatten attribute documentation

2023-10-04 Thread Richard Biener
The following clarifies the flatten attribute documentation to mention
the inlining applies also to calls formed as part of inlining earlier
calls but not calls to the function itself.

Will push this tomorrow or so if there are no better suggestions
on the wording.

PR ipa/111643
* doc/extend.texi (attribute flatten): Clarify.
---
 gcc/doc/extend.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b4770f1a149..645c76f23e9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3109,7 +3109,9 @@ file descriptor opened with @code{O_RDONLY}.
 @cindex @code{flatten} function attribute
 @item flatten
 Generally, inlining into a function is limited.  For a function marked with
-this attribute, every call inside this function is inlined, if possible.
+this attribute, every call inside this function is inlined including the
+calls such inlining introduces to the function (but not recursive calls
+to the function itself), if possible.
 Functions declared with attribute @code{noinline} and similar are not
 inlined.  Whether the function itself is considered for inlining depends
 on its size and the current inlining parameters.
-- 
2.35.3


Re: [PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM

2023-10-04 Thread Richard Biener
On Wed, 4 Oct 2023, Andre Vieira (lists) wrote:

> 
> 
> On 30/08/2023 14:04, Richard Biener wrote:
> > On Wed, 30 Aug 2023, Andre Vieira (lists) wrote:
> > 
> >> This patch adds a new target hook to enable us to adapt the types of return
> >> and parameters of simd clones.  We use this in two ways, the first one is
> >> to
> >> make sure we can create valid SVE types, including the SVE type attribute,
> >> when creating a SVE simd clone, even when the target options do not support
> >> SVE.  We are following the same behaviour seen with x86 that creates simd
> >> clones according to the ABI rules when no simdlen is provided, even if that
> >> simdlen is not supported by the current target options.  Note that this
> >> doesn't mean the simd clone will be used in auto-vectorization.
> > 
> > You are not documenting the bool parameter of the new hook.
> > 
> > What's wrong with doing the adjustment in TARGET_SIMD_CLONE_ADJUST?
> 
> simd_clone_adjust_argument_types is called after that hook, so by the time we
> call TARGET_SIMD_CLONE_ADJUST the types are still in scalar, not vector.  The
> same is true for the return type one.
> 
> Also the changes to the types need to be taken into consideration in
> 'adjustments' I think.

Nothing in the three existing implementations of TARGET_SIMD_CLONE_ADJUST
relies on this ordering I think, how about moving the hook invocation 
after simd_clone_adjust_argument_types?

Richard.

> PS: I hope the subject line survived, my email client is having a bit of a
> wobble this morning... it's what you get for updating software :(


[PATCH] Avoid left around copies when value-numbering BBs

2023-10-05 Thread Richard Biener
The following makes sure to treat values whose definition we didn't
visit as available since those by definition must dominate the entry
of the region.  That avoids unpropagated copies after if-conversion
and resulting SLP discovery fails (which doesn't handle plain copies).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-sccvn.cc (rpo_elim::eliminate_avail): Not
visited value numbers are available itself.
---
 gcc/tree-ssa-sccvn.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index e46498568cb..d2aab38c2d2 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -7688,7 +7688,11 @@ rpo_elim::eliminate_avail (basic_block bb, tree op)
 {
   if (SSA_NAME_IS_DEFAULT_DEF (valnum))
return valnum;
-  vn_avail *av = VN_INFO (valnum)->avail;
+  vn_ssa_aux_t valnum_info = VN_INFO (valnum);
+  /* See above.  */
+  if (!valnum_info->visited)
+   return valnum;
+  vn_avail *av = valnum_info->avail;
   if (!av)
return NULL_TREE;
   if (av->location == bb->index)
-- 
2.35.3


[PATCH] Fix SIMD call SLP discovery

2023-10-05 Thread Richard Biener
When we do SLP discovery of SIMD calls we run into the issue that
when the call is neither builtin nor internal function we have
cfn == CFN_LAST but internal_fn_p of that returns true.  Since
IFN_LAST isn't vectorizable we fail spuriously.

Fixed by checking for cfn != CFN_LAST && internal_fn_p (cfn)
instead.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_build_slp_tree_1): Do not
ask for internal_fn_p (CFN_LAST).
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4dd899404d9..08e8418b33e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1084,7 +1084,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  ldst_p = true;
  rhs_code = CFN_MASK_STORE;
}
- else if ((internal_fn_p (cfn)
+ else if ((cfn != CFN_LAST
+   && internal_fn_p (cfn)
&& !vectorizable_internal_fn_p (as_internal_fn (cfn)))
   || gimple_call_tail_p (call_stmt)
   || gimple_call_noreturn_p (call_stmt)
-- 
2.35.3


Re: [PATCH] ipa: Remove ipa_bits

2023-10-05 Thread Richard Biener
-   if (!plats->m_value_range.bottom_p ()
> +   if (do_vr
> +   && !plats->m_value_range.bottom_p ()
> && !plats->m_value_range.top_p ()
> && dbg_cnt (ipa_cp_vr))
>   {
> -   ipa_vr vr (plats->m_value_range.m_vr);
> +   if (bits)
> + {
> +   Value_Range tmp = plats->m_value_range.m_vr;
> +   tree type = ipa_get_type (info, i);
> +   irange &r = as_a (tmp);
> +   irange_bitmask bm (wide_int::from (bits->get_value (),
> +  TYPE_PRECISION (type),
> +  TYPE_SIGN (type)),
> +  wide_int::from (bits->get_mask (),
> +  TYPE_PRECISION (type),
> +  TYPE_SIGN (type)));
> +   r.update_bitmask (bm);
> +   ipa_vr vr (tmp);
> +   ts->m_vr->quick_push (vr);
> + }
> +   else
> + {
> +   ipa_vr vr (plats->m_value_range.m_vr);
> +   ts->m_vr->quick_push (vr);
> + }
> + }
> +   else if (bits)
> + {
> +   tree type = ipa_get_type (info, i);
> +   Value_Range tmp;
> +   tmp.set_varying (type);
> +   irange &r = as_a (tmp);
> +   irange_bitmask bm (wide_int::from (bits->get_value (),
> +  TYPE_PRECISION (type),
> +  TYPE_SIGN (type)),
> +  wide_int::from (bits->get_mask (),
> +  TYPE_PRECISION (type),
> +  TYPE_SIGN (type)));
> +   r.update_bitmask (bm);
> +   ipa_vr vr (tmp);
> ts->m_vr->quick_push (vr);
>   }
> else
> @@ -6664,6 +6654,21 @@ ipcp_store_vr_results (void)
> ipa_vr vr;
> ts->m_vr->quick_push (vr);
>   }
> +
> +   if (!dump_file || !bits)
> + continue;
> +
> +   if (!dumped_sth)
> + {
> +   fprintf (dump_file, "Propagated bits info for function %s:\n",
> +node->dump_name ());
> +   dumped_sth = true;
> + }
> +   fprintf (dump_file, " param %i: value = ", i);
> +   print_hex (bits->get_value (), dump_file);
> +   fprintf (dump_file, ", mask = ");
> +   print_hex (bits->get_mask (), dump_file);
> +   fprintf (dump_file, "\n");
>   }
>  }
>  }
> @@ -6696,9 +6701,7 @@ ipcp_driver (void)
>ipcp_propagate_stage (&topo);
>/* Decide what constant propagation and cloning should be performed.  */
>ipcp_decision_stage (&topo);
> -  /* Store results of bits propagation.  */
> -  ipcp_store_bits_results ();
> -  /* Store results of value range propagation.  */
> +  /* Store results of value range and bits propagation.  */
>ipcp_store_vr_results ();
>  
>/* Free all IPCP structures.  */
> --- gcc/ipa-sra.cc.jj 2023-10-05 11:32:40.233739151 +0200
> +++ gcc/ipa-sra.cc2023-10-05 11:36:45.408378045 +0200
> @@ -4134,22 +4134,8 @@ zap_useless_ipcp_results (const isra_fun
>else if (removed_item)
>  ts->m_agg_values->truncate (dst_index);
>  
> -  bool useful_bits = false;
> -  unsigned count = vec_safe_length (ts->bits);
> -  for (unsigned i = 0; i < count; i++)
> -if ((*ts->bits)[i])
> -{
> -  const isra_param_desc *desc = &(*ifs->m_parameters)[i];
> -  if (desc->locally_unused)
> - (*ts->bits)[i] = NULL;
> -  else
> - useful_bits = true;
> -}
> -  if (!useful_bits)
> -ts->bits = NULL;
> -
>bool useful_vr = false;
> -  count = vec_safe_length (ts->m_vr);
> +  unsigned count = vec_safe_length (ts->m_vr);
>for (unsigned i = 0; i < count; i++)
>  if ((*ts->m_vr)[i].known_p ())
>{
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-05 Thread Richard Biener
70c497f0627a61ea89d3b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include 
> +#include 
> +
> +/*
> +** f1:
> +**   ...
> +**   ldr q[0-9]+, \[x0\]
> +**   orr v[0-9]+.4s, #128, lsl #24
> +**   str q[0-9]+, \[x0\], 16
> +**   ...
> +*/
> +void f1 (float32_t *a, int n)
> +{
> +  for (int i = 0; i < (n & -8); i++)
> +   a[i] = -fabsf (a[i]);
> +}
> +
> +/*
> +** f2:
> +**   ...
> +**   ldr q[0-9]+, \[x0\]
> +**   orr v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
> +**   str q[0-9]+, \[x0\], 16
> +**   ...
> +*/
> +void f2 (float64_t *a, int n)
> +{
> +  for (int i = 0; i < (n & -8); i++)
> +   a[i] = -fabs (a[i]);
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c 
> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
> new file mode 100644
> index 
> ..10879dea74462d34b26160eeb0bd54ead063166b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +#include 
> +
> +/*
> +** negabs:
> +**   mov x0, -9223372036854775808
> +**   fmovd[0-9]+, x0
> +**   orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> +**   ret
> +*/
> +double negabs (double x)
> +{
> +   unsigned long long y;
> +   memcpy (&y, &x, sizeof(double));
> +   y = y | (1UL << 63);
> +   memcpy (&x, &y, sizeof(double));
> +   return x;
> +}
> +
> +/*
> +** negabsf:
> +**   moviv[0-9]+.2s, 0x80, lsl 24
> +**   orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> +**   ret
> +*/
> +float negabsf (float x)
> +{
> +   unsigned int y;
> +   memcpy (&y, &x, sizeof(float));
> +   y = y | (1U << 31);
> +   memcpy (&x, &y, sizeof(float));
> +   return x;
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
> new file mode 100644
> index 
> ..0c7664e6de77a497682952653ffd417453854d52
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#include 
> +
> +/*
> +** t1:
> +**   orr v[0-9]+.2s, #128, lsl #24
> +**   ret
> +*/
> +float32x2_t t1 (float32x2_t a)
> +{
> +  return vneg_f32 (vabs_f32 (a));
> +}
> +
> +/*
> +** t2:
> +**   orr v[0-9]+.4s, #128, lsl #24
> +**   ret
> +*/
> +float32x4_t t2 (float32x4_t a)
> +{
> +  return vnegq_f32 (vabsq_f32 (a));
> +}
> +
> +/*
> +** t3:
> +**   adrpx0, .LC[0-9]+
> +**   ldr q[0-9]+, \[x0, #:lo12:.LC0\]
> +**   orr v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
> +**   ret
> +*/
> +float64x2_t t3 (float64x2_t a)
> +{
> +  return vnegq_f64 (vabsq_f64 (a));
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
> new file mode 100644
> index 
> ..a60cd31b9294af2dac69eed1c93f899bd5c78fca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#include 
> +#include 
> +
> +/*
> +** f1:
> +**   moviv[0-9]+.2s, 0x80, lsl 24
> +**   orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> +**   ret
> +*/
> +float32_t f1 (float32_t a)
> +{
> +  return -fabsf (a);
> +}
> +
> +/*
> +** f2:
> +**   mov x0, -9223372036854775808
> +**   fmovd[0-9]+, x0
> +**   orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> +**   ret
> +*/
> +float64_t f2 (float64_t a)
> +{
> +  return -fabs (a);
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
> new file mode 100644
> index 
> ..1bf34328d8841de8e6b0a5458562a9f00e31c275
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#include 
> +#include 
> +
> +/*
> +** f1:
> +**   ...
> +**   ld1wz[0-9]+.s, p[0-9]+/z, \[x0, x2, lsl 2\]
> +**   orr z[0-9]+.s, z[0-9]+.s, #0x8000
> +**   st1wz[0-9]+.s, p[0-9]+, \[x0, x2, lsl 2\]
> +**   ...
> +*/
> +void f1 (float32_t *a, int n)
> +{
> +  for (int i = 0; i < (n & -8); i++)
> +   a[i] = -fabsf (a[i]);
> +}
> +
> +/*
> +** f2:
> +**   ...
> +**   ld1dz[0-9]+.d, p[0-9]+/z, \[x0, x2, lsl 3\]
> +**   orr z[0-9]+.d, z[0-9]+.d, #0x8000
> +**   st1dz[0-9]+.d, p[0-9]+, \[x0, x2, lsl 3\]
> +**   ...
> +*/
> +void f2 (float64_t *a, int n)
> +{
> +  for (int i = 0; i < (n & -8); i++)
> +   a[i] = -fabs (a[i]);
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
> new file mode 100644
> index 
> ..21f2a8da2a5d44e3d01f6604ca7be87e3744d494
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#include 
> +
> +/*
> +** negabs:
> +**   mov x0, -9223372036854775808
> +**   fmovd[0-9]+, x0
> +**   orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> +**   ret
> +*/
> +double negabs (double x)
> +{
> +   unsigned long long y;
> +   memcpy (&y, &x, sizeof(double));
> +   y = y | (1UL << 63);
> +   memcpy (&x, &y, sizeof(double));
> +   return x;
> +}
> +
> +/*
> +** negabsf:
> +**   moviv[0-9]+.2s, 0x80, lsl 24
> +**   orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> +**   ret
> +*/
> +float negabsf (float x)
> +{
> +   unsigned int y;
> +   memcpy (&y, &x, sizeof(float));
> +   y = y | (1U << 31);
> +   memcpy (&x, &y, sizeof(float));
> +   return x;
> +}
> +
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [committed] contrib: add mdcompact

2023-10-05 Thread Richard Biener
On Thu, Oct 5, 2023 at 5:49 PM Andrea Corallo  wrote:
>
> Hello all,
>
> this patch checks in mdcompact, the tool written in elisp that I used
> to mass convert all the multi choice pattern in the aarch64 back-end to
> the new compact syntax.
>
> I tested it on Emacs 29 (might run on older versions as well not
> sure), also I verified it runs cleanly on a few other back-ends (arm,
> loongarch).
>
> The tool can be used to convert a single pattern, an open buffer or
> all md files in a directory.
>
> The tool might need further adjustment to run on some specific
> back-end, in case very happy to help.
>
> This patch was pre-approved here [1].

Does the result generate identical insn-*.cc files?

> Best Regards
>
>   Andrea Corallo
>
> [1] 
>
> contrib/ChangeLog
>
> * mdcompact/mdcompact-testsuite.el: New file.
> * mdcompact/mdcompact.el: Likewise.
> * mdcompact/tests/1.md: Likewise.
> * mdcompact/tests/1.md.out: Likewise.
> * mdcompact/tests/2.md: Likewise.
> * mdcompact/tests/2.md.out: Likewise.
> * mdcompact/tests/3.md: Likewise.
> * mdcompact/tests/3.md.out: Likewise.
> * mdcompact/tests/4.md: Likewise.
> * mdcompact/tests/4.md.out: Likewise.
> * mdcompact/tests/5.md: Likewise.
> * mdcompact/tests/5.md.out: Likewise.
> * mdcompact/tests/6.md: Likewise.
> * mdcompact/tests/6.md.out: Likewise.
> * mdcompact/tests/7.md: Likewise.
> * mdcompact/tests/7.md.out: Likewise.
> ---
>  contrib/mdcompact/mdcompact-testsuite.el |  56 +
>  contrib/mdcompact/mdcompact.el   | 296 +++
>  contrib/mdcompact/tests/1.md |  36 +++
>  contrib/mdcompact/tests/1.md.out |  32 +++
>  contrib/mdcompact/tests/2.md |  25 ++
>  contrib/mdcompact/tests/2.md.out |  21 ++
>  contrib/mdcompact/tests/3.md |  16 ++
>  contrib/mdcompact/tests/3.md.out |  17 ++
>  contrib/mdcompact/tests/4.md |  17 ++
>  contrib/mdcompact/tests/4.md.out |  17 ++
>  contrib/mdcompact/tests/5.md |  12 +
>  contrib/mdcompact/tests/5.md.out |  11 +
>  contrib/mdcompact/tests/6.md |  11 +
>  contrib/mdcompact/tests/6.md.out |  11 +
>  contrib/mdcompact/tests/7.md |  11 +
>  contrib/mdcompact/tests/7.md.out |  11 +
>  16 files changed, 600 insertions(+)
>  create mode 100644 contrib/mdcompact/mdcompact-testsuite.el
>  create mode 100644 contrib/mdcompact/mdcompact.el
>  create mode 100644 contrib/mdcompact/tests/1.md
>  create mode 100644 contrib/mdcompact/tests/1.md.out
>  create mode 100644 contrib/mdcompact/tests/2.md
>  create mode 100644 contrib/mdcompact/tests/2.md.out
>  create mode 100644 contrib/mdcompact/tests/3.md
>  create mode 100644 contrib/mdcompact/tests/3.md.out
>  create mode 100644 contrib/mdcompact/tests/4.md
>  create mode 100644 contrib/mdcompact/tests/4.md.out
>  create mode 100644 contrib/mdcompact/tests/5.md
>  create mode 100644 contrib/mdcompact/tests/5.md.out
>  create mode 100644 contrib/mdcompact/tests/6.md
>  create mode 100644 contrib/mdcompact/tests/6.md.out
>  create mode 100644 contrib/mdcompact/tests/7.md
>  create mode 100644 contrib/mdcompact/tests/7.md.out
>
> diff --git a/contrib/mdcompact/mdcompact-testsuite.el 
> b/contrib/mdcompact/mdcompact-testsuite.el
> new file mode 100644
> index 000..494c0b5cd68
> --- /dev/null
> +++ b/contrib/mdcompact/mdcompact-testsuite.el
> @@ -0,0 +1,56 @@
> +;;; -*- lexical-binding: t; -*-
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software: you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation, either version 3 of the License, or
> +;; (at your option) any later version.
> +
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT
> +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +;; License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC.  If not, see .
> +
> +;;; Commentary:
> +
> +;;; Usage:
> +;; $ emacs -batch -l mdcompact.el -l mdcompact-testsuite.el -f 
> ert-run-tests-batch-and-exit
> +
> +;;; Code:
> +
> +(require 'mdcompact)
> +(require 'ert)
> +
> +(defconst mdcompat-test-directory (concat (file-name-directory
> +  (or load-file-name
> +   buffer-file-name))
> + "tests/"))
> +
> +(defun mdcompat-test-run (f)
> +  (with-temp-buffer
> +(insert-file-contents f)
> +(mdcomp-run-at-point)
> +(let ((a (buffer-string))
> + (b (with-temp-buffer
> +  (insert

Re: [PATCH]middle-end ifcvt: Allow any const IFN in conditional blocks

2023-10-06 Thread Richard Biener
On Thu, 5 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> When ifcvt was initially added masking was not a thing and as such it was
> rather conservative in what it supported.
> 
> For builtins it only allowed C99 builtin functions which it knew it can fold
> away.
> 
> These days the vectorizer is able to deal with needing to mask IFNs itself.
> vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after
> the operation to emulate the masking.
> 
> This is then used by match.pd to conver the IFN into a masked variant if it's
> available.
> 
> For these reasons the restriction in ifconvert is no longer require and we
> needless block vectorization when we can effectively handle the operations.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Note: This patch is part of a testseries and tests for it are added in the
> AArch64 patch that adds supports for the optab.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/109154
>   * tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 
> a8c915913aed267edfb3ebd2c530aeca7cf51832..f76e0d8f2e6e0f59073fa8484b0b2c7a6cdc9783
>  100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1129,6 +1129,16 @@ if_convertible_stmt_p (gimple *stmt, 
> vec refs)
>   return true;
> }
> }
> +
> + /* There are some IFN_s that are used to replace builtins but have the
> +same semantics.  Even if MASK_CALL cannot handle them vectorable_call
> +will insert the proper selection, so do not block conversion.  */
> + int flags = gimple_call_flags (stmt);
> + if ((flags & ECF_CONST)
> + && !(flags & ECF_LOOPING_CONST_OR_PURE)
> + && gimple_call_combined_fn (stmt) != CFN_LAST)
> +   return true;
> +

Can you instead move the check inside the if (fndecl) right before
it, changing it to check gimple_call_combined_fn?

OK with that change.

Richard.

>   return false;
>}
>  
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH]middle-end ifcvt: Add support for conditional copysign

2023-10-06 Thread Richard Biener
On Thu, 5 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> This adds a masked variant of copysign.  Nothing very exciting just the
> general machinery to define and use a new masked IFN.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Note: This patch is part of a testseries and tests for it are added in the
> AArch64 patch that adds supports for the optab.
> 
> Ok for master?

OK I guess.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/109154
>   * internal-fn.def (COPYSIGN): New.
>   * match.pd (UNCOND_BINARY, COND_BINARY): Map IFN_COPYSIGN to
>   IFN_COND_COPYSIGN.
>   * optabs.def (cond_copysign_optab, cond_len_copysign_optab): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> a2023ab9c3d01c28f51eb8a59e08c59e4c39aa7f..d9e6bdef6977f7ab9c0290bf4f4568aad0380456
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -268,6 +268,7 @@ DEF_INTERNAL_SIGNED_COND_FN (MOD, ECF_CONST, first, smod, 
> umod, binary)
>  DEF_INTERNAL_COND_FN (RDIV, ECF_CONST, sdiv, binary)
>  DEF_INTERNAL_SIGNED_COND_FN (MIN, ECF_CONST, first, smin, umin, binary)
>  DEF_INTERNAL_SIGNED_COND_FN (MAX, ECF_CONST, first, smax, umax, binary)
> +DEF_INTERNAL_COND_FN (COPYSIGN, ECF_CONST, copysign, binary)
>  DEF_INTERNAL_COND_FN (FMIN, ECF_CONST, fmin, binary)
>  DEF_INTERNAL_COND_FN (FMAX, ECF_CONST, fmax, binary)
>  DEF_INTERNAL_COND_FN (AND, ECF_CONST | ECF_NOTHROW, and, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> e12b508ce8ced64e62d94d6df82734cb630b8c1c..1e8d406e6c196b10b48d3c30dc29bffc1bc27bf4
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -93,14 +93,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>plus minus
>mult trunc_div trunc_mod rdiv
>min max
> -  IFN_FMIN IFN_FMAX
> +  IFN_FMIN IFN_FMAX IFN_COPYSIGN
>bit_and bit_ior bit_xor
>lshift rshift)
>  (define_operator_list COND_BINARY
>IFN_COND_ADD IFN_COND_SUB
>IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV
>IFN_COND_MIN IFN_COND_MAX
> -  IFN_COND_FMIN IFN_COND_FMAX
> +  IFN_COND_FMIN IFN_COND_FMAX IFN_COND_COPYSIGN
>IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
>IFN_COND_SHL IFN_COND_SHR)
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 2ccbe4197b7b700dcdb70e2c67cfcf12d7e381b1..93d4c63700cbaa9fea1177b3d6c7a3e12f609361
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -256,6 +256,7 @@ OPTAB_D (cond_fms_optab, "cond_fms$a")
>  OPTAB_D (cond_fnma_optab, "cond_fnma$a")
>  OPTAB_D (cond_fnms_optab, "cond_fnms$a")
>  OPTAB_D (cond_neg_optab, "cond_neg$a")
> +OPTAB_D (cond_copysign_optab, "cond_copysign$F$a")
>  OPTAB_D (cond_one_cmpl_optab, "cond_one_cmpl$a")
>  OPTAB_D (cond_len_add_optab, "cond_len_add$a")
>  OPTAB_D (cond_len_sub_optab, "cond_len_sub$a")
> @@ -281,6 +282,7 @@ OPTAB_D (cond_len_fms_optab, "cond_len_fms$a")
>  OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
>  OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
>  OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
> +OPTAB_D (cond_len_copysign_optab, "cond_len_copysign$F$a")
>  OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
>  OPTAB_D (cmov_optab, "cmov$a6")
>  OPTAB_D (cstore_optab, "cstore$a4")
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-06 Thread Richard Biener
On Thu, Oct 5, 2023 at 10:46 PM Tamar Christina  wrote:
>
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Thursday, October 5, 2023 9:26 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > ; Marcus Shawcroft
> > ; Kyrylo Tkachov 
> > Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> >
> > Tamar Christina  writes:
> > >> -Original Message-
> > >> From: Richard Sandiford 
> > >> Sent: Thursday, October 5, 2023 8:29 PM
> > >> To: Tamar Christina 
> > >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > >> ; Marcus Shawcroft
> > >> ; Kyrylo Tkachov
> > 
> > >> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> > >>
> > >> Tamar Christina  writes:
> > >> > Hi All,
> > >> >
> > >> > This adds an implementation for masked copysign along with an
> > >> > optimized pattern for masked copysign (x, -1).
> > >>
> > >> It feels like we're ending up with a lot of AArch64-specific code
> > >> that just hard- codes the observation that changing the sign is
> > >> equivalent to changing the top bit.  We then need to make sure that
> > >> we choose the best way of changing the top bit for any given situation.
> > >>
> > >> Hard-coding the -1/negative case is one instance of that.  But it
> > >> looks like we also fail to use the best sequence for SVE2.  E.g.
> > >> [https://godbolt.org/z/ajh3MM5jv]:
> > >>
> > >> #include 
> > >>
> > >> void f(double *restrict a, double *restrict b) {
> > >> for (int i = 0; i < 100; ++i)
> > >> a[i] = __builtin_copysign(a[i], b[i]); }
> > >>
> > >> void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
> > >> for (int i = 0; i < 100; ++i)
> > >> a[i] = (a[i] & ~c) | (b[i] & c); }
> > >>
> > >> gives:
> > >>
> > >> f:
> > >> mov x2, 0
> > >> mov w3, 100
> > >> whilelo p7.d, wzr, w3
> > >> .L2:
> > >> ld1dz30.d, p7/z, [x0, x2, lsl 3]
> > >> ld1dz31.d, p7/z, [x1, x2, lsl 3]
> > >> and z30.d, z30.d, #0x7fff
> > >> and z31.d, z31.d, #0x8000
> > >> orr z31.d, z31.d, z30.d
> > >> st1dz31.d, p7, [x0, x2, lsl 3]
> > >> incdx2
> > >> whilelo p7.d, w2, w3
> > >> b.any   .L2
> > >> ret
> > >> g:
> > >> mov x3, 0
> > >> mov w4, 100
> > >> mov z29.d, x2
> > >> whilelo p7.d, wzr, w4
> > >> .L6:
> > >> ld1dz30.d, p7/z, [x0, x3, lsl 3]
> > >> ld1dz31.d, p7/z, [x1, x3, lsl 3]
> > >> bsl z31.d, z31.d, z30.d, z29.d
> > >> st1dz31.d, p7, [x0, x3, lsl 3]
> > >> incdx3
> > >> whilelo p7.d, w3, w4
> > >> b.any   .L6
> > >> ret
> > >>
> > >> I saw that you originally tried to do this in match.pd and that the
> > >> decision was to fold to copysign instead.  But perhaps there's a
> > >> compromise where isel does something with the (new) copysign canonical
> > form?
> > >> I.e. could we go with your new version of the match.pd patch, and add
> > >> some isel stuff as a follow-on?
> > >>
> > >
> > > Sure if that's what's desired But..
> > >
> > > The example you posted above is for instance worse for x86
> > > https://godbolt.org/z/x9ccqxW6T where the first operation has a
> > > dependency chain of 2 and the latter of 3.  It's likely any open coding 
> > > of this
> > operation is going to hurt a target.
> > >
> > > So I'm unsure what isel transform this into...
> >
> > I didn't mean that we should go straight to using isel for the general 
> > case, just
> > for the new case.  The example above was instead trying to show the general
> > point that hiding the logic ops in target code is a double-edged sword.
>
> I see.. but the problem here is that transforming copysign (x, -1) into
> (x | 0x800) would require an integer operation on an FP value.  I'm happy 
> to
> do it but it seems like it'll be an AArch64 only thing anyway.
>
> If we want to do this we need to check can_change_mode_class or a hook.
> Most targets including x86 reject the conversion.  So it'll just be 
> effectively an AArch64
> thing.
>
> You're right that the actual equivalent transformation is this 
> https://godbolt.org/z/KesfrMv5z
> But the target won't allow it.
>
> >
> > The x86_64 example for the -1 case would be
> > https://godbolt.org/z/b9s6MaKs8 where the isel change would be an
> > improvement.  Without that, I guess
> > x86_64 will need to have a similar patch to the AArch64 one.
> >
>
> I think that's to be expected.  I think it's logical that every target just 
> needs to implement
> their optabs optimally.
>
> > That said, https://godbolt.org/z/e6nqoqbMh suggests that powerpc64 is
> > probably relying on the current copysign -> neg/abs transform.
> > (Not sure why the second function uses different IVs from the first.)
> >
> > Personally, I wouldn't be against a target hook that indicated wheth

Re: [PATCH] MATCH: Fix infinite loop between `vec_cond(vec_cond(a, b, 0), c, d)` and `a & b`

2023-10-06 Thread Richard Biener
On Fri, Oct 6, 2023 at 1:15 AM Andrew Pinski  wrote:>
> Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)`
> into `vec_cond(a & b, c, d)` but since in this case a is a comparison
> fold will change `a & b` back into `vec_cond(a,b,0)` which causes an
> infinite loop.
> The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*)
> only for GIMPLE so we don't get an infinite loop for fold any more.
>
> Note this is a latent bug since these patterns were added in 
> r11-2577-g229752afe3156a
> and was exposed by r14-3350-g47b833a9abe1 where now able to remove a 
> VIEW_CONVERT_EXPR.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK (also for branches if you like)

Richard.

> PR middle-end/111699
>
> gcc/ChangeLog:
>
> * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/pr111699-1.c: New test.
> ---
>  gcc/match.pd | 5 +
>  gcc/testsuite/gcc.c-torture/compile/pr111699-1.c | 7 +++
>  2 files changed, 12 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr111699-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4bdd83e6e06..31bfd8b6b68 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5045,6 +5045,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* (v ? w : 0) ? a : b is just (v & w) ? a : b
> Currently disabled after pass lvec because ARM understands
> VEC_COND_EXPR but not a plain v==w fed to BIT_IOR_EXPR.  */
> +#if GIMPLE
> +/* These can only be done in gimple as fold likes to convert:
> +   (CMP) & N into (CMP) ? N : 0
> +   and we try to match the same pattern again and again. */
>  (simplify
>   (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
>   (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
> @@ -5079,6 +5083,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (vec_cond @0 @3 (vec_cond:s @1 @2 @3))
>   (if (optimize_vectors_before_lowering_p () && types_match (@0, @1))
>(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
> +#endif
>
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> types are compatible.  */
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr111699-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr111699-1.c
> new file mode 100644
> index 000..87b127ed199
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr111699-1.c
> @@ -0,0 +1,7 @@
> +typedef unsigned char __attribute__((__vector_size__ (8))) V;
> +
> +void
> +foo (V *v)
> +{
> +  *v =  (V) 0x107B9A7FF >= (*v <= 0);
> +}
> --
> 2.39.3
>


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Richard Biener
reduc_info);
> -  gcc_assert (code.is_tree_code ());
> +  gcc_assert (code.is_tree_code () || cond_fn_p);
>return vectorize_fold_left_reduction
> (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi,
> -tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks,
> -lens);
> +code, reduc_fn, op.ops, op.num_ops, vectype_in,
> +reduc_index, masks, lens);
>  }
>  
>bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
> @@ -8254,14 +8334,20 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
>tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
>  
> +  /* Get NCOPIES vector definitions for all operands except the reduction
> + definition.  */
>vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
>single_defuse_cycle && reduc_index == 0
>? NULL_TREE : op.ops[0], &vec_oprnds0,
>single_defuse_cycle && reduc_index == 1
>? NULL_TREE : op.ops[1], &vec_oprnds1,
> -  op.num_ops == 3
> -  && !(single_defuse_cycle && reduc_index == 2)
> +  op.num_ops == 4
> +  || (op.num_ops == 3
> +  && !(single_defuse_cycle && reduc_index == 2))
>? op.ops[2] : NULL_TREE, &vec_oprnds2);
> +
> +  /* For single def-use cycles get one copy of the vectorized reduction
> + definition.  */
>if (single_defuse_cycle)
>  {
>gcc_assert (!slp_node);
> @@ -8301,7 +8387,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>   }
>else
>   {
> -   if (op.num_ops == 3)
> +   if (op.num_ops >= 3)
>   vop[2] = vec_oprnds2[i];
>  
> if (masked_loop_p && mask_by_cond_expr)
> @@ -8314,10 +8400,16 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
> if (emulated_mixed_dot_prod)
>   new_stmt = vect_emulate_mixed_dot_prod (loop_vinfo, stmt_info, gsi,
>   vec_dest, vop);
> -   else if (code.is_internal_fn ())
> +
> +   else if (code.is_internal_fn () && !cond_fn_p)
>   new_stmt = gimple_build_call_internal (internal_fn (code),
>  op.num_ops,
>  vop[0], vop[1], vop[2]);
> +   else if (code.is_internal_fn () && cond_fn_p)
> + new_stmt = gimple_build_call_internal (internal_fn (code),
> +op.num_ops,
> +vop[0], vop[1], vop[2],
> +vop[1]);
> else
>   new_stmt = gimple_build_assign (vec_dest, tree_code (op.code),
>   vop[0], vop[1], vop[2]);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index f1d0cd79961..e22067400af 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2319,7 +2319,7 @@ extern tree vect_create_addr_base_for_vector_ref 
> (vec_info *,
> tree);
>  
>  /* In tree-vect-loop.cc.  */
> -extern tree neutral_op_for_reduction (tree, code_helper, tree);
> +extern tree neutral_op_for_reduction (tree, code_helper, tree, bool = true);
>  extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info 
> loop_vinfo);
>  bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *);
>  /* Used in tree-vect-loop-manip.cc */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-10-06 Thread Richard Biener
On Thu, Sep 14, 2023 at 2:43 PM Di Zhao OS
 wrote:
>
> This is a new version of the patch on "nested FMA".
> Sorry for updating this after so long, I've been studying and
> writing micro cases to sort out the cause of the regression.

Sorry for taking so long to reply.

> First, following previous discussion:
> (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html)
>
> 1. From testing more altered cases, I don't think the
> problem is that reassociation works locally. In that:
>
>   1) On the example with multiplications:
>
> tmp1 = a + c * c + d * d + x * y;
> tmp2 = x * tmp1;
> result += (a + c + d + tmp2);
>
>   Given "result" rewritten by width=2, the performance is
>   worse if we rewrite "tmp1" with width=2. In contrast, if we
>   remove the multiplications from the example (and make "tmp1"
>   not singe used), and still rewrite "result" by width=2, then
>   rewriting "tmp1" with width=2 is better. (Make sense because
>   the tree's depth at "result" is still smaller if we rewrite
>   "tmp1".)
>
>   2) I tried to modify the assembly code of the example without
>   FMA, so the width of "result" is 4. On Ampere1 there's no
>   obvious improvement. So although this is an interesting
>   problem, it doesn't seem like the cause of the regression.

OK, I see.

> 2. From assembly code of the case with FMA, one problem is
> that, rewriting "tmp1" to parallel didn't decrease the
> minimum CPU cycles (taking MULT_EXPRs into account), but
> increased code size, so the overhead is increased.
>
>a) When "tmp1" is not re-written to parallel:
> fmadd d31, d2, d2, d30
> fmadd d31, d3, d3, d31
> fmadd d31, d4, d5, d31  //"tmp1"
> fmadd d31, d31, d4, d3
>
>b) When "tmp1" is re-written to parallel:
> fmul  d31, d4, d5
> fmadd d27, d2, d2, d30
> fmadd d31, d3, d3, d31
> fadd  d31, d31, d27 //"tmp1"
> fmadd d31, d31, d4, d3
>
> For version a), there are 3 dependent FMAs to calculate "tmp1".
> For version b), there are also 3 dependent instructions in the
> longer path: the 1st, 3rd and 4th.

Yes, it doesn't really change anything.  The patch has

+  /* If there's code like "acc = a * b + c * d + acc" in a tight loop, some
+ uarchs can execute results like:
+
+   _1 = a * b;
+   _2 = .FMA (c, d, _1);
+   acc_1 = acc_0 + _2;
+
+ in parallel, while turning it into
+
+   _1 = .FMA(a, b, acc_0);
+   acc_1 = .FMA(c, d, _1);
+
+ hinders that, because then the first FMA depends on the result
of preceding
+ iteration.  */

I can't see what can be run in parallel for the first case.  The .FMA
depends on the multiplication a * b.  Iff the uarch somehow decomposes
.FMA into multiply + add then the c * d multiply could run in parallel
with the a * b multiply which _might_ be able to hide some of the
latency of the full .FMA.  Like on x86 Zen FMA has a latency of 4
cycles but a multiply only 3.  But I never got confirmation from any
of the CPU designers that .FMAs are issued when the multiply
operands are ready and the add operand can be forwarded.

I also wonder why the multiplications of the two-FMA sequence
then cannot be executed at the same time?  So I have some doubt
of the theory above.

Iff this really is the reason for the sequence to execute with lower
overall latency and we want to attack this on GIMPLE then I think
we need a target hook telling us this fact (I also wonder if such
behavior can be modeled in the scheduler pipeline description at all?)

> So it seems to me the current get_reassociation_width algorithm
> isn't optimal in the presence of FMA. So I modified the patch to
> improve get_reassociation_width, rather than check for code
> patterns. (Although there could be some other complicated
> factors so the regression is more obvious when there's "nested
> FMA". But with this patch that should be avoided or reduced.)
>
> With this patch 508.namd_r 1-copy run has 7% improvement on
> Ampere1, on Intel Xeon there's about 3%. While I'm still
> collecting data on other CPUs, I'd like to know how do you
> think of this.
>
> About changes in the patch:
>
> 1. When the op list forms a complete FMA chain, try to search
> for a smaller width considering the benefit of using FMA. With
> a smaller width, the increment of code size is smaller when
> breaking the chain.

But this is all highly target specific (code size even more so).

How I understand your approach to fixing the issue leads me to
the suggestion to prioritize parallel rewriting, thus alter rank_ops_for_fma,
taking the reassoc width into account (the computed width should be
unchanged from rank_ops_for_fma) instead of "fixing up" the parallel
rewriting of FMAs (well, they are not yet formed of course).
get_reassociation_width has 'get_required_cycles', the above theory
could be verified with a very simple toy pipeline model.  We'd have
to ask the target for the reassoc width for MULT_EXPRs as well (or ma

Re: [PATCH 01/22] Add condition coverage profiling

2023-10-06 Thread Richard Biener
On Thu, 5 Oct 2023, Jan Hubicka wrote:

[...]
> Richi, can you please look at the gimple matching part?

What did you have in mind?  I couldn't find anything obvious in the
patch counting as gimple matching - do you have a pointer?

Thanks,
Richard.


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Richard Biener
On Fri, 6 Oct 2023, Robin Dapp wrote:

> > We might need a similar assert
> > 
> >   gcc_assert (HONOR_SIGNED_ZEROS (vectype_out)
> >   && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));?
> 
> erm, obviously not that exact assert but more something like
> 
> if (HONOR_SIGNED_ZEROS && !HONOR_SIGN_DEPENDENT_ROUNDING...)
>   {
> if (dump)
>   ...
> return false;
>   }
> 
> or so.

Yeah, of course the whole point of a fold-left reduction is to
_not_ give up without -ffast-math which is why I added the above.
I obviously didn't fully verify what happens for an original
MINUS_EXPR.  I think it's required to give up for -frounding-math,
but I think I might have put the code to do that in a generic
enough place.

For x86 you need --param vect-partial-vector-usage=2 and an
AVX512 enabled arch like -march=skylake-avx512 or -march=znver4.

I think tranforming - x to + (-x) works for signed zeros.

So if you think you got everything correct the patch is OK as-is,
I just wasn't sure - maybe the neutral_element change deserves
a comment as to how MINUS_EXPR is handled.

Richard.


Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-07 Thread Richard Biener



> Am 07.10.2023 um 11:23 schrieb Richard Sandiford :
> 
> Richard Biener  writes:
>> On Thu, 5 Oct 2023, Tamar Christina wrote:
>> 
>>>> I suppose the idea is that -abs(x) might be easier to optimize with other
>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
>>>> 
>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
>>>> canonical than copysign.
>>>> 
>>>>> Should I try removing this?
>>>> 
>>>> I'd say yes (and put the reverse canonicalization next to this pattern).
>>>> 
>>> 
>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
>>> canonical and allows a target to expand this sequence efficiently.  Such
>>> sequences are common in scientific code working with gradients.
>>> 
>>> various optimizations in match.pd only happened on COPYSIGN but not 
>>> COPYSIGN_ALL
>>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to 
>>> only
>> 
>> That's not true:
>> 
>> (define_operator_list COPYSIGN
>>BUILT_IN_COPYSIGNF
>>BUILT_IN_COPYSIGN
>>BUILT_IN_COPYSIGNL
>>IFN_COPYSIGN)
>> 
>> but they miss the extended float builtin variants like
>> __builtin_copysignf16.  Also see below
>> 
>>> the C99 builtins and so doesn't work for vectors.
>>> 
>>> The patch expands these optimizations to work on COPYSIGN_ALL.
>>> 
>>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
>>> which I remove since this is a less efficient form.  The testsuite is also
>>> updated in light of this.
>>> 
>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>> 
>>> Ok for master?
>>> 
>>> Thanks,
>>> Tamar
>>> 
>>> gcc/ChangeLog:
>>> 
>>>PR tree-optimization/109154
>>>* match.pd: Add new neg+abs rule, remove inverse copysign rule and
>>>expand existing copysign optimizations.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>PR tree-optimization/109154
>>>* gcc.dg/fold-copysign-1.c: Updated.
>>>* gcc.dg/pr55152-2.c: Updated.
>>>* gcc.dg/tree-ssa/abs-4.c: Updated.
>>>* gcc.dg/tree-ssa/backprop-6.c: Updated.
>>>* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
>>>* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
>>>* gcc.target/aarch64/fneg-abs_1.c: New test.
>>>* gcc.target/aarch64/fneg-abs_2.c: New test.
>>>* gcc.target/aarch64/fneg-abs_3.c: New test.
>>>* gcc.target/aarch64/fneg-abs_4.c: New test.
>>>* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
>>>* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
>>>* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
>>>* gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>>> 
>>> --- inline copy of patch ---
>>> 
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 
>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
>>>  100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>> 
>>> /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
>>> (for coss (COS COSH)
>>> - copysigns (COPYSIGN)
>>> - (simplify
>>> -  (coss (copysigns @0 @1))
>>> -   (coss @0)))
>>> + (for copysigns (COPYSIGN_ALL)
>> 
>> So this ends up generating for example the match
>> (cosf (copysignl ...)) which doesn't make much sense.
>> 
>> The lock-step iteration did
>> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
>> which is leaner but misses the case of
>> (cosf (ifn_copysign ..)) - that's probably what you are
>> after with this change.
>> 
>> That said, there isn't a nice solution (without altering the match.pd
>> IL).  There's the explicit solution, spelling out all combinations.
>> 
>> So if we want to go with yout pragmatic solution changing this
>> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
>> for iteration to a cross product for iteration is.
>> 
>> Changing just this pattern to
>> 
>> (for coss (COS COSH)
>> (for copysigns (COPYSIGN)
>>  (simplify
>>   (coss (copysigns @0 @1))
>>   (coss @0
>> 
>> inc

Re: [PATCH] TEST: Fix XPASS of TSVC testsuites for RVV

2023-10-08 Thread Richard Biener
{ { ! 
> aarch64_sve } && { ! riscv_v } }  } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s3111.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s3111.c
> index c7b2d614f10..4163dd8e422 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s3111.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s3111.c
> @@ -41,4 +41,4 @@ int main (int argc, char **argv)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
> aarch64_sve }  } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { { ! 
> aarch64_sve } && { ! riscv_v } }  } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> index 58898583c26..98ba7522471 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
> @@ -44,4 +44,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } 
> } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
> riscv_v } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s441.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s441.c
> index e73f782ba01..480e5975a36 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s441.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s441.c
> @@ -42,4 +42,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
> aarch64_sve }  } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { { ! 
> aarch64_sve } && { ! riscv_v } }  } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s443.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s443.c
> index a07800b7c95..709413fa6f8 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s443.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s443.c
> @@ -47,4 +47,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
> aarch64_sve }  } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { { ! 
> aarch64_sve } && { ! riscv_v } }  } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vif.c 
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vif.c
> index 48e1c141977..6eba46403b4 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vif.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vif.c
> @@ -38,4 +38,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
> aarch64_sve }  } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { { ! 
> aarch64_sve } && { ! riscv_v } }  } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-09 Thread Richard Biener
On Sat, 7 Oct 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> >> Am 07.10.2023 um 11:23 schrieb Richard Sandiford 
> >> >> Richard Biener  writes:
> >>> On Thu, 5 Oct 2023, Tamar Christina wrote:
> >>> 
> >>>>> I suppose the idea is that -abs(x) might be easier to optimize with 
> >>>>> other
> >>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
> >>>>> 
> >>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) is 
> >>>>> less
> >>>>> canonical than copysign.
> >>>>> 
> >>>>>> Should I try removing this?
> >>>>> 
> >>>>> I'd say yes (and put the reverse canonicalization next to this pattern).
> >>>>> 
> >>>> 
> >>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
> >>>> canonical and allows a target to expand this sequence efficiently.  Such
> >>>> sequences are common in scientific code working with gradients.
> >>>> 
> >>>> various optimizations in match.pd only happened on COPYSIGN but not 
> >>>> COPYSIGN_ALL
> >>>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted 
> >>>> to only
> >>> 
> >>> That's not true:
> >>> 
> >>> (define_operator_list COPYSIGN
> >>>BUILT_IN_COPYSIGNF
> >>>BUILT_IN_COPYSIGN
> >>>BUILT_IN_COPYSIGNL
> >>>IFN_COPYSIGN)
> >>> 
> >>> but they miss the extended float builtin variants like
> >>> __builtin_copysignf16.  Also see below
> >>> 
> >>>> the C99 builtins and so doesn't work for vectors.
> >>>> 
> >>>> The patch expands these optimizations to work on COPYSIGN_ALL.
> >>>> 
> >>>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs 
> >>>> (x))
> >>>> which I remove since this is a less efficient form.  The testsuite is 
> >>>> also
> >>>> updated in light of this.
> >>>> 
> >>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>>> 
> >>>> Ok for master?
> >>>> 
> >>>> Thanks,
> >>>> Tamar
> >>>> 
> >>>> gcc/ChangeLog:
> >>>> 
> >>>>PR tree-optimization/109154
> >>>>* match.pd: Add new neg+abs rule, remove inverse copysign rule and
> >>>>expand existing copysign optimizations.
> >>>> 
> >>>> gcc/testsuite/ChangeLog:
> >>>> 
> >>>>PR tree-optimization/109154
> >>>>* gcc.dg/fold-copysign-1.c: Updated.
> >>>>* gcc.dg/pr55152-2.c: Updated.
> >>>>* gcc.dg/tree-ssa/abs-4.c: Updated.
> >>>>* gcc.dg/tree-ssa/backprop-6.c: Updated.
> >>>>* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> >>>>* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> >>>>* gcc.target/aarch64/fneg-abs_1.c: New test.
> >>>>* gcc.target/aarch64/fneg-abs_2.c: New test.
> >>>>* gcc.target/aarch64/fneg-abs_3.c: New test.
> >>>>* gcc.target/aarch64/fneg-abs_4.c: New test.
> >>>>* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> >>>>* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> >>>>* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> >>>>* gcc.target/aarch64/sve/fneg-abs_4.c: New test.
> >>>> 
> >>>> --- inline copy of patch ---
> >>>> 
> >>>> diff --git a/gcc/match.pd b/gcc/match.pd
> >>>> index 
> >>>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
> >>>>  100644
> >>>> --- a/gcc/match.pd
> >>>> +++ b/gcc/match.pd
> >>>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>>> 
> >>>> /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
> >>>> (for coss (COS COSH)
> >>>> - copysigns (COPYSIGN)
> >>>> - (simplify
> >>>> -  (coss (copysigns @0 @1))
> >>>> -   (coss @0)))
> >>>> + (for copysigns (COPYSIGN_ALL)
> >>> 
> >&

Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-09 Thread Richard Biener
gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> index 38994ea82a5..3832a660023 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c
> @@ -41,5 +41,5 @@ neg_xi (double *x)
>return res_3;
> }
> -/* { dg-final { scan-tree-dump { = \.COND_ADD} "vect" { target { 
> vect_double_cond_arith && vect_fully_masked } } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
> vect_double_cond_arith && vect_fully_masked } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?ADD} "vect" { target { 
> vect_double_cond_arith && vect_fully_masked } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?SUB} "optimized" { target 
> { vect_double_cond_arith && vect_fully_masked } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> index 1af0fe642a0..5bb75206a68 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> @@ -52,8 +52,8 @@ main (void)
>return 0;
> }
> -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?ADD} "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?SUB} "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?MUL} "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?RDIV} "optimized" { target 
> vect_double_cond_arith } } } */
> /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target 
> vect_double_cond_arith } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> index ec3d9db4202..8a168081197 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> @@ -54,8 +54,8 @@ main (void)
>return 0;
> }
> -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?ADD} "optimized" { target 
> { vect_double_cond_arith && vect_masked_store } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?SUB} "optimized" { target 
> { vect_double_cond_arith && vect_masked_store } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?MUL} "optimized" { target 
> { vect_double_cond_arith && vect_masked_store } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_L?E?N?_?RDIV} "optimized" { target 
> { vect_double_cond_arith && vect_masked_store } } } } */
> /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> index 2aeebd44f83..c3257890735 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> @@ -56,8 +56,8 @@ main (void)
> }
> /* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4 "vect" { 
> target vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_L?E?N?_?ADD} 1 "optimized" { 
> target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_L?E?N?_?SUB} 1 "optimized" { 
> target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_L?E?N?_?MUL} 1 "optimized" { 
> target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_L?E?N?_?RDIV} 1 "optimized" 
> { target vect_double_cond_arith } } } */
> /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target 
> vect_double_cond_arith } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV

2023-10-09 Thread Richard Biener
On Sun, 8 Oct 2023, Jeff Law wrote:

> 
> 
> On 10/8/23 05:35, Juzhe-Zhong wrote:
> > RVV (RISC-V Vector) doesn't enable vect_unpack, but we still vectorize this
> > case well.
> > So, adjust dump check for RVV.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >  * gcc.dg/vect/vect-multitypes-16.c: Fix dump FAIL of RVV.
> I'd hoped to avoid a bunch of risc-v special casing in the generic part of the
> testsuite.  Basically the more we have target specific conditionals rather
> than conditionals using properties, the more likely we are to keep revisiting
> this stuff over time and possibly for other architectures as well.
> 
> What is it about risc-v's vector support that allows it to optimize this case?
> Is it the same property that allows us to handle the outer loop vectorization
> tests that you changed in another patch?

I suspect for VLA vectorization we can use direct conversion from
char to long long here?  I also notice the testcase uses 'char',
not specifying its sign.  So either of [sz]extVxyzDIVxyzQI is possibly
provided by RISCV?  (or possibly via some intermediate types in a
multi-step conversion)

For non-VLA and with the single vector size restriction we'd need
unpacking.

So it might be better

 { target { vect_unpack || { vect_vla && vect_sext_char_longlong } } }

where I think neither vect_vla nor vect_sext_char_longlong exists.

Richard - didn't you run into similar things with SVE?

Richard.


> Neither an ACK nor NAK right now.
> 
> Jeff
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] TEST: Fix dump FAIL of vect-multitypes-16.c for RVV

2023-10-09 Thread Richard Biener
On Mon, 9 Oct 2023, Robin Dapp wrote:

> > Maybe I should pretend RVV support vect_pack/vect_unpack and enable
> > all the tests in target-supports.exp?
> 
> The problem is that vect_pack/unpack is an overloaded term in the
> moment meaning "vector conversion" (promotion/demotion) or so.  This
> test does not require pack/unpack for successful vectorization but our
> method of keeping the number of elements the same works as well.  The
> naming probably precedes vectorizer support for that.
> I can't imagine cases where vectorization would fail because of this
> as we can always work around it some way.  So from that point of view
> "pretending" to support it would work.  However in case somebody wants
> to really write a specific test cases that relies on pack/unpack
> (maybe there are already some?) "pretending" would fail.

I suspect that for VLS you need to provide the respective patterns
because of the single vector-size restriction.

> I lean towards "pretending" at the moment ;)  The other option would be
> to rename that and audit all test cases.
> 
> Note there are also vect_intfloat_cvt as well as others that don't have
> pack/unpack in the name (that we also probably still need to enable).

Yeah well - the dejagnu "targets" are mostly too broad, but as usual
time is spent elsewhere instead of at cleaning up the mess ;)

It might be more useful to provide vect_.. dg targets
because then it's at least obvious what is meant.  Or group
things as vect_float vect_int.

Richard.

> Regards
>  Robin
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] RISC-V: Support movmisalign of RVV VLA modes

2023-10-09 Thread Richard Biener
On Sun, Oct 8, 2023 at 9:22 AM Juzhe-Zhong  wrote:
>
> Previously, I removed the movmisalign pattern to fix the execution FAILs in 
> this commit:
> https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520
>
> I was thinking that RVV doesn't allow misaligned at the beginning so I 
> removed that pattern.
> However, after deep investigation && reading RVV ISA again and experiment on 
> SPIKE,
> I realized I was wrong.
>
> RVV ISA reference: 
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints
>
> "If an element accessed by a vector memory instruction is not naturally 
> aligned to the size of the element,
>  either the element is transferred successfully or an address misaligned 
> exception is raised on that element."

But you gobble the "or .." into an existing -mstrict-align flag - are
you sure all implementations are
self-consistent with handling non-vector memory instructions and
vector memory instructions here?
At least the above wording doesn't seem to impose such requirement.

> It's obvious that RVV ISA does allow misaligned vector load/store.
>
> And experiment and confirm on SPIKE:
>
> [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
>  --isa=rv64gcv --varch=vlen:128,elen:64 
> ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
>   a.out
> bbl loader
> z   ra 00010158 sp 003ffb40 gp 
> 00012c48
> tp  t0 000110da t1 000f t2 
> 
> s0 00013460 s1  a0 00012ef5 a1 
> 00012018
> a2 00012a71 a3 000d a4 0004 a5 
> 00012a71
> a6 00012a71 a7 00012018 s2  s3 
> 
> s4  s5  s6  s7 
> 
> s8  s9  sA  sB 
> 
> t3  t4  t5  t6 
> 
> pc 00010258 va/inst 020660a7 sr 80026620
> Store/AMO access fault!
>
> [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike
>  --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 
> ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64
>   a.out
> bbl loader
>
> We can see SPIKE can pass previous *FAILED* execution tests with specifying 
> --misaligned to SPIKE.
>
> So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the 
> investigations I have done since
> it can improve multiple vectorization tests and fix dumple FAILs.
>
> This patch fixes these following dump FAILs:
>
> FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized 
> "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  
> scan-tree-dump-not optimized "Invalid sum"
> FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized 
> "Invalid sum"
>
> Consider this following case:
>
> struct s {
> unsigned i : 31;
> char a : 4;
> };
>
> #define N 32
> #define ELT0 {0x7FFFUL, 0}
> #define ELT1 {0x7FFFUL, 1}
> #define ELT2 {0x7FFFUL, 2}
> #define ELT3 {0x7FFFUL, 3}
> #define RES 48
> struct s A[N]
>   = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
>   ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
>   ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
>   ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
>
> int __attribute__ ((noipa))
> f(struct s *ptr, unsigned n) {
> int res = 0;
> for (int i = 0; i < n; ++i)
>   res += ptr[i].a;
> return res;
> }
>
> -O3 -S -fno-vect-cost-model (default strict-align):
>
> f:
> mv  a4,a0
> beq a1,zero,.L9
> addiw   a5,a1,-1
> li  a3,14
> vsetivlizero,16,e64,m8,ta,ma
> bleua5,a3,.L3
> andia5,a0,127
> bne a5,zero,.L3
> srliw   a3,a1,4
> sllia3,a3,7
> li  a0,15
> sllia0,a0,32
> add a3,a3,a4
> mv  a5,a4
>   

Re: [PATCH] TEST: Fix dump FAIL for RVV (RISCV-V vector)

2023-10-09 Thread Richard Biener
On Sun, 8 Oct 2023, Juzhe-Zhong wrote:

> As this showed: https://godbolt.org/z/3K9oK7fx3
> 
> ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times.
> 
> This is because RISC-V doesn't enable vec_pack_trunc so we will failed 
> conversion and fold_extract_last at the first time analysis.
> Then we succeed at the second time.
> 
> So RVV has 4 times of showing "FOLD_EXTRACT_LAST:.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-4.c: Add vect_pack_trunc variant.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
> index 8820075b1dc..8ea8c538713 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
> @@ -42,6 +42,7 @@ main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */
> +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 2 "vect" { target { vect_fold_extract_last && 
> vect_pack_trunc } } } } */
> +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
> FOLD_EXTRACT_LAST" 4 "vect" { target { { vect_fold_extract_last } && { ! 
> vect_pack_trunc } } } } } */
>  /* { dg-final { scan-tree-dump-times "condition expression based on integer 
> induction." 2 "vect" { target { ! vect_fold_extract_last } } } } */
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] TEST: Fix dump FAIL for RVV

2023-10-09 Thread Richard Biener
On Sun, 8 Oct 2023, Juzhe-Zhong wrote:

> gcc/testsuite/ChangeLog:

OK

> 
>   * gcc.dg/vect/bb-slp-cond-1.c: Fix dump FAIL for RVV.
>   * gcc.dg/vect/pr57705.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 4 ++--
>  gcc/testsuite/gcc.dg/vect/pr57705.c   | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> index c8024429e9c..e1ebc23505f 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> @@ -47,6 +47,6 @@ int main ()
>  }
>  
>  /* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 
> 1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { 
> target vect_element_align } } } */
> -/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { 
> vect_element_align && { ! amdgcn-*-* } } } } } */
> -/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target 
> amdgcn-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { 
> vect_element_align && { { ! amdgcn-*-* } && { ! riscv_v } } } } } } */
> +/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target { 
> amdgcn-*-* || riscv_v } } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/pr57705.c 
> b/gcc/testsuite/gcc.dg/vect/pr57705.c
> index 39c32946d74..2dacea0a7a7 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr57705.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr57705.c
> @@ -64,5 +64,5 @@ main ()
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 3 "vect" { target 
> vect_pack_trunc } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 2 "vect" { target { 
> ! vect_pack_trunc } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 3 "vect" { target { 
> vect_pack_trunc || riscv_v } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 2 "vect" { target { 
> { ! vect_pack_trunc } && { ! riscv_v } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] TEST: Fix XPASS of outer loop vectorization tests for RVV

2023-10-09 Thread Richard Biener
On Sun, 8 Oct 2023, Juzhe-Zhong wrote:

> Even though RVV doesn't enable vec_unpack/vec_pack, it succeed on outer loop 
> vectorizations.

How so?  I think this maybe goes with the other similar change.

That is, when we already have specific target checks adding riscv-*-* 
looks sensible but when we don't we should figure if there's a capability
we can (add and) test instead.

> Fix these following XPASS FAILs:
> 
> XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/no-scevccp-outer-16.c: Fix XPASS for RVV.
>   * gcc.dg/vect/no-scevccp-outer-17.c: Ditto.
>   * gcc.dg/vect/no-scevccp-outer-19.c: Ditto.
>   * gcc.dg/vect/no-scevccp-outer-21.c: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> index c7c2fa8a504..12179949e00 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> @@ -59,4 +59,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> index ba904a6c03e..86554a98169 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> @@ -65,4 +65,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> index 5cd4049d08c..624b54accf4 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> @@ -49,4 +49,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! {vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_unpack } } && { ! {riscv_v } } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> index 72e53c2bfb0..b30a5d78819 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> @@ -59,4 +59,4 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! { vect_pack_trunc } } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { { ! {vect_pack_trunc } } && { ! {riscv_v } } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-09 Thread Richard Biener
On Fri, 6 Oct 2023, Robin Dapp wrote:

> > So if you think you got everything correct the patch is OK as-is,
> > I just wasn't sure - maybe the neutral_element change deserves
> > a comment as to how MINUS_EXPR is handled.
> 
> Heh, I never think I got everything correct ;)
> 
> Added this now:
> 
>  static bool
>  fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
>  {
> +  /* We support MINUS_EXPR by negating the operand.  This also preserves an
> + initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 +
> + (-0.0) = -0.0.  */
> 
> What I still found is that aarch64 ICEs at the assertion you added
> with -frounding-math.  Therefore I changed it to:
> 
> - gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
> + if (HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot vectorize fold-left reduction 
> because"
> +" signed zeros cannot be preserved.\n");
> + return false;
> +   }
> 
> No code changes apart from that.  Will leave it until Monday and push then
> barring any objections.

Hmm, the function is called at transform time so this shouldn't help
avoiding the ICE.  I expected we refuse to vectorize _any_ reduction
when sign dependent rounding is in effect?  OTOH maybe sign-dependent
rounding is OK but only when we use a unconditional fold-left
(so a loop mask from fully masking is OK but not an original COND_ADD?).

Still the check should be done in vectorizable_reduction, not only
during transform (there the assert is proper, if we can distinguish
the loop mask vs. the COND_ADD here, otherwise just remove it).

Richard.


> Thanks for the pointers.
> 
> Regards
>  Robin
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-09 Thread Richard Biener
On Mon, 9 Oct 2023, Andrew Pinski wrote:

> On Mon, Oct 9, 2023 at 12:20?AM Richard Biener  wrote:
> >
> > On Sat, 7 Oct 2023, Richard Sandiford wrote:
> >
> > > Richard Biener  writes:
> > > >> Am 07.10.2023 um 11:23 schrieb Richard Sandiford 
> > > >> >> Richard Biener  
> > > >> writes:
> > > >>> On Thu, 5 Oct 2023, Tamar Christina wrote:
> > > >>>
> > > >>>>> I suppose the idea is that -abs(x) might be easier to optimize with 
> > > >>>>> other
> > > >>>>> patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
> > > >>>>>
> > > >>>>> For abs vs copysign it's a canonicalization, but (negate (abs @0)) 
> > > >>>>> is less
> > > >>>>> canonical than copysign.
> > > >>>>>
> > > >>>>>> Should I try removing this?
> > > >>>>>
> > > >>>>> I'd say yes (and put the reverse canonicalization next to this 
> > > >>>>> pattern).
> > > >>>>>
> > > >>>>
> > > >>>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is 
> > > >>>> more
> > > >>>> canonical and allows a target to expand this sequence efficiently.  
> > > >>>> Such
> > > >>>> sequences are common in scientific code working with gradients.
> > > >>>>
> > > >>>> various optimizations in match.pd only happened on COPYSIGN but not 
> > > >>>> COPYSIGN_ALL
> > > >>>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is 
> > > >>>> restricted to only
> > > >>>
> > > >>> That's not true:
> > > >>>
> > > >>> (define_operator_list COPYSIGN
> > > >>>BUILT_IN_COPYSIGNF
> > > >>>BUILT_IN_COPYSIGN
> > > >>>BUILT_IN_COPYSIGNL
> > > >>>IFN_COPYSIGN)
> > > >>>
> > > >>> but they miss the extended float builtin variants like
> > > >>> __builtin_copysignf16.  Also see below
> > > >>>
> > > >>>> the C99 builtins and so doesn't work for vectors.
> > > >>>>
> > > >>>> The patch expands these optimizations to work on COPYSIGN_ALL.
> > > >>>>
> > > >>>> There is an existing canonicalization of copysign (x, -1) to fneg 
> > > >>>> (fabs (x))
> > > >>>> which I remove since this is a less efficient form.  The testsuite 
> > > >>>> is also
> > > >>>> updated in light of this.
> > > >>>>
> > > >>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >>>>
> > > >>>> Ok for master?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Tamar
> > > >>>>
> > > >>>> gcc/ChangeLog:
> > > >>>>
> > > >>>>PR tree-optimization/109154
> > > >>>>* match.pd: Add new neg+abs rule, remove inverse copysign rule and
> > > >>>>expand existing copysign optimizations.
> > > >>>>
> > > >>>> gcc/testsuite/ChangeLog:
> > > >>>>
> > > >>>>PR tree-optimization/109154
> > > >>>>* gcc.dg/fold-copysign-1.c: Updated.
> > > >>>>* gcc.dg/pr55152-2.c: Updated.
> > > >>>>* gcc.dg/tree-ssa/abs-4.c: Updated.
> > > >>>>* gcc.dg/tree-ssa/backprop-6.c: Updated.
> > > >>>>* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> > > >>>>* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> > > >>>>* gcc.target/aarch64/fneg-abs_1.c: New test.
> > > >>>>* gcc.target/aarch64/fneg-abs_2.c: New test.
> > > >>>>* gcc.target/aarch64/fneg-abs_3.c: New test.
> > > >>>>* gcc.target/aarch64/fneg-abs_4.c: New test.
> > > >>>>* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> > > >>>>* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> > > >>>>*

Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-09 Thread Richard Biener
On Mon, Oct 9, 2023 at 11:39 AM Tamar Christina  wrote:
>
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Saturday, October 7, 2023 10:58 AM
> > To: Richard Biener 
> > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
> > nd ; Richard Earnshaw ;
> > Marcus Shawcroft ; Kyrylo Tkachov
> > 
> > Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> >
> > Richard Biener  writes:
> > > On Thu, Oct 5, 2023 at 10:46 PM Tamar Christina
> >  wrote:
> > >>
> > >> > -Original Message-
> > >> > From: Richard Sandiford 
> > >> > Sent: Thursday, October 5, 2023 9:26 PM
> > >> > To: Tamar Christina 
> > >> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > >> > ; Marcus Shawcroft
> > >> > ; Kyrylo Tkachov
> > 
> > >> > Subject: Re: [PATCH]AArch64 Add SVE implementation for
> > cond_copysign.
> > >> >
> > >> > Tamar Christina  writes:
> > >> > >> -Original Message-
> > >> > >> From: Richard Sandiford 
> > >> > >> Sent: Thursday, October 5, 2023 8:29 PM
> > >> > >> To: Tamar Christina 
> > >> > >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > >> > >> ; Marcus Shawcroft
> > >> > >> ; Kyrylo Tkachov
> > >> > 
> > >> > >> Subject: Re: [PATCH]AArch64 Add SVE implementation for
> > cond_copysign.
> > >> > >>
> > >> > >> Tamar Christina  writes:
> > >> > >> > Hi All,
> > >> > >> >
> > >> > >> > This adds an implementation for masked copysign along with an
> > >> > >> > optimized pattern for masked copysign (x, -1).
> > >> > >>
> > >> > >> It feels like we're ending up with a lot of AArch64-specific
> > >> > >> code that just hard- codes the observation that changing the
> > >> > >> sign is equivalent to changing the top bit.  We then need to
> > >> > >> make sure that we choose the best way of changing the top bit for 
> > >> > >> any
> > given situation.
> > >> > >>
> > >> > >> Hard-coding the -1/negative case is one instance of that.  But
> > >> > >> it looks like we also fail to use the best sequence for SVE2.  E.g.
> > >> > >> [https://godbolt.org/z/ajh3MM5jv]:
> > >> > >>
> > >> > >> #include 
> > >> > >>
> > >> > >> void f(double *restrict a, double *restrict b) {
> > >> > >> for (int i = 0; i < 100; ++i)
> > >> > >> a[i] = __builtin_copysign(a[i], b[i]); }
> > >> > >>
> > >> > >> void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
> > >> > >> for (int i = 0; i < 100; ++i)
> > >> > >> a[i] = (a[i] & ~c) | (b[i] & c); }
> > >> > >>
> > >> > >> gives:
> > >> > >>
> > >> > >> f:
> > >> > >> mov x2, 0
> > >> > >> mov w3, 100
> > >> > >> whilelo p7.d, wzr, w3
> > >> > >> .L2:
> > >> > >> ld1dz30.d, p7/z, [x0, x2, lsl 3]
> > >> > >> ld1dz31.d, p7/z, [x1, x2, lsl 3]
> > >> > >> and z30.d, z30.d, #0x7fff
> > >> > >> and z31.d, z31.d, #0x8000
> > >> > >> orr z31.d, z31.d, z30.d
> > >> > >> st1dz31.d, p7, [x0, x2, lsl 3]
> > >> > >> incdx2
> > >> > >> whilelo p7.d, w2, w3
> > >> > >> b.any   .L2
> > >> > >> ret
> > >> > >> g:
> > >> > >> mov x3, 0
> > >> > >> mov w4, 100
> > >> > >> mov z29.d, x2
> > >> > >> whilelo p7.d, wzr, w4
> > >> > >> .L6:
> > >> > >> ld1dz30.d, p7/z, [x0, x3, lsl 3]
> > >> > >> ld1dz31.d, p7/z, [x1, x3, lsl 3]
> > >>

[PATCH] tree-optimization/111715 - improve TBAA for access paths with pun

2023-10-09 Thread Richard Biener
The following improves basic TBAA for access paths formed by
C++ abstraction where we are able to combine a path from an
address-taking operation with a path based on that access using
a pun to avoid memory access semantics on the address-taking part.

The trick is to identify the point the semantic memory access path
starts which allows us to use the alias set of the outermost access
instead of only that of the base of this path.

Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages
with a slightly different variant, re-bootstrapping/testing now
(with doing the extra walk just for AGGREGATE_TYPE_P).

PR tree-optimization/111715
* alias.cc (reference_alias_ptr_type_1): When we have
a type-punning ref at the base search for the access
path part that's still semantically valid.

* gcc.dg/tree-ssa/ssa-fre-102.c: New testcase.
---
 gcc/alias.cc| 20 -
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c | 32 +
 2 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c

diff --git a/gcc/alias.cc b/gcc/alias.cc
index 7c1af1fe96e..4060ff72949 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -774,7 +774,25 @@ reference_alias_ptr_type_1 (tree *t)
   && (TYPE_MAIN_VARIANT (TREE_TYPE (inner))
  != TYPE_MAIN_VARIANT
   (TREE_TYPE (TREE_TYPE (TREE_OPERAND (inner, 1))
-return TREE_TYPE (TREE_OPERAND (inner, 1));
+{
+  tree alias_ptrtype = TREE_TYPE (TREE_OPERAND (inner, 1));
+  /* Unless we have the (aggregate) effective type of the access
+somewhere on the access path.  If we have for example
+(&a->elts[i])->l.len exposed by abstraction we'd see
+MEM  [(B *)a].elts[i].l.len and we can use the alias set
+of 'len' when typeof (MEM  [(B *)a].elts[i]) == B for
+example.  See PR111715.  */
+  if (AGGREGATE_TYPE_P (TREE_TYPE (alias_ptrtype)))
+   {
+ tree inner = *t;
+ while (handled_component_p (inner)
+&& (TYPE_MAIN_VARIANT (TREE_TYPE (inner))
+!= TYPE_MAIN_VARIANT (TREE_TYPE (alias_ptrtype
+   inner = TREE_OPERAND (inner, 0);
+   }
+  if (TREE_CODE (inner) == MEM_REF)
+   return alias_ptrtype;
+}
 
   /* Otherwise, pick up the outermost object that we could have
  a pointer to.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
new file mode 100644
index 000..afd48050819
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
@@ -0,0 +1,32 @@
+/* PR/111715 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+struct B {
+   struct { int len; } l;
+   long n;
+};
+struct A {
+   struct B elts[8];
+};
+
+static void
+set_len (struct B *b, int len)
+{
+  b->l.len = len;
+}
+
+static int
+get_len (struct B *b)
+{
+  return b->l.len;
+}
+
+int foo (struct A *a, int i, long *q)
+{
+  set_len (&a->elts[i], 1);
+  *q = 2;
+  return get_len (&a->elts[i]);
+}
+
+/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */
-- 
2.35.3


Re: [PATCH V2] TEST: Fix vect_cond_arith_* dump checks for RVV

2023-10-09 Thread Richard Biener
On Mon, 9 Oct 2023, Robin Dapp wrote:

> On 10/9/23 09:32, Andreas Schwab wrote:
> > On Okt 09 2023, juzhe.zh...@rivai.ai wrote:
> > 
> >> Turns out COND(_LEN)?_ADD can't work.
> > 
> > It should work though.  Tcl regexps are a superset of POSIX EREs.
> > 
> 
> The problem is that COND(_LEN)?_ADD matches two times against
> COND_LEN_ADD and a scan-tree-dump-times 1 will fail.  So for those
> checks in vect-cond-arith-6.c we either need to switch to
> scan-tree-dump or change the pattern to "\.(?:COND|COND_LEN)_ADD".
> 
> Juzhe, something like the attached works for me.

LGTM.

Richard.

> Regards
>  Robin
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> index 1af0fe642a0..7d26dbedc5e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> @@ -52,8 +52,8 @@ main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target 
> vect_double_cond_arith } } } */
>  /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target 
> vect_double_cond_arith } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> index ec3d9db4202..f7daa13685c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> @@ -54,8 +54,8 @@ main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump { = \.COND_ADD} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_SUB} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_MUL} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> -/* { dg-final { scan-tree-dump { = \.COND_RDIV} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?ADD} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?SUB} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?MUL} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?RDIV} "optimized" { target 
> { vect_double_cond_arith && vect_masked_store } } } } */
>  /* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target { 
> vect_double_cond_arith && vect_masked_store } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> index 2aeebd44f83..a80c30a50b2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> @@ -56,8 +56,8 @@ main (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4 "vect" 
> { target vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" { target 
> vect_double_cond_arith } } } */
> -/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" { target 
> vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump { = \.COND_(LEN_)?

Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-09 Thread Richard Biener
On Mon, Oct 9, 2023 at 12:17 PM Richard Sandiford
 wrote:
>
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Monday, October 9, 2023 10:56 AM
> >> To: Tamar Christina 
> >> Cc: Richard Biener ; gcc-patches@gcc.gnu.org;
> >> nd ; Richard Earnshaw ;
> >> Marcus Shawcroft ; Kyrylo Tkachov
> >> 
> >> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> >>
> >> Tamar Christina  writes:
> >> >> -Original Message-
> >> >> From: Richard Sandiford 
> >> >> Sent: Saturday, October 7, 2023 10:58 AM
> >> >> To: Richard Biener 
> >> >> Cc: Tamar Christina ;
> >> >> gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> >> ; Marcus Shawcroft
> >> >> ; Kyrylo Tkachov
> >> 
> >> >> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> >> >>
> >> >> Richard Biener  writes:
> >> >> > On Thu, Oct 5, 2023 at 10:46 PM Tamar Christina
> >> >>  wrote:
> >> >> >>
> >> >> >> > -Original Message-
> >> >> >> > From: Richard Sandiford 
> >> >> >> > Sent: Thursday, October 5, 2023 9:26 PM
> >> >> >> > To: Tamar Christina 
> >> >> >> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> >> >> > ; Marcus Shawcroft
> >> >> >> > ; Kyrylo Tkachov
> >> >> 
> >> >> >> > Subject: Re: [PATCH]AArch64 Add SVE implementation for
> >> >> cond_copysign.
> >> >> >> >
> >> >> >> > Tamar Christina  writes:
> >> >> >> > >> -Original Message-
> >> >> >> > >> From: Richard Sandiford 
> >> >> >> > >> Sent: Thursday, October 5, 2023 8:29 PM
> >> >> >> > >> To: Tamar Christina 
> >> >> >> > >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard
> >> >> >> > >> Earnshaw ; Marcus Shawcroft
> >> >> >> > >> ; Kyrylo Tkachov
> >> >> >> > 
> >> >> >> > >> Subject: Re: [PATCH]AArch64 Add SVE implementation for
> >> >> cond_copysign.
> >> >> >> > >>
> >> >> >> > >> Tamar Christina  writes:
> >> >> >> > >> > Hi All,
> >> >> >> > >> >
> >> >> >> > >> > This adds an implementation for masked copysign along with
> >> >> >> > >> > an optimized pattern for masked copysign (x, -1).
> >> >> >> > >>
> >> >> >> > >> It feels like we're ending up with a lot of AArch64-specific
> >> >> >> > >> code that just hard- codes the observation that changing the
> >> >> >> > >> sign is equivalent to changing the top bit.  We then need to
> >> >> >> > >> make sure that we choose the best way of changing the top bit
> >> >> >> > >> for any
> >> >> given situation.
> >> >> >> > >>
> >> >> >> > >> Hard-coding the -1/negative case is one instance of that.
> >> >> >> > >> But it looks like we also fail to use the best sequence for 
> >> >> >> > >> SVE2.  E.g.
> >> >> >> > >> [https://godbolt.org/z/ajh3MM5jv]:
> >> >> >> > >>
> >> >> >> > >> #include 
> >> >> >> > >>
> >> >> >> > >> void f(double *restrict a, double *restrict b) {
> >> >> >> > >> for (int i = 0; i < 100; ++i)
> >> >> >> > >> a[i] = __builtin_copysign(a[i], b[i]); }
> >> >> >> > >>
> >> >> >> > >> void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
> >> >> >> > >> for (int i = 0; i < 100; ++i)
> >> >> >> > >> a[i] = (a[i] & ~c) | (b[i] & c); }
> >> >> >> > >>
> >> >> >> > >> gives:
> >> >&g

Re: [PATCH] RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV

2023-10-09 Thread Richard Biener
On Mon, 9 Oct 2023, Juzhe-Zhong wrote:

> Reference: https://godbolt.org/z/G9jzf5Grh
> 
> RVV is able to vectorize this case using SLP. However, with 
> -fno-vect-cost-model, RVV vectorize it by vec_load_lanes with stride 6.

OK.  Note load/store-lanes is specifically pre-empting SLP if all
loads/stores of a SLP intance can support that.  Not sure if this
heuristic is good for load/store lanes with high stride?

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c 
> b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> index 7c7acd5bab6..96751faae7f 100644
> --- a/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> +++ b/gcc/testsuite/gcc.dg/vect/fast-math-slp-38.c
> @@ -18,4 +18,4 @@ foo (void)
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { ! vect_strided6 } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-09 Thread Richard Biener
On Mon, 9 Oct 2023, Robin Dapp wrote:

> > Hmm, the function is called at transform time so this shouldn't help
> > avoiding the ICE.  I expected we refuse to vectorize _any_ reduction
> > when sign dependent rounding is in effect?  OTOH maybe sign-dependent
> > rounding is OK but only when we use a unconditional fold-left
> > (so a loop mask from fully masking is OK but not an original COND_ADD?).
> 
> So we currently only disable the use of partial vectors
> 
>   else if (reduction_type == FOLD_LEFT_REDUCTION
>  && reduc_fn == IFN_LAST

aarch64 probably chokes because reduc_fn is not IFN_LAST.

>  && FLOAT_TYPE_P (vectype_in)
>  && HONOR_SIGNED_ZEROS (vectype_in)

so with your change we'd support signed zeros correctly.

>  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
>   {
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"can't operate on partial vectors because"
>" signed zeros cannot be preserved.\n");
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> 
> which is inside a LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P block.
> 
> For the fully masked case we continue (and then fail the assertion
> on aarch64 at transform time).
> 
> I didn't get why that case is ok, though?  We still merge the initial
> definition with the identity/neutral op (i.e. possibly -0.0) based on
> the loop mask.  Is that different to partial masking?

I think the main point with my earlier change is that without
native support for a fold-left reduction (like on x86) we get

 ops = mask ? ops : neutral;
 acc += ops[0];
 acc += ops[1];
 ...

so we wouldn't use a COND_ADD but add neutral elements for masked
elements.  That's OK for signed zeros after your change (great)
but not OK for sign dependent rounding (because we can't decide on
the sign of the neutral zero then).

For the case of using an internal function, thus direct target support,
it should be OK to have sign-dependent rounding if we can use
the masked-fold-left reduction op.  As we do

  /* On the first iteration the input is simply the scalar phi
 result, and for subsequent iterations it is the output of
 the preceding operation.  */
  if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
  if (mask && len && mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, 
reduc_var,
   def0, mask, len, bias);
  else if (mask && mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, 
reduc_var,
   def0, mask);
  else
new_stmt = gimple_build_call_internal (reduc_fn, 2, reduc_var,
   def0);

the last case should be able to assert that 
!HONOR_SIGN_DEPENDENT_ROUNDING (also the reduc_fn == IFN_LAST case).

The quoted condition above should change to drop the HONOR_SIGNED_ZEROS
condition and the reduc_fn == IFN_LAST should change, maybe to
internal_fn_mask_index (reduc_fn) == -1?

Richard.


Re: [PATCH] RISC-V Regression test: Fix FAIL of pr45752.c for RVV

2023-10-09 Thread Richard Biener
On Mon, 9 Oct 2023, Juzhe-Zhong wrote:

> RVV use load_lanes with stride = 5 vectorize this case with 
> -fno-vect-cost-model
> instead of SLP.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/pr45752.c: Adapt dump check for target supports 
> load_lanes with stride = 5.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/pr45752.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr45752.c 
> b/gcc/testsuite/gcc.dg/vect/pr45752.c
> index e8b364f29eb..3c87d9b04fc 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr45752.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr45752.c
> @@ -159,4 +159,4 @@ int main (int argc, const char* argv[])
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>  /* { dg-final { scan-tree-dump-times "gaps requires scalar epilogue loop" 0 
> "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> {target { ! { vect_load_lanes && vect_strided5 } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/111715 - improve TBAA for access paths with pun

2023-10-09 Thread Richard Biener
On Mon, 9 Oct 2023, Richard Biener wrote:

> The following improves basic TBAA for access paths formed by
> C++ abstraction where we are able to combine a path from an
> address-taking operation with a path based on that access using
> a pun to avoid memory access semantics on the address-taking part.
> 
> The trick is to identify the point the semantic memory access path
> starts which allows us to use the alias set of the outermost access
> instead of only that of the base of this path.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages
> with a slightly different variant, re-bootstrapping/testing now
> (with doing the extra walk just for AGGREGATE_TYPE_P).

I ended up pushing the original version below after bothing the
AGGREGATE_TYPE_P, improperly hiding the local variable.  It's
a micr-optimization not worth the trouble I think.

Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages, 
pushed.

>From 9cf3fca604db73866d0dc69dc88f95155027b3d7 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Mon, 9 Oct 2023 13:05:10 +0200
Subject: [PATCH] tree-optimization/111715 - improve TBAA for access paths with
 pun
To: gcc-patches@gcc.gnu.org

The following improves basic TBAA for access paths formed by
C++ abstraction where we are able to combine a path from an
address-taking operation with a path based on that access using
a pun to avoid memory access semantics on the address-taking part.

The trick is to identify the point the semantic memory access path
starts which allows us to use the alias set of the outermost access
instead of only that of the base of this path.

PR tree-optimization/111715
* alias.cc (reference_alias_ptr_type_1): When we have
a type-punning ref at the base search for the access
path part that's still semantically valid.

* gcc.dg/tree-ssa/ssa-fre-102.c: New testcase.
---
 gcc/alias.cc| 17 ++-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c | 32 +
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c

diff --git a/gcc/alias.cc b/gcc/alias.cc
index 7c1af1fe96e..86d8f7104ad 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -774,7 +774,22 @@ reference_alias_ptr_type_1 (tree *t)
   && (TYPE_MAIN_VARIANT (TREE_TYPE (inner))
  != TYPE_MAIN_VARIANT
   (TREE_TYPE (TREE_TYPE (TREE_OPERAND (inner, 1))
-return TREE_TYPE (TREE_OPERAND (inner, 1));
+{
+  tree alias_ptrtype = TREE_TYPE (TREE_OPERAND (inner, 1));
+  /* Unless we have the (aggregate) effective type of the access
+somewhere on the access path.  If we have for example
+(&a->elts[i])->l.len exposed by abstraction we'd see
+MEM  [(B *)a].elts[i].l.len and we can use the alias set
+of 'len' when typeof (MEM  [(B *)a].elts[i]) == B for
+example.  See PR111715.  */
+  tree inner = *t;
+  while (handled_component_p (inner)
+&& (TYPE_MAIN_VARIANT (TREE_TYPE (inner))
+!= TYPE_MAIN_VARIANT (TREE_TYPE (alias_ptrtype
+   inner = TREE_OPERAND (inner, 0);
+  if (TREE_CODE (inner) == MEM_REF)
+   return alias_ptrtype;
+}
 
   /* Otherwise, pick up the outermost object that we could have
  a pointer to.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
new file mode 100644
index 000..afd48050819
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
@@ -0,0 +1,32 @@
+/* PR/111715 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+struct B {
+   struct { int len; } l;
+   long n;
+};
+struct A {
+   struct B elts[8];
+};
+
+static void
+set_len (struct B *b, int len)
+{
+  b->l.len = len;
+}
+
+static int
+get_len (struct B *b)
+{
+  return b->l.len;
+}
+
+int foo (struct A *a, int i, long *q)
+{
+  set_len (&a->elts[i], 1);
+  *q = 2;
+  return get_len (&a->elts[i]);
+}
+
+/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */
-- 
2.35.3



Re: [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables

2023-10-09 Thread Richard Biener
ge exit_e, unsigned vf, 
> bool flat)
>  {
>/* For flat profiles do not scale down proportionally by VF and only
>   cap by known iteration count bounds.  */
> @@ -10980,7 +11011,6 @@ scale_profile_for_vect_loop (class loop *loop, 
> unsigned vf, bool flat)
>return;
>  }
>/* Loop body executes VF fewer times and exit increases VF times.  */
> -  edge exit_e = single_exit (loop);
>profile_count entry_count = loop_preheader_edge (loop)->count ();
>  
>/* If we have unreliable loop profile avoid dropping entry
> @@ -11350,7 +11380,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
>  
>/* Make sure there exists a single-predecessor exit bb.  Do this before 
>   versioning.   */
> -  edge e = single_exit (loop);
> +  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
>if (! single_pred_p (e->dest))
>  {
>split_loop_exit_edge (e, true);
> @@ -11376,7 +11406,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
>   loop closed PHI nodes on the exit.  */
>if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
>  {
> -  e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> +  e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
>if (! single_pred_p (e->dest))
>   {
> split_loop_exit_edge (e, true);
> @@ -11625,8 +11655,9 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
>   a zero NITERS becomes a nonzero NITERS_VECTOR.  */
>if (integer_onep (step_vector))
>  niters_no_overflow = true;
> -  vect_set_loop_condition (loop, loop_vinfo, niters_vector, step_vector,
> -niters_vector_mult_vf, !niters_no_overflow);
> +  vect_set_loop_condition (loop, LOOP_VINFO_IV_EXIT (loop_vinfo), loop_vinfo,
> +niters_vector, step_vector, niters_vector_mult_vf,
> +!niters_no_overflow);
>  
>unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
>  
> @@ -11699,7 +11730,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
> assumed_vf) - 1
>: wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
>  assumed_vf) - 1);
> -  scale_profile_for_vect_loop (loop, assumed_vf, flat);
> +  scale_profile_for_vect_loop (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> +assumed_vf, flat);
>  
>if (dump_enabled_p ())
>  {
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> f1d0cd79961abb095bc79d3b59a81930f0337e59..afa7a8e30891c782a0e5e3740ecc4377f5a31e54
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -919,10 +919,24 @@ public:
>   analysis.  */
>vec<_loop_vec_info *> epilogue_vinfos;
>  
> +  /* The controlling loop IV for the current loop when vectorizing.  This IV
> + controls the natural exits of the loop.  */
> +  edge vec_loop_iv;
> +
> +  /* The controlling loop IV for the epilogue loop when vectorizing.  This IV
> + controls the natural exits of the loop.  */
> +  edge vec_epilogue_loop_iv;
> +
> +  /* The controlling loop IV for the scalar loop being vectorized.  This IV
> + controls the natural exits of the loop.  */
> +  edge scalar_loop_iv;

all of the above sound as if they were IVs, the access macros have
_EXIT at the end, can you make the above as well?

Otherwise looks good to me.

Feel free to push approved patches of the series, no need to wait
until everything is approved.

Thanks,
Richard.

>  } *loop_vec_info;
>  
>  /* Access Functions.  */
>  #define LOOP_VINFO_LOOP(L) (L)->loop
> +#define LOOP_VINFO_IV_EXIT(L)  (L)->vec_loop_iv
> +#define LOOP_VINFO_EPILOGUE_IV_EXIT(L) (L)->vec_epilogue_loop_iv
> +#define LOOP_VINFO_SCALAR_IV_EXIT(L)   (L)->scalar_loop_iv
>  #define LOOP_VINFO_BBS(L)  (L)->bbs
>  #define LOOP_VINFO_NITERSM1(L) (L)->num_itersm1
>  #define LOOP_VINFO_NITERS(L)   (L)->num_iters
> @@ -2155,11 +2169,13 @@ class auto_purge_vect_location
>  
>  /* Simple loop peeling and versioning utilities for vectorizer's purposes -
> in tree-vect-loop-manip.cc.  */
> -extern void vect_set_loop_condition (class loop *, loop_vec_info,
> +extern void vect_set_loop_condition (class loop *, edge, loop_vec_info,
>tree, tree, tree, bool);
> -extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
> -class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> -  class loop *, edge);
> +extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
> +  const_edge);
> +class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
> + class loop *, edge,
> + edge, edge *);
>  class loop *vect_loop_versioning (loop_vec_info, gimple *);
>  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
>   tree *, tree *, tree *, int, bool, bool,
> @@ -2169,6 +2185,7 @@ extern void vect_prepare_for_masked_peels 
> (loop_vec_info);
>  extern dump_user_location_t find_loop_location (class loop *);
>  extern bool vect_can_advance_ivs_p (loop_vec_info);
>  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
> +extern edge vec_init_loop_exit_info (class loop *);
>  
>  /* In tree-vect-stmts.cc.  */
>  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> @@ -2358,6 +2375,7 @@ struct vect_loop_form_info
>tree assumptions;
>gcond *loop_cond;
>gcond *inner_loop_cond;
> +  edge loop_exit;
>  };
>  extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info 
> *);
>  extern loop_vec_info vect_create_loop_vinfo (class loop *, vec_info_shared *,
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index 
> a048e9d89178a37455bd7b83ab0f2a238a4ce69e..d97e2b54c25ac60378935392aa7b73476efed74b
>  100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -943,6 +943,8 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call,
>class loop *scalar_loop = get_loop (fun, tree_to_shwi (arg));
>  
>LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
> +  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)
> += vec_init_loop_exit_info (scalar_loop);
>gcc_checking_assert (vect_loop_vectorized_call (scalar_loop)
>  == loop_vectorized_call);
>/* If we are going to vectorize outer loop, prevent vectorization
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)



Re: [PATCH] use get_range_query to replace get_global_range_query

2023-10-10 Thread Richard Biener
 
> -  get_global_range_query ()->range_of_expr (r, bound);
> +  get_range_query (cfun)->range_of_expr (r, bound);
>  
>if (r.undefined_p () || r.varying_p ())
>   return true;

The pass has a ranger instance, so yes, this should improve things.
Since the pass doesn't do any IL modification it should also be safe.

> diff --git a/gcc/tree-dfa.cc b/gcc/tree-dfa.cc
> index af8e9243947..5355af2c869 100644
> --- a/gcc/tree-dfa.cc
> +++ b/gcc/tree-dfa.cc
> @@ -531,10 +531,7 @@ get_ref_base_and_extent (tree exp, poly_int64 *poffset,
>  
>   value_range vr;
>   range_query *query;
> - if (cfun)
> -   query = get_range_query (cfun);
> - else
> -   query = get_global_range_query ();
> + query = get_range_query (cfun);
>  
>   if (TREE_CODE (index) == SSA_NAME
>   && (low_bound = array_ref_low_bound (exp),
> diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
> index 64464802c1e..e85a1881526 100644
> --- a/gcc/tree-ssa-loop-split.cc
> +++ b/gcc/tree-ssa-loop-split.cc
> @@ -145,7 +145,7 @@ split_at_bb_p (class loop *loop, basic_block bb, tree 
> *border, affine_iv *iv,
>   else
> {
>   int_range<2> r;
> - get_global_range_query ()->range_of_expr (r, op0, stmt);
> + get_range_query (cfun)->range_of_expr (r, op0, stmt);

loop splitting doesn't have a ranger instance so this is a no-op change
but I'm also not sure it would be safe to use a dynamic ranger instance
here since we are doing even CFG manipulations between.  Please leave
this change out.

>   if (!r.varying_p () && !r.undefined_p ()
>   && TREE_CODE (op1) == INTEGER_CST)
> {
> diff --git a/gcc/tree-ssa-loop-unswitch.cc b/gcc/tree-ssa-loop-unswitch.cc
> index 619b50fb4bb..b3dc2ded931 100644
> --- a/gcc/tree-ssa-loop-unswitch.cc
> +++ b/gcc/tree-ssa-loop-unswitch.cc
> @@ -764,7 +764,7 @@ evaluate_control_stmt_using_entry_checks (gimple *stmt,
>  
> int_range_max r;
>     if (!ranger->gori ().outgoing_edge_range_p (r, e, idx,
> -   *get_global_range_query 
> ()))
> +   *get_range_query (cfun)))
>   continue;

unswitching has a ranger instance but it does perform IL modification.
Did you check whether the use of the global ranger was intentional here?
Specifically we do have the 'ranger' object here and IIRC using global
ranges was intentional.  So please leave this change out.

Thanks,
Richard.

> r.intersect (path_range);
> if (r.undefined_p ())
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] wide-int: Remove rwide_int, introduce dw_wide_int

2023-10-10 Thread Richard Biener
gt; +  p->precision = w.get_precision ();
> +  p->len = w.get_len ();
> +  memcpy (p->val, w.get_val (), p->len * sizeof (HOST_WIDE_INT));
> +  return p;
> +}
> +
>  /* Add an unsigned wide integer attribute value to a DIE.  */
>  
>  static inline void
>  add_AT_wide (dw_die_ref die, enum dwarf_attribute attr_kind,
> -  const rwide_int& w)
> +  const wide_int_ref &w)
>  {
>dw_attr_node attr;
>  
>attr.dw_attr = attr_kind;
>attr.dw_attr_val.val_class = dw_val_class_wide_int;
>attr.dw_attr_val.val_entry = NULL;
> -  attr.dw_attr_val.v.val_wide = ggc_alloc ();
> -  *attr.dw_attr_val.v.val_wide = w;
> +  attr.dw_attr_val.v.val_wide = alloc_dw_wide_int (w);
>add_dwarf_attr (die, &attr);
>  }
>  
> @@ -16714,8 +16724,8 @@ mem_loc_descriptor (rtx rtl, machine_mod
> mem_loc_result->dw_loc_oprnd1.v.val_die_ref.external = 0;
> mem_loc_result->dw_loc_oprnd2.val_class
>   = dw_val_class_wide_int;
> -   mem_loc_result->dw_loc_oprnd2.v.val_wide = ggc_alloc ();
> -   *mem_loc_result->dw_loc_oprnd2.v.val_wide = rtx_mode_t (rtl, mode);
> +   mem_loc_result->dw_loc_oprnd2.v.val_wide
> + = alloc_dw_wide_int (rtx_mode_t (rtl, mode));
>   }
>break;
>  
> @@ -17288,8 +17298,8 @@ loc_descriptor (rtx rtl, machine_mode mo
> loc_result = new_loc_descr (DW_OP_implicit_value,
> GET_MODE_SIZE (int_mode), 0);
> loc_result->dw_loc_oprnd2.val_class = dw_val_class_wide_int;
> -   loc_result->dw_loc_oprnd2.v.val_wide = ggc_alloc ();
> -   *loc_result->dw_loc_oprnd2.v.val_wide = rtx_mode_t (rtl, int_mode);
> +   loc_result->dw_loc_oprnd2.v.val_wide
> + = alloc_dw_wide_int (rtx_mode_t (rtl, int_mode));
>   }
>break;
>  
> @@ -20189,7 +20199,7 @@ extract_int (const unsigned char *src, u
>  /* Writes wide_int values to dw_vec_const array.  */
>  
>  static void
> -insert_wide_int (const rwide_int &val, unsigned char *dest, int elt_size)
> +insert_wide_int (const wide_int_ref &val, unsigned char *dest, int elt_size)
>  {
>int i;
>  
> @@ -20274,8 +20284,7 @@ add_const_value_attribute (dw_die_ref di
> && (GET_MODE_PRECISION (int_mode)
> & (HOST_BITS_PER_WIDE_INT - 1)) == 0)
>   {
> -   rwide_int w = rtx_mode_t (rtl, int_mode);
> -   add_AT_wide (die, DW_AT_const_value, w);
> +   add_AT_wide (die, DW_AT_const_value, rtx_mode_t (rtl, int_mode));
> return true;
>   }
>return false;
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] MATCH: [PR111679] Add alternative simplification of `a | ((~a) ^ b)`

2023-10-10 Thread Richard Biener
On Mon, Oct 9, 2023 at 11:28 PM Andrew Pinski  wrote:
>
> So currently we have a simplification for `a | ~(a ^ b)` but
> that does not match the case where we had originally `(~a) | (a ^ b)`
> so we need to add a new pattern that matches that and uses 
> bitwise_inverted_equal_p
> that also catches comparisons too.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/111679
>
> gcc/ChangeLog:
>
> * match.pd (`a | ((~a) ^ b)`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bitops-5.c: New test.
> ---
>  gcc/match.pd |  8 +++
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-5.c | 27 
>  2 files changed, 35 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-5.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 31bfd8b6b68..49740d189a7 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1350,6 +1350,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& TYPE_PRECISION (TREE_TYPE (@0)) == 1)
>(bit_ior @0 (bit_xor @1 { build_one_cst (type); }
>
> +/* a | ((~a) ^ b)  -->  a | (~b) (alt version of the above 2) */
> +(simplify
> + (bit_ior:c @0 (bit_xor:cs @1 @2))
> + (with { bool wascmp; }
> + (if (bitwise_inverted_equal_p (@0, @1, wascmp)
> +  && (!wascmp || element_precision (type) == 1))
> +  (bit_ior @0 (bit_not @2)
> +
>  /* (a | b) | (a &^ b)  -->  a | b  */
>  (for op (bit_and bit_xor)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitops-5.c
> new file mode 100644
> index 000..990610e3002
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-5.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +/* PR tree-optimization/111679 */
> +
> +int f1(int a, int b)
> +{
> +return (~a) | (a ^ b); // ~(a & b) or (~a) | (~b)
> +}
> +
> +_Bool fb(_Bool c, _Bool d)
> +{
> +return (!c) | (c ^ d); // ~(c & d) or (~c) | (~d)
> +}
> +
> +_Bool fb1(int x, int y)
> +{
> +_Bool a = x == 10,  b = y > 100;
> +return (!a) | (a ^ b); // ~(a & b) or (~a) | (~b)
> +// or (x != 10) | (y <= 100)
> +}
> +
> +/* { dg-final { scan-tree-dump-not   "bit_xor_expr, "   "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_not_expr, " 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_and_expr, " 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_ior_expr, " 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "ne_expr, _\[0-9\]+, x_\[0-9\]+"  1 
> "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "le_expr, _\[0-9\]+, y_\[0-9\]+"  1 
> "optimized" } } */
> --
> 2.39.3
>


Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr65935.c for RVV

2023-10-10 Thread Richard Biener
On Tue, 10 Oct 2023, Juzhe-Zhong wrote:

> Here is the reference comparing dump IR between ARM SVE and RVV.
> 
> https://godbolt.org/z/zqess8Gss
> 
> We can see RVV has one more dump IR:
> optimized: basic block part vectorized using 128 byte vectors
> since RVV has 1024 bit vectors.
> 
> The codegen is reasonable good.
> 
> However, I saw GCN also has 1024 bit vector.
> This patch may cause this case FAIL in GCN port ?
> 
> Hi, GCN folk, could you check this patch in GCN port for me ?

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/bb-slp-pr65935.c: Add vect1024 variant.
>   * lib/target-supports.exp: Ditto.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c | 3 ++-
>  gcc/testsuite/lib/target-supports.exp  | 6 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> index 8df35327e7a..9ef1330b47c 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> @@ -67,7 +67,8 @@ int main()
>  
>  /* We should also be able to use 2-lane SLP to initialize the real and
> imaginary components in the first loop of main.  */
> -/* { dg-final { scan-tree-dump-times "optimized: basic block" 10 "slp1" } } 
> */
> +/* { dg-final { scan-tree-dump-times "optimized: basic block" 10 "slp1" { 
> target {! { vect1024 } } } } } */
> +/* { dg-final { scan-tree-dump-times "optimized: basic block" 11 "slp1" { 
> target { { vect1024 } } } } } */
>  /* We should see the s->phase[dir] operand splatted and no other operand 
> built
> from scalars.  See PR97334.  */
>  /* { dg-final { scan-tree-dump "Using a splat" "slp1" } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index dc366d35a0a..95c489d7f76 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -8903,6 +8903,12 @@ proc check_effective_target_vect_variable_length { } {
>  return [expr { [lindex [available_vector_sizes] 0] == 0 }]
>  }
>  
> +# Return 1 if the target supports vectors of 1024 bits.
> +
> +proc check_effective_target_vect1024 { } {
> +return [expr { [lsearch -exact [available_vector_sizes] 1024] >= 0 }]
> +}
> +
>  # Return 1 if the target supports vectors of 512 bits.
>  
>  proc check_effective_target_vect512 { } {
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] Fix missed CSE with a BLKmode entity

2023-10-10 Thread Richard Biener
The following fixes fallout of r10-7145-g1dc00a8ec9aeba which made
us cautionous about CSEing a load to an object that has padding bits.
The added check also triggers for BLKmode entities like STRING_CSTs
but by definition a BLKmode entity does not have padding bits.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111751
* tree-ssa-sccvn.cc (visit_reference_op_load): Exempt
BLKmode result from the padding bits check.
---
 gcc/tree-ssa-sccvn.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index d2aab38c2d2..ce8ae8c6753 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -5747,8 +5747,9 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
 {
   /* Avoid the type punning in case the result mode has padding where
 the op we lookup has not.  */
-  if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
-   GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)
+  if (TYPE_MODE (TREE_TYPE (result)) != BLKmode
+ && maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
+  GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)
result = NULL_TREE;
   else
{
-- 
2.35.3


[PATCH] tree-optimization/111519 - strlen optimization skips clobbering store

2023-10-10 Thread Richard Biener
The following fixes a mistake in count_nonzero_bytes which happily
skips over stores clobbering the memory we load a value we store
from and then performs analysis on the memory state before the
intermediate store.

The patch implements the most simple fix - guarantee that there are
no intervening stores by tracking the original active virtual operand
and comparing that to the one of a load we attempt to analyze.

This simple approach breaks two subtests of gcc.dg/Wstringop-overflow-69.c
which I chose to XFAIL.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

PR tree-optimization/111519
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes):
Add virtual operand argument and pass it through.  Compare
the memory state we try to analyze to the memory state we
are going to use the result oon.
(strlen_pass::count_nonzero_bytes_addr): Add virtual
operand argument and pass it through.

* gcc.dg/torture/pr111519.c: New testcase.
* gcc.dg/Wstringop-overflow-69.c: XFAIL two subtests.
---
 gcc/testsuite/gcc.dg/Wstringop-overflow-69.c |  4 +-
 gcc/testsuite/gcc.dg/torture/pr111519.c  | 47 
 gcc/tree-ssa-strlen.cc   | 27 ++-
 3 files changed, 64 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111519.c

diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-69.c 
b/gcc/testsuite/gcc.dg/Wstringop-overflow-69.c
index be361fe620d..3c17fe13d8e 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-69.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-69.c
@@ -57,9 +57,9 @@ void warn_vec_decl (void)
 {
   *(VC2*)a1 = c2;   // { dg-warning "writing 2 bytes into a region of size 
1" }
   *(VC4*)a2 = c4;   // { dg-warning "writing 4 bytes into a region of size 
2" }
-  *(VC4*)a3 = c4;   // { dg-warning "writing 4 bytes into a region of size 
3" }
+  *(VC4*)a3 = c4;   // { dg-warning "writing 4 bytes into a region of size 
3" "pr111519" { xfail *-*-* } }
   *(VC8*)a4 = c8;   // { dg-warning "writing 8 bytes into a region of size 
4" }
-  *(VC8*)a7 = c8;   // { dg-warning "writing 8 bytes into a region of size 
7" }
+  *(VC8*)a7 = c8;   // { dg-warning "writing 8 bytes into a region of size 
7" "pr111519" { xfail *-*-* } }
   *(VC16*)a15 = c16;// { dg-warning "writing 16 bytes into a region of 
size 15" }
 }
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr111519.c 
b/gcc/testsuite/gcc.dg/torture/pr111519.c
new file mode 100644
index 000..095bb1cd13b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111519.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+
+int a, o;
+char b, f, i;
+long c;
+static signed char d;
+static char g;
+unsigned *h;
+signed char *e = &f;
+static signed char **j = &e;
+static long k[2];
+unsigned **l = &h;
+short m;
+volatile int z;
+
+__attribute__((noipa)) void
+foo (char *p)
+{
+  (void) p;
+}
+
+int
+main ()
+{
+  int p = z;
+  signed char *n = &d;
+  *n = 0;
+  while (c)
+for (; i; i--)
+  ;
+  for (g = 0; g <= 1; g++)
+{
+  *n = **j;
+  k[g] = 0 != &m;
+  *e = l && k[0];
+}
+  if (p)
+foo (&b);
+  for (; o < 4; o++)
+{
+  a = d;
+  if (p)
+   foo (&b);
+}
+  if (a != 1)
+__builtin_abort ();
+}
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 8b7ef919826..0ff3f2e308a 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -281,14 +281,14 @@ public:
gimple *stmt,
unsigned lenrange[3], bool *nulterm,
bool *allnul, bool *allnonnul);
-  bool count_nonzero_bytes (tree exp,
+  bool count_nonzero_bytes (tree exp, tree vuse,
gimple *stmt,
unsigned HOST_WIDE_INT offset,
unsigned HOST_WIDE_INT nbytes,
unsigned lenrange[3], bool *nulterm,
bool *allnul, bool *allnonnul,
ssa_name_limit_t &snlim);
-  bool count_nonzero_bytes_addr (tree exp,
+  bool count_nonzero_bytes_addr (tree exp, tree vuse,
 gimple *stmt,
 unsigned HOST_WIDE_INT offset,
 unsigned HOST_WIDE_INT nbytes,
@@ -4531,8 +4531,8 @@ nonzero_bytes_for_type (tree type, unsigned lenrange[3],
 }
 
 /* Recursively determine the minimum and maximum number of leading nonzero
-   bytes in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
-   to each.
+   bytes in the representation of EXP at memory state VUSE and set
+   LENRANGE[0] and LENRANGE[1] to each.
Sets LENRANGE[2] to the total size of the access (which may be less
than LENRANGE[1] when what's being referenced by EXP is a pointer
rather than an array).
@@ -4546,7 +4546,7 @@ nonzero_bytes_for_type (tree type, unsigned lenrange[3],
Returns true

Re: [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits.

2023-10-10 Thread Richard Biener
 STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +  for (gcond *cond : info->conds)
> +{
> +  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> +  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +}
> +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
> +  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
>LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
>  
> @@ -3594,7 +3653,11 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>&& LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
>&& !loop->simduid);
>if (!vect_epilogues)
> -return first_loop_vinfo;
> +{
> +  loop_form_info.conds.release ();
> +  loop_form_info.alt_loop_conds.release ();
> +  return first_loop_vinfo;
> +}

I think there's 'inner' where you leak these.  Maybe use auto_vec<>
in vect_loop_form_info instead?

Otherwise looks OK.

Thanks,
Richard.

>/* Now analyze first_loop_vinfo for epilogue vectorization.  */
>poly_uint64 lowest_th = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
> @@ -3694,6 +3757,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
> *shared)
>  (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
>  }
>  
> +  loop_form_info.conds.release ();
> +  loop_form_info.alt_loop_conds.release ();
> +
>return first_loop_vinfo;
>  }
>  
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> afa7a8e30891c782a0e5e3740ecc4377f5a31e54..55b6771b271d5072fa1327d595e1dddb112cfdf6
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -882,6 +882,12 @@ public:
>   we need to peel off iterations at the end to form an epilogue loop.  */
>bool peeling_for_niter;
>  
> +  /* List of loop additional IV conditionals found in the loop.  */
> +  auto_vec conds;
> +
> +  /* Main loop IV cond.  */
> +  gcond* loop_iv_cond;
> +
>/* True if there are no loop carried data dependencies in the loop.
>   If loop->safelen <= 1, then this is always true, either the loop
>   didn't have any loop carried data dependencies, or the loop is being
> @@ -984,6 +990,8 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)(L)->peeling_for_niter
> +#define LOOP_VINFO_LOOP_CONDS(L)   (L)->conds
> +#define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
>  #define LOOP_VINFO_SCALAR_LOOP(L)   (L)->scalar_loop
>  #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
> @@ -2373,7 +2381,9 @@ struct vect_loop_form_info
>tree number_of_iterations;
>tree number_of_iterationsm1;
>tree assumptions;
> +  vec conds;
>gcond *loop_cond;
> +  vec alt_loop_conds;
>gcond *inner_loop_cond;
>edge loop_exit;
>  };
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/111519 - strlen optimization skips clobbering store

2023-10-10 Thread Richard Biener
On Tue, 10 Oct 2023, Jakub Jelinek wrote:

> On Tue, Oct 10, 2023 at 10:49:04AM +0000, Richard Biener wrote:
> > The following fixes a mistake in count_nonzero_bytes which happily
> > skips over stores clobbering the memory we load a value we store
> > from and then performs analysis on the memory state before the
> > intermediate store.
> > 
> > The patch implements the most simple fix - guarantee that there are
> > no intervening stores by tracking the original active virtual operand
> > and comparing that to the one of a load we attempt to analyze.
> > 
> > This simple approach breaks two subtests of gcc.dg/Wstringop-overflow-69.c
> > which I chose to XFAIL.
> 
> This function is a total mess, but I think punting right after the
> gimple_assign_single_p test is too big hammer.
> There are various cases and some of them are fine even when vuse is
> different, others aren't.
> The function at that point tries to handle CONSTRUCTOR, MEM_REF, or decls.
> 
> I don't see why the CONSTRUCTOR case couldn't be fine regardless of the
> vuse.  Though, am not really sure when a CONSTRUCTOR would appear, the
> lhs would need to be an SSA_NAME, so wouldn't for vectors that be a
> VECTOR_CST instead, etc.?  Ah, perhaps a vector with some non-constant
> element in it.

Yeah, but what should that be interpreted to in terms of object-size?!

I think the only real case we'd see here is the MEM_REF RHS given
we know we have a register type value.  I'll note the function
totally misses handled_component_p (), it only seems to handle
*p and 'decl'.

> The MEM_REF case supposedly only if we can guarantee nothing could overwrite
> it (so MEM_REF of TREE_STATIC TREE_READONLY could be fine, STRING_CST too,
> anything else is wrong - count_nonzero_bytes_addr uses the
> get_stridx/get_strinfo APIs, which for something that can be changed
> always reflects only the state at the current statement.  So, e.g. the
> get_stridx (exp, stmt) > 0 case in count_nonzero_bytes_addr is when
> the caller must definitely punt if vuse is different.
> Then for decls, again, CONST_DECLs, DECL_IN_CONSTANT_POOLs are certainly
> fine.  Other decls for which ctor_for_folding returns non-error_mark_node
> should be fine as well, I think ctor_useable_for_folding_p is supposed
> to verify that.  STRING_CSTs should be fine as well.
> 
> So maybe pass the vuse down to count_nonzero_bytes_addr and return false
> in the idx > 0 case in there if gimple_vuse (stmt) != vuse?

I don't know enough of the pass to do better, can you take it from here?
One of the points is that we need to know the memory context (aka vuse)
of both the original store and the load we are interpreting.  For
_addr I wasn't sure how we arrive here.  As you said, this is a bit
of spaghetti and I don't want to untangle this any further.

Thanks,
Richard.


[PATCH] tree-optimization/111751 - support 1024 bit vector constant reinterpretation

2023-10-10 Thread Richard Biener
The following ups the limit in fold_view_convert_expr to handle
1024bit vectors as used by GCN and RVV.  It also robustifies
the handling in visit_reference_op_load to properly give up when
constants cannot be re-interpreted.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111751
* fold-const.cc (fold_view_convert_expr): Up the buffer size
to 128 bytes.
* tree-ssa-scccvn.cc (visit_reference_op_load): Special case
constants, giving up when re-interpretation to the target type
fails.
---
 gcc/fold-const.cc | 4 ++--
 gcc/tree-ssa-sccvn.cc | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 4f8561509ff..82299bb7f1d 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -9266,8 +9266,8 @@ fold_view_convert_vector_encoding (tree type, tree expr)
 static tree
 fold_view_convert_expr (tree type, tree expr)
 {
-  /* We support up to 512-bit values (for V8DFmode).  */
-  unsigned char buffer[64];
+  /* We support up to 1024-bit values (for GCN/RISC-V V128QImode).  */
+  unsigned char buffer[128];
   int len;
 
   /* Check that the host and target are sane.  */
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index ce8ae8c6753..0b2c10dcc1a 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -5751,6 +5751,8 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
  && maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
   GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)
result = NULL_TREE;
+  else if (CONSTANT_CLASS_P (result))
+   result = const_unop (VIEW_CONVERT_EXPR, TREE_TYPE (op), result);
   else
{
  /* We will be setting the value number of lhs to the value number
-- 
2.35.3


Re: [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling

2023-10-10 Thread Richard Biener
ee guard_arg
> - = find_guard_arg (loop, epilog, single_exit (loop), update_phi);
> +  tree guard_arg = find_guard_arg (loop, epilog, loop_exit,
> +update_phi, e->dest_idx);
>/* If the var is live after loop but not a reduction, we simply
>use the old arg.  */
>if (!guard_arg)
> @@ -2898,21 +2869,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, 
> class loop *epilog,
>  }
>  }
>  
> -/* EPILOG loop is duplicated from the original loop for vectorizing,
> -   the arg of its loop closed ssa PHI needs to be updated.  */
> -
> -static void
> -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> -{
> -  gphi_iterator gsi;
> -  basic_block exit_bb = single_exit (epilog)->dest;
> -
> -  gcc_assert (single_pred_p (exit_bb));
> -  edge e = EDGE_PRED (exit_bb, 0);
> -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> -}
> -
>  /* LOOP_VINFO is an epilogue loop whose corresponding main loop can be 
> skipped.
> Return a value that equals:
>  
> @@ -3255,8 +3211,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
> tree nitersm1,
>  e, &prolog_e);
>gcc_assert (prolog);
>prolog->force_vectorize = false;
> -  slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, loop,
> -  exit_e, true);
> +
>first_loop = prolog;
>reset_original_copy_tables ();
>  
> @@ -3336,8 +3291,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
> tree nitersm1,
>LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
>gcc_assert (epilog);
>epilog->force_vectorize = false;
> -  slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
> -  new_epilog_e, false);
>bb_before_epilog = loop_preheader_edge (epilog)->src;
>  
>/* Scalar version loop may be preferred.  In this case, add guard
> @@ -3430,7 +3383,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
> tree nitersm1,
>  irred_flag);
> if (vect_epilogues)
>   epilogue_vinfo->skip_this_loop_edge = guard_e;
> -   slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, epilog_e);
> +   edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +   slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv, guard_e,
> +   epilog_e);
> /* Only need to handle basic block before epilog loop if it's not
>the guard_bb, which is the case when skip_vector is true.  */
> if (guard_bb != bb_before_epilog)
> @@ -3441,8 +3396,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
> tree nitersm1,
>   }
> scale_loop_profile (epilog, prob_epilog, -1);
>   }
> -  else
> - slpeel_update_phi_nodes_for_lcssa (epilog);
>  
>unsigned HOST_WIDE_INT bound;
>if (bound_scalar.is_constant (&bound))
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> f1caa5f207d3b13da58c3a313b11d1ef98374349..327cab0f736da7f1bd3e024d666df46ef9208107
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5877,7 +5877,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
> loop_vinfo,
>basic_block exit_bb;
>tree scalar_dest;
>tree scalar_type;
> -  gimple *new_phi = NULL, *phi;
> +  gimple *new_phi = NULL, *phi = NULL;
>gimple_stmt_iterator exit_gsi;
>tree new_temp = NULL_TREE, new_name, new_scalar_dest;
>gimple *epilog_stmt = NULL;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> 55b6771b271d5072fa1327d595e1dddb112cfdf6..25ceb6600673d71fd6012443403997e921066483
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2183,7 +2183,7 @@ extern bool slpeel_can_duplicate_loop_p (const class 
> loop *, const_edge,
>const_edge);
>  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
>   class loop *, edge,
> - edge, edge *);
> + edge, edge *, bool = true);
>  class loop *vect_loop_versioning (loop_vec_info, gimple *);
>  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
>   tree *, tree *, tree *, int, bool, bool,
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/111519 - strlen optimization skips clobbering store

2023-10-10 Thread Richard Biener
On Tue, 10 Oct 2023, Jakub Jelinek wrote:

> On Tue, Oct 10, 2023 at 11:59:28AM +0000, Richard Biener wrote:
> > > I don't see why the CONSTRUCTOR case couldn't be fine regardless of the
> > > vuse.  Though, am not really sure when a CONSTRUCTOR would appear, the
> > > lhs would need to be an SSA_NAME, so wouldn't for vectors that be a
> > > VECTOR_CST instead, etc.?  Ah, perhaps a vector with some non-constant
> > > element in it.
> > 
> > Yeah, but what should that be interpreted to in terms of object-size?!
> 
> The function in question doesn't compute object sizes, just minimum/maximum
> number of non-zero bytes in some rhs of a store and whether everything
> stored is 0s, or everything non-zeros, or some non-zeros followed by zero.
> 
> > I think the only real case we'd see here is the MEM_REF RHS given
> > we know we have a register type value.  I'll note the function
> > totally misses handled_component_p (), it only seems to handle
> > *p and 'decl'.
> 
> Yeah, maybe we could handle even that at some point.
> Though perhaps better first to rewrite it completely, because the recursive
> calls where in some cases it means one thing and in another case something
> completely different are just bad design (or lack thereof).

Yeah ... (true for many similar pieces in pointer-query and other
diagnostic passes...)

> > > So maybe pass the vuse down to count_nonzero_bytes_addr and return false
> > > in the idx > 0 case in there if gimple_vuse (stmt) != vuse?
> > 
> > I don't know enough of the pass to do better, can you take it from here?
> > One of the points is that we need to know the memory context (aka vuse)
> > of both the original store and the load we are interpreting.  For
> > _addr I wasn't sure how we arrive here.  As you said, this is a bit
> > of spaghetti and I don't want to untangle this any further.
> 
> I meant something like below, without getting rid of the -Wshadow stuff
> in there my initial attempt didn't work.  This passes the new testcase
> as well as the testcase you've been touching, but haven't tested it beyond
> that yet.

Works for me if it turns out working.

> In theory we could even handle some cases with gimple_vuse (stmt) != vuse,
> because we save a copy of the strinfo state at the end of basic blocks and
> only throw that away after we process all dominator children.  But we'd need
> to figure out at which bb to look and temporarily switch the vectors.

As we need sth for the branch(es) I think we should do that as followup
at most.

Thanks,
Richard.

> 2023-10-10  Richard Biener  
>   Jakub Jelinek  
> 
>   PR tree-optimization/111519
>   * tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Add vuse
>   argument and pass it through to recursive calls and
>   count_nonzero_bytes_addr calls.  Don't shadow the stmt argument, but
>   change stmt for gimple_assign_single_p statements for which we don't
>   immediately punt.
>   (strlen_pass::count_nonzero_bytes_addr): Add vuse argument and pass
>   it through to recursive calls and count_nonzero_bytes calls.  Don't
>   use get_strinfo if gimple_vuse (stmt) is different from vuse.  Don't
>   shadow the stmt argument.
> 
>   * gcc.dg/torture/pr111519.c: New testcase.
> 
> --- gcc/tree-ssa-strlen.cc.jj 2023-08-30 11:21:38.539521966 +0200
> +++ gcc/tree-ssa-strlen.cc2023-10-10 15:05:44.731871218 +0200
> @@ -281,14 +281,14 @@ public:
>   gimple *stmt,
>   unsigned lenrange[3], bool *nulterm,
>   bool *allnul, bool *allnonnul);
> -  bool count_nonzero_bytes (tree exp,
> +  bool count_nonzero_bytes (tree exp, tree vuse,
>   gimple *stmt,
>   unsigned HOST_WIDE_INT offset,
>   unsigned HOST_WIDE_INT nbytes,
>   unsigned lenrange[3], bool *nulterm,
>   bool *allnul, bool *allnonnul,
>   ssa_name_limit_t &snlim);
> -  bool count_nonzero_bytes_addr (tree exp,
> +  bool count_nonzero_bytes_addr (tree exp, tree vuse,
>gimple *stmt,
>unsigned HOST_WIDE_INT offset,
>unsigned HOST_WIDE_INT nbytes,
> @@ -4531,8 +4531,8 @@ nonzero_bytes_for_type (tree type, unsig
>  }
>  
>  /* Recursively determine the minimum and maximum number of leading nonzero
> -   bytes in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
> -   to each.
> +   bytes

Re: [PATCH] dwarf2out: Stop using wide_int in GC structures

2023-10-10 Thread Richard Biener
On Tue, 10 Oct 2023, Jakub Jelinek wrote:

> Hi!
> 
> On Tue, Oct 10, 2023 at 09:30:31AM +, Richard Biener wrote:
> > On Mon, 9 Oct 2023, Jakub Jelinek wrote:
> > > > This makes wide_int unusable in GC structures, so for dwarf2out
> > > > which was the only place which needed it there is a new rwide_int type
> > > > (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
> > > > inline and is trivially copyable (dwarf2out should never deal with large
> > > > _BitInt constants, those should have been lowered earlier).
> > > 
> > > As discussed on IRC, the dwarf2out.{h,cc} needs are actually quite 
> > > limited,
> > > it just needs to allocate new GC structures val_wide points to 
> > > (constructed
> > > from some const wide_int_ref &) and needs to call operator==,
> > > get_precision, elt, get_len and get_val methods on it.
> > > Even trailing_wide_int would be overkill for that, the following just adds
> > > a new struct with precision/len and trailing val array members and
> > > implements the needed methods (only 2 of them using wide_int_ref 
> > > constructed
> > > from those).
> > > 
> > > Incremental patch, so far compile time tested only:
> > 
> > LGTM, wonder if we can push this separately as prerequesite?
> 
> Here it is as a separate independent patch.  Even without the
> wide_int changing patch it should save some memory, by not always
> allocating room for 9 limbs, but say just the 2/3 or how many we actually
> need.  And, another advantage is that if we really need it at some point,
> it could support even more than 9 limbs if it is created from a wide_int_ref
> with get_len () > 9.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK by me if Jason doesn't object.

Thanks,
Richard.

> 2023-10-10  Jakub Jelinek  
> 
>   * dwarf2out.h (wide_int_ptr): Remove.
>   (dw_wide_int_ptr): New typedef.
>   (struct dw_val_node): Change type of val_wide from wide_int_ptr
>   to dw_wide_int_ptr.
>   (struct dw_wide_int): New type.
>   (dw_wide_int::elt): New method.
>   (dw_wide_int::operator ==): Likewise.
>   * dwarf2out.cc (get_full_len): Change argument type to
>   const dw_wide_int & from const wide_int &.  Use CEIL.  Call
>   get_precision method instead of calling wi::get_precision.
>   (alloc_dw_wide_int): New function.
>   (add_AT_wide): Change w argument type to const wide_int_ref &
>   from const wide_int &.  Use alloc_dw_wide_int.
>   (mem_loc_descriptor, loc_descriptor): Use alloc_dw_wide_int.
>   (insert_wide_int): Change val argument type to const wide_int_ref &
>   from const wide_int &.
>   (add_const_value_attribute): Pass rtx_mode_t temporary directly to
>   add_AT_wide instead of using a temporary variable.
> 
> --- gcc/dwarf2out.h.jj2023-10-09 14:37:45.890939965 +0200
> +++ gcc/dwarf2out.h   2023-10-09 16:46:14.705816928 +0200
> @@ -30,7 +30,7 @@ typedef struct dw_cfi_node *dw_cfi_ref;
>  typedef struct dw_loc_descr_node *dw_loc_descr_ref;
>  typedef struct dw_loc_list_struct *dw_loc_list_ref;
>  typedef struct dw_discr_list_node *dw_discr_list_ref;
> -typedef wide_int *wide_int_ptr;
> +typedef struct dw_wide_int *dw_wide_int_ptr;
>  
>  
>  /* Call frames are described using a sequence of Call Frame
> @@ -252,7 +252,7 @@ struct GTY(()) dw_val_node {
>unsigned HOST_WIDE_INT
>   GTY ((tag ("dw_val_class_unsigned_const"))) val_unsigned;
>double_int GTY ((tag ("dw_val_class_const_double"))) val_double;
> -  wide_int_ptr GTY ((tag ("dw_val_class_wide_int"))) val_wide;
> +  dw_wide_int_ptr GTY ((tag ("dw_val_class_wide_int"))) val_wide;
>dw_vec_const GTY ((tag ("dw_val_class_vec"))) val_vec;
>struct dw_val_die_union
>   {
> @@ -313,6 +313,35 @@ struct GTY(()) dw_discr_list_node {
>int dw_discr_range;
>  };
>  
> +struct GTY((variable_size)) dw_wide_int {
> +  unsigned int precision;
> +  unsigned int len;
> +  HOST_WIDE_INT val[1];
> +
> +  unsigned int get_precision () const { return precision; }
> +  unsigned int get_len () const { return len; }
> +  const HOST_WIDE_INT *get_val () const { return val; }
> +  inline HOST_WIDE_INT elt (unsigned int) const;
> +  inline bool operator == (const dw_wide_int &) const;
> +};
> +
> +inline HOST_WIDE_INT
> +dw_wide_int::elt (unsigned int i) const
> +{
> +  if (i < len)
> +return val[i];
> +  wide_int_ref ref = wi::storage_ref (val, len, precision);
> 

Re: [PATCH V1] use more get_range_query

2023-10-11 Thread Richard Biener
On Wed, 11 Oct 2023, Jiufu Guo wrote:

> Hi,
> 
> For "get_global_range_query" SSA_NAME_RANGE_INFO can be queried.
> For "get_range_query", it could get more context-aware range info.
> And look at the implementation of "get_range_query",  it returns
> global range if no local fun info.
> 
> So, if not quering for SSA_NAME and not chaning the IL, it would
> be ok to use get_range_query to replace get_global_range_query.
> 
> Patch https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630389.html,
> Uses get_range_query could handle more cases.
> 
> Compare with previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632401.html
> This patch remove some unsafe replacement.
> 
> Pass bootstrap & regtest on ppc64{,le} and x86_64.
> Is this ok for trunk.

OK.

Richard.

> 
> BR,
> Jeff (Jiufu Guo)
> 
> gcc/ChangeLog:
> 
>   * fold-const.cc (expr_not_equal_to): Replace get_global_range_query
>   by get_range_query.
>   * gimple-fold.cc (size_must_be_zero_p): Likewise.
>   * gimple-range-fold.cc (fur_source::fur_source): Likewise.
>   * gimple-ssa-warn-access.cc (check_nul_terminated_array): Likewise.
>   * tree-dfa.cc (get_ref_base_and_extent): Likewise.
> 
> ---
>  gcc/fold-const.cc | 6 +-
>  gcc/gimple-fold.cc| 6 ++
>  gcc/gimple-range-fold.cc  | 4 +---
>  gcc/gimple-ssa-warn-access.cc | 2 +-
>  gcc/tree-dfa.cc   | 5 +
>  5 files changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 4f8561509ff..15134b21b9f 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -11056,11 +11056,7 @@ expr_not_equal_to (tree t, const wide_int &w)
>if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
>   return false;
>  
> -  if (cfun)
> - get_range_query (cfun)->range_of_expr (vr, t);
> -  else
> - get_global_range_query ()->range_of_expr (vr, t);
> -
> +  get_range_query (cfun)->range_of_expr (vr, t);
>if (!vr.undefined_p () && !vr.contains_p (w))
>   return true;
>/* If T has some known zero bits and W has any of those bits set,
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index dc89975270c..853edd9e5d4 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -876,10 +876,8 @@ size_must_be_zero_p (tree size)
>wide_int zero = wi::zero (TYPE_PRECISION (type));
>value_range valid_range (type, zero, ssize_max);
>value_range vr;
> -  if (cfun)
> -get_range_query (cfun)->range_of_expr (vr, size);
> -  else
> -get_global_range_query ()->range_of_expr (vr, size);
> +  get_range_query (cfun)->range_of_expr (vr, size);
> +
>if (vr.undefined_p ())
>  vr.set_varying (TREE_TYPE (size));
>vr.intersect (valid_range);
> diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
> index d1945ccb554..6e9530c3d7f 100644
> --- a/gcc/gimple-range-fold.cc
> +++ b/gcc/gimple-range-fold.cc
> @@ -50,10 +50,8 @@ fur_source::fur_source (range_query *q)
>  {
>if (q)
>  m_query = q;
> -  else if (cfun)
> -m_query = get_range_query (cfun);
>else
> -m_query = get_global_range_query ();
> +m_query = get_range_query (cfun);
>m_gori = NULL;
>  }
>  
> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
> index fcaff128d60..e439d1b9b68 100644
> --- a/gcc/gimple-ssa-warn-access.cc
> +++ b/gcc/gimple-ssa-warn-access.cc
> @@ -332,7 +332,7 @@ check_nul_terminated_array (GimpleOrTree expr, tree src, 
> tree bound)
>  {
>Value_Range r (TREE_TYPE (bound));
>  
> -  get_global_range_query ()->range_of_expr (r, bound);
> +  get_range_query (cfun)->range_of_expr (r, bound);
>  
>if (r.undefined_p () || r.varying_p ())
>   return true;
> diff --git a/gcc/tree-dfa.cc b/gcc/tree-dfa.cc
> index af8e9243947..5355af2c869 100644
> --- a/gcc/tree-dfa.cc
> +++ b/gcc/tree-dfa.cc
> @@ -531,10 +531,7 @@ get_ref_base_and_extent (tree exp, poly_int64 *poffset,
>  
>   value_range vr;
>   range_query *query;
> - if (cfun)
> -   query = get_range_query (cfun);
> - else
> -   query = get_global_range_query ();
> + query = get_range_query (cfun);
>  
>   if (TREE_CODE (index) == SSA_NAME
>   && (low_bound = array_ref_low_bound (exp),
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] MATCH: [PR111282] Simplify `a & (b ^ ~a)` to `a & b`

2023-10-11 Thread Richard Biener
On Wed, Oct 11, 2023 at 2:46 AM Andrew Pinski  wrote:
>
> While `a & (b ^ ~a)` is optimized to `a & b` on the rtl level,
> it is always good to optimize this at the gimple level and allows
> us to match a few extra things including where a is a comparison.
>
> Note I had to update/change the testcase and-1.c to avoid matching
> this case as we can match -2 and 1 as bitwise inversions.

OK.

> PR tree-optimization/111282
>
> gcc/ChangeLog:
>
> * match.pd (`a & ~(a ^ b)`, `a & (a == b)`,
> `a & ((~a) ^ b)`): New patterns.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/and-1.c: Update testcase to avoid
> matching `~1 & (a ^ 1)` simplification.
> * gcc.dg/tree-ssa/bitops-6.c: New test.
> ---
>  gcc/match.pd | 20 ++
>  gcc/testsuite/gcc.dg/tree-ssa/and-1.c|  6 ++---
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c | 33 
>  3 files changed, 56 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 49740d189a7..26b05c157c1 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1358,6 +1358,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& (!wascmp || element_precision (type) == 1))
>(bit_ior @0 (bit_not @2)
>
> +/* a & ~(a ^ b)  -->  a & b  */
> +(simplify
> + (bit_and:c @0 (bit_not (bit_xor:c @0 @1)))
> + (bit_and @0 @1))
> +
> +/* a & (a == b)  -->  a & b (boolean version of the above). */
> +(simplify
> + (bit_and:c @0 (nop_convert? (eq:c @0 @1)))
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
> +  (bit_and @0 @1)))
> +
> +/* a & ((~a) ^ b)  -->  a & b (alt version of the above 2) */
> +(simplify
> + (bit_and:c @0 (bit_xor:c @1 @2))
> + (with { bool wascmp; }
> + (if (bitwise_inverted_equal_p (@0, @1, wascmp)
> +  && (!wascmp || element_precision (type) == 1))
> +  (bit_and @0 @2
> +
>  /* (a | b) | (a &^ b)  -->  a | b  */
>  (for op (bit_and bit_xor)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/and-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/and-1.c
> index 276c2b9bd8a..27d38907eea 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/and-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/and-1.c
> @@ -2,10 +2,10 @@
>  /* { dg-options "-O -fdump-tree-optimized-raw" } */
>
>  int f(int in) {
> -  in = in | 3;
> -  in = in ^ 1;
> +  in = in | 7;
> +  in = in ^ 3;
>in = (in & ~(unsigned long)1);
>return in;
>  }
>
> -/* { dg-final { scan-tree-dump-not "bit_and_expr" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "bit_and_expr, "  "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c
> new file mode 100644
> index 000..e6ab2fd6c71
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-6.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +/* PR tree-optimization/111282 */
> +
> +
> +int f(int a, int b)
> +{
> +  return a & (b ^ ~a); // a & b
> +}
> +
> +_Bool fb(_Bool x, _Bool y)
> +{
> +  return x & (y ^ !x); // x & y
> +}
> +
> +int fa(int w, int z)
> +{
> +  return (~w) & (w ^ z); // ~w & z
> +}
> +
> +int fcmp(int x, int y)
> +{
> +  _Bool a = x == 2;
> +  _Bool b = y == 1;
> +  return a & (b ^ !a); // (x == 2) & (y == 1)
> +}
> +
> +/* { dg-final { scan-tree-dump-not   "bit_xor_expr, "   "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_and_expr, " 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "bit_not_expr, " 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not   "ne_expr, ""optimized" } } */
> +/* { dg-final { scan-tree-dump-times "eq_expr, "  2 "optimized" } } */
> +
> --
> 2.39.3
>


RE: [PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables

2023-10-11 Thread Richard Biener
; >  {
> > > -  e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > +  e = LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo);
> > >if (! single_pred_p (e->dest))
> > >   {
> > > split_loop_exit_edge (e, true);
> > > @@ -11625,8 +11655,9 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >   a zero NITERS becomes a nonzero NITERS_VECTOR.  */
> > >if (integer_onep (step_vector))
> > >  niters_no_overflow = true;
> > > -  vect_set_loop_condition (loop, loop_vinfo, niters_vector, step_vector,
> > > -niters_vector_mult_vf, !niters_no_overflow);
> > > +  vect_set_loop_condition (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > loop_vinfo,
> > > +niters_vector, step_vector, niters_vector_mult_vf,
> > > +!niters_no_overflow);
> > >
> > >unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
> > >
> > > @@ -11699,7 +11730,8 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > > assumed_vf) - 1
> > >: wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
> > >  assumed_vf) - 1);
> > > -  scale_profile_for_vect_loop (loop, assumed_vf, flat);
> > > +  scale_profile_for_vect_loop (loop, LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > +assumed_vf, flat);
> > >
> > >if (dump_enabled_p ())
> > >  {
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > f1d0cd79961abb095bc79d3b59a81930f0337e59..afa7a8e30891c782a0e5e
> > 3740ecc
> > > 4377f5a31e54 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -919,10 +919,24 @@ public:
> > >   analysis.  */
> > >vec<_loop_vec_info *> epilogue_vinfos;
> > >
> > > +  /* The controlling loop IV for the current loop when vectorizing.  
> > > This IV
> > > + controls the natural exits of the loop.  */  edge vec_loop_iv;
> > > +
> > > +  /* The controlling loop IV for the epilogue loop when vectorizing.  
> > > This IV
> > > + controls the natural exits of the loop.  */  edge
> > > + vec_epilogue_loop_iv;
> > > +
> > > +  /* The controlling loop IV for the scalar loop being vectorized.  This 
> > > IV
> > > + controls the natural exits of the loop.  */  edge
> > > + scalar_loop_iv;
> > 
> > all of the above sound as if they were IVs, the access macros have _EXIT at 
> > the
> > end, can you make the above as well?
> > 
> > Otherwise looks good to me.
> > 
> > Feel free to push approved patches of the series, no need to wait until
> > everything is approved.
> > 
> > Thanks,
> > Richard.
> > 
> > >  } *loop_vec_info;
> > >
> > >  /* Access Functions.  */
> > >  #define LOOP_VINFO_LOOP(L) (L)->loop
> > > +#define LOOP_VINFO_IV_EXIT(L)  (L)->vec_loop_iv
> > > +#define LOOP_VINFO_EPILOGUE_IV_EXIT(L) (L)->vec_epilogue_loop_iv
> > > +#define LOOP_VINFO_SCALAR_IV_EXIT(L)   (L)->scalar_loop_iv
> > >  #define LOOP_VINFO_BBS(L)  (L)->bbs
> > >  #define LOOP_VINFO_NITERSM1(L) (L)->num_itersm1
> > >  #define LOOP_VINFO_NITERS(L)   (L)->num_iters
> > > @@ -2155,11 +2169,13 @@ class auto_purge_vect_location
> > >
> > >  /* Simple loop peeling and versioning utilities for vectorizer's 
> > > purposes -
> > > in tree-vect-loop-manip.cc.  */
> > > -extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > +extern void vect_set_loop_condition (class loop *, edge,
> > > +loop_vec_info,
> > >tree, tree, tree, bool);
> > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > > const_edge); -class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *,
> > > -  class loop *, edge);
> > > +extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
> > > +  const_edge);
> > > +class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
> > > + class loop *, edge,
> > > + edge, edge *);
> > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);  extern
> > > class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > >   tree *, tree *, tree *, int, bool, bool, @@ 
> > > -
> > 2169,6 +2185,7
> > > @@ extern void vect_prepare_for_masked_peels (loop_vec_info);  extern
> > > dump_user_location_t find_loop_location (class loop *);  extern bool
> > > vect_can_advance_ivs_p (loop_vec_info);  extern void
> > > vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
> > > +extern edge vec_init_loop_exit_info (class loop *);
> > >
> > >  /* In tree-vect-stmts.cc.  */
> > >  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> > > @@ -2358,6 +2375,7 @@ struct vect_loop_form_info
> > >tree assumptions;
> > >gcond *loop_cond;
> > >gcond *inner_loop_cond;
> > > +  edge loop_exit;
> > >  };
> > >  extern opt_result vect_analyze_loop_form (class loop *,
> > > vect_loop_form_info *);  extern loop_vec_info vect_create_loop_vinfo
> > > (class loop *, vec_info_shared *, diff --git a/gcc/tree-vectorizer.cc
> > > b/gcc/tree-vectorizer.cc index
> > >
> > a048e9d89178a37455bd7b83ab0f2a238a4ce69e..d97e2b54c25ac6037893
> > 5392aa7b
> > > 73476efed74b 100644
> > > --- a/gcc/tree-vectorizer.cc
> > > +++ b/gcc/tree-vectorizer.cc
> > > @@ -943,6 +943,8 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo,
> > gimple *loop_vectorized_call,
> > >class loop *scalar_loop = get_loop (fun, tree_to_shwi (arg));
> > >
> > >LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
> > > +  LOOP_VINFO_SCALAR_IV_EXIT (loop_vinfo)
> > > += vec_init_loop_exit_info (scalar_loop);
> > >gcc_checking_assert (vect_loop_vectorized_call (scalar_loop)
> > >  == loop_vectorized_call);
> > >/* If we are going to vectorize outer loop, prevent vectorization
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH 2/3]middle-end: updated niters analysis to handle multiple exits.

2023-10-11 Thread Richard Biener
On Wed, 11 Oct 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, October 10, 2023 12:14 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: Re: [PATCH 2/3]middle-end: updated niters analysis to handle
> > multiple exits.
> > 
> > On Mon, 2 Oct 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This second part updates niters analysis to be able to analyze any
> > > number of exits.  If we have multiple exits we determine the main exit
> > > by finding the first counting IV.
> > >
> > > The change allows the vectorizer to pass analysis for multiple loops,
> > > but we later gracefully reject them.  It does however allow us to test
> > > if the exit handling is using the right exit everywhere.
> > >
> > > Additionally since we analyze all exits, we now return all conditions
> > > for them and determine which condition belongs to the main exit.
> > >
> > > The main condition is needed because the vectorizer needs to ignore
> > > the main IV condition during vectorization as it will replace it during 
> > > codegen.
> > >
> > > To track versioned loops we extend the contract between ifcvt and the
> > > vectorizer to store the exit number in aux so that we can match it up 
> > > again
> > during peeling.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu,
> > > and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-if-conv.cc (tree_if_conversion): Record exits in aux.
> > >   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > Use
> > >   it.
> > >   * tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
> > >   (vec_init_loop_exit_info): Extend analysis when multiple exits.
> > >   (vect_analyze_loop_form): Record conds and determine main cond.
> > >   (vect_create_loop_vinfo): Extend bookkeeping of conds.
> > >   (vect_analyze_loop): Release conds.
> > >   * tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
> > >   LOOP_VINFO_LOOP_IV_COND):  New.
> > >   (struct vect_loop_form_info): Add conds, alt_loop_conds;
> > >   (struct loop_vec_info): Add conds, loop_iv_cond.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index
> > >
> > 799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55
> > 903531
> > > b22829f4fdbd 100644
> > > --- a/gcc/tree-if-conv.cc
> > > +++ b/gcc/tree-if-conv.cc
> > > @@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop,
> > vec *preds)
> > >  }
> > >if (need_to_ifcvt)
> > >  {
> > > +  /* Before we rewrite edges we'll record their original position in 
> > > the
> > > +  edge map such that we can map the edges between the ifcvt and the
> > > +  non-ifcvt loop during peeling.  */
> > > +  uintptr_t idx = 0;
> > > +  for (edge exit : get_loop_exit_edges (loop))
> > > + exit->aux = (void*)idx++;
> > > +
> > >/* Now all statements are if-convertible.  Combine all the basic
> > >blocks into one huge basic block doing the if-conversion
> > >on-the-fly.  */
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > >
> > e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4
> > 052e1b1
> > > 2e0d17300262 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop, edge loop_exit,
> > >scalar_loop = loop;
> > >scalar_exit = loop_exit;
> > >  }
> > > +  else if (scalar_loop == loop)
> > > +scalar_exit = loop_exit;
> > > +  else
> > > +{
> > > +  /* Loop has been version, match exits up using the aux index.  */
> > > +  for (edge exit : get_loop_exit_edges (scalar_loop))
> > > + if (exit->aux == loop_exit->aux)
> > > +   {
> > > + scalar_exit = exit;
> > > + break;
> > > +   }
> > > +}
> > >
> > >bbs = XNEWVEC (basic_block, scalar_loop->num_no

RE: [PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling

2023-10-11 Thread Richard Biener
g_arg, first_loop_e, UNKNOWN_LOCATION);
> > > +  tree def = get_current_def (var);
> > > +  if (!def)
> > > +continue;
> > > +  if (operand_equal_p (def,
> > > +   PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > +return PHI_RESULT (phi);
> > >   }
> > >  }
> > > +  return NULL_TREE;
> > >  }
> > >
> > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> > > beginning @@ -2796,11 +2768,11 @@
> > slpeel_update_phi_nodes_for_guard1 (class loop *skip_loop,
> > >  }
> > >  }
> > >
> > > -/* LOOP and EPILOG are two consecutive loops in CFG and EPILOG is copied
> > > -   from LOOP.  Function slpeel_add_loop_guard adds guard skipping from a
> > > -   point between the two loops to the end of EPILOG.  Edges GUARD_EDGE
> > > -   and MERGE_EDGE are the two pred edges of merge_bb at the end of
> > EPILOG.
> > > -   The CFG looks like:
> > > +/* LOOP and EPILOG are two consecutive loops in CFG connected by
> > LOOP_EXIT edge
> > > +   and EPILOG is copied from LOOP.  Function slpeel_add_loop_guard adds
> > guard
> > > +   skipping from a point between the two loops to the end of EPILOG.  
> > > Edges
> > > +   GUARD_EDGE and MERGE_EDGE are the two pred edges of merge_bb at
> > the end of
> > > +   EPILOG.  The CFG looks like:
> > >
> > >   loop:
> > > header_a:
> > > @@ -2851,6 +2823,7 @@ slpeel_update_phi_nodes_for_guard1 (class loop
> > > *skip_loop,
> > >
> > >  static void
> > >  slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop
> > > *epilog,
> > > + const_edge loop_exit,
> > >   edge guard_edge, edge merge_edge)  {
> > >gphi_iterator gsi;
> > > @@ -2859,13 +2832,11 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop *loop, class loop *epilog,
> > >gcc_assert (single_succ_p (merge_bb));
> > >edge e = single_succ_edge (merge_bb);
> > >basic_block exit_bb = e->dest;
> > > -  gcc_assert (single_pred_p (exit_bb));
> > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > >
> > >for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > >  {
> > >gphi *update_phi = gsi.phi ();
> > > -  tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > +  tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > >
> > >tree merge_arg = NULL_TREE;
> > >
> > > @@ -2877,8 +2848,8 @@ slpeel_update_phi_nodes_for_guard2 (class loop
> > *loop, class loop *epilog,
> > >if (!merge_arg)
> > >   merge_arg = old_arg;
> > >
> > > -  tree guard_arg
> > > - = find_guard_arg (loop, epilog, single_exit (loop), update_phi);
> > > +  tree guard_arg = find_guard_arg (loop, epilog, loop_exit,
> > > +update_phi, e->dest_idx);
> > >/* If the var is live after loop but not a reduction, we simply
> > >use the old arg.  */
> > >if (!guard_arg)
> > > @@ -2898,21 +2869,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop *loop, class loop *epilog,
> > >  }
> > >  }
> > >
> > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > -
> > > -static void
> > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog) -{
> > > -  gphi_iterator gsi;
> > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > -
> > > -  gcc_assert (single_pred_p (exit_bb));
> > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > -rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > -}
> > > -
> > >  /* LOOP_VINFO is an epilogue loop whose corresponding main loop can be
> > skipped.
> > > Return a value that equals:
> > >
> > > @@ -3255,8 +3211,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >  e, &prolog_e);
> > >gcc_assert (prolog);
> > >prolog->force_vectorize = false;
> > > -  slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, prolog_e, 
> > > loop,
> > > -  exit_e, true);
> > > +
> > >first_loop = prolog;
> > >reset_original_copy_tables ();
> > >
> > > @@ -3336,8 +3291,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
> > >gcc_assert (epilog);
> > >epilog->force_vectorize = false;
> > > -  slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, e, epilog,
> > > -  new_epilog_e, false);
> > >bb_before_epilog = loop_preheader_edge (epilog)->src;
> > >
> > >/* Scalar version loop may be preferred.  In this case, add
> > > guard @@ -3430,7 +3383,9 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo, tree niters, tree nitersm1,
> > >  irred_flag);
> > > if (vect_epilogues)
> > >   epilogue_vinfo->skip_this_loop_edge = guard_e;
> > > -   slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > epilog_e);
> > > +   edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > +   slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv,
> > guard_e,
> > > +   epilog_e);
> > > /* Only need to handle basic block before epilog loop if it's not
> > >the guard_bb, which is the case when skip_vector is true.  */
> > > if (guard_bb != bb_before_epilog)
> > > @@ -3441,8 +3396,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >   }
> > > scale_loop_profile (epilog, prob_epilog, -1);
> > >   }
> > > -  else
> > > - slpeel_update_phi_nodes_for_lcssa (epilog);
> > >
> > >unsigned HOST_WIDE_INT bound;
> > >if (bound_scalar.is_constant (&bound)) diff --git
> > > a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > f1caa5f207d3b13da58c3a313b11d1ef98374349..327cab0f736da7f1bd3e0
> > 24d666d
> > > f46ef9208107 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -5877,7 +5877,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> > >basic_block exit_bb;
> > >tree scalar_dest;
> > >tree scalar_type;
> > > -  gimple *new_phi = NULL, *phi;
> > > +  gimple *new_phi = NULL, *phi = NULL;
> > >gimple_stmt_iterator exit_gsi;
> > >tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > >gimple *epilog_stmt = NULL;
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > 55b6771b271d5072fa1327d595e1dddb112cfdf6..25ceb6600673d71fd601
> > 24434039
> > > 97e921066483 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -2183,7 +2183,7 @@ extern bool slpeel_can_duplicate_loop_p (const
> > class loop *, const_edge,
> > >const_edge);
> > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
> > >   class loop *, edge,
> > > - edge, edge *);
> > > + edge, edge *, bool = true);
> > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);  extern
> > > class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > >   tree *, tree *, tree *, int, bool, bool,
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-11 Thread Richard Biener
On Wed, 11 Oct 2023, Juzhe-Zhong wrote:

> This patch fixes this following FAILs in RISC-V regression:
> 
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> 
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> 
> Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> tree-vect-patterns.cc if it is same
> situation as GATHER_LOAD (no conditional mask).
> 
> So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> argument is a dummy mask.
> 
> gcc/ChangeLog:
> 
>   * tree-vect-slp.cc (vect_get_operand_map):
>   (vect_build_slp_tree_1):
>   (vect_build_slp_tree_2):
>   * tree-vect-stmts.cc (vectorizable_load):
> 
> ---
>  gcc/tree-vect-slp.cc   | 18 --
>  gcc/tree-vect-stmts.cc |  4 ++--
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index fa098f9ff4e..712c04ec278 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
> swap = 0)
> case IFN_MASK_GATHER_LOAD:
>   return arg1_arg4_map;
>  
> +   case IFN_MASK_LEN_GATHER_LOAD:
> + /* In tree-vect-patterns.cc, we will have these 2 situations:
> +
> + - Unconditional gather load transforms
> +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> +
> + - Conditional gather load transforms
> +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> +   : nullptr;
> +
> case IFN_MASK_STORE:
>   return arg3_arg2_map;
>  
> @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>  
> if (cfn == CFN_MASK_LOAD
> || cfn == CFN_GATHER_LOAD
> -   || cfn == CFN_MASK_GATHER_LOAD)
> +   || cfn == CFN_MASK_GATHER_LOAD
> +   || cfn == CFN_MASK_LEN_GATHER_LOAD)
>   ldst_p = true;
> else if (cfn == CFN_MASK_STORE)
>   {
> @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
> if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> && rhs_code != CFN_GATHER_LOAD
> && rhs_code != CFN_MASK_GATHER_LOAD
> +   && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> /* Not grouped loads are handled as externals for BB
>vectorization.  For loop vectorization we can handle
>splats the same we handle single element interleaving.  */
> @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
>if (gcall *stmt = dyn_cast  (stmt_info->stmt))
>   gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
>   || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
> + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
>else
>   {
> *max_nunits = this_max_nunits;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index cd7c1090d88..263acf5d3cd 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9575,9 +9575,9 @@ vectorizable_load (vec_info *vinfo,
>   return false;
>  
>mask_index = internal_fn_mask_index (ifn);
> -  if (mask_index >= 0 && slp_node)
> +  if (mask_index >= 0 && slp_node && internal_fn_len_index (ifn) < 0)
>   mask_index = vect_slp_child_index_for_operand (call, mask_index);
> -  if (mask_index >= 0
> +  if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
> && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> &mask, NULL, &mask_dt, &mask_vectype))
>   return false;

You are ignoring the mask argument and here only handle it when the
IFN doesn't have a _LEN.  This doesn't seem to be forward looking
to the point where you want to actually handle masked (aka conditional)
gather.

Did you check that SLP is actually used to vectorize this?

Richard.


[PATCH] tree-optimization/111764 - wrong reduction vectorization

2023-10-12 Thread Richard Biener
The following removes a misguided attempt to allow x + x in a reduction
path, also allowing x * x which isn't valid.  x + x actually never
arrives this way but instead is canonicalized to 2 * x.  This makes
reduction path handling consistent with how we handle the single-stmt
reduction case.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111764
* tree-vect-loop.cc (check_reduction_path): Remove the attempt
to allow x + x via special-casing of assigns.

* gcc.dg/vect/pr111764.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr111764.c | 16 
 gcc/tree-vect-loop.cc| 15 +++
 2 files changed, 19 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr111764.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr111764.c 
b/gcc/testsuite/gcc.dg/vect/pr111764.c
new file mode 100644
index 000..f4e110f3bbf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111764.c
@@ -0,0 +1,16 @@
+#include "tree-vect.h"
+
+short b = 2;
+
+int main()
+{
+  check_vect ();
+
+  for (int a = 1; a <= 9; a++)
+b = b * b;
+  if (b != 0)
+__builtin_abort ();
+
+  return 0;
+}
+
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 23c6e8259e7..82b793db74b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3986,24 +3986,15 @@ pop:
 ???  We could relax this and handle arbitrary live stmts by
 forcing a scalar epilogue for example.  */
   imm_use_iterator imm_iter;
+  use_operand_p use_p;
   gimple *op_use_stmt;
   unsigned cnt = 0;
   FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op.ops[opi])
if (!is_gimple_debug (op_use_stmt)
&& (*code != ERROR_MARK
|| flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
- {
-   /* We want to allow x + x but not x < 1 ? x : 2.  */
-   if (is_gimple_assign (op_use_stmt)
-   && gimple_assign_rhs_code (op_use_stmt) == COND_EXPR)
- {
-   use_operand_p use_p;
-   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
- cnt++;
- }
-   else
- cnt++;
- }
+ FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+   cnt++;
   if (cnt != 1)
{
  fail = true;
-- 
2.35.3


[PATCH] tree-optimization/111773 - avoid CD-DCE of noreturn special calls

2023-10-12 Thread Richard Biener
The support to elide calls to allocation functions in DCE runs into
the issue that when implementations are discovered noreturn we end
up DCEing the calls anyway, leaving blocks without termination and
without outgoing edges which is both invalid IL and wrong-code when
as in the example the noreturn call would throw.  The following
avoids taking advantage of both noreturn and the ability to elide
allocation at the same time.

For the testcase it's valid to throw or return 10 by eliding the
allocation.  But we have to do either where currently we'd run
off the function.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Honza, any objections here?

Thanks,
Richard.

PR tree-optimization/111773
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Do
not elide noreturn calls that are reflected to the IL.

* g++.dg/torture/pr111773.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr111773.C | 31 +
 gcc/tree-ssa-dce.cc |  8 +++
 2 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr111773.C

diff --git a/gcc/testsuite/g++.dg/torture/pr111773.C 
b/gcc/testsuite/g++.dg/torture/pr111773.C
new file mode 100644
index 000..af8c687252c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr111773.C
@@ -0,0 +1,31 @@
+// { dg-do run }
+
+#include 
+
+void* operator new(std::size_t sz)
+{
+  throw std::bad_alloc{};
+}
+
+int __attribute__((noipa)) foo ()
+{
+  int* p1 = static_cast(::operator new(sizeof(int)));
+  return 10;
+}
+
+int main()
+{
+  int res;
+  try
+{
+  res = foo ();
+}
+  catch (...)
+{
+  return 0;
+}
+
+  if (res != 10)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index f0b02456132..bbdf9312c9f 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -221,6 +221,14 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool 
aggressive)
 
 case GIMPLE_CALL:
   {
+   /* Never elide a noreturn call we pruned control-flow for.  */
+   if ((gimple_call_flags (stmt) & ECF_NORETURN)
+   && gimple_call_ctrl_altering_p (stmt))
+ {
+   mark_stmt_necessary (stmt, true);
+   return;
+ }
+
tree callee = gimple_call_fndecl (stmt);
if (callee != NULL_TREE
&& fndecl_built_in_p (callee, BUILT_IN_NORMAL))
-- 
2.35.3


Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener
On Thu, 12 Oct 2023, ??? wrote:

> Thanks Richi point it out.
> 
> I found this patch can't make conditional gather load succeed on SLP.
> 
> I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> 
> If no condition mask, in tree-vect-patterns.cc,  I build MASK_LEN_GATHER_LOAD 
> (ptr, offset, scale, 0) -> 4 arguments same as GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> naturally.
> 
> If has condition mask, in tree-vect-patterns.cc,  I build 
> MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> as MASK_GATHER_LOAD.
> In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> flow naturally.
> 
> Is it reasonable ?

What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
even when the mask is -1?

> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-11 20:50
> To: Juzhe-Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
>  
> > This patch fixes this following FAILs in RISC-V regression:
> > 
> > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > vect "Loop contains only SLP stmts"
> > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > SLP stmts"
> > 
> > The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
> > 
> > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > tree-vect-patterns.cc if it is same
> > situation as GATHER_LOAD (no conditional mask).
> > 
> > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > argument is a dummy mask.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-slp.cc (vect_get_operand_map):
> > (vect_build_slp_tree_1):
> > (vect_build_slp_tree_2):
> > * tree-vect-stmts.cc (vectorizable_load):
> > 
> > ---
> >  gcc/tree-vect-slp.cc   | 18 --
> >  gcc/tree-vect-stmts.cc |  4 ++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index fa098f9ff4e..712c04ec278 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned 
> > char swap = 0)
> >case IFN_MASK_GATHER_LOAD:
> >  return arg1_arg4_map;
> >  
> > +   case IFN_MASK_LEN_GATHER_LOAD:
> > + /* In tree-vect-patterns.cc, we will have these 2 situations:
> > +
> > + - Unconditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
> > +
> > + - Conditional gather load transforms
> > +   into MASK_LEN_GATHER_LOAD with real conditional mask.*/
> > + return integer_minus_onep (gimple_call_arg (call, 4)) ? arg1_map
> > +   : nullptr;
> > +
> >case IFN_MASK_STORE:
> >  return arg3_arg2_map;
> >  
> > @@ -1077,7 +1088,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> > *swap,
> >  
> >if (cfn == CFN_MASK_LOAD
> >|| cfn == CFN_GATHER_LOAD
> > -   || cfn == CFN_MASK_GATHER_LOAD)
> > +   || cfn == CFN_MASK_GATHER_LOAD
> > +   || cfn == CFN_MASK_LEN_GATHER_LOAD)
> >  ldst_p = true;
> >else if (cfn == CFN_MASK_STORE)
> >  {
> > @@ -1337,6 +1349,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> > *swap,
> >if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
> >&& rhs_code != CFN_GATHER_LOAD
> >&& rhs_code != CFN_MASK_GATHER_LOAD
> > +   && rhs_code != CFN_MASK_LEN_GATHER_LOAD
> >/* Not grouped loads are handled as externals for BB
> >  vectorization.  For loop vectorization we can handle
> >  splats the same we handle single element interleaving.  */
> > @@ -1837,7 +1850,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> >if (gcall *stmt = dyn_cast  (stmt_info->stmt))
> >  gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
> >  || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
> > - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
> > + || gimple_call_internal_p (

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:

> I tree-vect-slp.cc:
> vect_get_and_check_slp_defs
> 711: 
> 
>   tree type = TREE_TYPE (oprnd);
>   dt = dts[i];
>   if ((dt == vect_constant_def
>|| dt == vect_external_def)
>   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
>   && (TREE_CODE (type) == BOOLEAN_TYPE
>   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
>   type)))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "Build SLP failed: invalid type of def "
>  "for variable-length SLP %T\n", oprnd);
>   return -1;
> }
> 
> Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> condition, then SLP failed:
> Build SLP failed: invalid type of def

I think this can be restricted to vect_external_def, but some history
might reveal the cases we put this code in for (we should be able to
materialize all constants?).  At least uniform boolean constants
should be fine.
 
>
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:44
> To: ???
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, ??? wrote:
>  
> > Thanks Richi point it out.
> > 
> > I found this patch can't make conditional gather load succeed on SLP.
> > 
> > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > 
> > If no condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP flow 
> > naturally.
> > 
> > If has condition mask, in tree-vect-patterns.cc,  I build 
> > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments same 
> > as MASK_GATHER_LOAD.
> > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD SLP 
> > flow naturally.
> > 
> > Is it reasonable ?
>  
> What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> even when the mask is -1?
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-11 20:50
> > To: Juzhe-Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> >  
> > > This patch fixes this following FAILs in RISC-V regression:
> > > 
> > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> > > vect "Loop contains only SLP stmts"
> > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only 
> > > SLP stmts"
> > > 
> > > The root cause of these FAIL is that GCC SLP failed on 
> > > MASK_LEN_GATHER_LOAD.
> > > 
> > > Since for RVV, we build MASK_LEN_GATHER_LOAD with dummy mask (-1) in 
> > > tree-vect-patterns.cc if it is same
> > > situation as GATHER_LOAD (no conditional mask).
> > > 
> > > So we make MASK_LEN_GATHER_LOAD leverage the flow of GATHER_LOAD if mask 
> > > argument is a dummy mask.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-vect-slp.cc (vect_get_operand_map):
> > > (vect_build_slp_tree_1):
> > > (vect_build_slp_tree_2):
> > > * tree-vect-stmts.cc (vectorizable_load):
> > > 
> > > ---
> > >  gcc/tree-vect-slp.cc   | 18 --
> > >  gcc/tree-vect-stmts.cc |  4 ++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index fa098f9ff4e..712c04ec278 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -544,6 +544,17 @@ vect_get_operand_map (const gimple *stmt, unsigned 
> > > char swap = 0)
> > >case IFN_MASK_GATHER_LOAD:
> > >  return arg1_arg4_map;
> 

[PATCH] tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA

2023-10-12 Thread Richard Biener
The following handles byte-aligned, power-of-two and byte-multiple
sized BIT_FIELD_REF reads in SRA.  In particular this should cover
BIT_FIELD_REFs created by optimize_bit_field_compare.

For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
appearing there leading to more DSE, fully eliding the aggregates.

This results in the same false positive -Wuninitialized as the
older attempt to remove the folding from optimize_bit_field_compare,
fixed by initializing part of the aggregate unconditionally.

Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.

Martin is on leave so I'll push this tomorrow unless the Fortran
folks have objections.

Thanks,
Richard.

PR tree-optimization/111779
gcc/
* tree-sra.cc (sra_handled_bf_read_p): New function.
(build_access_from_expr_1): Handle some BIT_FIELD_REFs.
(sra_modify_expr): Likewise.
(make_fancy_name_1): Skip over BIT_FIELD_REF.

gcc/fortran/
* trans-expr.cc (gfc_trans_assignment_1): Initialize
lhs_caf_attr and rhs_caf_attr codimension flag to avoid
false positive -Wuninitialized.

gcc/testsuite/
* gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
* gcc.dg/vect/vect-pr111779.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c |  4 +-
 gcc/testsuite/gcc.dg/vect/vect-pr111779.c  | 56 ++
 gcc/tree-sra.cc| 24 --
 3 files changed, 79 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr111779.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
index e3c33f49ef6..43152de5616 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
@@ -31,5 +31,5 @@ constraint_equal (struct constraint a, struct constraint b)
 && constraint_expr_equal (a.rhs, b.rhs);
 }
 
-/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 1 "dse1" } } */
-/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 1 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 2 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 2 "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr111779.c 
b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
new file mode 100644
index 000..79b72aebc78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
@@ -0,0 +1,56 @@
+#include 
+#include "tree-vect.h"
+
+struct C
+{
+int c;
+int d;
+bool f :1;
+float e;
+};
+
+struct A
+{
+  unsigned int a;
+  unsigned char c1, c2;
+  bool b1 : 1;
+  bool b2 : 1;
+  bool b3 : 1;
+  struct C b4;
+};
+
+void __attribute__((noipa))
+foo (const struct A * __restrict x, int y)
+{
+  int s = 0, i = 0;
+  for (i = 0; i < y; ++i)
+{
+  const struct A a = x[i];
+  s += a.b4.f ? 1 : 0;
+}
+  if (s != 0)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  struct A x[100];
+  int i;
+
+  check_vect ();
+
+  __builtin_memset (x, -1, sizeof (x));
+#pragma GCC novect
+  for (i = 0; i < 100; i++)
+{
+  x[i].b1 = false;
+  x[i].b2 = false;
+  x[i].b3 = false;
+  x[i].b4.f = false;
+}
+  foo (x, 100);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_int } } 
} */
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 56a8ba26135..24d0c20da6a 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1113,6 +1113,21 @@ disqualify_base_of_expr (tree t, const char *reason)
 disqualify_candidate (t, reason);
 }
 
+/* Return true if the BIT_FIELD_REF read EXPR is handled by SRA.  */
+
+static bool
+sra_handled_bf_read_p (tree expr)
+{
+  uint64_t size, offset;
+  if (bit_field_size (expr).is_constant (&size)
+  && bit_field_offset (expr).is_constant (&offset)
+  && size % BITS_PER_UNIT == 0
+  && offset % BITS_PER_UNIT == 0
+  && pow2p_hwi (size))
+return true;
+  return false;
+}
+
 /* Scan expression EXPR and create access structures for all accesses to
candidates for scalarization.  Return the created access or NULL if none is
created.  */
@@ -1123,7 +1138,8 @@ build_access_from_expr_1 (tree expr, gimple *stmt, bool 
write)
   struct access *ret = NULL;
   bool partial_ref;
 
-  if (TREE_CODE (expr) == BIT_FIELD_REF
+  if ((TREE_CODE (expr) == BIT_FIELD_REF
+   && (write || !sra_handled_bf_read_p (expr)))
   || TREE_CODE (expr) == IMAGPART_EXPR
   || TREE_CODE (expr) == REALPART_EXPR)
 {
@@ -1170,6 +1186,7 @@ build_access_from_expr_1 (tree expr, gimple *stmt, bool 
write)
 case COMPONENT_REF:
 case ARRAY_REF:
 case ARRAY_RANGE_REF:
+case BIT_FIELD_REF:
   ret = create_access (expr, stmt, write);
   break;
 
@@ -1549,6 +1566,7 @@ make_fancy_name_1 (tree expr)
   obstack_grow (&name_obstack, buffer, strlen (buffer));
   break;
 
+case BIT_FIELD_REF:
 case ADDR_EXPR:
   make_fancy_name_1 (TREE_OPER

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-12 Thread Richard Biener
 orig_code = new_code != ERROR_MARK ? new_code : orig_code;
> +}
> +
>STMT_VINFO_REDUC_CODE (reduc_info) = orig_code;
>  
>vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
> @@ -7678,7 +7748,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>   {
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "reduction: not commutative/associative");
> + "reduction: not commutative/associative\n");
> return false;
>   }
>  }
> @@ -8062,9 +8132,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>   }
>else if (reduction_type == FOLD_LEFT_REDUCTION
> -&& reduc_fn == IFN_LAST
> +&& internal_fn_mask_index (reduc_fn) == -1
>  && FLOAT_TYPE_P (vectype_in)
> -&& HONOR_SIGNED_ZEROS (vectype_in)
>  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
>   {
> if (dump_enabled_p ())
> @@ -8213,6 +8282,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>  
>code_helper code = canonicalize_code (op.code, op.type);
>internal_fn cond_fn = get_conditional_internal_fn (code, op.type);
> +
>vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, 
> vectype_in);
> @@ -8231,17 +8301,29 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>if (code == COND_EXPR)
>  gcc_assert (ncopies == 1);
>  
> +  /* A binary COND_OP reduction must have the same definition and else
> + value. */
> +  bool cond_fn_p = code.is_internal_fn ()
> +&& conditional_internal_fn_code (internal_fn (code)) != ERROR_MARK;
> +  if (cond_fn_p)
> +{
> +  gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB
> +   || code == IFN_COND_MUL || code == IFN_COND_AND
> +   || code == IFN_COND_IOR || code == IFN_COND_XOR);
> +  gcc_assert (op.num_ops == 4 && (op.ops[1] == op.ops[3]));
> +}
> +
>bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>  
>vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
>if (reduction_type == FOLD_LEFT_REDUCTION)
>  {
>internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
> -  gcc_assert (code.is_tree_code ());
> +  gcc_assert (code.is_tree_code () || cond_fn_p);
>return vectorize_fold_left_reduction
> (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi,
> -tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks,
> -lens);
> +code, reduc_fn, op.ops, op.num_ops, vectype_in,
> +reduc_index, masks, lens);
>  }
>  
>bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
> @@ -8254,14 +8336,20 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
>tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
>  
> +  /* Get NCOPIES vector definitions for all operands except the reduction
> + definition.  */
>vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
>single_defuse_cycle && reduc_index == 0
>? NULL_TREE : op.ops[0], &vec_oprnds0,
>single_defuse_cycle && reduc_index == 1
>? NULL_TREE : op.ops[1], &vec_oprnds1,
> -  op.num_ops == 3
> -  && !(single_defuse_cycle && reduc_index == 2)
> +  op.num_ops == 4
> +  || (op.num_ops == 3
> +  && !(single_defuse_cycle && reduc_index == 2))
>? op.ops[2] : NULL_TREE, &vec_oprnds2);
> +
> +  /* For single def-use cycles get one copy of the vectorized reduction
> + definition.  */
>if (single_defuse_cycle)
>  {
>gcc_assert (!slp_node);
> @@ -8301,7 +8389,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>   }
>else
>   {
> -   if (op.num_ops == 3)
> +   if (op.num_ops >= 3)
>   vop[2] = vec_oprnds2[i];
>  
> if (masked_loop_p && mask_by_cond_expr)
> @@ -8314,10 +8402,16 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
> if (emulated_mixed_dot_prod)
>   new_stmt = vect_emulate_mixed_dot_prod (loop_vinfo, stmt_info, gsi,
>   vec_dest, vop);
> -   else if (code.is_internal_fn ())
> +
> +   else if (code.is_internal_fn () && !cond_fn_p)
>   new_stmt = gimple_build_call_internal (internal_fn (code),
>  op.num_ops,
>  vop[0], vop[1], vop[2]);
> +   else if (code.is_internal_fn () && cond_fn_p)
> + new_stmt = gimple_build_call_internal (internal_fn (code),
> +op.num_ops,
> +vop[0], vop[1], vop[2],
> +vop[1]);
> else
>   new_stmt = gimple_build_assign (vec_dest, tree_code (op.code),
>   vop[0], vop[1], vop[2]);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index f1d0cd79961..e22067400af 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2319,7 +2319,7 @@ extern tree vect_create_addr_base_for_vector_ref 
> (vec_info *,
> tree);
>  
>  /* In tree-vect-loop.cc.  */
> -extern tree neutral_op_for_reduction (tree, code_helper, tree);
> +extern tree neutral_op_for_reduction (tree, code_helper, tree, bool = true);
>  extern widest_int vect_iv_limit_for_partial_vectors (loop_vec_info 
> loop_vinfo);
>  bool vect_rgroup_iv_might_wrap_p (loop_vec_info, rgroup_controls *);
>  /* Used in tree-vect-loop-manip.cc */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> I restrict as you said into vect_external_def.
> 
> Then this condition made SLP failed:
> 
> -  if (mask_index >= 0
> +  if (mask_index >= 0 && internal_fn_len_index (ifn) < 0
>   && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
>   &mask, NULL, &mask_dt, &mask_vectype))
> return false;
>
> So I add 'internal_fn_len_index (ifn) < 0' for MASK_LEN_GATHER_LOAD does not 
> check scalar mask.

Rather figure why.
 
> Then ICE here:
> 
> vect_slp_analyze_node_operations
> if (child
>   && (SLP_TREE_DEF_TYPE (child) == vect_constant_def
>   || SLP_TREE_DEF_TYPE (child) == vect_external_def)
>   /* Perform usual caching, note code-generation still
>  code-gens these nodes multiple times but we expect
>  to CSE them later.  */
>   && !visited_set.add (child))
> {
>   visited_vec.safe_push (child);
>   /* ???  After auditing more code paths make a "default"
>  and push the vector type from NODE to all children
>  if it is not already set.  */
>   /* Compute the number of vectors to be generated.  */
>   tree vector_type = SLP_TREE_VECTYPE (child);
>   if (!vector_type)
> {
>   /* For shifts with a scalar argument we don't need
>  to cost or code-generate anything.
>  ???  Represent this more explicitely.  */
>   gcc_assert ((STMT_VINFO_TYPE (SLP_TREE_REPRESENTATIVE (node)) 
> > assert FAILed.
>        == shift_vec_info_type)
>   && j == 1);
>   continue;
> }
> 
> Could you help me with that?
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue t

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:

> Oh. I see.
> 
> Here make vect_constant_def failed to SLP:
> 
> tree-vect-slp.cc:
> vect_build_slp_tree_2
> line 2354:
> 
>   if (oprnd_info->first_dt == vect_external_def
>   || oprnd_info->first_dt == vect_constant_def)
> {
>   slp_tree invnode = vect_create_new_slp_node (oprnd_info->ops);
>   SLP_TREE_DEF_TYPE (invnode) = oprnd_info->first_dt;
>   oprnd_info->ops = vNULL;
>   children.safe_push (invnode);
>   continue;
> }
> 
> It seems that we handle vect_constant_def same as vect_external_def.
> So failed to SLP ?

Why?  We _should_ see a SLP node for the all-true mask operand.

> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-

Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-12 Thread Richard Biener
On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:

> In tree-vect-stmts.cc
> 
> vect_check_scalar_mask
> 
> Failed here:
> 
>   /* If the caller is not prepared for adjusting an external/constant
>  SLP mask vector type fail.  */
>   if (slp_node
>   && !mask_node

^^^

where's the mask_node?

>   && SLP_TREE_DEF_TYPE (mask_node_1) != vect_internal_def)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "SLP mask argument is not vectorized.\n");
>   return false;
> }
> 
> If we allow vect_constant_def, we should adjust constant SLP mask ? in the 
> caller "vectorizable_load" ?
> 
> But I don't know how to adjust that.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-10-12 17:55
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> On Thu, 12 Oct 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I tree-vect-slp.cc:
> > vect_get_and_check_slp_defs
> > 711: 
> > 
> >   tree type = TREE_TYPE (oprnd);
> >   dt = dts[i];
> >   if ((dt == vect_constant_def
> >|| dt == vect_external_def)
> >   && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> >   && (TREE_CODE (type) == BOOLEAN_TYPE
> >   || !can_duplicate_and_interleave_p (vinfo, stmts.length 
> > (),
> >   type)))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "Build SLP failed: invalid type of def "
> >  "for variable-length SLP %T\n", oprnd);
> >   return -1;
> > }
> > 
> > Here mask = -1 is BOOLEAN type in tree-vect-patterns.cc reaching this 
> > condition, then SLP failed:
> > Build SLP failed: invalid type of def
>  
> I think this can be restricted to vect_external_def, but some history
> might reveal the cases we put this code in for (we should be able to
> materialize all constants?).  At least uniform boolean constants
> should be fine.
> >
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-10-12 17:44
> > To: ???
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > On Thu, 12 Oct 2023, ??? wrote:
> >  
> > > Thanks Richi point it out.
> > > 
> > > I found this patch can't make conditional gather load succeed on SLP.
> > > 
> > > I am considering change MASK_LEN_GATHER_LOAD in pattern recognization:
> > > 
> > > If no condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0) -> 4 arguments same as 
> > > GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the GATHER_LOAD SLP 
> > > flow naturally.
> > > 
> > > If has condition mask, in tree-vect-patterns.cc,  I build 
> > > MASK_LEN_GATHER_LOAD (ptr, offset, scale, 0, condition) -> 5 arguments 
> > > same as MASK_GATHER_LOAD.
> > > In this situation, MASK_LEN_GATHER_LOAD can resue the MASK_GATHER_LOAD 
> > > SLP flow naturally.
> > > 
> > > Is it reasonable ?
> >  
> > What's wrong with handling MASK_LEN_GATHER_LOAD with all arguments
> > even when the mask is -1?
> >  
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-10-11 20:50
> > > To: Juzhe-Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> > > On Wed, 11 Oct 2023, Juzhe-Zhong wrote:
> > >  
> > > > This patch fixes this following FAILs in RISC-V regression:
> > > > 
> > > > FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains 
> > > > only SLP stmts"
> > > > FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  
> > > > scan-tree-dump vect "Loop contains only SLP stmts"
> >

Re: [PATCH] tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA

2023-10-12 Thread Richard Biener
On Thu, 12 Oct 2023, Richard Biener wrote:

> The following handles byte-aligned, power-of-two and byte-multiple
> sized BIT_FIELD_REF reads in SRA.  In particular this should cover
> BIT_FIELD_REFs created by optimize_bit_field_compare.
> 
> For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
> appearing there leading to more DSE, fully eliding the aggregates.
> 
> This results in the same false positive -Wuninitialized as the
> older attempt to remove the folding from optimize_bit_field_compare,
> fixed by initializing part of the aggregate unconditionally.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.
> 
> Martin is on leave so I'll push this tomorrow unless the Fortran
> folks have objections.

Err, and I forgot that hunk.  It's

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 7beefa2e69c..1b8be081a17 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -12015,7 +12015,10 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * 
expr2, bool init_flag,
 && !is_runtime_conformable (expr1, expr2);
 
   /* Only analyze the expressions for coarray properties, when in coarray-lib
- mode.  */
+ mode.  Avoid false-positive uninitialized diagnostics with initializing
+ the codimension flag unconditionally.  */
+  lhs_caf_attr.codimension = false;
+  rhs_caf_attr.codimension = false;
   if (flag_coarray == GFC_FCOARRAY_LIB)
 {
   lhs_caf_attr = gfc_caf_attr (expr1, false, &lhs_refs_comp);


> Thanks,
> Richard.
> 
>   PR tree-optimization/111779
> gcc/
>   * tree-sra.cc (sra_handled_bf_read_p): New function.
>   (build_access_from_expr_1): Handle some BIT_FIELD_REFs.
>   (sra_modify_expr): Likewise.
>   (make_fancy_name_1): Skip over BIT_FIELD_REF.
> 
> gcc/fortran/
>   * trans-expr.cc (gfc_trans_assignment_1): Initialize
>   lhs_caf_attr and rhs_caf_attr codimension flag to avoid
>   false positive -Wuninitialized.
> 
> gcc/testsuite/
>   * gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
>   * gcc.dg/vect/vect-pr111779.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c |  4 +-
>  gcc/testsuite/gcc.dg/vect/vect-pr111779.c  | 56 ++
>  gcc/tree-sra.cc| 24 --
>  3 files changed, 79 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> index e3c33f49ef6..43152de5616 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> @@ -31,5 +31,5 @@ constraint_equal (struct constraint a, struct constraint b)
>  && constraint_expr_equal (a.rhs, b.rhs);
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 1 "dse1" } } 
> */
> -/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 1 "dse1" } } 
> */
> +/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 2 "dse1" } } 
> */
> +/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 2 "dse1" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr111779.c 
> b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> new file mode 100644
> index 000..79b72aebc78
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-pr111779.c
> @@ -0,0 +1,56 @@
> +#include 
> +#include "tree-vect.h"
> +
> +struct C
> +{
> +int c;
> +int d;
> +bool f :1;
> +float e;
> +};
> +
> +struct A
> +{
> +  unsigned int a;
> +  unsigned char c1, c2;
> +  bool b1 : 1;
> +  bool b2 : 1;
> +  bool b3 : 1;
> +  struct C b4;
> +};
> +
> +void __attribute__((noipa))
> +foo (const struct A * __restrict x, int y)
> +{
> +  int s = 0, i = 0;
> +  for (i = 0; i < y; ++i)
> +{
> +  const struct A a = x[i];
> +  s += a.b4.f ? 1 : 0;
> +}
> +  if (s != 0)
> +__builtin_abort ();
> +}
> +
> +int
> +main ()
> +{
> +  struct A x[100];
> +  int i;
> +
> +  check_vect ();
> +
> +  __builtin_memset (x, -1, sizeof (x));
> +#pragma GCC novect
> +  for (i = 0; i < 100; i++)
> +{
> +  x[i].b1 = false;
> +  x[i].b2 = false;
> +  x[i].b3 = false;
> +  x[i].b4.f = false;
> +}
> +  foo (x, 100);
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target vect_int } 
> } } */
> diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> index 56a8ba26135..24d0c2

[PATCH] Add support for SLP vectorization of OpenMP SIMD clone calls

2023-10-13 Thread Richard Biener
This adds support for SLP vectorization of OpenMP SIMD clone calls.
There's a complication when vectorizing calls involving virtual
operands since this is now for the first time not only leafs (loads
or stores).  With SLP this runs into the issue that placement of
the vectorized stmts is not necessarily at one of the original
scalar stmts which leads to the magic updating virtual operands
in vect_finish_stmt_generation not working.  So we run into the
assert that updating virtual operands isn't necessary.  I've
papered over this similar to how we do for mismatched const/pure
attribution by setting vinfo->any_known_not_updated_vssa.

I've added two basic testcases with multi-lane SLP and verified
that with single-lane SLP enabled the rest of the existing testcases
pass.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push later 
today.

Richard.

* tree-vect-slp.cc (mask_call_maps): New.
(vect_get_operand_map): Handle IFN_MASK_CALL.
(vect_build_slp_tree_1): Likewise.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
SLP.

* gcc.dg/vect/slp-simd-clone-1.c: New testcase.
* gcc.dg/vect/slp-simd-clone-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/slp-simd-clone-1.c |  46 +
 gcc/testsuite/gcc.dg/vect/slp-simd-clone-2.c |  57 +++
 gcc/tree-vect-slp.cc |  20 +++-
 gcc/tree-vect-stmts.cc   | 102 ++-
 4 files changed, 196 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-simd-clone-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-simd-clone-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-simd-clone-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-simd-clone-1.c
new file mode 100644
index 000..6ccbb39b567
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-simd-clone-1.c
@@ -0,0 +1,46 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+
+#include "tree-vect.h"
+
+int x[1024];
+
+#pragma omp declare simd simdlen(4) notinbranch
+__attribute__((noinline)) int
+foo (int a, int b)
+{
+  return a + b;
+}
+
+void __attribute__((noipa))
+bar (void)
+{
+#pragma omp simd
+  for (int i = 0; i < 512; i++)
+{
+  x[2*i+0] = foo (x[2*i+0], x[2*i+0]);
+  x[2*i+1] = foo (x[2*i+1], x[2*i+1]);
+}
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+
+#pragma GCC novector
+  for (i = 0; i < 1024; i++)
+x[i] = i;
+
+  bar ();
+
+#pragma GCC novector
+  for (i = 0; i < 1024; i++)
+if (x[i] != i + i)
+  abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-simd-clone-2.c 
b/gcc/testsuite/gcc.dg/vect/slp-simd-clone-2.c
new file mode 100644
index 000..98387c92486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-simd-clone-2.c
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
+
+#include "tree-vect.h"
+
+int x[1024];
+
+#pragma omp declare simd simdlen(4) inbranch
+__attribute__((noinline)) int
+foo (int a, int b)
+{
+  return a + b;
+}
+
+void __attribute__((noipa))
+bar (void)
+{
+#pragma omp simd
+  for (int i = 0; i < 512; i++)
+{
+  if (x[2*i+0] < 10)
+   x[2*i+0] = foo (x[2*i+0], x[2*i+0]);
+  if (x[2*i+1] < 20)
+   x[2*i+1] = foo (x[2*i+1], x[2*i+1]);
+}
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+
+#pragma GCC novector
+  for (i = 0; i < 1024; i++)
+x[i] = i;
+
+  bar ();
+
+#pragma GCC novector
+  for (i = 0; i < 1024; i++)
+{
+  if (((i & 1) && i < 20)
+ || (!(i & 1) && i < 10))
+   {
+ if (x[i] != i + i)
+   abort ();
+   }
+  else if (x[i] != i)
+   abort ();
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target 
avx2_runtime } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4ff8cbaec04..436efdd4807 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -505,6 +505,14 @@ static const int arg2_map[] = { 1, 2 };
 static const int arg1_arg4_map[] = { 2, 1, 4 };
 static const int arg3_arg2_map[] = { 2, 3, 2 };
 static const int op1_op0_map[] = { 2, 1, 0 };
+static const int mask_call_maps[6][7] = {
+  { 1, 1, },
+  { 2, 1, 2, },
+  { 3, 1, 2, 3, },
+  { 4, 1, 2, 3, 4, },
+  { 5, 1, 2, 3, 4, 5, },
+  { 6, 1, 2, 3, 4, 5, 6 },
+};
 
 /* For most SLP statements, there is a one-to-one mapping between
gimple arguments and child nodes.  If that is not true for STMT,
@@ -547,6 +555,15 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
  case IFN_MASK_STORE:
return arg3_arg2_map;
 
+ case IFN_MASK_CALL:
+   {
+ unsigned nargs = gimple_call_num_args (call);
+ if (nargs >= 2 && nargs <= 7)
+ 

Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-pr69907.c for RVV

2023-10-13 Thread Richard Biener
On Fri, 13 Oct 2023, Juzhe-Zhong wrote:

> Like ARM SVE and GCN, add RVV.

Adding RVV when SVE or GCN is already there looks obvious to me, these
kind of changes are pre-approved.  No need for all the noise.

Thanks,
Richard.

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/bb-slp-pr69907.c: Add RVV.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
> index b348526b62f..f63b42a271a 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
> @@ -22,5 +22,5 @@ void foo(unsigned *p1, unsigned short *p2)
>  /* Disable for SVE because for long or variable-length vectors we don't
> get an unrolled epilogue loop.  Also disable for AArch64 Advanced SIMD,
> because there we can vectorize the epilogue using mixed vector sizes.
> -   Likewise for AMD GCN.  */
> -/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a 
> load is not supported" "slp1" { target { { ! aarch64*-*-* } && { ! 
> amdgcn*-*-* } } } } } */
> +   Likewise for AMD GCN and RVV.  */
> +/* { dg-final { scan-tree-dump "BB vectorization with gaps at the end of a 
> load is not supported" "slp1" { target { { ! aarch64*-*-* } && { { ! 
> amdgcn*-*-* } && { ! riscv_v } } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] RISC-V Regression: Fix FAIL of bb-slp-68.c for RVV

2023-10-13 Thread Richard Biener
On Fri, 13 Oct 2023, Juzhe-Zhong wrote:

> Like comment said, this test failed on 64 bytes vector.
> Both RVV and GCN has 64 bytes vector.
> 
> So it's more reasonable to use vect512.

OK

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/bb-slp-68.c: Use vect512.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
> index e7573a14933..2dd3d8ee90c 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
> @@ -20,4 +20,4 @@ void foo ()
>  
>  /* We want to have the store group split into 4, 2, 4 when using 32byte 
> vectors.
> Unfortunately it does not work when 64-byte vectors are available.  */
> -/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail amdgcn-*-* 
> } } } */
> +/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail vect512 } } 
> } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] OMP SIMD inbranch call vectorization for AVX512 style masks

2023-10-13 Thread Richard Biener
The following teaches vectorizable_simd_clone_call to handle
integer mode masks.  The tricky bit is to second-guess the
number of lanes represented by a single mask argument - the following
uses simdlen and the number of mask arguments to calculate that,
assuming ABIs have them uniform.

Similar to the VOIDmode handling there's a restriction on not
supporting splitting/merging of incoming vector masks to
more/less SIMD call arguments.

Bootstrapped and tested on x86_64-unknown-linux-gnu, re-testing
after a minor change.  Will push later.

Richard.

PR tree-optimization/111795
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
integer mode mask arguments.

* gcc.target/i386/vect-simd-clone-avx512-1.c: New testcase.
* gcc.target/i386/vect-simd-clone-avx512-2.c: Likewise.
* gcc.target/i386/vect-simd-clone-avx512-3.c: Likewise.
---
 .../i386/vect-simd-clone-avx512-1.c   |  43 +
 .../i386/vect-simd-clone-avx512-2.c   |   6 +
 .../i386/vect-simd-clone-avx512-3.c   |   6 +
 gcc/tree-vect-stmts.cc| 150 ++
 4 files changed, 175 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-3.c

diff --git a/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-1.c 
b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-1.c
new file mode 100644
index 000..e350996439e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-1.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512vl" } */
+
+#include "avx512vl-check.h"
+
+#ifndef SIMDLEN
+#define SIMDLEN 4
+#endif
+
+int x[1024];
+
+#pragma omp declare simd simdlen(SIMDLEN)
+__attribute__((noinline)) int
+foo (int a, int b)
+{
+  return a + b;
+}
+
+void __attribute__((noipa))
+bar (void)
+{
+#pragma omp simd
+  for (int i = 0; i < 1024; i++)
+if (x[i] < 20)
+  x[i] = foo (x[i], x[i]);
+}
+
+void avx512vl_test ()
+{
+  int i;
+#pragma GCC novector
+  for (i = 0; i < 1024; i++)
+x[i] = i;
+
+  bar ();
+
+#pragma GCC novector
+  for (i = 0; i < 1024; i++)
+if ((i < 20 && x[i] != i + i)
+   || (i >= 20 && x[i] != i))
+  abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-2.c 
b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-2.c
new file mode 100644
index 000..d9968ae30f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-2.c
@@ -0,0 +1,6 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512vl" } */
+
+#define SIMDLEN 8
+#include "vect-simd-clone-avx512-1.c"
diff --git a/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-3.c 
b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-3.c
new file mode 100644
index 000..c05f6c8ce91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-3.c
@@ -0,0 +1,6 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512vl } */
+/* { dg-options "-O2 -fopenmp-simd -mavx512vl" } */
+
+#define SIMDLEN 16
+#include "vect-simd-clone-avx512-1.c"
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 0fb6fc3394a..abc8603f67c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4492,6 +4492,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
i = -1;
break;
  case SIMD_CLONE_ARG_TYPE_MASK:
+   if (SCALAR_INT_MODE_P (n->simdclone->mask_mode)
+   != SCALAR_INT_MODE_P (TYPE_MODE (arginfo[i].vectype)))
+ i = -1;
break;
  }
if (i == (size_t) -1)
@@ -4517,6 +4520,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (bestn == NULL)
 return false;
 
+  unsigned int num_mask_args = 0;
+  if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode))
+for (i = 0; i < nargs; i++)
+  if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK)
+   num_mask_args++;
+
   for (i = 0; i < nargs; i++)
 {
   if ((arginfo[i].dt == vect_constant_def
@@ -4541,30 +4550,50 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
  return false;
}
 
-  if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
- && bestn->simdclone->mask_mode == VOIDmode
- && (simd_clone_subparts (bestn->simdclone->args[i].vector_type)
- != simd_clone_subparts (arginfo[i].vectype)))
+  if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK)
{
- /* FORNOW we only have partial support for vector-type masks that
-can't hold all of simdlen. */
- 

Re: [PATCH] wide-int: Fix estimation of buffer sizes for wide_int printing [PR111800]

2023-10-14 Thread Richard Biener



> Am 14.10.2023 um 10:21 schrieb Jakub Jelinek :
> 
> Hi!
> 
> As mentioned in the PR, my estimations on needed buffer size for wide_int
> and especially widest_int printing were incorrect, I've used get_len ()
> in the estimations, but that is true only for !wi::neg_p (x) values.
> Under the hood, we have 3 ways to print numbers.
> print_decs which if
>  if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
>  || (wi.get_len () == 1))
> uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive or
> negative) and otherwise uses print_hex,
> print_decu which if
>  if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
>  || (wi.get_len () == 1 && !wi::neg_p (wi)))
> uses sprintf which always fits into WIDE_INT_PRINT_BUFFER_SIZE (positive
> only) and print_hex, which doesn't print most significant limbs which are
> zero and the first limb which is non-zero prints such that redundant 0
> hex digits aren't printed, while all limbs below that are printed with
> "%016" PRIx64.  For wi::neg_p (x) values, the first limb of the precision
> is always non-zero, so we print all the limbs for the precision.
> So, the current estimations are accurate if !wi::neg_p (x), or when
> print_decs will be used and x.get_len () == 1, otherwise we need to use
> estimation based on get_precision () rather than get_len ().
> 
> The following patch does that, bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk?

Can we somehow abstract this common pattern?

> The patch doesn't address what I've talked about earlier, that we might
> actually stop using print_hex when asked for print_dec{s,u} - we could for
> negative print_decs just negate and call print_decu, and in print_decu
> e.g. in a loop UNSIGNED wi::divmod_trunc by
> HOST_WIDE_INT_UC (1000) and print the 19 decimal digits of
> remainder if quotient is non-zero, otherwise non-padded rest, and then
> reshuffle the buffer.  And/or perhaps print_hex should also take signop
> and print negative hex constants as -0x. if asked for SIGNED.
> And finally, I think we should try to rewrite tree-ssa-ccp.cc bit-cp from
> widest_int to wide_int, even the earlier:
> PHI node value: CONSTANT 
> 0xffe2
>  (0x19)
> in the -fdump-tree-ccp-details dumps is horribly confusing when the
> type is say just 32-bit or 64-bit, and with the recent widest_int changes
> those are now around with > 32000 f hex digits in there.  Not to mention we 
> shouldn't
> really care about state of bits beyond the precision and I think we always
> have the type in question around (x.val is INTEGER_CST of the right type
> and we just to::widest it, just x.mask is widest_int).
> 
> 2023-10-14  Jakub Jelinek  
> 
>PR tree-optimization/111800
> gcc/
>* wide-int.cc (assert_deceq): Use wi.get_len () for buffer size
>estimation only if !wi::neg_p (wi) or if len is 1 and sgn is SIGNED,
>otherwise use WIDE_INT_MAX_HWIS for wi.get_precision ().
>(assert_hexeq): Use wi.get_len () for buffer size estimation only
>if !wi::neg_p (wi), otherwise use WIDE_INT_MAX_HWIS for
>wi.get_precision ().
>* wide-int-print.cc (print_decs): Use wi.get_len () for buffer size
>estimation only if !wi::neg_p (wi) or if len is 1, otherwise use
>WIDE_INT_MAX_HWIS for wi.get_precision ().
>(print_decu): Use wi.get_len () for buffer size estimation only if
>!wi::neg_p (wi), otherwise use WIDE_INT_MAX_HWIS for
>wi.get_precision ().
>(print_hex): Likewise.
>* value-range.cc (irange_bitmask::dump): Use get_len () for
>buffer size estimation only if !wi::neg_p (wi), otherwise use
>WIDE_INT_MAX_HWIS for get_precision ().
>* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
>Likewise.
>* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
>i_bound.get_len () for buffer size estimation only if
>!wi::neg_p (i_bound) or if len is 1 and !TYPE_UNSIGNED, otherwise use
>WIDE_INT_MAX_HWIS for i_bound.get_precision ().  Use TYPE_SIGN macro
>in print_dec call argument.
> gcc/c-family/
>* c-warn.cc (match_case_to_enum_1): Assert w.get_precision ()
>is smaller or equal to WIDE_INT_MAX_INL_PRECISION rather than
>w.get_len () is smaller or equal to WIDE_INT_MAX_INL_ELTS.
> 
> --- gcc/wide-int.cc.jj2023-10-13 19:34:44.288830022 +0200
> +++ gcc/wide-int.cc2023-10-13 20:23:12.889386810 +0200
> @@ -2450,7 +2450,9 @@ static void
> assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn)
> {
>   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
>   unsigned len = wi.get_len ();
> +  if ((len != 1 || sgn == UNSIGNED) && wi::neg_p (wi))
> +len = WIDE_INT_MAX_HWIS (wi.get_precision ());
>   if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
> p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
>   print_dec (wi, p, 

Re: [PATCH] wide-int, v2: Fix estimation of buffer sizes for wide_int printing [PR111800]

2023-10-14 Thread Richard Biener



> Am 14.10.2023 um 11:50 schrieb Jakub Jelinek :
> 
> Hi!
> 
>> On Sat, Oct 14, 2023 at 10:41:28AM +0200, Richard Biener wrote:
>> Can we somehow abstract this common pattern?
> 
> So like this?  With room for the future tweaks like printing decimal
> instead of hex numbers by print_dec*, where we'd only need to adjust
> the inlines.  The XALLOCAVEC call is left for the callers, those would
> make the inlines uninlinable and not doing what they should.

LGTM.

Richard 

> 2023-10-14  Jakub Jelinek  
> 
>PR tree-optimization/111800
> gcc/
>* wide-int-print.h (print_dec_buf_size, print_decs_buf_size,
>print_decu_buf_size, print_hex_buf_size): New inline functions.
>* wide-int.cc (assert_deceq): Use print_dec_buf_size.
>(assert_hexeq): Use print_hex_buf_size.
>* wide-int-print.cc (print_decs): Use print_decs_buf_size.
>(print_decu): Use print_decu_buf_size.
>(print_hex): Use print_hex_buf_size.
>(pp_wide_int_large): Use print_dec_buf_size.
>* value-range.cc (irange_bitmask::dump): Use print_hex_buf_size.
>* value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks):
>Likewise.
>* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use
>print_dec_buf_size.  Use TYPE_SIGN macro in print_dec call argument.
> gcc/c-family/
>* c-warn.cc (match_case_to_enum_1): Assert w.get_precision ()
>is smaller or equal to WIDE_INT_MAX_INL_PRECISION rather than
>w.get_len () is smaller or equal to WIDE_INT_MAX_INL_ELTS.
> 
> --- gcc/wide-int-print.h.jj2023-10-13 19:34:44.283830089 +0200
> +++ gcc/wide-int-print.h2023-10-14 11:21:44.190603091 +0200
> @@ -36,4 +36,40 @@ extern void print_hex (const wide_int_re
> extern void print_hex (const wide_int_ref &wi, FILE *file);
> extern void pp_wide_int_large (pretty_printer *, const wide_int_ref &, 
> signop);
> 
> +inline bool
> +print_dec_buf_size (const wide_int_ref &wi, signop sgn, unsigned int *len)
> +{
> +  unsigned int l = wi.get_len ();
> +  if ((l != 1 || sgn == UNSIGNED) && wi::neg_p (wi))
> +l = WIDE_INT_MAX_HWIS (wi.get_precision ());
> +  l = l * HOST_BITS_PER_WIDE_INT / 4 + 4;
> +  *len = l;
> +  return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE);
> +}
> +
> +inline bool
> +print_decs_buf_size (const wide_int_ref &wi, unsigned int *len)
> +{
> +  return print_dec_buf_size (wi, SIGNED, len);
> +}
> +
> +inline bool
> +print_decu_buf_size (const wide_int_ref &wi, unsigned int *len)
> +{
> +  return print_dec_buf_size (wi, UNSIGNED, len);
> +}
> +
> +inline bool
> +print_hex_buf_size (const wide_int_ref &wi, unsigned int *len)
> +{
> +  unsigned int l;
> +  if (wi::neg_p (wi))
> +l = WIDE_INT_MAX_HWIS (wi.get_precision ());
> +  else
> +l = wi.get_len ();
> +  l = l * HOST_BITS_PER_WIDE_INT / 4 + 4;
> +  *len = l;
> +  return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE);
> +}
> +
> #endif /* WIDE_INT_PRINT_H */
> --- gcc/wide-int.cc.jj2023-10-14 11:07:52.738850767 +0200
> +++ gcc/wide-int.cc2023-10-14 11:22:03.100347386 +0200
> @@ -2450,9 +2450,9 @@ static void
> assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn)
> {
>   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
> -  unsigned len = wi.get_len ();
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
> -p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
> +  unsigned len;
> +  if (print_dec_buf_size (wi, sgn, &len))
> +p = XALLOCAVEC (char, len);
>   print_dec (wi, p, sgn);
>   ASSERT_STREQ (expected, p);
> }
> @@ -2463,9 +2463,9 @@ static void
> assert_hexeq (const char *expected, const wide_int_ref &wi)
> {
>   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
> -  unsigned len = wi.get_len ();
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
> -p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
> +  unsigned len;
> +  if (print_hex_buf_size (wi, &len))
> +p = XALLOCAVEC (char, len);
>   print_hex (wi, p);
>   ASSERT_STREQ (expected, p);
> }
> --- gcc/wide-int-print.cc.jj2023-10-14 11:07:52.737850781 +0200
> +++ gcc/wide-int-print.cc2023-10-14 11:37:43.994623668 +0200
> @@ -75,9 +75,9 @@ void
> print_decs (const wide_int_ref &wi, FILE *file)
> {
>   char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
> -  unsigned len = wi.get_len ();
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
> -p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
> +  unsigned len;
> +  if (print_decs_buf_size (wi, &len))
> +p = XALLOCAVEC (char, len);
>   print_decs (wi, p);
>   fputs (p, file);
> }
> @@ -102,9 +102,9 @@ 

Re: [PATCH] Do not prepend target triple to -fuse-ld=lld,mold.

2023-10-16 Thread Richard Biener
On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:

> lld and mold are platform-agnostic and not prefixed with target triple.
> Prepending the target triple makes it less likely to find the intended
> linker executable.
> 
> A potential breaking change is that we no longer try to search for
> triple-prefixed lld/mold binaries anymore. However, since there doesn't
> seem to be support to build LLVM or mold with triple-prefixed executable
> names, it seems better to just not bother with that case.
> 
>   PR driver/111605
> 
> gcc/Changelog:
> 
>   * collect2.cc (main): Do not prepend target triple to
>   -fuse-ld=lld,mold.
> ---
>  gcc/collect2.cc | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/collect2.cc b/gcc/collect2.cc
> index 63b9a0c233a..c943f9f577c 100644
> --- a/gcc/collect2.cc
> +++ b/gcc/collect2.cc
> @@ -865,12 +865,15 @@ main (int argc, char **argv)
>int i;
>  
>for (i = 0; i < USE_LD_MAX; i++)
> -full_ld_suffixes[i]
>  #ifdef CROSS_DIRECTORY_STRUCTURE
> -  = concat (target_machine, "-", ld_suffixes[i], NULL);
> -#else
> -  = ld_suffixes[i];
> -#endif
> +/* lld and mold are platform-agnostic and not prefixed with target
> +   triple.  */
> +if (!(i == USE_LLD_LD || i == USE_MOLD_LD))
> +  full_ld_suffixes[i] = concat (target_machine, "-", ld_suffixes[i],
> + NULL);
> +else
> +#endif
> +  full_ld_suffixes[i] = ld_suffixes[i];
>  
>p = argv[0] + strlen (argv[0]);
>while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))

Since we later do

  /* Search the compiler directories for `ld'.  We have protection against
 recursive calls in find_a_file.  */
  if (ld_file_name == 0)
ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], 
X_OK);
  /* Search the ordinary system bin directories
 for `ld' (if native linking) or `TARGET-ld' (if cross).  */
  if (ld_file_name == 0)
ld_file_name = find_a_file (&path, full_ld_suffixes[selected_linker], 
X_OK);

I wonder how having full_ld_suffixes[LLD|MOLD] == ld_suffixes[LLD|MOLD]
fixes anything?

Richard.


Re: [PATCH] Do not prepend target triple to -fuse-ld=lld,mold.

2023-10-16 Thread Richard Biener
On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:

> 
> 
> > On Oct 16, 2023, at 17:39, Richard Biener  wrote:
> > 
> > On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> > 
> >> lld and mold are platform-agnostic and not prefixed with target triple.
> >> Prepending the target triple makes it less likely to find the intended
> >> linker executable.
> >> 
> >> A potential breaking change is that we no longer try to search for
> >> triple-prefixed lld/mold binaries anymore. However, since there doesn't
> >> seem to be support to build LLVM or mold with triple-prefixed executable
> >> names, it seems better to just not bother with that case.
> >> 
> >>PR driver/111605
> >> 
> >> gcc/Changelog:
> >> 
> >>* collect2.cc (main): Do not prepend target triple to
> >>-fuse-ld=lld,mold.
> >> ---
> >> gcc/collect2.cc | 13 -
> >> 1 file changed, 8 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/gcc/collect2.cc b/gcc/collect2.cc
> >> index 63b9a0c233a..c943f9f577c 100644
> >> --- a/gcc/collect2.cc
> >> +++ b/gcc/collect2.cc
> >> @@ -865,12 +865,15 @@ main (int argc, char **argv)
> >>   int i;
> >> 
> >>   for (i = 0; i < USE_LD_MAX; i++)
> >> -full_ld_suffixes[i]
> >> #ifdef CROSS_DIRECTORY_STRUCTURE
> >> -  = concat (target_machine, "-", ld_suffixes[i], NULL);
> >> -#else
> >> -  = ld_suffixes[i];
> >> -#endif
> >> +/* lld and mold are platform-agnostic and not prefixed with target
> >> +   triple.  */
> >> +if (!(i == USE_LLD_LD || i == USE_MOLD_LD))
> >> +  full_ld_suffixes[i] = concat (target_machine, "-", ld_suffixes[i],
> >> +  NULL);
> >> +else
> >> +#endif
> >> +  full_ld_suffixes[i] = ld_suffixes[i];
> >> 
> >>   p = argv[0] + strlen (argv[0]);
> >>   while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))
> > 
> > Since we later do
> > 
> >  /* Search the compiler directories for `ld'.  We have protection against
> > recursive calls in find_a_file.  */
> >  if (ld_file_name == 0)
> >ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], 
> > X_OK);
> >  /* Search the ordinary system bin directories
> > for `ld' (if native linking) or `TARGET-ld' (if cross).  */
> >  if (ld_file_name == 0)
> >ld_file_name = find_a_file (&path, full_ld_suffixes[selected_linker], 
> > X_OK);
> > 
> > I wonder how having full_ld_suffixes[LLD|MOLD] == ld_suffixes[LLD|MOLD]
> > fixes anything?
> 
> Per the linked PR, the intended use case for this is when one wants to use 
> their system lld/mold with a separately packaged cross toolchain, without 
> requiring them to symlink their system lld/mold into the cross toolchain bin 
> directory.
> 
> (Note that the first search is against COMPILER_PATH while the latter is 
> against PATH).

Ah.  So what about instead adding here

   /* Search the ordinary system bin directories for mold/lld even in
  a cross configuration.  */
   if (ld_file_name == 0
   && selected_linker == ...)
 ld_file_name = find_a_file (&path, ld_suffixes[selected_linker], X_OK);

instead?  That would keep things working in case the user has a
xyz-arch-mold in the system dir but uses GNU ld on the host
otherwise, lacking a 'mold' binary there?

That is, we'd only add, not change what we search for.

Thanks,
Richard.


Re: [PATCH] MATCH: Improve `A CMP 0 ? A : -A` set of patterns to use bitwise_equal_p.

2023-10-16 Thread Richard Biener
On Mon, Oct 16, 2023 at 12:00 AM Andrew Pinski  wrote:
>
> This improves the `A CMP 0 ? A : -A` set of match patterns to use
> bitwise_equal_p which allows an nop cast between signed and unsigned.
> This allows catching a few extra cases which were not being caught before.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> PR tree-optimization/101541
> * match.pd (A CMP 0 ? A : -A): Improve
> using bitwise_equal_p.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/101541
> * gcc.dg/tree-ssa/phi-opt-36.c: New test.
> * gcc.dg/tree-ssa/phi-opt-37.c: New test.
> ---
>  gcc/match.pd   | 49 -
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c | 51 ++
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-37.c | 24 ++
>  3 files changed, 104 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-37.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 45624f3dcb4..142e2dfbeb1 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5668,42 +5668,51 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   /* A == 0 ? A : -Asame as -A */
>   (for cmp (eq uneq)
>(simplify
> -   (cnd (cmp @0 zerop) @0 (negate@1 @0))
> -(if (!HONOR_SIGNED_ZEROS (type))
> +   (cnd (cmp @0 zerop) @2 (negate@1 @2))
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& bitwise_equal_p (@0, @2))
>   @1))
>(simplify
> -   (cnd (cmp @0 zerop) zerop (negate@1 @0))
> -(if (!HONOR_SIGNED_ZEROS (type))
> +   (cnd (cmp @0 zerop) zerop (negate@1 @2))
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& bitwise_equal_p (@0, @2))
>   @1))
>   )
>   /* A != 0 ? A : -Asame as A */
>   (for cmp (ne ltgt)
>(simplify
> -   (cnd (cmp @0 zerop) @0 (negate @0))
> -(if (!HONOR_SIGNED_ZEROS (type))
> - @0))
> +   (cnd (cmp @0 zerop) @1 (negate @1))
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& bitwise_equal_p (@0, @1))
> + @1))
>(simplify
> -   (cnd (cmp @0 zerop) @0 integer_zerop)
> -(if (!HONOR_SIGNED_ZEROS (type))
> - @0))
> +   (cnd (cmp @0 zerop) @1 integer_zerop)
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& bitwise_equal_p (@0, @1))
> + @1))
>   )
>   /* A >=/> 0 ? A : -Asame as abs (A) */
>   (for cmp (ge gt)
>(simplify
> -   (cnd (cmp @0 zerop) @0 (negate @0))
> -(if (!HONOR_SIGNED_ZEROS (type)
> -&& !TYPE_UNSIGNED (type))
> - (abs @0
> +   (cnd (cmp @0 zerop) @1 (negate @1))
> +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> +&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> +&& bitwise_equal_p (@0, @1))
> + (if (TYPE_UNSIGNED (type))
> +  (absu:type @0)
> +  (abs @0)
>   /* A <=/< 0 ? A : -Asame as -abs (A) */
>   (for cmp (le lt)
>(simplify
> -   (cnd (cmp @0 zerop) @0 (negate @0))
> -(if (!HONOR_SIGNED_ZEROS (type)
> -&& !TYPE_UNSIGNED (type))
> - (if (ANY_INTEGRAL_TYPE_P (type)
> - && !TYPE_OVERFLOW_WRAPS (type))
> +   (cnd (cmp @0 zerop) @1 (negate @1))
> +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> +&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> +&& bitwise_equal_p (@0, @1))
> + (if ((ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> + || TYPE_UNSIGNED (type))
>(with {
> -   tree utype = unsigned_type_for (type);
> +   tree utype = unsigned_type_for (TREE_TYPE(@0));
> }
> (convert (negate (absu:utype @0
> (negate (abs @0)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c
> new file mode 100644
> index 000..4baf9f82a22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c
> @@ -0,0 +1,51 @@
> +/* { dg-options "-O2 -fdump-tree-phiopt" } */
> +
> +unsigned f0(int A)
> +{
> +  unsigned t = A;
> +// A == 0? A : -Asame as -A
> +  if (A == 0)  return t;
> +  return -t;
> +}
> +
> +unsigned f1(int A)
> +{
> +  unsigned t = A;
> +// A != 0? A : -Asame as A
> +  if (A != 0)  return t;
> +  return -t;
> +}
> +unsigned f2(int A)
> +{
> +  unsigned t = A;
> +// A >= 0? A : -Asame as abs (A)
> +  if (A >= 0)  return t;
> +  return -t;
> +}
> +unsigned f3(int A)
> +{
> +  unsigned t = A;
> +// A > 0?  A : -Asame as abs (A)
> +  if (A > 0)  return t;
> +  return -t;
> +}
> +unsigned f4(int A)
> +{
> +  unsigned t = A;
> +// A <= 0? A : -Asame as -abs (A)
> +  if (A <= 0)  return t;
> +  return -t;
> +}
> +unsigned f5(int A)
> +{
> +  unsigned t = A;
> +// A < 0?  A : -Asame as -abs (A)
> +  if (A < 0)  return t;
> +  return -t;
> +}
> +
> +/* f4 and f5 are not allowed to be optimized in early phi-opt. */
> +/* { dg-final { scan-tree-dump-times "if " 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-not "if " "phiopt2" } } */
> +
> 

Re: [PATCH] Do not prepend target triple to -fuse-ld=lld,mold.

2023-10-16 Thread Richard Biener
On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:

> 
> 
> > On Oct 16, 2023, at 17:55, Richard Biener  wrote:
> > 
> > On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> > 
> >> 
> >> 
> >>> On Oct 16, 2023, at 17:39, Richard Biener  wrote:
> >>> 
> >>> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> >>> 
> >>>> lld and mold are platform-agnostic and not prefixed with target triple.
> >>>> Prepending the target triple makes it less likely to find the intended
> >>>> linker executable.
> >>>> 
> >>>> A potential breaking change is that we no longer try to search for
> >>>> triple-prefixed lld/mold binaries anymore. However, since there doesn't
> >>>> seem to be support to build LLVM or mold with triple-prefixed executable
> >>>> names, it seems better to just not bother with that case.
> >>>> 
> >>>>  PR driver/111605
> >>>> 
> >>>> gcc/Changelog:
> >>>> 
> >>>>  * collect2.cc (main): Do not prepend target triple to
> >>>>  -fuse-ld=lld,mold.
> >>>> ---
> >>>> gcc/collect2.cc | 13 -
> >>>> 1 file changed, 8 insertions(+), 5 deletions(-)
> >>>> 
> >>>> diff --git a/gcc/collect2.cc b/gcc/collect2.cc
> >>>> index 63b9a0c233a..c943f9f577c 100644
> >>>> --- a/gcc/collect2.cc
> >>>> +++ b/gcc/collect2.cc
> >>>> @@ -865,12 +865,15 @@ main (int argc, char **argv)
> >>>>  int i;
> >>>> 
> >>>>  for (i = 0; i < USE_LD_MAX; i++)
> >>>> -full_ld_suffixes[i]
> >>>> #ifdef CROSS_DIRECTORY_STRUCTURE
> >>>> -  = concat (target_machine, "-", ld_suffixes[i], NULL);
> >>>> -#else
> >>>> -  = ld_suffixes[i];
> >>>> -#endif
> >>>> +/* lld and mold are platform-agnostic and not prefixed with target
> >>>> +   triple.  */
> >>>> +if (!(i == USE_LLD_LD || i == USE_MOLD_LD))
> >>>> +  full_ld_suffixes[i] = concat (target_machine, "-", ld_suffixes[i],
> >>>> +NULL);
> >>>> +else
> >>>> +#endif
> >>>> +  full_ld_suffixes[i] = ld_suffixes[i];
> >>>> 
> >>>>  p = argv[0] + strlen (argv[0]);
> >>>>  while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))
> >>> 
> >>> Since we later do
> >>> 
> >>> /* Search the compiler directories for `ld'.  We have protection against
> >>>recursive calls in find_a_file.  */
> >>> if (ld_file_name == 0)
> >>>   ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], 
> >>> X_OK);
> >>> /* Search the ordinary system bin directories
> >>>for `ld' (if native linking) or `TARGET-ld' (if cross).  */
> >>> if (ld_file_name == 0)
> >>>   ld_file_name = find_a_file (&path, full_ld_suffixes[selected_linker], 
> >>> X_OK);
> >>> 
> >>> I wonder how having full_ld_suffixes[LLD|MOLD] == ld_suffixes[LLD|MOLD]
> >>> fixes anything?
> >> 
> >> Per the linked PR, the intended use case for this is when one wants to use 
> >> their system lld/mold with a separately packaged cross toolchain, without 
> >> requiring them to symlink their system lld/mold into the cross toolchain 
> >> bin directory.
> >> 
> >> (Note that the first search is against COMPILER_PATH while the latter is 
> >> against PATH).
> > 
> > Ah.  So what about instead adding here
> > 
> >   /* Search the ordinary system bin directories for mold/lld even in
> >  a cross configuration.  */
> >   if (ld_file_name == 0
> >   && selected_linker == ...)
> > ld_file_name = find_a_file (&path, ld_suffixes[selected_linker], X_OK);
> > 
> > instead?  That would keep things working in case the user has a
> > xyz-arch-mold in the system dir but uses GNU ld on the host
> > otherwise, lacking a 'mold' binary there?
> > 
> > That is, we'd only add, not change what we search for.
> 
> I considered that, but as described in commit message, it doesn?t seem anyone 
> has created stuff named xyz-arch-lld or xyz-arch-mold. Closest is Gentoo?s 
> symlink mentioned in this thread, but that?s xyz-arch-ld -> ld.lld/mold.
> As such, this feels like a quirk, not something we need to keep compatibility 
> for.

I don't have a good idea whether this is the case or not unfortunately
so if it's my call I would err on the safe side.

We seem to recognize mold and lld only since GCC 12 which both are
still maintained so I think we might want to do the change on all
those branches?

If you feel confident there's indeed no such installs then let's go
with your original patch.

Thus, OK for trunk and the affected branches after a while of no
reported issues.

Thanks,
Richard.

> The proposed change seems simple enough though, so if you consider this 
> a compatibility issue I can go for that way as well.

> Tatsuyuki.
> 
> > 
> > Thanks,
> > Richard.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] Improve factor_out_conditional_operation for conversions and constants

2023-10-16 Thread Richard Biener
On Mon, Oct 16, 2023 at 2:02 AM Andrew Pinski  wrote:
>
> In the case of a NOP conversion (precisions of the 2 types are equal),
> factoring out the conversion can be done even if int_fits_type_p returns
> false and even when the conversion is defined by a statement inside the
> conditional. Since it is a NOP conversion there is no zero/sign extending
> happening which is why it is ok to be done here.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> PR tree-optimization/104376
> PR tree-optimization/101541
> * tree-ssa-phiopt.cc (factor_out_conditional_operation):
> Allow nop conversions even if it is defined by a statement
> inside the conditional.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/101541
> * gcc.dg/tree-ssa/phi-opt-38.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c | 44 ++
>  gcc/tree-ssa-phiopt.cc |  8 +++-
>  2 files changed, 50 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
> new file mode 100644
> index 000..ca04d1619e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
> @@ -0,0 +1,44 @@
> +/* { dg-options "-O2 -fdump-tree-phiopt" } */
> +
> +unsigned f0(int A)
> +{
> +// A == 0? A : -Asame as -A
> +  if (A == 0)  return A;
> +  return -A;
> +}
> +
> +unsigned f1(int A)
> +{
> +// A != 0? A : -Asame as A
> +  if (A != 0)  return A;
> +  return -A;
> +}
> +unsigned f2(int A)
> +{
> +// A >= 0? A : -Asame as abs (A)
> +  if (A >= 0)  return A;
> +  return -A;
> +}
> +unsigned f3(int A)
> +{
> +// A > 0?  A : -Asame as abs (A)
> +  if (A > 0)  return A;
> +  return -A;
> +}
> +unsigned f4(int A)
> +{
> +// A <= 0? A : -Asame as -abs (A)
> +  if (A <= 0)  return A;
> +  return -A;
> +}
> +unsigned f5(int A)
> +{
> +// A < 0?  A : -Asame as -abs (A)
> +  if (A < 0)  return A;
> +  return -A;
> +}
> +
> +/* f4 and f5 are not allowed to be optimized in early phi-opt. */
> +/* { dg-final { scan-tree-dump-times "if" 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-not "if" "phiopt2" } } */
> +
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 312a6f9082b..0ab8fad5898 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -310,7 +310,9 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
> *phi,
> return NULL;
>/* If arg1 is an INTEGER_CST, fold it to new type.  */
>if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
> - && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
> + && (int_fits_type_p (arg1, TREE_TYPE (new_arg0))
> + || TYPE_PRECISION (TREE_TYPE (new_arg0))
> + == TYPE_PRECISION (TREE_TYPE (arg1

can you add parens for auto-indent?

> {
>   if (gimple_assign_cast_p (arg0_def_stmt))
> {
> @@ -323,7 +325,9 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
> *phi,
>  its basic block, because then it is possible this
>  could enable further optimizations (minmax replacement
>  etc.).  See PR71016.  */

Doesn't the comment also apply for equal precision?

> - if (new_arg0 != gimple_cond_lhs (cond_stmt)
> + if (TYPE_PRECISION (TREE_TYPE (new_arg0))
> +   != TYPE_PRECISION (TREE_TYPE (arg1))
> + && new_arg0 != gimple_cond_lhs (cond_stmt)
>   && new_arg0 != gimple_cond_rhs (cond_stmt)
>   && gimple_bb (arg0_def_stmt) == e0->src)
> {

When we later fold_convert () I think you want to drop TREE_OVERFLOW
which we eventually add.

Otherwise OK I think.

Richard.

> --
> 2.34.1
>


Re: [PATCH] [PR31531] MATCH: Improve ~a < ~b and ~a < CST, allow a nop cast inbetween ~ and a/b

2023-10-16 Thread Richard Biener
On Mon, Oct 16, 2023 at 4:34 AM Andrew Pinski  wrote:
>
> Currently we able to simplify `~a CMP ~b` to `b CMP a` but we should allow a 
> nop
> conversion in between the `~` and the `a` which can show up. A similarly 
> thing should
> be done for `~a CMP CST`.
>
> I had originally submitted the `~a CMP CST` case as
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585088.html;
> I noticed we should do the same thing for the `~a CMP ~b` case and combined
> it with that one here.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/31531
>
> gcc/ChangeLog:
>
> * match.pd (~X op ~Y): Allow for an optional nop convert.
> (~X op C): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr31531-1.c: New test.
> * gcc.dg/tree-ssa/pr31531-2.c: New test.
> ---
>  gcc/match.pd  | 10 ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c | 19 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c | 34 +++
>  3 files changed, 59 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 51e5065d086..e76ec1ec034 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5944,18 +5944,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Fold ~X op ~Y as Y op X.  */
>  (for cmp (simple_comparison)
>   (simplify
> -  (cmp (bit_not@2 @0) (bit_not@3 @1))
> +  (cmp (nop_convert1?@4 (bit_not@2 @0)) (nop_convert2? (bit_not@3 @1)))
>(if (single_use (@2) && single_use (@3))
> -   (cmp @1 @0
> +   (with { tree otype = TREE_TYPE (@4); }
> +(cmp (convert:otype @1) (convert:otype @0))
>
>  /* Fold ~X op C as X op' ~C, where op' is the swapped comparison.  */
>  (for cmp (simple_comparison)
>   scmp (swapped_simple_comparison)
>   (simplify
> -  (cmp (bit_not@2 @0) CONSTANT_CLASS_P@1)
> +  (cmp (nop_convert? (bit_not@2 @0)) CONSTANT_CLASS_P@1)
>(if (single_use (@2)
> && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST))
> -   (scmp @0 (bit_not @1)
> +   (with { tree otype = TREE_TYPE (@1); }
> +(scmp (convert:otype @0) (bit_not @1))
>
>  (for cmp (simple_comparison)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c
> new file mode 100644
> index 000..c27299151eb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* PR tree-optimization/31531 */
> +
> +int f(int a)
> +{
> +  int b = ~a;
> +  return b<0;
> +}
> +
> +
> +int f1(unsigned a)
> +{
> +  int b = ~a;
> +  return b<0;
> +}
> +/* We should convert the above two functions from b <0 to ((int)a) >= 0. */
> +/* { dg-final { scan-tree-dump-times ">= 0" 2 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "~" 0 "optimized"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c
> new file mode 100644
> index 000..865ea292215
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* PR tree-optimization/31531 */
> +
> +int f0(unsigned x, unsigned t)
> +{
> +x = ~x;
> +t = ~t;
> +int xx = x;
> +int tt = t;
> +return tt < xx;
> +}
> +
> +int f1(unsigned x, int t)
> +{
> +x = ~x;
> +t = ~t;
> +int xx = x;
> +int tt = t;
> +return tt < xx;
> +}
> +
> +int f2(int x, unsigned t)
> +{
> +x = ~x;
> +t = ~t;
> +int xx = x;
> +int tt = t;
> +return tt < xx;
> +}
> +
> +
> +/* We should be able to remove all ~ from the above functions. */
> +/* { dg-final { scan-tree-dump-times "~" 0 "optimized"} } */
> --
> 2.39.3
>


[PATCH] tree-optimization/111807 - ICE in verify_sra_access_forest

2023-10-16 Thread Richard Biener
The following addresses build_reconstructed_reference failing to
build references with a different offset than the models and thus
the caller conditional being off.  This manifests when attempting
to build a ref with offset 160 from the model BIT_FIELD_REF 
onto the same base l_4827 but the models offset being 288.  This
cannot work for any kind of ref I can think of, not just with
BIT_FIELD_REFs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push
later.

Martin - do you remember which case was supposed to be allowed
with offset < model->offset?

Thanks,
Richard.

PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Only call
build_reconstructed_reference when the offsets are the same.

* gcc.dg/torture/pr111807.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111807.c | 12 
 gcc/tree-sra.cc |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111807.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111807.c 
b/gcc/testsuite/gcc.dg/torture/pr111807.c
new file mode 100644
index 000..09fbdcfb667
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111807.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+static struct A {
+  int x : 4;
+} a;
+static int b;
+int main()
+{
+  struct A t[2];
+  t[0] = b ? t[1] : a;
+  return (b ? t[1].x : 0) && 1;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 24d0c20da6a..f8dff8b27d7 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1751,7 +1751,7 @@ build_ref_for_model (location_t loc, tree base, 
HOST_WIDE_INT offset,
  && !TREE_THIS_VOLATILE (base)
  && (TYPE_ADDR_SPACE (TREE_TYPE (base))
  == TYPE_ADDR_SPACE (TREE_TYPE (model->expr)))
- && offset <= model->offset
+ && offset == model->offset
  /* build_reconstructed_reference can still fail if we have already
 massaged BASE because of another type incompatibility.  */
  && (res = build_reconstructed_reference (loc, base, model)))
-- 
2.35.3


[PATCH] middle-end/111818 - failed DECL_NOT_GIMPLE_REG_P setting of volatile

2023-10-16 Thread Richard Biener
The following addresses a missed DECL_NOT_GIMPLE_REG_P setting of
a volatile declared parameter which causes inlining to substitute
a constant parameter into a context where its address is required.

The main issue is in update_address_taken which clears
DECL_NOT_GIMPLE_REG_P from the parameter but fails to rewrite it
because is_gimple_reg returns false for volatiles.  The following
changes maybe_optimize_var to make the 1:1 correspondence between
clearing DECL_NOT_GIMPLE_REG_P of a register typed decl and
actually rewriting it to SSA.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/111818
* tree-ssa.cc (maybe_optimize_var): When clearing
DECL_NOT_GIMPLE_REG_P always rewrite into SSA.

* gcc.dg/torture/pr111818.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111818.c | 11 +++
 gcc/tree-ssa.cc | 17 +++--
 2 files changed, 22 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111818.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111818.c 
b/gcc/testsuite/gcc.dg/torture/pr111818.c
new file mode 100644
index 000..a7a9d71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111818.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+static void foo(const volatile unsigned int x, void *p)
+{
+  __builtin_memcpy(p, (void *)&x, sizeof x);
+}
+
+void bar(void *number)
+{
+  foo(0, number);
+}
diff --git a/gcc/tree-ssa.cc b/gcc/tree-ssa.cc
index ebba02b8449..2f3210fcf61 100644
--- a/gcc/tree-ssa.cc
+++ b/gcc/tree-ssa.cc
@@ -1788,15 +1788,20 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
bitmap not_reg_needs,
  maybe_reg = true;
  DECL_NOT_GIMPLE_REG_P (var) = 0;
}
-  if (maybe_reg && is_gimple_reg (var))
+  if (maybe_reg)
{
- if (dump_file)
+ if (is_gimple_reg (var))
{
- fprintf (dump_file, "Now a gimple register: ");
- print_generic_expr (dump_file, var);
- fprintf (dump_file, "\n");
+ if (dump_file)
+   {
+ fprintf (dump_file, "Now a gimple register: ");
+ print_generic_expr (dump_file, var);
+ fprintf (dump_file, "\n");
+   }
+ bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
}
- bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
+ else
+   DECL_NOT_GIMPLE_REG_P (var) = 1;
}
 }
 }
-- 
2.35.3


Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-17 Thread Richard Biener
On Thu, Oct 12, 2023 at 10:42 AM Ajit Agarwal  wrote:
>
> This patch improves code sinking pass to sink statements before call to reduce
> register pressure.
> Review comments are incorporated. Synced and modified with latest trunk 
> sources.
>
> For example :
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>   l = a + b + c + d +e + f;
>   if (a != 5)
> {
>   bar();
>   j = l;
> }
> }
>
> Code Sinking does the following:
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>
>   if (a != 5)
> {
>   l = a + b + c + d +e + f;
>   bar();
>   j = l;
> }
> }
>
> Bootstrapped regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
> tree-ssa-sink: Improve code sinking pass
>
> Currently, code sinking will sink code after function calls.  This increases
> register pressure for callee-saved registers.  The following patch improves
> code sinking by placing the sunk code before calls in the use block or in
> the immediate dominator of the use blocks.

The patch no longer does what the description above says.

More comments below.

> 2023-10-12  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> PR tree-optimization/81953
> * tree-ssa-sink.cc (statement_sink_location): Move statements before
> calls.
> (select_best_block): Add heuristics to select the best blocks in the
> immediate post dominator.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/81953
> * gcc.dg/tree-ssa/ssa-sink-20.c: New test.
> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
>  gcc/tree-ssa-sink.cc| 39 -
>  3 files changed, 56 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 000..d3b79ca5803
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> new file mode 100644
> index 000..84e7938c54f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j, x;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  if (b != 3)
> +x = 3;
> +  else
> +x = 5;
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index a360c5cdd6e..95298bc8402 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
>
>  /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
> tree, return the best basic block between them (inclusive) to place
> -   statements.
> +   statements. The best basic block should be an immediate dominator of
> +   best basic block if the use stmt is after the call.
>
> We want the most control dependent block in the shallowest loop nest.
>
> @@ -196,6 +197,16 @@ select_best_block (basic_block early_bb,
>basic_block best_bb = late_bb;
>basic_block temp_bb = late_bb;
>int threshold;
> +  /* Get the sinking threshold.  If the statement to be moved has memory
> + operands, then increase the threshold by 7% as those are even more
> + profitable to avoid, clamping at 100%.  */
> +  threshold = param_sink_frequency_threshold;
> +  if (gimple_vuse (stmt) || gimple_vdef (stmt))
> +{
> +  threshold += 7;
> +  if (threshold > 100)
> +   threshold = 100;
> +}
>
>while (temp_bb != early_bb)
>  {
> @@ -204,6 +215,14 @@ select_best_block (basic_block early_bb,
>if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
> best_bb = temp_bb;
>
> +  /* if we have temp_bb post dominated by use block block then immediate
> +   * dominator would be our best block.  */
> +  if (!gimple_vuse (stmt)
> + && bb_loop_depth (temp_bb) == bb_loop_depth (early_

Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

2023-10-17 Thread Richard Biener
On Mon, Oct 16, 2023 at 9:27 PM Jeff Law  wrote:
>
>
>
> On 10/15/23 03:49, Roger Sayle wrote:
> >
> > Hi Jeff,
> > Thanks for the speedy review(s).
> >
> >> From: Jeff Law 
> >> Sent: 15 October 2023 00:03
> >> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
> >> make_compound_operation.
> >>
> >> On 10/14/23 16:14, Roger Sayle wrote:
> >>>
> >>> This patch is my proposed solution to PR rtl-optimization/91865.
> >>> Normally RTX simplification canonicalizes a ZERO_EXTEND of a
> >>> ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
> >>> possible for combine's make_compound_operation to unintentionally
> >>> generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
> >>> unlikely to be matched by the backend.
> >>>
> >>> For the new test case:
> >>>
> >>> const int table[2] = {1, 2};
> >>> int foo (char i) { return table[i]; }
> >>>
> >>> compiling with -O2 -mlarge on msp430 we currently see:
> >>>
> >>> Trying 2 -> 7:
> >>>   2: r25:HI=zero_extend(R12:QI)
> >>> REG_DEAD R12:QI
> >>>   7: r28:PSI=sign_extend(r25:HI)#0
> >>> REG_DEAD r25:HI
> >>> Failed to match this instruction:
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>   (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]
> >>>
> >>> which results in the following code:
> >>>
> >>> foo:AND #0xff, R12
> >>>   RLAM.A #4, R12 { RRAM.A #4, R12
> >>>   RLAM.A  #1, R12
> >>>   MOVX.W  table(R12), R12
> >>>   RETA
> >>>
> >>> With this patch, we now see:
> >>>
> >>> Trying 2 -> 7:
> >>>   2: r25:HI=zero_extend(R12:QI)
> >>> REG_DEAD R12:QI
> >>>   7: r28:PSI=sign_extend(r25:HI)#0
> >>> REG_DEAD r25:HI
> >>> Successfully matched this instruction:
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>   (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
> >>> combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
> >>> cost 8
> >>>
> >>> foo:MOV.B   R12, R12
> >>>   RLAM.A  #1, R12
> >>>   MOVX.W  table(R12), R12
> >>>   RETA
> >>>
> >>>
> >>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >>> and make -k check, both with and without --target_board=unix{-m32}
> >>> with no new failures.  Ok for mainline?
> >>>
> >>> 2023-10-14  Roger Sayle  
> >>>
> >>> gcc/ChangeLog
> >>>   PR rtl-optimization/91865
> >>>   * combine.cc (make_compound_operation): Avoid creating a
> >>>   ZERO_EXTEND of a ZERO_EXTEND.
> >>>
> >>> gcc/testsuite/ChangeLog
> >>>   PR rtl-optimization/91865
> >>>   * gcc.target/msp430/pr91865.c: New test case.
> >> Neither an ACK or NAK at this point.
> >>
> >> The bug report includes a patch from Segher which purports to fix this in 
> >> simplify-
> >> rtx.  Any thoughts on Segher's approach and whether or not it should be
> >> considered?
> >>
> >> The BZ also indicates that removal of 2 patterns from msp430.md would 
> >> solve this
> >> too (though it may cause regressions elsewhere?).  Any thoughts on that 
> >> approach
> >> as well?
> >>
> >
> > Great questions.  I believe Segher's proposed patch (in comment #4) was an
> > msp430-specific proof-of-concept workaround rather than intended to be fix.
> > Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
> > is not a solution that'll work on many platforms (and therefore not really 
> > suitable
> > for target-independent middle-end code in the RTL optimizers).
> Thanks.  I didn't really look at Segher's patch, so thanks for digging
> into it.  Certainly just flipping the mode of the hard register isn't
> correct.
>
>
> >
> > The underlying issue, which is applicable to all targets, is that 
> > combine.cc's
> > make_compound_operation is expected to reverse the local transformations
> > made by expand_compound_operation.  Hence, if an RTL expression is
> > canonical going into expand_compound_operation, it is expected (hoped)
> > to be canonical (and equivalent) coming out of make_compound_operation.
> In theory, correct.
>
>
> >
> > Hence, rather than be a MSP430 specific issue, no target should expect (or
> > be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
> > of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
> > CONST_INT operands, or a shift by zero, it's something the middle-end might
> > reasonably be expected to clean-up. [Yeah, I know... 😊]
> Agreed.
>
>
>
> >
> >>> (set (reg:PSI 28 [ iD.1772 ])
> >>>   (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]
> >
> > As a rule of thumb, if the missed optimization bug report includes combine's
> > diagnostic "Failed to match this instruction:", things can be improved by 
> > adding
> > a pattern (often a define_insn_and_split) that matches the shown RTL.
> Yes, but we also need to ponder if that's the right way to fix any given
> problem.  Sometimes we're going to be better

Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-17 Thread Richard Biener
On Mon, Oct 16, 2023 at 11:59 PM Richard Sandiford
 wrote:
>
> Robin Dapp  writes:
> >> Why are the contents of this if statement wrong for COND_LEN?
> >> If the "else" value doesn't matter, then the masked form can use
> >> the "then" value for all elements.  I would have expected the same
> >> thing to be true of COND_LEN.
> >
> > Right, that one was overly pessimistic.  Removed.
> >
> >> But isn't the test whether res_op->code itself is an internal_function?
> >> In other words, shouldn't it just be:
> >>
> >>   if (internal_fn_p (res_op->code)
> >>&& internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
> >>  return true;
> >>
> >> maybe_resimplify_conditional_op should already have converted to an
> >> internal function where possible, and if combined_fn (res_op->code)
> >> does any extra conversion on the fly, that conversion won't be reflected
> >> in res_op.
> >
> > I went through some of our test cases and believe most of the problems
> > are due to situations like the following:
> >
> > In vect-cond-arith-2.c we have (on riscv)
> >   vect_neg_xi_14.4_23 = -vect_xi_13.3_22;
> >   vect_res_2.5_24 = .COND_LEN_ADD ({ -1, ... }, vect_res_1.0_17, 
> > vect_neg_xi_14.4_23, vect_res_1.0_17, _29, 0);
> >
> > On aarch64 this is a situation that matches the VEC_COND_EXPR
> > simplification that I disabled with this patch.  We valueized
> > to _26 = vect_res_1.0_17 - vect_xi_13.3_22 and then create
> > vect_res_2.5_24 = VEC_COND_EXPR ;
> > This is later re-assembled into a COND_SUB.
> >
> > As we have two masks or COND_LEN we cannot use a VEC_COND_EXPR to
> > achieve the same thing.  Would it be possible to create a COND_OP
> > directly instead, though?  I tried the following (not very polished
> > obviously):
> >
> > -  new_op.set_op (VEC_COND_EXPR, res_op->type,
> > -res_op->cond.cond, res_op->ops[0],
> > -res_op->cond.else_value);
> > -  *res_op = new_op;
> > -  return gimple_resimplify3 (seq, res_op, valueize);
> > +  if (!res_op->cond.len)
> > +   {
> > + new_op.set_op (VEC_COND_EXPR, res_op->type,
> > +res_op->cond.cond, res_op->ops[0],
> > +res_op->cond.else_value);
> > + *res_op = new_op;
> > + return gimple_resimplify3 (seq, res_op, valueize);
> > +   }
> > +  else if (seq && *seq && is_gimple_assign (*seq))
> > +   {
> > + new_op.code = gimple_assign_rhs_code (*seq);
> > + new_op.type = res_op->type;
> > + new_op.num_ops = gimple_num_ops (*seq) - 1;
> > + new_op.ops[0] = gimple_assign_rhs1 (*seq);
> > + if (new_op.num_ops > 1)
> > +   new_op.ops[1] = gimple_assign_rhs2 (*seq);
> > + if (new_op.num_ops > 2)
> > +   new_op.ops[2] = gimple_assign_rhs2 (*seq);
> > +
> > + new_op.cond = res_op->cond;
> > +
> > + gimple_match_op bla2;
> > + if (convert_conditional_op (&new_op, &bla2))
> > +   {
> > + *res_op = bla2;
> > + // SEQ should now be dead.
> > + return true;
> > +   }
> > +   }
> >
> > This would make the other hunk (check whether it was a LEN
> > and try to recreate it) redundant I hope.
> >
> > I don't know enough about valueization, whether it's always
> > safe to do that and other implications.  On riscv this seems
> > to work, though and the other backends never go through the LEN
> > path.  If, however, this is a feasible direction it could also
> > be done for the non-LEN targets?
>
> I don't know much about valueisation either :)  But it does feel
> like we're working around the lack of a LEN form of COND_EXPR.
> In other words, it seems odd that we can do:
>
>   IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>
> but we can't do:
>
>   IFN_COND_LEN (mask, a, b, len, bias)
>
> There seems to be no way of applying a length without also finding an
> operation to perform.

Indeed .. maybe - _maybe_ we want to scrap VEC_COND_EXPR for
IFN_COND{,_LEN} to be more consistent here?

> Does IFN_COND_LEN make conceptual sense on RVV?  If so, would defining
> it solve some of these problems?
>
> I suppose in the worst case, IFN_COND_LEN is equivalent to IFN_COND_LEN_IOR
> with a zero input (and extended to floats).  So if the target can do
> IFN_COND_LEN_IOR, it could implement IFN_COND_LEN using the same instruction.

In principle one can construct a mask from the length via {0, 1, ... }
< len and then
AND that to the mask in a VEC_COND_EXPR but that's of course super ugly and
likely inefficient (or hard to match back on RTL land).

Richard.

> Thanks,
> Richard
>


Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-17 Thread Richard Biener
On Tue, Oct 17, 2023 at 10:53 AM Ajit Agarwal  wrote:
>
> Hello Richard:
>
> On 17/10/23 2:03 pm, Richard Biener wrote:
> > On Thu, Oct 12, 2023 at 10:42 AM Ajit Agarwal  
> > wrote:
> >>
> >> This patch improves code sinking pass to sink statements before call to 
> >> reduce
> >> register pressure.
> >> Review comments are incorporated. Synced and modified with latest trunk 
> >> sources.
> >>
> >> For example :
> >>
> >> void bar();
> >> int j;
> >> void foo(int a, int b, int c, int d, int e, int f)
> >> {
> >>   int l;
> >>   l = a + b + c + d +e + f;
> >>   if (a != 5)
> >> {
> >>   bar();
> >>   j = l;
> >> }
> >> }
> >>
> >> Code Sinking does the following:
> >>
> >> void bar();
> >> int j;
> >> void foo(int a, int b, int c, int d, int e, int f)
> >> {
> >>   int l;
> >>
> >>   if (a != 5)
> >> {
> >>   l = a + b + c + d +e + f;
> >>   bar();
> >>   j = l;
> >> }
> >> }
> >>
> >> Bootstrapped regtested on powerpc64-linux-gnu.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >> tree-ssa-sink: Improve code sinking pass
> >>
> >> Currently, code sinking will sink code after function calls.  This 
> >> increases
> >> register pressure for callee-saved registers.  The following patch improves
> >> code sinking by placing the sunk code before calls in the use block or in
> >> the immediate dominator of the use blocks.
> >
> > The patch no longer does what the description above says.
> Why you think so. Please let me know.

You talk about calls above but the patch doesn't do anything about calls.  You
also don't do anything about register pressure, rather the effect of
your changes
are to move some stmts by a smaller "distance", whatever effect that has.

> >
> > More comments below.
> >
> >> 2023-10-12  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >> PR tree-optimization/81953
> >> * tree-ssa-sink.cc (statement_sink_location): Move statements 
> >> before
> >> calls.
> >> (select_best_block): Add heuristics to select the best blocks in 
> >> the
> >> immediate post dominator.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> PR tree-optimization/81953
> >> * gcc.dg/tree-ssa/ssa-sink-20.c: New test.
> >> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> >> ---
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
> >>  gcc/tree-ssa-sink.cc| 39 -
> >>  3 files changed, 56 insertions(+), 17 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >> new file mode 100644
> >> index 000..d3b79ca5803
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >> @@ -0,0 +1,15 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> >> +void bar();
> >> +int j;
> >> +void foo(int a, int b, int c, int d, int e, int f)
> >> +{
> >> +  int l;
> >> +  l = a + b + c + d +e + f;
> >> +  if (a != 5)
> >> +{
> >> +  bar();
> >> +  j = l;
> >> +}
> >> +}
> >> +/* { dg-final { scan-tree-dump 
> >> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >> new file mode 100644
> >> index 000..84e7938c54f
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >> @@ -0,0 +1,19 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> >> +void bar();
> >> +int j, x;
> >> +void foo(int a, int b, int c, int d, int e, int f)
> >> +{
> >> +  int l;
> >>

Re: [PATCH 02/11] Handle epilogues that contain jumps

2023-10-17 Thread Richard Biener
On Thu, Oct 12, 2023 at 10:15 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Aug 22, 2023 at 12:42 PM Szabolcs Nagy via Gcc-patches
> >  wrote:
> >>
> >> From: Richard Sandiford 
> >>
> >> The prologue/epilogue pass allows the prologue sequence
> >> to contain jumps.  The sequence is then partitioned into
> >> basic blocks using find_many_sub_basic_blocks.
> >>
> >> This patch treats epilogues in the same way.  It's needed for
> >> a follow-on aarch64 patch that adds conditional code to both
> >> the prologue and the epilogue.
> >>
> >> Tested on aarch64-linux-gnu (including with a follow-on patch)
> >> and x86_64-linux-gnu.  OK to install?
> >>
> >> Richard
> >>
> >> gcc/
> >> * function.cc (thread_prologue_and_epilogue_insns): Handle
> >> epilogues that contain jumps.
> >> ---
> >>
> >> This is a previously approved patch that was not committed
> >> because it was not needed at the time, but i'd like to commit
> >> it as it is needed for the followup aarch64 eh_return changes:
> >>
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605769.html
> >>
> >> ---
> >>  gcc/function.cc | 10 ++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/gcc/function.cc b/gcc/function.cc
> >> index dd2c1136e07..70d1cd65303 100644
> >> --- a/gcc/function.cc
> >> +++ b/gcc/function.cc
> >> @@ -6120,6 +6120,11 @@ thread_prologue_and_epilogue_insns (void)
> >>   && returnjump_p (BB_END (e->src)))
> >> e->flags &= ~EDGE_FALLTHRU;
> >> }
> >> +
> >> + auto_sbitmap blocks (last_basic_block_for_fn (cfun));
> >> + bitmap_clear (blocks);
> >> +   bitmap_set_bit (blocks, BLOCK_FOR_INSN (epilogue_seq)->index);
> >> + find_many_sub_basic_blocks (blocks);
> >> }
> >>else if (next_active_insn (BB_END (exit_fallthru_edge->src)))
> >> {
> >> @@ -6218,6 +6223,11 @@ thread_prologue_and_epilogue_insns (void)
> >>   set_insn_locations (seq, epilogue_location);
> >>
> >>   emit_insn_before (seq, insn);
> >> +
> >> + auto_sbitmap blocks (last_basic_block_for_fn (cfun));
> >> + bitmap_clear (blocks);
> >> + bitmap_set_bit (blocks, BLOCK_FOR_INSN (insn)->index);
> >> + find_many_sub_basic_blocks (blocks);
> >
> > I'll note that clearing a full sbitmap to pass down a single basic block
> > to find_many_sub_basic_blocks is a quite expensive operation.  May I suggest
> > to add an overload operating on a single basic block?  It's only
> >
> >   FOR_EACH_BB_FN (bb, cfun)
> > SET_STATE (bb,
> >bitmap_bit_p (blocks, bb->index) ? BLOCK_TO_SPLIT :
> > BLOCK_ORIGINAL);
> >
> > using the bitmap, so factoring the rest of the function and customizing this
> > walk would do the trick.  Note that the whole function could be refactored 
> > to
> > handle single blocks more efficiently.
>
> Sorry for the late reply, but does this look OK?  Tested on
> aarch64-linux-gnu and x86_64-linux-gnu.

LGTM, not sure if I'm qualified enough to approve though (I think you
are more qualified here, so ..)

Thanks,
Richard.

> Thanks,
> Richard
>
> ---
>
> The prologue/epilogue pass allows the prologue sequence to contain
> jumps.  The sequence is then partitioned into basic blocks using
> find_many_sub_basic_blocks.
>
> This patch treats epilogues in a similar way.  Since only one block
> might need to be split, the patch (re)introduces a find_sub_basic_blocks
> routine to handle a single block.
>
> The new routine hard-codes the assumption that split_block will chain
> the new block immediately after the original block.  The routine doesn't
> try to replicate the fix for PR81030, since that was specific to
> gimple->rtl expansion.
>
> The patch is needed for follow-on aarch64 patches that add conditional
> code to the epilogue.  The tests are part of those patches.
>
> gcc/
> * cfgbuild.h (find_sub_basic_blocks): Declare.
> * cfgbuild.cc (update_profile_for_new_sub_basic_block): New function,
> split out from...
> (find_many_sub_basic_blocks): ...here.
> (find_sub_basic_blocks): New function.
> * function.cc (thread_prologue_and_epilogue_insns

Re: [PATCH] MATCH: [PR111432] Simplify `a & (x | CST)` to a when we know that (a & ~CST) == 0

2023-10-17 Thread Richard Biener
On Sat, Oct 14, 2023 at 2:57 AM Andrew Pinski  wrote:
>
> This adds the simplification `a & (x | CST)` to a when we know that
> `(a & ~CST) == 0`. In a similar fashion as `a & CST` is handle.
>
> I looked into handling `a | (x & CST)` but that I don't see any decent
> simplifications happening.
>
> OK? Bootstrapped and tested on x86_linux-gnu with no regressions.

OK.

> PR tree-optimization/111432
>
> gcc/ChangeLog:
>
> * match.pd (`a & (x | CST)`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bitops-7.c: New test.
> ---
>  gcc/match.pd |  8 
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-7.c | 24 
>  2 files changed, 32 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-7.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 51e5065d086..45624f3dcb4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1550,6 +1550,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>&& wi::bit_and_not (get_nonzero_bits (@0), wi::to_wide (@1)) == 0)
>@0))
> +
> +/* `a & (x | CST)` -> a if we know that (a & ~CST) == 0   */
> +(simplify
> + (bit_and:c SSA_NAME@0 (bit_ior @1 INTEGER_CST@2))
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  && wi::bit_and_not (get_nonzero_bits (@0), wi::to_wide (@2)) == 0)
> +  @0))
> +
>  /* x | C -> C if we know that x & ~C == 0.  */
>  (simplify
>   (bit_ior SSA_NAME@0 INTEGER_CST@1)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-7.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitops-7.c
> new file mode 100644
> index 000..7fb18db3a11
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-7.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
> +/* PR tree-optimization/111432 */
> +
> +int
> +foo3(int c, int bb)
> +{
> +  if ((bb & ~3)!=0) __builtin_unreachable();
> +  return (bb & (c|3));
> +}
> +
> +int
> +foo_bool(int c, _Bool bb)
> +{
> +  return (bb & (c|7));
> +}
> +
> +/* Both of these functions should be able to remove the `IOR` and `AND`
> +   as the only bits that are non-zero for bb is set on the other side
> +   of the `AND`.
> + */
> +
> +/* { dg-final { scan-tree-dump-not   "bit_ior_expr, "   "optimized" } } */
> +/* { dg-final { scan-tree-dump-not   "bit_and_expr, "   "optimized" } } */
> --
> 2.39.3
>


Re: [PATCH] wide-int-print: Don't print large numbers hexadecimally for print_dec{,s,u}

2023-10-17 Thread Richard Biener
)
> @@ -92,11 +94,37 @@ print_decu (const wide_int_ref &wi, char
>|| (wi.get_len () == 1 && !wi::neg_p (wi)))
>  sprintf (buf, HOST_WIDE_INT_PRINT_UNSIGNED, wi.to_uhwi ());
>else
> -print_hex (wi, buf);
> +{
> +  widest2_int w = widest2_int::from (wi, UNSIGNED), r;
> +  widest2_int ten19 = HOST_WIDE_INT_UC (1000);
> +  char buf2[20], next1[19], next2[19];
> +  size_t l, c = 0, i;
> +  /* In order to avoid dividing this twice, print the 19 decimal
> +  digit chunks in reverse order into buffer and then reorder
> +  them in-place.  */
> +  while (wi::gtu_p (w, ten19))
> + {
> +   w = wi::divmod_trunc (w, ten19, UNSIGNED, &r);
> +   sprintf (buf + c * 19, "%019" PRIu64, r.to_uhwi ());
> +   ++c;
> + }
> +  l = sprintf (buf2, HOST_WIDE_INT_PRINT_UNSIGNED, w.to_uhwi ());
> +  buf[c * 19 + l] = '\0';
> +  memcpy (next1, buf, 19);
> +  memcpy (buf, buf2, l);
> +  for (i = 0; i < c / 2; ++i)
> + {
> +   memcpy (next2, buf + (c - i - 1) * 19, 19);
> +   memcpy (buf + l + (c - i - 1) * 19, next1, 19);
> +   memcpy (next1, buf + (i + 1) * 19, 19);
> +   memcpy (buf + l + i * 19, next2, 19);
> + }
> +  if (c & 1)
> + memcpy (buf + l + i * 19, next1, 19);
> +}
>  }
>  
> -/* Try to print the signed self in decimal to FILE if the number fits
> -   in a HWI.  Other print in hex.  */
> +/* Try to print the signed self in decimal to FILE.  */
>  
>  void
>  print_decu (const wide_int_ref &wi, FILE *file)
> @@ -155,8 +183,7 @@ void
>  pp_wide_int_large (pretty_printer *pp, const wide_int_ref &w, signop sgn)
>  {
>unsigned int len;
> -  if (!print_dec_buf_size (w, sgn, &len))
> -len = WIDE_INT_PRINT_BUFFER_SIZE;
> +  print_dec_buf_size (w, sgn, &len);
>char *buf = XALLOCAVEC (char, len);
>print_dec (w, buf, sgn);
>pp_string (pp, buf);
> --- gcc/pretty-print.h.jj 2023-10-15 23:04:06.095422965 +0200
> +++ gcc/pretty-print.h2023-10-16 10:51:56.053529117 +0200
> @@ -448,8 +448,9 @@ pp_wide_integer (pretty_printer *pp, HOS
>  inline void
>  pp_wide_int (pretty_printer *pp, const wide_int_ref &w, signop sgn)
>  {
> -  unsigned int prec = w.get_precision ();
> -  if (UNLIKELY ((prec + 3) / 4 > sizeof (pp_buffer (pp)->digit_buffer) - 3))
> +  unsigned int len;
> +  print_dec_buf_size (w, sgn, &len);
> +  if (UNLIKELY (len > sizeof (pp_buffer (pp)->digit_buffer)))
>  pp_wide_int_large (pp, w, sgn);
>else
>  {
> --- gcc/tree-pretty-print.cc.jj   2023-09-21 20:02:53.467522151 +0200
> +++ gcc/tree-pretty-print.cc  2023-10-16 11:05:51.131997367 +0200
> @@ -2248,10 +2248,11 @@ dump_generic_node (pretty_printer *pp, t
> pp_minus (pp);
> val = -val;
>   }
> -   unsigned int prec = val.get_precision ();
> -   if ((prec + 3) / 4 > sizeof (pp_buffer (pp)->digit_buffer) - 3)
> +   unsigned int len;
> +   print_hex_buf_size (val, &len);
> +   if (UNLIKELY (len > sizeof (pp_buffer (pp)->digit_buffer)))
>   {
> -   char *buf = XALLOCAVEC (char, (prec + 3) / 4 + 3);
> +   char *buf = XALLOCAVEC (char, len);
> print_hex (val, buf);
> pp_string (pp, buf);
>   }
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-optimization/111846 - put simd-clone-info into SLP tree

2023-10-17 Thread Richard Biener
The following avoids bogously re-using the simd-clone-info we
currently hang off stmt_info from two different SLP contexts where
a different number of lanes should have chosen a different best
simdclone.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111846
* tree-vectorizer.h (_slp_tree::simd_clone_info): Add.
(SLP_TREE_SIMD_CLONE_INFO): New.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
SLP_TREE_SIMD_CLONE_INFO.
(_slp_tree::~_slp_tree): Release it.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
SLP_TREE_SIMD_CLONE_INFO or STMT_VINFO_SIMD_CLONE_INFO
dependent on if we're doing SLP.

* gcc.dg/vect/pr111846.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr111846.c | 12 ++
 gcc/tree-vect-slp.cc |  2 ++
 gcc/tree-vect-stmts.cc   | 35 +---
 gcc/tree-vectorizer.h|  6 +
 4 files changed, 36 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr111846.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr111846.c 
b/gcc/testsuite/gcc.dg/vect/pr111846.c
new file mode 100644
index 000..d283882f261
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111846.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -ffast-math" } */
+/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
+
+extern __attribute__((__simd__)) float powf(float, float);
+float gv[0][10];
+float eq_set_bands_real_adj[0];
+void eq_set_bands_real() {
+  for (int c = 0; c < 10; c++)
+for (int i = 0; i < 10; i++)
+  gv[c][i] = powf(0, eq_set_bands_real_adj[i]) - 1;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index af8f5031bd2..d081999a763 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -117,6 +117,7 @@ _slp_tree::_slp_tree ()
   SLP_TREE_CHILDREN (this) = vNULL;
   SLP_TREE_LOAD_PERMUTATION (this) = vNULL;
   SLP_TREE_LANE_PERMUTATION (this) = vNULL;
+  SLP_TREE_SIMD_CLONE_INFO (this) = vNULL;
   SLP_TREE_DEF_TYPE (this) = vect_uninitialized_def;
   SLP_TREE_CODE (this) = ERROR_MARK;
   SLP_TREE_VECTYPE (this) = NULL_TREE;
@@ -143,6 +144,7 @@ _slp_tree::~_slp_tree ()
   SLP_TREE_VEC_DEFS (this).release ();
   SLP_TREE_LOAD_PERMUTATION (this).release ();
   SLP_TREE_LANE_PERMUTATION (this).release ();
+  SLP_TREE_SIMD_CLONE_INFO (this).release ();
   if (this->failed)
 free (failed);
 }
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b3a56498595..9bb43e98f56 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4215,6 +4215,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (nargs == 0)
 return false;
 
+  vec& simd_clone_info = (slp_node ? SLP_TREE_SIMD_CLONE_INFO (slp_node)
+   : STMT_VINFO_SIMD_CLONE_INFO (stmt_info));
   arginfo.reserve (nargs, true);
   auto_vec slp_op;
   slp_op.safe_grow_cleared (nargs);
@@ -4256,25 +4258,22 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
gcc_assert (thisarginfo.vectype != NULL_TREE);
 
   /* For linear arguments, the analyze phase should have saved
-the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
-  if (i * 3 + 4 <= STMT_VINFO_SIMD_CLONE_INFO (stmt_info).length ()
- && STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2])
+the base and step in {STMT_VINFO,SLP_TREE}_SIMD_CLONE_INFO.  */
+  if (i * 3 + 4 <= simd_clone_info.length ()
+ && simd_clone_info[i * 3 + 2])
{
  gcc_assert (vec_stmt);
- thisarginfo.linear_step
-   = tree_to_shwi (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2]);
- thisarginfo.op
-   = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 1];
+ thisarginfo.linear_step = tree_to_shwi (simd_clone_info[i * 3 + 2]);
+ thisarginfo.op = simd_clone_info[i * 3 + 1];
  thisarginfo.simd_lane_linear
-   = (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 3]
-  == boolean_true_node);
+   = (simd_clone_info[i * 3 + 3] == boolean_true_node);
  /* If loop has been peeled for alignment, we need to adjust it.  */
  tree n1 = LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo);
  tree n2 = LOOP_VINFO_NITERS (loop_vinfo);
  if (n1 != n2 && !thisarginfo.simd_lane_linear)
{
  tree bias = fold_build2 (MINUS_EXPR, TREE_TYPE (n1), n1, n2);
- tree step = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2];
+ tree step = simd_clone_info[i * 3 + 2];
  tree opt = TREE_TYPE (thisarginfo.op);
  bias = fold_convert (TREE_TYPE (step), bias);
  bias = fold_build2 (MULT_EXPR, TREE_TYPE (step), bias, step);
@@ -4328,8 +4327,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   unsigned group_size = slp_node ? SLP_TREE_L

Re: PING Re: [PATCH v2 RFA] diagnostic: add permerror variants with opt

2023-10-17 Thread Richard Biener
On Tue, Oct 17, 2023 at 9:51 PM Jason Merrill  wrote:
>
> Ping?

OK.

Thanks,
Richard.

> On 10/3/23 17:09, Jason Merrill wrote:
> > This revision changes from using DK_PEDWARN for permerror-with-option to 
> > using
> > DK_PERMERROR.
> >
> > Tested x86_64-pc-linux-gnu.  OK for trunk?
> >
> > -- 8< --
> >
> > In the discussion of promoting some pedwarns to be errors by default, rather
> > than move them all into -fpermissive it seems to me to make sense to support
> > DK_PERMERROR with an option flag.  This way will also work with
> > -fpermissive, but users can also still use -Wno-error=narrowing to downgrade
> > that specific diagnostic rather than everything affected by -fpermissive.
> >
> > So, for diagnostics that we want to make errors by default we can just
> > change the pedwarn call to permerror.
> >
> > The tests check desired behavior for such a permerror in a system header
> > with various flags.  The patch preserves the existing permerror behavior of
> > ignoring -w and system headers by default, but respecting them when
> > downgraded to a warning by -fpermissive.
> >
> > This seems similar to but a bit better than the approach of forcing
> > -pedantic-errors that I previously used for -Wnarrowing: specifically, in
> > that now -w by itself is not enough to silence the -Wnarrowing
> > error (integer-pack2.C).
> >
> > gcc/ChangeLog:
> >
> >   * doc/invoke.texi: Move -fpermissive to Warning Options.
> >   * diagnostic.cc (update_effective_level_from_pragmas): Remove
> >   redundant system header check.
> >   (diagnostic_report_diagnostic): Move down syshdr/-w check.
> >   (diagnostic_impl): Handle DK_PERMERROR with an option number.
> >   (permerror): Add new overloads.
> >   * diagnostic-core.h (permerror): Declare them.
> >
> > gcc/cp/ChangeLog:
> >
> >   * typeck2.cc (check_narrowing): Use permerror.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/ext/integer-pack2.C: Add -fpermissive.
> >   * g++.dg/diagnostic/sys-narrow.h: New test.
> >   * g++.dg/diagnostic/sys-narrow1.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1a.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1b.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1c.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1d.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1e.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1f.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1g.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1h.C: New test.
> >   * g++.dg/diagnostic/sys-narrow1i.C: New test.
> > ---
> >   gcc/doc/invoke.texi   | 22 +++---
> >   gcc/diagnostic-core.h |  3 +
> >   gcc/testsuite/g++.dg/diagnostic/sys-narrow.h  |  2 +
> >   gcc/cp/typeck2.cc | 10 +--
> >   gcc/diagnostic.cc | 67 ---
> >   gcc/testsuite/g++.dg/diagnostic/sys-narrow1.C |  4 ++
> >   .../g++.dg/diagnostic/sys-narrow1a.C  |  5 ++
> >   .../g++.dg/diagnostic/sys-narrow1b.C  |  5 ++
> >   .../g++.dg/diagnostic/sys-narrow1c.C  |  5 ++
> >   .../g++.dg/diagnostic/sys-narrow1d.C  |  5 ++
> >   .../g++.dg/diagnostic/sys-narrow1e.C  |  5 ++
> >   .../g++.dg/diagnostic/sys-narrow1f.C  |  5 ++
> >   .../g++.dg/diagnostic/sys-narrow1g.C  |  5 ++
> >   .../g++.dg/diagnostic/sys-narrow1h.C  |  6 ++
> >   .../g++.dg/diagnostic/sys-narrow1i.C  |  6 ++
> >   gcc/testsuite/g++.dg/ext/integer-pack2.C  |  2 +-
> >   16 files changed, 117 insertions(+), 40 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow.h
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1a.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1b.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1c.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1d.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1e.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1f.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1g.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1h.C
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1i.C
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 4085fc90907..6b6506a75b2 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -231,7 +231,7 @@ in the following sections.
> >   -fnew-inheriting-ctors
> >   -fnew-ttp-matching
> >   -fno-nonansi-builtins  -fnothrow-opt  -fno-operator-names
> > --fno-optional-diags  -fpermissive
> > +-fno-optional-diags
> >   -fno-pretty-templates
> >   -fno-rtti  -fsized-deallocation
> >   -ftemplate-backtrace-limit=@var{n}
> > @@ -323,7 +323,7 @@ Objective-C and Objective-C++ Dialects}.
> >   

  1   2   3   4   5   6   7   8   9   10   >