Re: [PATCH] Document that vector_size works with typedefs [PR92880]

2024-04-16 Thread Richard Biener
On Tue, Apr 16, 2024 at 2:26 AM Andrew Pinski  wrote:
>
> This just adds a clause to make it more obvious that the vector_size
> attribute extension works with typedefs.
> Note this whole section needs a rewrite to be a similar format as other
> extensions. But that is for another day.
>
> OK?

OK

>
> gcc/ChangeLog:
>
> PR c/92880
> * doc/extend.texi (Using Vector Instructions): Add that
> the base_types could be a typedef of them.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/doc/extend.texi | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 7b54a241a7b..e290265d68d 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -12901,12 +12901,13 @@ typedef int v4si __attribute__ ((vector_size (16)));
>  @end smallexample
>
>  @noindent
> -The @code{int} type specifies the @dfn{base type}, while the attribute 
> specifies
> -the vector size for the variable, measured in bytes.  For example, the
> -declaration above causes the compiler to set the mode for the @code{v4si}
> -type to be 16 bytes wide and divided into @code{int} sized units.  For
> -a 32-bit @code{int} this means a vector of 4 units of 4 bytes, and the
> -corresponding mode of @code{foo} is @acronym{V4SI}.
> +The @code{int} type specifies the @dfn{base type} (which can be a
> +@code{typedef}), while the attribute specifies the vector size for the
> +variable, measured in bytes. For example, the declaration above causes
> +the compiler to set the mode for the @code{v4si} type to be 16 bytes wide
> +and divided into @code{int} sized units.  For a 32-bit @code{int} this
> +means a vector of 4 units of 4 bytes, and the corresponding mode of
> +@code{foo} is @acronym{V4SI}.
>
>  The @code{vector_size} attribute is only applicable to integral and
>  floating scalars, although arrays, pointers, and function return values
> --
> 2.43.0
>


Re: [PATCH]middle-end: skip vectorization check on ilp32 on vect-early-break_124-pr114403.c

2024-04-16 Thread Richard Biener
On Tue, 16 Apr 2024, Tamar Christina wrote:

> Hi all,
> 
> The testcase seems to fail vectorization on -m32 since the access pattern is
> determined as too complex.  This skips the vectorization check on ilp32 
> systems
> as I couldn't find a better proxy for being able to do strided 64-bit loads 
> and
> I suspect it would fail on all 32-bit targets.

You could try having Val aligned to 64bits in the structure (likely
32bit targets have it not aligned).

> Regtested on x86_64-pc-linux-gnu with -m32 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/114403
>   * gcc.dg/vect/vect-early-break_124-pr114403.c: Skip in ilp32.
> 
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
> index 
> 1751296ab813fe85eaab1f58dc674bac10f6eb7a..db8e00556f116ca81c5a6558ec6ecd3b222ec93d
>  100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
> @@ -2,11 +2,11 @@
>  /* { dg-require-effective-target vect_early_break_hw } */
>  /* { dg-require-effective-target vect_long_long } */
>  
> -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { ! ilp32 } 
> } } } */
>  
>  #include "tree-vect.h"
>  
> -typedef unsigned long PV;
> +typedef unsigned long long PV;
>  typedef struct _buff_t {
>  int foo;
>  PV Val;
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: docs: document early break support and pragma novector

2024-04-16 Thread Richard Biener
On Tue, 16 Apr 2024, Tamar Christina wrote:

> docs: document early break support and pragma novector

OK.

> ---
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 
> b4c602a523717c1d64333e44aefb60ba0ed02e7a..aceecb86f17443cfae637e90987427b98c42f6eb
>  100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -200,6 +200,34 @@ a work-in-progress.
>  for indicating parameters that are expected to be null-terminated
>  strings.
>
> +  
> +The vectorizer now supports vectorizing loops which contain any number 
> of early breaks.
> +This means loops such as:
> +
> + int z[100], y[100], x[100];
> + int foo (int n)
> + {
> +   int res = 0;
> +   for (int i = 0; i < n; i++)
> + {
> +y[i] = x[i] * 2;
> +res += x[i] + y[i];
> +
> +if (x[i] > 5)
> +  break;
> +
> +if (z[i] > 5)
> +  break;
> +
> + }
> +   return res;
> + }
> +
> +can now be vectorized on a number of targets.  In this first version any
> +input data sources must either have a statically known size at compile 
> time
> +or the vectorizer must be able to determine based on auxillary 
> information
> +that the accesses are aligned.
> +  
>  
>  
>  New Languages and Language specific improvements
> @@ -231,6 +259,9 @@ a work-in-progress.
>previous options -std=c2x, -std=gnu2x
>and -Wc11-c2x-compat, which are deprecated but remain
>supported.
> +  GCC supports a new pragma pragma GCC novector to
> +  indicate to the vectorizer not to vectorize the loop annotated with the
> +  pragma.
>  
>  
>  C++
> @@ -400,6 +431,9 @@ a work-in-progress.
>warnings are enabled for C++ as well
>The DR 2237 code no longer gives an error, it emits
>a -Wtemplate-id-cdtor warning instead
> +  GCC supports a new pragma pragma GCC novector to
> +  indicate to the vectorizer not to vectorize the loop annotated with the
> +  pragma.
>  
>  
>  Runtime Library (libstdc++)
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] Guard longjmp in test to not inf loop [PR114720]

2024-04-15 Thread Richard Biener
On Mon, Apr 15, 2024 at 2:35 PM Jørgen Kvalsvik  wrote:
>
> Guard the longjmp to not infinitely loop. The longjmp (jump) function is
> called unconditionally to make test flow simpler, but the jump
> destination would return to a point in main that would call longjmp
> again. The longjmp is really there to exercise the then-branch of
> setjmp, to verify coverage is accurately counted in the presence of
> complex edges.

OK

> PR gcov-profile/114720
>
> gcc/testsuite/ChangeLog:
>
> * gcc.misc-tests/gcov-22.c: Guard longjmp to not loop.
> ---
>  gcc/testsuite/gcc.misc-tests/gcov-22.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.misc-tests/gcov-22.c 
> b/gcc/testsuite/gcc.misc-tests/gcov-22.c
> index 641791a7223..7ca78467ca3 100644
> --- a/gcc/testsuite/gcc.misc-tests/gcov-22.c
> +++ b/gcc/testsuite/gcc.misc-tests/gcov-22.c
> @@ -87,7 +87,19 @@ setdest ()
>  void
>  jump ()
>  {
> -longjmp (dest, 1);
> +/* Protect the longjmp so it will only be done once.  The whole purpose 
> of
> +   this function is to help test conditions and instrumentation around
> +   setjmp and its complex edges, as both branches should count towards
> +   coverage, even when one is taken through longjmp.  If the jump is not
> +   guarded it can cause an infinite loop as setdest returns to a point in
> +   main before jump (), leading to an infinite loop.  See PR
> +   gcov-profile/114720.  */
> +static int called_once = 0;
> +if (!called_once) /* conditions(suppress) */
> +{
> +   called_once = 1;
> +   longjmp (dest, 1);
> +}
>  }
>
>  int
> --
> 2.30.2
>


Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Richard Biener
On Mon, Apr 15, 2024 at 12:04 PM Tobias Burnus  wrote:
>
> I experimented with some variants to make clearer that each of RDNA2 and
> RNDA3 applies to two card types, but at the end I settled on the
> fewest-word version.
>
> Comments, remarks, suggestions? (To this change or in general?)
>
> Current version: https://gcc.gnu.org/gcc-14/changes.html#amdgcn
>
> Compiler flags, listing the the gfx* cards:
> https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html
>
> Tobias
>
> PS: On the compiler side, I am looking forward to a .def file which
> reduces the number of files to change when adding a new gfx* card, given
> that we have doubled the number of entries. [Well, 1 missing but I know
> of one WIP addition.]

I do wonder whether hot-patching the ELF header from the libgomp plugin
with the actual micro-subarch would be possible to make the driver happy.
We do query the device ISA when initializing the device so we should
be able to massage the ELF header of the object in GOMP_OFFLOAD_load_image
at least within some constraints (ideally we'd mark the ELF object as to
be matched with a device in some group).

Richard.


Re: [PATCH] c, v3: Fix ICE with -g and -std=c23 related to incomplete types [PR114361]

2024-04-15 Thread Richard Biener
On Mon, 15 Apr 2024, Jakub Jelinek wrote:

> On Mon, Apr 15, 2024 at 10:05:58AM +0200, Jakub Jelinek wrote:
> > On Mon, Apr 15, 2024 at 10:02:25AM +0200, Richard Biener wrote:
> > > > Though, haven't managed to reproduce it with -O2 -flto -std=c23
> > > > struct S;
> > > > typedef struct S **V[10];
> > > > V **foo (int x) { return 0; }
> > > > struct S { int s; };
> > > > either.
> > > > So, maybe let's drop the ipa-free-lang-data.cc part?
> > > > Seems fld_incomplete_type_of uses fld_type_variant which should
> > > > copy over TYPE_CANONICAL.
> > > 
> > > If you have a testcase that still triggers it would be nice to see it.
> > 
> > I don't, that is why I'm now suggesting to just drop that hunk.
> 
> Actually no, I've just screwed up something in my testing.
> One can reproduce it easily with -O2 -flto 20021205-1.c -std=c23
> if the ipa-free-lang-data.cc hunk is removed.
> This happens when fld_incomplete_type_of is called on a POINTER_TYPE
> to RECORD_TYPE x, where the RECORD_TYPE x is not the TYPE_MAIN_VARIANT,
> but another variant created by set_underlying_type.  The
> c_update_type_canonical didn't touch TYPE_CANONICAL in those, I was too
> afraid I don't know what TYPE_CANONICAL should be for all variant types,
> so that TREE_TYPE (t) had TYPE_CANONICAL NULL.  But when we call
> fld_incomplete_type_of on that TREE_TYPE (t), it sees it isn't
> TYPE_MAIN_VARIANT, so calls
>   return (fld_type_variant
>   (fld_incomplete_type_of (TYPE_MAIN_VARIANT (t), fld), t, fld));
> but TYPE_MAIN_VARIANT (t) has already TYPE_CANONICAL (TYPE_MAIN_VARIANT (t))
> == TYPE_MAIN_VARIANT (t), that one has been completed on finish_struct.
> And so we trigger the assertion, because
> TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t))
> is no longer true, the former is non-NULL, the latter is NULL.
> 
> But looking at all the build_variant_type_copy callers and the call itself,
> the call itself sets TYPE_CANONICAL to the TYPE_CANONICAL of the type on
> which it is called and the only caller I can find that changes
> TYPE_CANONICAL sometimes is build_qualified_type.
> So, I'd hope that normally all variant types of an aggregate type (or
> pointer type) have the same TYPE_CANONICAL if they have the same TYPE_QUALS
> and if they have it different, they have TYPE_CANONICAL of
> build_qualified_type of the base TYPE_CANONICAL.

The middle-end assumes that TYPE_CANONICAL of all variant types are
the same, for TBAA purposes it immediately "puns" to
TYPE_CANONICAL (TYPE_MAIN_VARIANT (..)).  It also assumes that
the canonical type is not a variant type.  Note we never "honor"
TYPE_STRUCTURAL_EQUALITY_P on a variant type (because we don't look
at it, we only look at whether the main variant has
TYPE_STRUCTURAL_EQUALITY_P).

Thus, TYPE_CANONICAL of variant types in principle doesn't need to be
set (but not all places might go the extra step looking at the main
variant before accessing TYPE_CANONICAL).

Richard.

> With the following updated patch (ipa-free-lang-data.cc hunk removed,
> c_update_type_canonical function updated, plus removed trailing whitespace
> from tests),
> make check-gcc RUNTESTFLAGS="--target_board=unix/-std=gnu23 
> compile.exp='20021205-1.c 20040214-2.c 20060109-1.c pr113623.c pr46866.c 
> pta-1.c' execute.exp='pr33870-1.c pr33870.c'"
> no longer ICEs (have just expected FAILs on 20040214-2.c which isn't
> compatible with C23) and make check-gcc -j32 doesn't regress compared
> to the unpatched one.
> 
> Is this ok for trunk if it passes full bootstrap/regtest?
> 
> 2024-04-15  Martin Uecker  
>   Jakub Jelinek  
> 
>   PR lto/114574
>   PR c/114361
> gcc/c/
>   * c-decl.cc (shadow_tag_warned): For flag_isoc23 and code not
>   ENUMERAL_TYPE use SET_TYPE_STRUCTURAL_EQUALITY.
>   (parser_xref_tag): Likewise.
>   (start_struct): For flag_isoc23 use SET_TYPE_STRUCTURAL_EQUALITY.
>   (c_update_type_canonical): New function.
>   (finish_struct): Put NULL as second == operand rather than first.
>   Assert TYPE_STRUCTURAL_EQUALITY_P.  Call c_update_type_canonical.
>   * c-typeck.cc (composite_type_internal): Use
>   SET_TYPE_STRUCTURAL_EQUALITY.  Formatting fix.
> gcc/testsuite/
>   * gcc.dg/pr114574-1.c: New test.
>   * gcc.dg/pr114574-2.c: New test.
>   * gcc.dg/pr114361.c: New test.
>   * gcc.dg/c23-tag-incomplete-1.c: New test.
>   * gcc.dg/c23-tag-incomplete-2.c: New test.
> 
> --- gcc/c/c-decl.cc.jj2024-04-09 09:29:04.824520299 +0200
> +++ gcc/c/c-decl.cc   2024-04-15 12:26:43.000790475 +0200
> @@ -5051,6 +5051,8 @@ shadow_tag_warned (const struct c_declsp
>

[PATCH] gcov-profile/114715 - missing coverage for switch

2024-04-15 Thread Richard Biener
The following avoids missing coverage for the line of a switch statement
which happens when gimplification emits a BIND_EXPR wrapping the switch
as that prevents us from setting locations on the containing statements
via annotate_all_with_location.  Instead set the location of the GIMPLE
switch directly.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?

Thanks,
Richard.

PR gcov-profile/114715
* gimplify.cc (gimplify_switch_expr): Set the location of the
GIMPLE switch.

* gcc.misc-tests/gcov-24.c: New testcase.
---
 gcc/gimplify.cc|  1 +
 gcc/testsuite/gcc.misc-tests/gcov-24.c | 30 ++
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-24.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 3df58b962f3..26e96ada4c7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3017,6 +3017,7 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
 
   switch_stmt = gimple_build_switch (SWITCH_COND (switch_expr),
 default_case, labels);
+  gimple_set_location (switch_stmt, EXPR_LOCATION (switch_expr));
   /* For the benefit of -Wimplicit-fallthrough, if switch_body_seq
 ends with a GIMPLE_LABEL holding SWITCH_BREAK_LABEL_P LABEL_DECL,
 wrap the GIMPLE_SWITCH up to that GIMPLE_LABEL into a GIMPLE_BIND,
diff --git a/gcc/testsuite/gcc.misc-tests/gcov-24.c 
b/gcc/testsuite/gcc.misc-tests/gcov-24.c
new file mode 100644
index 000..395099bd7ae
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/gcov-24.c
@@ -0,0 +1,30 @@
+/* { dg-options "-fprofile-arcs -ftest-coverage" } */
+/* { dg-do run { target native } } */
+
+int main()
+{
+  int a = 1;
+  int b = 2;
+  int c = -3;
+  switch(a) /* count(1) */
+{
+case 1: /* count(1) */
+c = 3;
+switch(b) { /* count(1) */
+  case 1: /* count(#) */
+  c = 4;
+  break;
+  case 2: /* count(1) */
+  c = 5;
+  break;
+}
+break;
+case 2: /* count(#) */
+c = 6;
+break;
+default: /* count(#) */
+break;
+}
+}
+
+/* { dg-final { run-gcov gcov-24.c } } */
-- 
2.35.3


Re: [PATCH] attribs: Don't crash on NULL TREE_TYPE in diag_attr_exclusions [PR114634]

2024-04-15 Thread Richard Biener
On Mon, 15 Apr 2024, Jakub Jelinek wrote:

> Hi!
> 
> The enumerator still doesn't have TREE_TYPE set but diag_attr_exclusions
> assumes that all decls must have types.
> I think it is better in something as unimportant as diag_attr_exclusions
> to be more robust, if there is no type, it can just diagnose exclusions
> on the DECL_ATTRIBUTES, like for types it only diagnoses it on
> TYPE_ATTRIBUTES.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, but can you add a comment?

Thanks,
Richard.

> 2024-04-15  Jakub Jelinek  
> 
>   PR c++/114634
>   * attribs.cc (diag_attr_exclusions): Set attrs[1] to NULL_TREE for
>   decls with NULL TREE_TYPE.
> 
>   * g++.dg/ext/attrib68.C: New test.
> 
> --- gcc/attribs.cc.jj 2024-02-12 20:44:52.409074876 +0100
> +++ gcc/attribs.cc2024-04-12 18:29:52.000381917 +0200
> @@ -468,7 +468,10 @@ diag_attr_exclusions (tree last_decl, tr
>if (DECL_P (node))
>  {
>attrs[0] = DECL_ATTRIBUTES (node);
> -  attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
> +  if (TREE_TYPE (node))
> + attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
> +  else
> + attrs[1] = NULL_TREE;
>  }
>else
>  {
> --- gcc/testsuite/g++.dg/ext/attrib68.C.jj2024-04-12 18:31:38.100968098 
> +0200
> +++ gcc/testsuite/g++.dg/ext/attrib68.C   2024-04-12 18:30:57.011515625 
> +0200
> @@ -0,0 +1,8 @@
> +// PR c++/114634
> +// { dg-do compile }
> +
> +template 
> +struct A
> +{
> +  enum { e __attribute__ ((aligned (16))) }; // { dg-error "alignment may 
> not be specified for 'e'" }
> +};
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [C PATCH, v2] Fix ICE with -g and -std=c23 related to incomplete types [PR114361]

2024-04-15 Thread Richard Biener
On Mon, 15 Apr 2024, Jakub Jelinek wrote:

> On Mon, Apr 15, 2024 at 09:38:29AM +0200, Jakub Jelinek wrote:
> > I had this spot instrumented to log the different cases (before adding the
> > code to fix up also pointer types in c_update_type_canonical) and the only 
> > thing
> > that triggered was that the 2 TYPE_CANONICALs weren't equal if
> > TYPE_STRUCTURAL_EQUALITY_P (TREE_TYPE (t)), the other was just in case.
> > gcc.c-torture/compile/20021205-1.c
> > gcc.c-torture/compile/20040214-2.c
> > gcc.c-torture/compile/20060109-1.c
> > gcc.c-torture/compile/pr113623.c
> > gcc.c-torture/compile/pr46866.c
> > gcc.c-torture/compile/pta-1.c
> > gcc.c-torture/execute/pr33870-1.c
> > gcc.c-torture/execute/pr33870.c
> > gcc.dg/torture/pr57478.c
> > tests were affected in make check-gcc.
> > I thought it would be a clear consequence of the choice we've discussed on
> > IRC, that build_pointer_type_for_mode and other tree.cc functions which
> > lookup/create derived types don't try to fill in TYPE_CANONICAL for
> > types derived from something which initially had TYPE_STRUCTURAL_EQUALITY_P
> > but later changed to non-TYPE_STRUCTURAL_EQUALITY_P.  The patch updates
> > it solely for qualified types/related pointer types, but doesn't do that
> > for array types, pointer to array types, function types, ...
> > So, I think the assertion could still trigger if we have something like
> > -O2 -flto -std=c23
> > struct S;
> > typedef struct S *T;
> > typedef T U[10];
> > typedef U *V;
> > V foo (int x) { return 0; }
> > struct S { int s; };
> > (but doesn't, dunno what I'm missing; though here certainly V and U have
> > TYPE_STRUCTURAL_EQUALITY_P, even T has because it is a typedef, not
> > something actually normally returned by build_pointer_type).
> 
> Though, haven't managed to reproduce it with -O2 -flto -std=c23
> struct S;
> typedef struct S **V[10];
> V **foo (int x) { return 0; }
> struct S { int s; };
> either.
> So, maybe let's drop the ipa-free-lang-data.cc part?
> Seems fld_incomplete_type_of uses fld_type_variant which should
> copy over TYPE_CANONICAL.

If you have a testcase that still triggers it would be nice to see it.

Richard.


Re: [Backport 1/2] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-15 Thread Richard Biener
On Mon, 15 Apr 2024, Richard Biener wrote:

> On Sun, 14 Apr 2024, H.J. Lu wrote:
> 
> > We can't profile indirect calls to IFUNC resolvers nor their callees as
> > it requires TLS which hasn't been set up yet when the dynamic linker is
> > resolving IFUNC symbols.
> > 
> > Add an IFUNC resolver caller marker to cgraph_node and set it if the
> > function is called by an IFUNC resolver.  Disable indirect call profiling
> > for IFUNC resolvers and their callees.
> > 
> > Tested with profiledbootstrap on Fedora 39/x86-64.
> > 
> > gcc/ChangeLog:
> > 
> > PR tree-optimization/114115
> > * cgraph.h (symtab_node): Add check_ifunc_callee_symtab_nodes.
> > (cgraph_node): Add called_by_ifunc_resolver.
> > * cgraphunit.cc (symbol_table::compile): Call
> > symtab_node::check_ifunc_callee_symtab_nodes.
> > * symtab.cc (check_ifunc_resolver): New.
> > (ifunc_ref_map): Likewise.
> > (is_caller_ifunc_resolver): Likewise.
> > (symtab_node::check_ifunc_callee_symtab_nodes): Likewise.
> > * tree-profile.cc (gimple_gen_ic_func_profiler): Disable indirect
> > call profiling for IFUNC resolvers and their callees.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR tree-optimization/114115
> > * gcc.dg/pr114115.c: New test.
> > 
> > (cherry picked from commit cab32bacaea268ec062b1fb4fc662d90c9d1cfce)
> > ---
> >  gcc/cgraph.h|  6 +++
> >  gcc/cgraphunit.cc   |  2 +
> >  gcc/symtab.cc   | 89 +
> >  gcc/testsuite/gcc.dg/pr114115.c | 24 +
> >  gcc/tree-profile.cc |  8 ++-
> >  5 files changed, 128 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr114115.c
> > 
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index c1a3691b6f5..430c87d8bb7 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -479,6 +479,9 @@ public:
> >   Return NULL if there's no such node.  */
> >static symtab_node *get_for_asmname (const_tree asmname);
> >  
> > +  /* Check symbol table for callees of IFUNC resolvers.  */
> > +  static void check_ifunc_callee_symtab_nodes (void);
> > +
> >/* Verify symbol table for internal consistency.  */
> >static DEBUG_FUNCTION void verify_symtab_nodes (void);
> >  
> > @@ -896,6 +899,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
> > public symtab_node
> >redefined_extern_inline (false), tm_may_enter_irr (false),
> >ipcp_clone (false), declare_variant_alt (false),
> >calls_declare_variant_alt (false), gc_candidate (false),
> > +  called_by_ifunc_resolver (false),
> >m_uid (uid), m_summary_id (-1)
> >{}
> >  
> > @@ -1491,6 +1495,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
> > public symtab_node
> >   is set for local SIMD clones when they are created and cleared if the
> >   vectorizer uses them.  */
> >unsigned gc_candidate : 1;
> > +  /* Set if the function is called by an IFUNC resolver.  */
> > +  unsigned called_by_ifunc_resolver : 1;
> >  
> >  private:
> >/* Unique id of the node.  */
> > diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
> > index bccd2f2abb5..40dcceccca5 100644
> > --- a/gcc/cgraphunit.cc
> > +++ b/gcc/cgraphunit.cc
> > @@ -2313,6 +2313,8 @@ symbol_table::compile (void)
> >  
> >symtab_node::checking_verify_symtab_nodes ();
> >  
> > +  symtab_node::check_ifunc_callee_symtab_nodes ();
> > +
> >timevar_push (TV_CGRAPHOPT);
> >if (pre_ipa_mem_report)
> >  dump_memory_report ("Memory consumption before IPA");
> > diff --git a/gcc/symtab.cc b/gcc/symtab.cc
> > index 0470509a98d..df09def81e9 100644
> > --- a/gcc/symtab.cc
> > +++ b/gcc/symtab.cc
> > @@ -1369,6 +1369,95 @@ symtab_node::verify (void)
> >timevar_pop (TV_CGRAPH_VERIFY);
> >  }
> >  
> > +/* Return true and set *DATA to true if NODE is an ifunc resolver.  */
> > +
> > +static bool
> > +check_ifunc_resolver (cgraph_node *node, void *data)
> > +{
> > +  if (node->ifunc_resolver)
> > +{
> > +  bool *is_ifunc_resolver = (bool *) data;
> > +  *is_ifunc_resolver = true;
> > +  return true;
> > +}
> > +  return false;
> > +}
> > +
> > +static auto_bitmap ifunc_ref_map;
> 
> Please don't use static auto_bitmap, that isn't constructed
> properly.
> 
>

Re: [Backport 1/2] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-15 Thread Richard Biener
l.  */
> +  if (e->caller == node)
> + continue;
> +
> +  /* Skip if it has been visited.  */
> +  unsigned int uid = e->caller->get_uid ();
> +  if (bitmap_bit_p (ifunc_ref_map, uid))
> + continue;
> +  bitmap_set_bit (ifunc_ref_map, uid);
> +
> +  if (is_caller_ifunc_resolver (e->caller))
> + {
> +   /* Return true if caller is an IFUNC resolver.  */
> +   e->caller->called_by_ifunc_resolver = true;
> +   return true;
> + }
> +
> +  /* Check if caller's alias is an IFUNC resolver.  */
> +  e->caller->call_for_symbol_and_aliases (check_ifunc_resolver,
> +   _ifunc_resolver,
> +   true);
> +  if (is_ifunc_resolver)
> + {
> +   /* Return true if caller's alias is an IFUNC resolver.  */
> +   e->caller->called_by_ifunc_resolver = true;
> +   return true;
> + }
> +}
> +
> +  return false;
> +}
> +
> +/* Check symbol table for callees of IFUNC resolvers.  */
> +
> +void
> +symtab_node::check_ifunc_callee_symtab_nodes (void)
> +{
> +  symtab_node *node;
> +
> +  FOR_EACH_SYMBOL (node)
> +{
> +  cgraph_node *cnode = dyn_cast  (node);
> +  if (!cnode)
> + continue;
> +
> +  unsigned int uid = cnode->get_uid ();
> +  if (bitmap_bit_p (ifunc_ref_map, uid))
> + continue;
> +  bitmap_set_bit (ifunc_ref_map, uid);
> +
> +  bool is_ifunc_resolver = false;
> +  cnode->call_for_symbol_and_aliases (check_ifunc_resolver,
> +   _ifunc_resolver, true);
> +  if (is_ifunc_resolver || is_caller_ifunc_resolver (cnode))
> + cnode->called_by_ifunc_resolver = true;
> +}
> +
> +  bitmap_clear (ifunc_ref_map);
> +}
> +
>  /* Verify symbol table for internal consistency.  */
>  
>  DEBUG_FUNCTION void
> diff --git a/gcc/testsuite/gcc.dg/pr114115.c b/gcc/testsuite/gcc.dg/pr114115.c
> new file mode 100644
> index 000..2629f591877
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr114115.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -fprofile-generate -fdump-tree-optimized" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-require-ifunc "" } */
> +
> +void *foo_ifunc2() __attribute__((ifunc("foo_resolver")));
> +
> +void bar(void)
> +{
> +}
> +
> +static int f3()
> +{
> +  bar ();
> +  return 5;
> +}
> +
> +void (*foo_resolver(void))(void)
> +{
> +  f3();
> +  return bar;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "__gcov_indirect_call_profiler_v" 
> "optimized" } } */
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index da300d5f9e8..b5de0fb914f 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -418,7 +418,13 @@ gimple_gen_ic_func_profiler (void)
>gcall *stmt1;
>tree tree_uid, cur_func, void0;
>  
> -  if (c_node->only_called_directly_p ())
> +  /* Disable indirect call profiling for an IFUNC resolver and its
> + callees since it requires TLS which hasn't been set up yet when
> + the dynamic linker is resolving IFUNC symbols.  See
> + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114115
> +   */
> +  if (c_node->only_called_directly_p ()
> +  || c_node->called_by_ifunc_resolver)
>  return;
>  
>gimple_init_gcov_profiler ();
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [C PATCH, v2] Fix ICE with -g and -std=c23 related to incomplete types [PR114361]

2024-04-15 Thread Richard Biener
c/testsuite/gcc.dg/pr114361.c b/gcc/testsuite/gcc.dg/pr114361.c
> new file mode 100644
> index 000..0f3feb53566
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr114361.c
> @@ -0,0 +1,11 @@
> +/* PR c/114361 */
> +/* { dg-do compile } */
> +/* { dg-options "-std=gnu23 -g" } */
> +
> +void f()
> +{
> +typedef struct foo bar;
> +typedef __typeof( ({ (struct foo { bar *x; }){ }; }) ) wuz;
> +struct foo { wuz *x; };
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/pr114574-1.c 
> b/gcc/testsuite/gcc.dg/pr114574-1.c
> new file mode 100644
> index 000..060dcdbe73e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr114574-1.c
> @@ -0,0 +1,10 @@
> +/* PR lto/114574
> + * { dg-do compile }
> + * { dg-options "-flto" } */
> +
> +const struct S * x;
> +struct S {};
> +void f(const struct S **);
> +
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/pr114574-2.c 
> b/gcc/testsuite/gcc.dg/pr114574-2.c
> new file mode 100644
> index 000..723291e2211
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr114574-2.c
> @@ -0,0 +1,10 @@
> +/* PR lto/114574
> + * { dg-do compile }
> + * { dg-options "-flto -std=c23" } */
> +
> +const struct S * x;
> +struct S {};
> +void f(const struct S **);
> +
> +
> +
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH]middle-end: adjust loop upper bounds when peeling for gaps and early break [PR114403].

2024-04-12 Thread Richard Biener
nfo) ? 1 : 0;
   int bias_for_assumed = bias_for_lowest;
   int alignment_npeels = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
   if (alignment_npeels && LOOP_VINFO_USING_PARTIAL_VECTORS_P 
(loop_vinfo))


> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/114403
>   * tree-vect-loop.cc (vect_transform_loop): Adjust upper bounds for when
>   peeling for gaps and early break.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/114403
>   * gcc.dg/vect/vect-early-break_124-pr114403.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
> new file mode 100644
> index 
> ..ae5e53efc45e7bef89c5a72abd6afa48292668db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
> @@ -0,0 +1,74 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break_hw } */
> +/* { dg-require-effective-target vect_long_long } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include "tree-vect.h"
> +
> +typedef unsigned long PV;
> +typedef struct _buff_t {
> +int foo;
> +PV Val;
> +} buff_t;
> +
> +#define NUM 9
> +#define SZ NUM * sizeof (PV)
> +char buffer[SZ];
> +
> +__attribute__ ((noipa))
> +buff_t *copy (buff_t *first, buff_t *last)
> +{
> +  char *buffer_ptr = buffer;
> +  char *const buffer_end = [SZ-1];
> +  int store_size = sizeof(first->Val);
> +  while (first != last && (buffer_ptr + store_size) <= buffer_end)
> +{
> +  const char *value_data = (const char *)(>Val);
> +  __builtin_memcpy(buffer_ptr, value_data, store_size);
> +  buffer_ptr += store_size;
> +  ++first;
> +}
> +
> +  if (first == last)
> +return 0;
> +
> +  return first;
> +}
> +
> +int main ()
> +{
> +  /* Copy an ascii buffer.  We need to trigger the loop to exit from
> + the condition where we have more data to copy but not enough space.
> + For this test that means that OVL must be > SZ.  */
> +#define OVL NUM*2
> +  char str[OVL]="abcdefghiabcdefgh\0";
> +  buff_t tmp[OVL];
> +
> +#pragma GCC novector
> +  for (int i = 0; i < OVL; i++)
> +tmp[i].Val = str[i];
> +
> +  buff_t *start = [0];
> +  buff_t *last = [OVL-1];
> +  buff_t *res = 0;
> +
> +  /* This copy should exit on the early exit, in which case we know
> + that start != last as we had more data to copy but the buffer
> + was full.  */
> +  if (!(res = copy (start, last)))
> +__builtin_abort ();
> +
> +  /* Check if we have the right reduction value.  */
> +  if (res != [NUM-1])
> +__builtin_abort ();
> +
> +  int store_size = sizeof(PV);
> +#pragma GCC novector
> +  for (int i = 0; i < NUM - 1; i+=store_size)
> +if (0 != __builtin_memcmp (buffer+i, (char*)[i].Val, store_size))
> +  __builtin_abort ();
> +
> +  return 0;
> +}
> +
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 4375ebdcb493a90fd0501cbb4b07466077b525c3..024a24a305c4727f97eb022247f4dca791c52dfe
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -12144,6 +12144,12 @@ vect_transform_loop (loop_vec_info loop_vinfo, 
> gimple *loop_vectorized_call)
>   -min_epilogue_iters to remove iterations that cannot be performed
> by the vector code.  */
>int bias_for_lowest = 1 - min_epilogue_iters;
> +  /* For an early break we must always assume that the vector loop can be
> + executed partially.  In this definition a partial iteration means that 
> we
> + take an exit before the IV exit.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +bias_for_lowest = 1;
> +
>int bias_for_assumed = bias_for_lowest;
>int alignment_npeels = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
>if (alignment_npeels && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-cfg: Make the verifier returns_twice message translatable

2024-04-12 Thread Richard Biener



> Am 12.04.2024 um 09:58 schrieb Jakub Jelinek :
> 
> Hi!
> 
> While translation of the verifier messages is questionable, that case is
> something that ideally should never happen except to gcc developers
> and so pressumably English should be fine, we use error etc. APIs and
> those imply translatations and some translators translate it.
> The following patch adjusts the code such that we don't emit
> appel returns_twice est not first dans le bloc de base 33
> in French (i.e. 2 English word in the middle of a French message).
> Similarly Swedish or Ukrainian.
> Note, the German translator did differentiate between these verifier
> messages vs. normal user facing and translated it to:
> "Interner Fehler: returns_twice call is %s in basic block %d"
> so just a German prefix before English message.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-04-12  Jakub Jelinek  
> 
>* tree-cfg.cc (gimple_verify_flow_info): Make the misplaced
>returns_twice diagnostics translatable.
> 
> --- gcc/tree-cfg.cc.jj2024-04-10 10:19:04.237471564 +0200
> +++ gcc/tree-cfg.cc2024-04-11 17:18:57.962672110 +0200
> @@ -5818,7 +5818,7 @@ gimple_verify_flow_info (void)
>  if (gimple_code (stmt) == GIMPLE_CALL
>  && gimple_call_flags (stmt) & ECF_RETURNS_TWICE)
>{
> -  const char *misplaced = NULL;
> +  bool misplaced = false;
>  /* TM is an exception: it points abnormal edges just after the
> call that starts a transaction, i.e. it must end the BB.  */
>  if (gimple_call_builtin_p (stmt, BUILT_IN_TM_START))
> @@ -5826,18 +5826,23 @@ gimple_verify_flow_info (void)
>  if (single_succ_p (bb)
>  && bb_has_abnormal_pred (single_succ (bb))
>  && !gsi_one_nondebug_before_end_p (gsi))
> -misplaced = "not last";
> +{
> +  error ("returns_twice call is not last in basic block "
> + "%d", bb->index);
> +  misplaced = true;
> +}
>}
>  else
>{
> -  if (seen_nondebug_stmt
> -  && bb_has_abnormal_pred (bb))
> -misplaced = "not first";
> +  if (seen_nondebug_stmt && bb_has_abnormal_pred (bb))
> +{
> +  error ("returns_twice call is not first in basic block "
> + "%d", bb->index);
> +  misplaced = true;
> +}
>}
>  if (misplaced)
>{
> -  error ("returns_twice call is %s in basic block %d",
> - misplaced, bb->index);
>  print_gimple_stmt (stderr, stmt, 0, TDF_SLIM);
>  err = true;
>}
> 
>Jakub
> 


Re: [PATCH] Limit special asan/ubsan/bitint returns_twice handling to calls in bbs with abnormal pred [PR114687]

2024-04-12 Thread Richard Biener



> Am 12.04.2024 um 09:50 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The tree-cfg.cc verifier only diagnoses returns_twice calls preceded
> by non-label/debug stmts if it is in a bb with abnormal predecessor.
> The following testcase shows that if a user lies in the attributes
> (a function which never returns can't be pure, and can't return
> twice when it doesn't ever return at all), when we figure it out,
> we can remove the abnormal edges to the "returns_twice" call and perhaps
> whole .ABNORMAL_DISPATCHER etc.
> edge_before_returns_twice_call then ICEs because it can't find such
> an edge.
> 
> The following patch limits the special handling to calls in bbs where
> the verifier requires that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-04-12  Jakub Jelinek  
> 
>PR sanitizer/114687
>* gimple-iterator.cc (gsi_safe_insert_before): Only use
>edge_before_returns_twice_call if bb_has_abnormal_pred.
>(gsi_safe_insert_seq_before): Likewise.
>* gimple-lower-bitint.cc (bitint_large_huge::lower_call): Only
>push to m_returns_twice_calls if bb_has_abnormal_pred.
> 
>* gcc.dg/asan/pr114687.c: New test.
> 
> --- gcc/gimple-iterator.cc.jj2024-03-14 09:57:09.024966285 +0100
> +++ gcc/gimple-iterator.cc2024-04-11 17:05:06.267081433 +0200
> @@ -1049,7 +1049,8 @@ gsi_safe_insert_before (gimple_stmt_iter
>   gimple *stmt = gsi_stmt (*iter);
>   if (stmt
>   && is_gimple_call (stmt)
> -  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> +  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0
> +  && bb_has_abnormal_pred (gsi_bb (*iter)))
> {
>   edge e = edge_before_returns_twice_call (gsi_bb (*iter));
>   basic_block new_bb = gsi_insert_on_edge_immediate (e, g);
> @@ -1072,7 +1073,8 @@ gsi_safe_insert_seq_before (gimple_stmt_
>   gimple *stmt = gsi_stmt (*iter);
>   if (stmt
>   && is_gimple_call (stmt)
> -  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> +  && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0
> +  && bb_has_abnormal_pred (gsi_bb (*iter)))
> {
>   edge e = edge_before_returns_twice_call (gsi_bb (*iter));
>   gimple *f = gimple_seq_first_stmt (seq);
> --- gcc/gimple-lower-bitint.cc.jj2024-04-09 09:28:21.261123664 +0200
> +++ gcc/gimple-lower-bitint.cc2024-04-11 17:06:58.033548199 +0200
> @@ -5320,7 +5320,7 @@ bitint_large_huge::lower_call (tree obj,
>  arg = make_ssa_name (TREE_TYPE (arg));
>  gimple *g = gimple_build_assign (arg, v);
>  gsi_insert_before (, g, GSI_SAME_STMT);
> -  if (returns_twice)
> +  if (returns_twice && bb_has_abnormal_pred (gimple_bb (stmt)))
>{
>  m_returns_twice_calls.safe_push (stmt);
>  returns_twice = false;
> --- gcc/testsuite/gcc.dg/asan/pr114687.c.jj2024-04-11 17:09:54.518127165 
> +0200
> +++ gcc/testsuite/gcc.dg/asan/pr114687.c2024-04-11 17:09:22.699563654 
> +0200
> @@ -0,0 +1,22 @@
> +/* PR sanitizer/114687 */
> +/* { dg-do compile } */
> +
> +int a;
> +int foo (int);
> +
> +__attribute__((pure, returns_twice)) int
> +bar (void)
> +{
> +  a = 1;
> +  while (a)
> +a = 2;
> +  return a;
> +}
> +
> +int
> +baz (void)
> +{
> +  int d = bar ();
> +  foo (d);
> +  return 0;
> +}
> 
>Jakub
> 


Re: [PATCH] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]

2024-04-12 Thread Richard Biener
On Fri, Apr 12, 2024 at 1:25 AM Andrew Pinski (QUIC)
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, April 11, 2024 2:31 AM
> > To: Andrew Pinski (QUIC) 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 
> > bit
> > types [PR114666]
> >
> > On Thu, Apr 11, 2024 at 10:43 AM Andrew Pinski
> >  wrote:
> > >
> > > The issue here is that the `a?~t:t` pattern assumed (maybe correctly)
> > > that a here was always going to be a unsigned boolean type. This fixes
> > > the problem in both patterns to cast the operand to boolean type first.
> > >
> > > I should note that VRP seems to be keep on wanting to produce `a ==
> > > 0?1:-2` from `((int)a) ^ 1` is a bit odd and partly is the cause of
> > > the issue and there seems to be some disconnect on what should be the
> > > canonical form. That will be something to look at for GCC 15.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > > PR tree-optimization/114666
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd (`!a?b:c`): Cast `a` to boolean type for cond for
> > > gimple.
> > > (`a?~t:t`): Cast `a` to boolean type before casting it
> > > to the type.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.c-torture/execute/bitfld-signed1-1.c: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/match.pd| 10 +++---
> > >  .../gcc.c-torture/execute/bitfld-signed1-1.c| 13 +
> > >  2 files changed, 20 insertions(+), 3 deletions(-)  create mode 100644
> > > gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > > 15a1e7350d4..ffc928b656a 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -5895,7 +5895,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   /* !A ? B : C -> A ? C : B.  */
> > >   (simplify
> > >(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
> > > -  (cnd @0 @2 @1)))
> > > +  /* For gimple, make sure the operand to COND is a boolean type,
> > > + truth_valued_p will match 1bit integers too. */  (if (GIMPLE &&
> > > + cnd == COND_EXPR)
> > > +   (cnd (convert:boolean_type_node @0) @2 @1)
> > > +   (cnd @0 @2 @1
> >
> > This looks "wrong" for GENERIC still?
>
> I tired without the GIMPLE check and ran into the testcase 
> gcc.dg/torture/builtins-isinf-sign-1.c failing. Because the extra convert was 
> blocking seeing both sides of an equal was the same (I didn't look into it 
> further than that). So I decided to limit it to GIMPLE only.
>
> > But this is not really part of the fix but deciding we should not have
> > signed:1 as
> > cond operand?  I'll note that truth_valued_p allows signed:1.
> >
> > Maybe as minimal surgery add a TYPE_UNSIGNED (TREE_TPE (@0)) check here
> > instead?
>
> That might work, let me try.
>
> >
> > >  /* abs/negative simplifications moved from
> > fold_cond_expr_with_comparison.
> > >
> > > @@ -7099,8 +7103,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > && (!wascmp || TYPE_PRECISION (type) == 1))
> > > (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
> > > || TYPE_PRECISION (type) == 1)
> > > -(bit_xor (convert:type @0) @2)
> > > -(bit_xor (negate (convert:type @0)) @2)
> > > +(bit_xor (convert:type (convert:boolean_type_node @0)) @2)
> > > +(bit_xor (negate (convert:type (convert:boolean_type_node @0)))
> > > + @2)
> > >  #endif
> >
> > This looks OK, but then testing TYPE_UNSIGNED (TREE_TYPE (@0)) might be
> > better?
> >
>
> Let me do that just like the other pattern.
>
> > Does this all just go downhill from what VRP creates?  That is, would IL
> > checking have had a chance detecting it if we say signed:1 are not valid as
> > condition?
>
> Yes. So what VRP produces in the testcase is:
> `_2 == 0 ? 1 : -2u` (where _2 is the signed 1bit integer).
> Now maybe the COND_EXPR should be the canonical form for constants (but that 
> is for a different patch I think, I added it to the list of things I should 
> look into for GCC 15).

Ah OK, so th

Re: [PATCH v2] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]

2024-04-12 Thread Richard Biener
On Fri, Apr 12, 2024 at 6:53 AM Andrew Pinski  wrote:
>
> The problem is `!a?b:c` pattern will create a COND_EXPR with an 1bit signed 
> integer
> which breaks patterns like `a?~t:t`. This rejects when we have a signed 
> operand for
> both patterns.
>
> Note for GCC 15, I am going to look at the canonicalization of `a?~t:t` where 
> t
> was a constant since I think keeping it a COND_EXPR might be more canonical 
> and
> is what VPR produces from the same IR; if anything expand should handle which 
> one
> is better.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/114666
>
> gcc/ChangeLog:
>
> * match.pd (`!a?b:c`): Reject signed types for the condition.
> (`a?~t:t`): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/bitfld-signed1-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd|  6 +-
>  .../gcc.c-torture/execute/bitfld-signed1-1.c| 13 +
>  2 files changed, 18 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 15a1e7350d4..d401e7503e6 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5895,7 +5895,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   /* !A ? B : C -> A ? C : B.  */
>   (simplify
>(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
> -  (cnd @0 @2 @1)))
> +  /* For CONDs, don't handle signed values here. */
> +  (if (cnd == VEC_COND_EXPR
> +   || TYPE_UNSIGNED (TREE_TYPE (@0)))
> +   (cnd @0 @2 @1
>
>  /* abs/negative simplifications moved from fold_cond_expr_with_comparison.
>
> @@ -7095,6 +7098,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (cond @0 @1 @2)
>   (with { bool wascmp; }
>(if (INTEGRAL_TYPE_P (type)
> +   && TYPE_UNSIGNED (TREE_TYPE (@0))
> && bitwise_inverted_equal_p (@1, @2, wascmp)
> && (!wascmp || TYPE_PRECISION (type) == 1))
> (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
> diff --git a/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> new file mode 100644
> index 000..b0ff120ea51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/114666 */
> +/* We used to miscompile this to be always aborting
> +   due to the use of the signed 1bit into the COND_EXPR. */
> +
> +struct {
> +  signed a : 1;
> +} b = {-1};
> +char c;
> +int main()
> +{
> +  if ((b.a ^ 1UL) < 3)
> +__builtin_abort();
> +}
> --
> 2.43.0
>


Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-04-12 Thread Richard Biener
On Thu, 11 Apr 2024, Thomas Schwinge wrote:

> Hi Chung-Lin, Richard!
> 
> From me just a few mechanical pieces, see below.  Richard, are you able
> to again comment on Chung-Lin's general strategy, as I'm not at all
> familiar with those parts of the code?

I've queued all stage1 material and will be only able to slowly look
at it after we released.

> On 2024-04-03T19:50:55+0800, Chung-Lin Tang  
> wrote:
> > On 2023/10/30 8:46 PM, Richard Biener wrote:
> >>>
> >>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> >>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> >>> flag.
> >>>
> >>> The actual optimization then is done in this second patch.  Chung-Lin
> >>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> >>> I don't have much experience with most of the following generic code, so
> >>> would appreciate a helping hand, whether that conceptually makes sense as
> >>> well as from the implementation point of view:
> >
> > First of all, I have removed all of the gimplify-stage scanning and setting 
> > of
> > DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no 
> > changes to
> > gimplify.cc now)
> >
> > I remember this code was an artifact of earlier attempts to allow 
> > struct-member
> > pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
> > anyways.
> > I think the omp_data_* member accesses when building child function side
> > receiver_refs is blocking points-to analysis from working (didn't try 
> > digging deeper)
> >
> > Also during gimplify, VAR_DECLs appeared to be reused (at least in some 
> > cases) for map
> > clause decl reference building, so hoping that the variables "happen to be" 
> > single-use and
> > DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY 
> > does appear to be
> > a little risky.
> >
> > However, for firstprivate pointers processed during omp-low, it appears to 
> > be somewhat different.
> > (see below description)
> >
> >> No, I don't think you can use that flag on non-default-defs, nor
> >> preserve it on copying.  So
> >> it also doesn't nicely extend to DECLs as done by the patch.  We
> >> currently _only_ use it
> >> for incoming parameters.  When used on arbitrary code you can get to for 
> >> example
> >> 
> >> ptr1(points-to-readony-memory) = >x;
> >> ... access via ptr1 ...
> >> ptr2 = >x;
> >> ... access via ptr2 ...
> >> 
> >> where both are your OMP regions differently constrained (the constrain is 
> >> on the
> >> code in the region, _not_ on the actual protections of the pointed to
> >> data, much like
> >> for the fortran case).  But now CSE comes along and happily replaces all 
> >> ptr2
> >> with ptr2 in the second region and ... oops!
> >
> > Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 
> > in the second region"?
> >
> > That doesn't happen, because during omp-lower/expand, OMP target regions 
> > (which is all that
> > this applies currently) is separated into different individual child 
> > functions.
> >
> > (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
> > omp-lower, when
> > for firstprivate pointers (i.e. 'a' here) we set this bit when constructing 
> > the first load
> > of this pointer)
> >
> >   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> >   {
> > foo (a, a[8]);
> > r = a[8];
> >   }
> >   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> >   {
> > foo (a, a[12]);
> > r = a[12];
> >   }
> >
> > After omp-expand (before SSA):
> >
> > __attribute__((oacc parallel, omp target entrypoint, noclone))
> > void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> > {
> >  ...
> >:
> >   D.2962 = .omp_data_i->D.2947;
> >   a.8 = D.2962;
> >   r.1 = (*a.8)[12];
> >   foo (a.8, r.1);
> >   r.1 = (*a.8)[12];
> >   D.2965 = .omp_data_i->r;
> >   *D.2965 = r.1;
> >   return;
> > }
> >
> > __attribute__((oacc parallel, omp target entrypoint, noclone))
> > void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> > {
> >   ...
> >:
> >   D.2968 = .omp_data_i->D.2939;

Re: [r14-9912 Regression] FAIL: gcc.dg/guality/pr54693-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 21 z == 30 - 3 * i on Linux/x86_64

2024-04-12 Thread Richard Biener
On Fri, 12 Apr 2024, haochen.jiang wrote:

> On Linux/x86_64,
> 
> c7e8a8d814229fd6fc4c16c2452f15dddc613479 is the first bad commit
> commit c7e8a8d814229fd6fc4c16c2452f15dddc613479
> Author: Richard Biener 
> Date:   Thu Apr 11 11:08:07 2024 +0200
> 
> tree-optimization/109596 - wrong debug stmt move by copyheader
> 
> caused
> 
> FAIL: gcc.dg/guality/pr43051-1.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  -DPREVENT_OPTIMIZATION  line 34 c 
> == [0]
> FAIL: gcc.dg/guality/pr43051-1.c   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  -DPREVENT_OPTIMIZATION  line 39 c 
> == [0]
> FAIL: gcc.dg/guality/pr54693-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 x == 10 - i
> FAIL: gcc.dg/guality/pr54693-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 y == 20 - 2 * i
> FAIL: gcc.dg/guality/pr54693-2.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 z == 30 - 3 * i

Just FYI these are the FAILs as they were present before the regression
this change fixed.


Re: Combine patch ping

2024-04-11 Thread Richard Biener



> Am 11.04.2024 um 16:03 schrieb Segher Boessenkool 
> :
> 
> On Wed, Apr 10, 2024 at 08:32:39PM +0200, Uros Bizjak wrote:
>>> On Wed, Apr 10, 2024 at 7:56 PM Segher Boessenkool
>>>  wrote:
>>> This is never okay.  You cannot commit a patch without approval, *ever*.
> 
> This is the biggest issue, to start with.  It is fundamental.

I have approved the patch as you might have noticed.

Richard 

>>> That patch is also obvious -- obviously *wrong*, that is.  There are
>>> big assumptions everywhere in the compiler how a CC reg can be used.
>>> This violates that, as explained elsewhere.
>> 
>> Can you please elaborate what is wrong with this concrete patch.
> 
> The explanation of the patch is contradictory to how RTL works at all,
> so it is just wrong.  It might even do something sane, but I didn't get
> that far at all!
> 
> Write good email explanations, and a good proposed commit message.
> Please.  It is the only one people can judge a patch.  Well, apart
> from doing everything myself from first principles, ignoring everything
> you said, just looking at the patch itself, but that is a hundred times
> more work.  I don't do that.
> 
>> The
>> part that the patch touches has several wrong assumptions, and the
>> fixed "???" comment just emphasizes that. I don't see what is wrong
>> with:
>> 
>> (define_insn "@pushfl2"
>>  [(set (match_operand:W 0 "push_operand" "=<")
>>(unspec:W [(match_operand 1 "flags_reg_operand")]
>>  UNSPEC_PUSHFL))]
>>  "GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_CC"
>>  "pushf{}"
>>  [(set_attr "type" "push")
>>   (set_attr "mode" "")])
> 
> What does it even mean?  What is a flags:CC?  You always always always
> need to say what is *in* the flags, if you want to use it as input
> (which is what unspec does).  CC is weird like this.  Most targets do
> not have distinct physical flags for every condition, only a few
> conditions are "alive" at any point in the program!
> 
>> it is just a push of the flags reg to the stack. If the push can't be
>> described in this way, then it is the middle end at fault, we can't
>> just change modes at will.
> 
> But that is not what this describes: it operates on the flags register
> in some unspecified way, and pushes the result of *that* to the stack.
> 
> (Stack pointer modification is not described here btw, should it be?  Is
> that magically implemented by the backend some way, via type=push
> perhaps?)
> 
> 
> Segher


Re: [PATCH] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]

2024-04-11 Thread Richard Biener
On Thu, Apr 11, 2024 at 10:43 AM Andrew Pinski  wrote:
>
> The issue here is that the `a?~t:t` pattern assumed (maybe correctly) that a
> here was always going to be a unsigned boolean type. This fixes the problem
> in both patterns to cast the operand to boolean type first.
>
> I should note that VRP seems to be keep on wanting to produce `a == 0?1:-2`
> from `((int)a) ^ 1` is a bit odd and partly is the cause of the issue and 
> there
> seems to be some disconnect on what should be the canonical form. That will be
> something to look at for GCC 15.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> PR tree-optimization/114666
>
> gcc/ChangeLog:
>
> * match.pd (`!a?b:c`): Cast `a` to boolean type for cond for
> gimple.
> (`a?~t:t`): Cast `a` to boolean type before casting it
> to the type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/bitfld-signed1-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd| 10 +++---
>  .../gcc.c-torture/execute/bitfld-signed1-1.c| 13 +
>  2 files changed, 20 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 15a1e7350d4..ffc928b656a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5895,7 +5895,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   /* !A ? B : C -> A ? C : B.  */
>   (simplify
>(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
> -  (cnd @0 @2 @1)))
> +  /* For gimple, make sure the operand to COND is a boolean type,
> + truth_valued_p will match 1bit integers too. */
> +  (if (GIMPLE && cnd == COND_EXPR)
> +   (cnd (convert:boolean_type_node @0) @2 @1)
> +   (cnd @0 @2 @1

This looks "wrong" for GENERIC still?

But this is not really part of the fix but deciding we should not have
signed:1 as
cond operand?  I'll note that truth_valued_p allows signed:1.

Maybe as minimal surgery add a TYPE_UNSIGNED (TREE_TPE (@0)) check here
instead?

>  /* abs/negative simplifications moved from fold_cond_expr_with_comparison.
>
> @@ -7099,8 +7103,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && (!wascmp || TYPE_PRECISION (type) == 1))
> (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
> || TYPE_PRECISION (type) == 1)
> -(bit_xor (convert:type @0) @2)
> -(bit_xor (negate (convert:type @0)) @2)
> +(bit_xor (convert:type (convert:boolean_type_node @0)) @2)
> +(bit_xor (negate (convert:type (convert:boolean_type_node @0))) @2)
>  #endif

This looks OK, but then testing TYPE_UNSIGNED (TREE_TYPE (@0)) might be
better?

Does this all just go downhill from what VRP creates?  That is, would
IL checking
have had a chance detecting it if we say signed:1 are not valid as condition?

That said, the latter pattern definitely needs guarding/adjustment, I'm not sure
the former is wrong?  Semantically [VEC_]COND_EXPR is op0 != 0 ? ... : ...

Richard.

>  /* Simplify pointer equality compares using PTA.  */
> diff --git a/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> new file mode 100644
> index 000..b0ff120ea51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/114666 */
> +/* We used to miscompile this to be always aborting
> +   due to the use of the signed 1bit into the COND_EXPR. */
> +
> +struct {
> +  signed a : 1;
> +} b = {-1};
> +char c;
> +int main()
> +{
> +  if ((b.a ^ 1UL) < 3)
> +__builtin_abort();
> +}
> --
> 2.43.0
>


[PATCH] tree-optimization/109596 - wrong debug stmt move by copyheader

2024-04-11 Thread Richard Biener
The following fixes an omission in r14-162-gcda246f8b421ba causing
wrong-debug and a bunch of guality regressions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/109596
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Propagate
debug stmts to nonexit->dest rather than exit->dest.
---
 gcc/tree-ssa-loop-ch.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
index 1f0033be4c4..b7ef485c4cc 100644
--- a/gcc/tree-ssa-loop-ch.cc
+++ b/gcc/tree-ssa-loop-ch.cc
@@ -957,7 +957,7 @@ ch_base::copy_headers (function *fun)
 
   edge entry = loop_preheader_edge (loop);
 
-  propagate_threaded_block_debug_into (exit->dest, entry->dest);
+  propagate_threaded_block_debug_into (nonexit->dest, entry->dest);
   if (!gimple_duplicate_seme_region (entry, exit, bbs, n_bbs, copied_bbs,
 true))
{
-- 
2.35.3


[PATCH] middle-end/114681 - condition coverage and inlining

2024-04-11 Thread Richard Biener
When inlining a gcond it can map to multiple stmts, esp. with
non-call EH.  The following makes sure to pick up the remapped
condition when dealing with condition coverage.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/114681
* tree-inline.cc (copy_bb): Key on the remapped stmt
to identify gconds to have condition coverage data remapped.

* gcc.misc-tests/gcov-pr114681.c: New testcase.
---
 gcc/testsuite/gcc.misc-tests/gcov-pr114681.c | 18 ++
 gcc/tree-inline.cc   |  2 +-
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr114681.c

diff --git a/gcc/testsuite/gcc.misc-tests/gcov-pr114681.c 
b/gcc/testsuite/gcc.misc-tests/gcov-pr114681.c
new file mode 100644
index 000..a8dc666a452
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/gcov-pr114681.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fnon-call-exceptions -fno-exceptions 
-fcondition-coverage" } */
+
+float f, g;
+
+static void
+bar ()
+{
+  if (g < f)
+for (;;)
+  ;
+}
+
+void
+foo ()
+{
+  bar ();
+}
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 5f852885e7f..238afb7de80 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -2090,7 +2090,7 @@ copy_bb (copy_body_data *id, basic_block bb,
  /* If -fcondition-coverage is used, register the inlined conditions
 in the cond->expression mapping of the caller.  The expression tag
 is shifted conditions from the two bodies are not mixed.  */
- if (id->src_cfun->cond_uids && is_a  (orig_stmt))
+ if (id->src_cfun->cond_uids && is_a  (stmt))
{
  gcond *orig_cond = as_a  (orig_stmt);
  gcond *cond = as_a  (stmt);
-- 
2.35.3


Re: [PATCH] asan, v3: Fix up handling of > 32 byte aligned variables with -fsanitize=address -fstack-protector* [PR110027]

2024-04-11 Thread Richard Biener
  prev_offset = frame_offset.to_constant ();
> }
> to -ASAN_RED_ZONE_SIZE.  The asan_emit_stack_protection code wasn't
> taking this into account though, so essentially assumed in the
> __asan_stack_malloc_N allocated memory it needs to align it such that
> pointer corresponding to offsets[0] is alignb aligned.  But that isn't
> correct if alignb > ASAN_RED_ZONE_SIZE, in that case it needs to ensure that
> pointer corresponding to frame offset 0 is alignb aligned.
> 
> The following patch fixes that.  Unlike the previous case where
> we knew that asan_frame_size + base_align_bias falls into the same bucket
> as asan_frame_size, this isn't in some cases true anymore, so the patch
> recomputes which bucket to use and if going to bucket 11 (because there is
> no __asan_stack_malloc_11 function in the library) disables the after return
> sanitization.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM.

Thanks,
Richard.

> 2024-04-11  Jakub Jelinek  
> 
>   PR middle-end/110027
>   * asan.cc (asan_emit_stack_protection): Assert offsets[0] is
>   zero if there is no stack protect guard, otherwise
>   -ASAN_RED_ZONE_SIZE.  If alignb > ASAN_RED_ZONE_SIZE and there is
>   stack pointer guard, take the ASAN_RED_ZONE_SIZE bytes allocated at
>   the top of the stack into account when computing base_align_bias.
>   Recompute use_after_return_class from asan_frame_size + base_align_bias
>   and set to -1 if that would overflow to 11.
> 
>   * gcc.dg/asan/pr110027.c: New test.
> 
> --- gcc/asan.cc.jj2024-04-10 09:54:39.661231059 +0200
> +++ gcc/asan.cc   2024-04-10 12:12:11.337978004 +0200
> @@ -1911,19 +1911,39 @@ asan_emit_stack_protection (rtx base, rt
>  }
>str_cst = asan_pp_string (_pp);
>  
> +  gcc_checking_assert (offsets[0] == (crtl->stack_protect_guard
> +   ? -ASAN_RED_ZONE_SIZE : 0));
>/* Emit the prologue sequence.  */
>if (asan_frame_size > 32 && asan_frame_size <= 65536 && pbase
>&& param_asan_use_after_return)
>  {
> +  HOST_WIDE_INT adjusted_frame_size = asan_frame_size;
> +  /* The stack protector guard is allocated at the top of the frame
> +  and cfgexpand.cc then uses align_frame_offset (ASAN_RED_ZONE_SIZE);
> +  while in that case we can still use asan_frame_size, we need to take
> +  that into account when computing base_align_bias.  */
> +  if (alignb > ASAN_RED_ZONE_SIZE && crtl->stack_protect_guard)
> + adjusted_frame_size += ASAN_RED_ZONE_SIZE;
>use_after_return_class = floor_log2 (asan_frame_size - 1) - 5;
>/* __asan_stack_malloc_N guarantees alignment
>N < 6 ? (64 << N) : 4096 bytes.  */
>if (alignb > (use_after_return_class < 6
>   ? (64U << use_after_return_class) : 4096U))
>   use_after_return_class = -1;
> -  else if (alignb > ASAN_RED_ZONE_SIZE && (asan_frame_size & (alignb - 
> 1)))
> - base_align_bias = ((asan_frame_size + alignb - 1)
> -& ~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
> +  else if (alignb > ASAN_RED_ZONE_SIZE
> +&& (adjusted_frame_size & (alignb - 1)))
> + {
> +   base_align_bias
> + = ((adjusted_frame_size + alignb - 1)
> +& ~(alignb - HOST_WIDE_INT_1)) - adjusted_frame_size;
> +   use_after_return_class
> + = floor_log2 (asan_frame_size + base_align_bias - 1) - 5;
> +   if (use_after_return_class > 10)
> + {
> +   base_align_bias = 0;
> +   use_after_return_class = -1;
> + }
> + }
>  }
>  
>/* Align base if target is STRICT_ALIGNMENT.  */
> --- gcc/testsuite/gcc.dg/asan/pr110027.c.jj   2024-04-10 12:01:19.939768472 
> +0200
> +++ gcc/testsuite/gcc.dg/asan/pr110027.c  2024-04-10 12:11:52.728229147 
> +0200
> @@ -0,0 +1,50 @@
> +/* PR middle-end/110027 */
> +/* { dg-do run } */
> +/* { dg-additional-options "-fstack-protector-strong" { target 
> fstack_protector } } */
> +/* { dg-set-target-env-var ASAN_OPTIONS "detect_stack_use_after_return=1" } 
> */
> +
> +struct __attribute__((aligned (128))) S { char s[128]; };
> +struct __attribute__((aligned (64))) T { char s[192]; };
> +struct __attribute__((aligned (32))) U { char s[256]; };
> +struct __attribute__((aligned (64))) V { char s[320]; };
> +struct __attribute__((aligned (128))) W { char s[512]; };
> +
> +__attribute__((noipa)) void
> +foo (void *p, void *q, void *r, void *s)
> +{
> +  if (((__UINTPTR_TYPE__) p & 31) != 0
> +  || ((__UINTPTR_TYPE__) q & 127) != 0
> +  || ((__UINTPTR_TYPE__) r & 63) != 0)
> +__builtin_abort ();
> +  (void *) s;
> +}
> +
> +__attribute__((noipa)) int
> +bar (void)
> +{
> +  struct U u;
> +  struct S s;
> +  struct T t;
> +  char p[4];
> +  foo (, , , );
> +  return 42;
> +}
> +
> +__attribute__((noipa)) int
> +baz (void)
> +{
> +  struct W w;
> +  struct U u;
> +  struct V v;
> +  char p[4];
> +  foo (, , , );
> +  return 42;
> +}
> +
> +int
> +main ()
> +{
> +  bar ();
> +  baz ();
> +  return 0;
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] s390: testsuite: Fix loop-interchange-16.c

2024-04-11 Thread Richard Biener
On Thu, Apr 11, 2024 at 9:02 AM Stefan Schulze Frielinghaus
 wrote:
>
> Revert parameter max-completely-peel-times to 16, otherwise, the
> innermost loop is removed and we are left with no loop interchange which
> this test is all about.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/loop-interchange-16.c: Revert parameter
> max-completely-peel-times for s390.
> ---
>  Ok for mainline?

Can you check whether placing

#pragma GCC unroll 0

before the innermost loop works as well?  That'd be more to the point.

OK if that works.

thanks,
Richard.

>  gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
> index 781555e085d..2530ec84bc0 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-16.c
> @@ -1,6 +1,7 @@
>  /* PR/101280 */
>  /* { dg-do compile } */
>  /* { dg-options "-O3 -fdump-tree-linterchange-details" } */
> +/* { dg-additional-options "--param max-completely-peel-times=16" { target 
> s390*-*-* } } */
>
>  void dummy (double *, double *);
>  #define LEN_2D 32
> --
> 2.43.0
>


Re: Combine patch ping

2024-04-11 Thread Richard Biener
On Wed, 10 Apr 2024, Uros Bizjak wrote:

> On Wed, Apr 10, 2024 at 7:56 PM Segher Boessenkool
>  wrote:
> >
> > On Sun, Apr 07, 2024 at 08:31:38AM +0200, Uros Bizjak wrote:
> > > If there are no further comments, I plan to commit the referred patch
> > > to the mainline on Wednesday. The latest version can be considered an
> > > obvious patch that solves certain oversight in the original
> > > implementation.
> >
> > This is never okay.  You cannot commit a patch without approval, *ever*.
> >
> > That patch is also obvious -- obviously *wrong*, that is.  There are
> > big assumptions everywhere in the compiler how a CC reg can be used.
> > This violates that, as explained elsewhere.
> 
> Can you please elaborate what is wrong with this concrete patch.

Better show a correct patch.  The interchanges in the last months
have not been constructive at all.

Richard.

Re: [PATCH] c++/114409 - ANNOTATE_EXPR and templates

2024-04-10 Thread Richard Biener
On Wed, 10 Apr 2024, Jakub Jelinek wrote:

> On Wed, Apr 10, 2024 at 06:43:02PM +0200, Richard Biener wrote:
> > The following fixes a mismatch in COMPOUND_EXPR handling in
> > tsubst_expr vs tsubst_stmt where the latter allows a stmt in
> > operand zero but the former doesn't.  This makes a difference
> > for the case at hand because when the COMPOUND_EXPR is wrapped
> > inside an ANNOTATE_EXPR it gets handled by tsubst_expr and when
> > not, tsubst_stmt successfully handles it and the contained
> > DECL_EXPR in operand zero.
> > 
> > The following makes handling of COMPOUND_EXPR in tsubst_expr
> > consistent with that of tsubst_stmt for the operand that doesn't
> > specify the result and thus the reason we choose either or the
> > other for substing.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> > 
> > Thanks,
> > Richard.
> > 
> > PR c++/114409
> > gcc/cp/
> > * pt.cc (tsubst_expr): Recurse to COMPOUND_EXPR operand
> > zero using tsubst_stmt, when that returns NULL return
> > the subst operand one, mimicing what tsubst_stmt does.
> > 
> > gcc/testsuite/
> > * g++.dg/pr114409.C: New testcase.
> 
> I've posted https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114409#c16
> for this already and Jason agreed to that version, so I just have to test it
> tonight:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649165.html

Ah, I saw the bugzilla patches and wanted this version to be sent
because I think the COMPOUND_EXPR inconsistency is odd.  So Jason,
please still have a look, not necessarily because of the bug
which can be fixed in multiple ways but because of that COMPOUND_EXPR
handling oddity (there are already some cases in tsubst_expr that
explicitly recurse with tsubst_stmt).

Richard.

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] c++/114409 - ANNOTATE_EXPR and templates

2024-04-10 Thread Richard Biener
The following fixes a mismatch in COMPOUND_EXPR handling in
tsubst_expr vs tsubst_stmt where the latter allows a stmt in
operand zero but the former doesn't.  This makes a difference
for the case at hand because when the COMPOUND_EXPR is wrapped
inside an ANNOTATE_EXPR it gets handled by tsubst_expr and when
not, tsubst_stmt successfully handles it and the contained
DECL_EXPR in operand zero.

The following makes handling of COMPOUND_EXPR in tsubst_expr
consistent with that of tsubst_stmt for the operand that doesn't
specify the result and thus the reason we choose either or the
other for substing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR c++/114409
gcc/cp/
* pt.cc (tsubst_expr): Recurse to COMPOUND_EXPR operand
zero using tsubst_stmt, when that returns NULL return
the subst operand one, mimicing what tsubst_stmt does.

gcc/testsuite/
* g++.dg/pr114409.C: New testcase.
---
 gcc/cp/pt.cc| 5 -
 gcc/testsuite/g++.dg/pr114409.C | 8 
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/pr114409.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index bf4b89d8413..dae423a751f 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20635,8 +20635,11 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 
 case COMPOUND_EXPR:
   {
-   tree op0 = tsubst_expr (TREE_OPERAND (t, 0), args,
+   tree op0 = tsubst_stmt (TREE_OPERAND (t, 0), args,
complain & ~tf_decltype, in_decl);
+   if (op0 == NULL_TREE)
+ /* If the first operand was a statement, we're done with it.  */
+ RETURN (RECUR (TREE_OPERAND (t, 1)));
RETURN (build_x_compound_expr (EXPR_LOCATION (t),
   op0,
   RECUR (TREE_OPERAND (t, 1)),
diff --git a/gcc/testsuite/g++.dg/pr114409.C b/gcc/testsuite/g++.dg/pr114409.C
new file mode 100644
index 000..6343fe8d9f3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr114409.C
@@ -0,0 +1,8 @@
+// { dg-do compile }
+
+template  int t() {
+#pragma GCC unroll 4
+while (int ThisEntry = 0) { } // { dg-bogus "ignoring loop annotation" "" 
{ xfail *-*-* } }
+return 0;
+}
+int tt = t<1>();
-- 
2.35.3


[PATCH] tree-optimization/114672 - WIDEN_MULT_PLUS_EXPR type mismatch

2024-04-10 Thread Richard Biener
The following makes sure to restrict WIDEN_MULT*_EXPR to a mode
precision final compute type as the mode is used to find the optab
and type checking chokes when seeing bit-precisions later which
would likely also not properly expanded to RTL.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114672
* tree-ssa-math-opts.cc (convert_plusminus_to_widen): Only
allow mode-precision results.

* gcc.dg/torture/pr114672.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr114672.c | 14 ++
 gcc/tree-ssa-math-opts.cc   |  5 +++--
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr114672.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr114672.c 
b/gcc/testsuite/gcc.dg/torture/pr114672.c
new file mode 100644
index 000..b69511fe8db
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr114672.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+struct {
+  __INT64_TYPE__ m : 60;
+} s;
+
+short a;
+short b;
+
+void
+foo ()
+{
+  s.m += a * b;
+}
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index a8d25c2de48..705f4a4695a 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -2918,8 +2918,9 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
gimple *stmt,
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
-  if (TREE_CODE (type) != INTEGER_TYPE
-  && TREE_CODE (type) != FIXED_POINT_TYPE)
+  if ((TREE_CODE (type) != INTEGER_TYPE
+   && TREE_CODE (type) != FIXED_POINT_TYPE)
+  || !type_has_mode_precision_p (type))
 return false;
 
   if (code == MINUS_EXPR)
-- 
2.35.3


Re: [PATCH] testsuite: Adjust pr113359-2_*.c with unsigned long long [PR114662]

2024-04-10 Thread Richard Biener
On Wed, Apr 10, 2024 at 8:24 AM Kewen.Lin  wrote:
>
> Hi,
>
> pr113359-2_*.c define a struct having unsigned long type
> members ay and az which have 4 bytes size at -m32, while
> the related constants CL1 and CL2 used for equality check
> are always 8 bytes, it makes compiler consider the below
>
>   69   if (a.ay != CL1)
>   70 __builtin_abort ();
>
> always to abort and optimize away the following call to
> getb, which leads to the expected wpa dumping on
> "Semantic equality" missing.
>
> This patch is to modify the types with unsigned long long
> accordingly.  Tested well on powerpc64-linux-gnu.
>
> Is it ok for trunk?

OK

> BR,
> Kewen
> -
> PR testsuite/114662
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/lto/pr113359-2_0.c: Use unsigned long long instead of
> unsigned long.
> * gcc.dg/lto/pr113359-2_1.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/lto/pr113359-2_0.c | 8 
>  gcc/testsuite/gcc.dg/lto/pr113359-2_1.c | 8 
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/lto/pr113359-2_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr113359-2_0.c
> index 8b2d5bdfab2..8495667599d 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr113359-2_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr113359-2_0.c
> @@ -8,15 +8,15 @@
>  struct SA
>  {
>unsigned int ax;
> -  unsigned long ay;
> -  unsigned long az;
> +  unsigned long long ay;
> +  unsigned long long az;
>  };
>
>  struct SB
>  {
>unsigned int bx;
> -  unsigned long by;
> -  unsigned long bz;
> +  unsigned long long by;
> +  unsigned long long bz;
>  };
>
>  struct ZA
> diff --git a/gcc/testsuite/gcc.dg/lto/pr113359-2_1.c 
> b/gcc/testsuite/gcc.dg/lto/pr113359-2_1.c
> index 61bc0547981..8320f347efe 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr113359-2_1.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr113359-2_1.c
> @@ -5,15 +5,15 @@
>  struct SA
>  {
>unsigned int ax;
> -  unsigned long ay;
> -  unsigned long az;
> +  unsigned long long ay;
> +  unsigned long long az;
>  };
>
>  struct SB
>  {
>unsigned int bx;
> -  unsigned long by;
> -  unsigned long bz;
> +  unsigned long long by;
> +  unsigned long long bz;
>  };
>
>  struct ZA
> --
> 2.43.0


[PATCH] Revert "combine: Don't combine if I2 does not change"

2024-04-09 Thread Richard Biener
This reverts commit 839bc42772ba7af66af3bd16efed4a69511312ae.

I have now pushed the temporary reversion of this to resolve the
P1 regressions this caused.  I'll re-install it on trunk once 14.1
was released (which might be a week or two after stage1 opens).

Richard.

---
 gcc/combine.cc | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 71c9abc145c..92b8d98e6c1 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4196,17 +4196,6 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   adjust_for_new_dest (i3);
 }
 
-  /* If I2 didn't change, this is not a combination (but a simplification or
- canonicalisation with context), which should not be done here.  Doing
- it here explodes the algorithm.  Don't.  */
-  if (rtx_equal_p (newi2pat, PATTERN (i2)))
-{
-  if (dump_file)
-   fprintf (dump_file, "i2 didn't change, not doing this\n");
-  undo_all ();
-  return 0;
-}
-
   /* We now know that we can do this combination.  Merge the insns and
  update the status of registers and LOG_LINKS.  */
 
-- 
2.35.3


Re: [PATCH] lto/114655 - -flto=4 at link time doesn't override -flto=auto at compile time

2024-04-09 Thread Richard Biener
On Tue, 9 Apr 2024, Jan Hubicka wrote:

> > The following adjusts -flto option processing in lto-wrapper to have
> > link-time -flto override any compile time setting.
> > 
> > LTO-boostrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > OK for trunk and branches?  GCC 11 seems to be unaffected by this.
> > 
> > Thanks,
> > Richard.
> > 
> > PR lto/114655
> > * lto-wrapper.cc (merge_flto_options): Add force argument.
> > (merge_and_complain): Do not force here.
> > (run_gcc): But here to make the link-time -flto option override
> > any compile-time one.
> Looks good to me.  I am actually surprised we propagate -flto settings
> from compile time at all.  I guess I never tried it since I never
> assumed it to work :)

We do magic now ;)  I think this was done because while people manage
to use CFLAGS=-flto they eventually fail to adjust LDFLAGS and without
plugin auto-loading you won't get LTO and in particular not -flto=auto.

I checked that it now works as expected - fortunately -v now displays
the make invocation command, so it was easy to verify.

Richard.


[PATCH] lto/114655 - -flto=4 at link time doesn't override -flto=auto at compile time

2024-04-09 Thread Richard Biener
The following adjusts -flto option processing in lto-wrapper to have
link-time -flto override any compile time setting.

LTO-boostrapped on x86_64-unknown-linux-gnu, testing in progress.

OK for trunk and branches?  GCC 11 seems to be unaffected by this.

Thanks,
Richard.

PR lto/114655
* lto-wrapper.cc (merge_flto_options): Add force argument.
(merge_and_complain): Do not force here.
(run_gcc): But here to make the link-time -flto option override
any compile-time one.
---
 gcc/lto-wrapper.cc | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 610594cdc2b..02579951569 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -218,15 +218,18 @@ find_option (vec , 
cl_decoded_option *option)
   return find_option (options, option->opt_index);
 }
 
-/* Merge -flto FOPTION into vector of DECODED_OPTIONS.  */
+/* Merge -flto FOPTION into vector of DECODED_OPTIONS.  If FORCE is true
+   then FOPTION overrides previous settings.  */
 
 static void
 merge_flto_options (vec _options,
-   cl_decoded_option *foption)
+   cl_decoded_option *foption, bool force)
 {
   int existing_opt = find_option (decoded_options, foption);
   if (existing_opt == -1)
 decoded_options.safe_push (*foption);
+  else if (force)
+decoded_options[existing_opt].arg = foption->arg;
   else
 {
   if (strcmp (foption->arg, decoded_options[existing_opt].arg) != 0)
@@ -493,7 +496,7 @@ merge_and_complain (vec _options,
  break;
 
case OPT_flto_:
- merge_flto_options (decoded_options, foption);
+ merge_flto_options (decoded_options, foption, false);
  break;
}
 }
@@ -1550,8 +1553,8 @@ run_gcc (unsigned argc, char *argv[])
  break;
 
case OPT_flto_:
- /* Merge linker -flto= option with what we have in IL files.  */
- merge_flto_options (fdecoded_options, option);
+ /* Override IL file settings with a linker -flto= option.  */
+ merge_flto_options (fdecoded_options, option, true);
  if (strcmp (option->arg, "jobserver") == 0)
jobserver_requested = true;
  break;
-- 
2.35.3


[PATCH] Remove live-info global bitmap

2024-04-09 Thread Richard Biener
The following removes the unused tree_live_info_d->global bitmap.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.

Richard.

* tree-ssa-live.h (tree_live_info_d::global): Remove.
(partition_is_global): Likewise.
(make_live_on_entry): Do not set bit in global.
* tree-ssa-live.cc (new_tree_live_info): Do not allocate
global bitmap.
(delete_tree_live_info): Do not release it.
(set_var_live_on_entry): Do not set bits in it.
---
 gcc/tree-ssa-live.cc | 13 +
 gcc/tree-ssa-live.h  | 13 -
 2 files changed, 1 insertion(+), 25 deletions(-)

diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
index d94e94eb3bc..fa6be2fced3 100644
--- a/gcc/tree-ssa-live.cc
+++ b/gcc/tree-ssa-live.cc
@@ -1015,7 +1015,6 @@ new_tree_live_info (var_map map)
   live->work_stack = XNEWVEC (int, last_basic_block_for_fn (cfun));
   live->stack_top = live->work_stack;
 
-  live->global = BITMAP_ALLOC (NULL);
   return live;
 }
 
@@ -1035,7 +1034,6 @@ delete_tree_live_info (tree_live_info_p live)
   bitmap_obstack_release (>liveout_obstack);
   free (live->liveout);
 }
-  BITMAP_FREE (live->global);
   free (live->work_stack);
   free (live);
 }
@@ -1123,7 +1121,6 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p 
live)
   use_operand_p use;
   basic_block def_bb = NULL;
   imm_use_iterator imm_iter;
-  bool global = false;
 
   p = var_to_partition (live->map, ssa_name);
   if (p == NO_PARTITION)
@@ -1173,16 +1170,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p 
live)
 
   /* If there was a live on entry use, set the bit.  */
   if (add_block)
-{
- global = true;
- bitmap_set_bit (>livein[add_block->index], p);
-   }
+   bitmap_set_bit (>livein[add_block->index], p);
 }
-
-  /* If SSA_NAME is live on entry to at least one block, fill in all the live
- on entry blocks between the def and all the uses.  */
-  if (global)
-bitmap_set_bit (live->global, p);
 }
 
 
diff --git a/gcc/tree-ssa-live.h b/gcc/tree-ssa-live.h
index e86ce0c1768..ac39091f5d2 100644
--- a/gcc/tree-ssa-live.h
+++ b/gcc/tree-ssa-live.h
@@ -237,9 +237,6 @@ typedef struct tree_live_info_d
   /* Var map this relates to.  */
   var_map map;
 
-  /* Bitmap indicating which partitions are global.  */
-  bitmap global;
-
   /* Bitmaps of live on entry blocks for partition elements.  */
   bitmap_head *livein;
 
@@ -276,15 +273,6 @@ extern bitmap live_vars_at_stmt (vec &, 
live_vars_map *,
 gimple *);
 extern void destroy_live_vars (vec &);
 
-/*  Return TRUE if P is marked as a global in LIVE.  */
-
-inline int
-partition_is_global (tree_live_info_p live, int p)
-{
-  gcc_checking_assert (live->global);
-  return bitmap_bit_p (live->global, p);
-}
-
 
 /* Return the bitmap from LIVE representing the live on entry blocks for
partition P.  */
@@ -329,7 +317,6 @@ inline void
 make_live_on_entry (tree_live_info_p live, basic_block bb , int p)
 {
   bitmap_set_bit (>livein[bb->index], p);
-  bitmap_set_bit (live->global, p);
 }
 
 
-- 
2.35.3


Re: [PATCH] Guard function->cond_uids access [PR114601]

2024-04-09 Thread Richard Biener
On Tue, 9 Apr 2024, Jørgen Kvalsvik wrote:

> PR114601 shows that it is possible to reach the condition_uid lookup
> without having also created the fn->cond_uids, through
> compiler-generated conditionals. Consider all lookups on non-existing
> maps misses, which they are from the perspective of the source code, to
> avoid the NULL access.

OK.

>   PR gcov-profile/114601
> 
> gcc/ChangeLog:
> 
>   * tree-profile.cc (condition_uid): Guard fn->cond_uids access.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.misc-tests/gcov-pr114601.c: New test.
> ---
>  gcc/testsuite/gcc.misc-tests/gcov-pr114601.c | 11 +++
>  gcc/tree-profile.cc  |  9 +++--
>  2 files changed, 18 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr114601.c
> 
> diff --git a/gcc/testsuite/gcc.misc-tests/gcov-pr114601.c 
> b/gcc/testsuite/gcc.misc-tests/gcov-pr114601.c
> new file mode 100644
> index 000..72248c8fd25
> --- /dev/null
> +++ b/gcc/testsuite/gcc.misc-tests/gcov-pr114601.c
> @@ -0,0 +1,11 @@
> +/* PR gcov-profile/114601 */
> +/* { dg-do compile } */
> +/* { dg-options "-fcondition-coverage -finstrument-functions-once" } */
> +
> +/* -finstrument-functions-once inserts a hidden conditional expression into
> +   this function which otherwise has none.  This caused a crash on looking up
> +   the condition as the cond->expr map is not created unless it necessary.  
> */
> +void
> +empty (void)
> +{
> +}
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index b85111624fe..b87c121790c 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -359,12 +359,17 @@ condition_index (unsigned flag)
> min-max, etc., which leaves ghost identifiers in basic blocks that do not
> end with a conditional jump.  They are not really meaningful for condition
> coverage anymore, but since coverage is unreliable under optimization 
> anyway
> -   this is not a big problem.  */
> +   this is not a big problem.
> +
> +   The cond_uids map in FN cannot be expected to exist.  It will only be
> +   created if it is needed, and a function may have gconds even though there
> +   are none in source.  This can be seen in PR gcov-profile/114601, when
> +   -finstrument-functions-once is used and the function has no conditions.  
> */
>  unsigned
>  condition_uid (struct function *fn, basic_block b)
>  {
>  gimple *stmt = gsi_stmt (gsi_last_bb (b));
> -if (!safe_is_a (stmt))
> +if (!safe_is_a  (stmt) || !fn->cond_uids)
>   return 0;
>  
>  unsigned *v = fn->cond_uids->get (as_a  (stmt));
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH/RFC] On the use of -funreachable-traps to deal with PR 109627

2024-04-09 Thread Richard Biener
On Tue, Apr 9, 2024 at 9:11 AM Jakub Jelinek  wrote:
>
> On Tue, Apr 09, 2024 at 09:03:59AM +0200, Richard Biener wrote:
> > > With the possibility of sounding like a broken record, I think
> > > __builtin_unreachable is fundamentally flawed.   It generates no code
> > > and just lets the program continue if ever "reached".  This is a
> > > security risk and (IMHO) just plain silly.  We're in a situation that is
> > > never supposed to happen, so continuing to execute code is just asking
> > > for problems.
> > >
> > > If it were up to me, I'd have __builtin_unreachable emit a trap or
> > > similar construct that should (in general) halt execution.
> >
> > __builtin_unreachable tells the compiler it's OK to omit a path to it
> > while __builtin_trap doesn't.  So once we replace the former with the
> > latter we have to keep the path.  Maybe that's OK.  I do agree that
> > the RTL representation of expanding __builtin_unreachable () to
> > "nothing" is bad.  Expanding to a trap always would be OK with me.
>
> Even that would prevent tons of needed optimizations, especially the
> reason why __builtin_unreachable () has been added in the first place
> - for asm goto which always branches and so the kernel can put
> __builtin_unreachable () after it to say that it won't fall through.
> I think the kernel folks would be upset if we change that.
>
> So, can't we instead just emit a trap when in the last cfglayout -> cfgrtl
> switch we see that the last bb in the function doesn't have any successors?

That's probably a good middle-ground if we can identify that "last" switch
easily (why not do it at each such switch?)

Richard.

> Jakub
>


Re: [PATCH] build: Check for cargo when building rust language

2024-04-09 Thread Richard Biener
On Mon, Apr 8, 2024 at 6:39 PM  wrote:
>
> From: Pierre-Emmanuel Patry 
>
> Hello,
>
> The rust frontend requires cargo to build some of it's components,
> it's presence was not checked during configuration.

OK.

Please work on documenting build requirements for rust in doc/install.texi,
look for where Ada build requirements are documented.

Richard.

> Best regards,
> Pierre-Emmanuel
>
> --
>
> Prevent rust language from building when cargo is
> missing.
>
> config/ChangeLog:
>
> * acx.m4: Add a macro to check for rust
> components.
>
> ChangeLog:
>
> * configure: Regenerate.
> * configure.ac: Emit an error message when cargo
> is missing.
>
> Signed-off-by: Pierre-Emmanuel Patry 
> ---
>  config/acx.m4 |  11 +
>  configure | 117 ++
>  configure.ac  |  18 
>  3 files changed, 146 insertions(+)
>
> diff --git a/config/acx.m4 b/config/acx.m4
> index 7efe98aaf96..3c5fe67342e 100644
> --- a/config/acx.m4
> +++ b/config/acx.m4
> @@ -424,6 +424,17 @@ else
>  fi
>  ])
>
> +# Test for Rust
> +# We require cargo and rustc for some parts of the rust compiler.
> +AC_DEFUN([ACX_PROG_CARGO],
> +[AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> +AC_CHECK_TOOL(CARGO, cargo, no)
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> +else
> +  have_cargo=no
> +fi])
> +
>  # Test for D.
>  AC_DEFUN([ACX_PROG_GDC],
>  [AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> diff --git a/configure b/configure
> index 874966fb9f0..46e66e20197 100755
> --- a/configure
> +++ b/configure
> @@ -714,6 +714,7 @@ PGO_BUILD_GEN_CFLAGS
>  HAVE_CXX11_FOR_BUILD
>  HAVE_CXX11
>  do_compare
> +CARGO
>  GDC
>  GNATMAKE
>  GNATBIND
> @@ -5786,6 +5787,104 @@ else
>have_gdc=no
>  fi
>
> +
> +if test -n "$ac_tool_prefix"; then
> +  # Extract the first word of "${ac_tool_prefix}cargo", so it can be a 
> program name with args.
> +set dummy ${ac_tool_prefix}cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$CARGO"; then
> +  ac_cv_prog_CARGO="$CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_CARGO="${ac_tool_prefix}cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +CARGO=$ac_cv_prog_CARGO
> +if test -n "$CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CARGO" >&5
> +$as_echo "$CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +
> +fi
> +if test -z "$ac_cv_prog_CARGO"; then
> +  ac_ct_CARGO=$CARGO
> +  # Extract the first word of "cargo", so it can be a program name with args.
> +set dummy cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_ac_ct_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$ac_ct_CARGO"; then
> +  ac_cv_prog_ac_ct_CARGO="$ac_ct_CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_ac_ct_CARGO="cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +ac_ct_CARGO=$ac_cv_prog_ac_ct_CARGO
> +if test -n "$ac_ct_CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CARGO" >&5
> +$as_echo "$ac_ct_CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +  if test "x$ac_ct_CARGO" = x; then
> +CARGO="no"
> +  else
> +case $cross_compiling:$ac_tool_warned in
> +yes:)
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not 
> prefixed with host triplet" >&5
> +$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" 
> >&2;}
> +ac_tool_warned=yes ;;
> +esac
> +CARGO=$ac_ct_CARGO
> +  fi
> +else
> +  CARGO="$ac_cv_prog_CARGO"
> +fi
> +
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> +else
> +  have_cargo=no
> +fi
>  { $as_echo "$as_me:${as_lineno-$LINENO}: checking how to compare 
> bootstrapped objects" >&5
>  $as_echo_n "checking how to compare bootstrapped objects... " >&6; }
>  if ${gcc_cv_prog_cmp_skip+:} false; then :
> @@ -9099,6 +9198,24 

Re: [PATCH 2/2] Generate constant at start of loop, without UB

2024-04-09 Thread Richard Biener
On Mon, 8 Apr 2024, Jørgen Kvalsvik wrote:

> Generating the constants used for recording the edges taken for
> condition coverage would trigger undefined behavior when an expression
> had exactly 64 (== sizeof (1ULL)) conditions, as it would generate the
> constant for the next iteration at the end of the loop body, even if there
> was never a next iteration. By moving the check and constant generation
> to the top of the loop and hoisting the increment flag there is no
> opportunity for UB.

OK.

Thanks,
Richard.

>   PR 114627
> 
> gcc/ChangeLog:
> 
>   * tree-profile.cc (instrument_decisions): Generate constant
>   at the start of loop.
> ---
>  gcc/tree-profile.cc | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index 33ff550a7bc..e58f5c83472 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -1049,6 +1049,7 @@ instrument_decisions (array_slice expr, 
> size_t condno,
>  zerocounter[2] = zero;
>  
>  unsigned xi = 0;
> +bool increment = false;
>  tree rhs = build_int_cst (gcov_type_node, 1ULL << xi);
>  for (basic_block current : expr)
>  {
> @@ -1057,7 +1058,14 @@ instrument_decisions (array_slice expr, 
> size_t condno,
>   candidates.safe_push (zerocounter);
>   counters prev = resolve_counters (candidates);
>  
> - int increment = 0;
> + if (increment)
> + {
> + xi += 1;
> + gcc_checking_assert (xi < sizeof (uint64_t) * BITS_PER_UNIT);
> + rhs = build_int_cst (gcov_type_node, 1ULL << xi);
> + increment = false;
> + }
> +
>   for (edge e : current->succs)
>   {
>   counters next = prev;
> @@ -1072,7 +1080,7 @@ instrument_decisions (array_slice expr, 
> size_t condno,
>   tree m = build_int_cst (gcov_type_node, masks[2*xi + k]);
>   next[2] = emit_bitwise_op (e, prev[2], BIT_IOR_EXPR, m);
>   }
> - increment = 1;
> + increment = true;
>   }
>   else if (e->flags & EDGE_COMPLEX)
>   {
> @@ -1085,11 +1093,13 @@ instrument_decisions (array_slice expr, 
> size_t condno,
>   }
>   table.get_or_insert (e->dest).safe_push (next);
>   }
> - xi += increment;
> - if (increment)
> - rhs = build_int_cst (gcov_type_node, 1ULL << xi);
>  }
>  
> +/* Since this is also the return value, the number of conditions, make 
> sure
> +   to include the increment of the last basic block.  */
> +if (increment)
> + xi += 1;
> +
>  gcc_assert (xi == bitmap_count_bits (core));
>  
>  const tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH 1/2] Add tree-inlined gconds to caller cond->expr map

2024-04-09 Thread Richard Biener
(struct function *callee, struct 
> function *caller,
> }
>   add_local_decl (caller, new_var);
>}
> -
> -  /* If -fcondition-coverage is used and the caller has conditions, copy the
> - mapping into the caller but and the end so the caller and callee
> - expressions aren't mixed.  */
> -  if (callee->cond_uids)
> -{
> -  if (!caller->cond_uids)
> - caller->cond_uids = new hash_map  ();
> -
> -  unsigned dst_max_uid = 0;
> -  for (auto itr : *callee->cond_uids)
> - if (itr.second >= dst_max_uid)
> -   dst_max_uid = itr.second + 1;
> -
> -  for (auto itr : *callee->cond_uids)
> - caller->cond_uids->put (itr.first, itr.second + dst_max_uid);
> -}
>  }
>  
>  /* Add to BINDINGS a debug stmt resetting SRCVAR if inlining might
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] Fix up duplicated words mostly in comments, part 2

2024-04-09 Thread Richard Biener
c_on_device built-in redeclaration,
> the built-in is declared with int rather than enum because
> the enum isn't intrinsic.  */
>  && !(TREE_CODE (olddecl) == FUNCTION_DECL
> --- gcc/tree-ssa-sccvn.cc.jj  2024-03-07 10:01:05.042180535 +0100
> +++ gcc/tree-ssa-sccvn.cc 2024-04-08 17:21:09.216842533 +0200
> @@ -5979,7 +5979,7 @@ visit_phi (gimple *phi, bool *inserted,
>if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (PHI_RESULT (phi)))
>  return set_ssa_val_to (PHI_RESULT (phi), PHI_RESULT (phi));
>  
> -  /* We track whether a PHI was CSEd to to avoid excessive iterations
> +  /* We track whether a PHI was CSEd to avoid excessive iterations
>   that would be necessary only because the PHI changed arguments
>   but not value.  */
>if (!inserted)
> --- gcc/rtl-ssa/accesses.h.jj 2024-01-24 13:11:21.212467024 +0100
> +++ gcc/rtl-ssa/accesses.h2024-04-08 17:23:39.681854561 +0200
> @@ -358,7 +358,7 @@ public:
>use_info *next_any_insn_use () const;
>  
>// Return the next use by a debug instruction, or null if none.
> -  // This is only valid if if is_in_debug_insn ().
> +  // This is only valid if is_in_debug_insn ().
>use_info *next_debug_insn_use () const;
>  
>// Return the previous use by a phi node in the list, or null if none.
> --- gcc/jit/docs/topics/expressions.rst.jj2024-02-02 22:13:29.307363148 
> +0100
> +++ gcc/jit/docs/topics/expressions.rst   2024-04-08 17:25:30.396391778 
> +0200
> @@ -238,7 +238,7 @@ Constructor expressions
> The fields in ``fields`` need to be the same objects that were used
> to create the struct.
>  
> -   Each value has to have have the same unqualified type as the field
> +   Each value has to have the same unqualified type as the field
> it is applied to.
>  
> A NULL value element  in ``values`` is a shorthand for zero initialization
> --- gcc/doc/options.texi.jj   2024-01-05 08:35:13.505829960 +0100
> +++ gcc/doc/options.texi  2024-04-08 17:32:31.513829809 +0200
> @@ -422,9 +422,9 @@ The option is the inverse of another opt
>  the options-processing script will declare @code{TARGET_@var{thisname}},
>  @code{TARGET_@var{name}_P} and @code{TARGET_@var{name}_OPTS_P} macros:
>  @code{TARGET_@var{thisname}} is 1 when the option is active and 0 otherwise,
> -@code{TARGET_@var{name}_P} is similar to @code{TARGET_@var{name}} but take an
> -argument as @samp{target_flags}, and and @code{TARGET_@var{name}_OPTS_P} also
> -similar to @code{TARGET_@var{name}} but take an argument as 
> @code{gcc_options}.
> +@code{TARGET_@var{name}_P} is similar to @code{TARGET_@var{name}} but takes 
> an
> +argument as @samp{target_flags}, and @code{TARGET_@var{name}_OPTS_P} is also
> +similar to @code{TARGET_@var{name}} but takes an argument as 
> @code{gcc_options}.
>  
>  @item Enum(@var{name})
>  The option's argument is a string from the set of strings associated
> --- gcc/doc/invoke.texi.jj2024-04-08 09:44:46.130803133 +0200
> +++ gcc/doc/invoke.texi   2024-04-08 17:33:25.245123475 +0200
> @@ -11709,7 +11709,7 @@ By default the analyzer attempts to reco
>  frames, and to emit events showing the inlined calls.
>  
>  With @option{-fno-analyzer-undo-inlining} this attempt to reconstruct
> -the original frame information can be be disabled, which may be of help
> +the original frame information can be disabled, which may be of help
>  when debugging issues in the analyzer.
>  
>  @item -fanalyzer-verbose-edges
> @@ -25750,7 +25750,7 @@ Outputs pseudo-c assembly dialect.
>  @item -minline-memops-threshold=@var{bytes}
>  Specifies a size threshold in bytes at or below which memmove, memcpy
>  and memset shall always be expanded inline.  Operations dealing with
> -sizes larger than this threshold would have to be be implemented using
> +sizes larger than this threshold would have to be implemented using
>  a library call instead of being expanded inline, but since BPF doesn't
>  allow libcalls, exceeding this threshold results in a compile-time
>  error.  The default is @samp{1024} bytes.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH/RFC] On the use of -funreachable-traps to deal with PR 109627

2024-04-09 Thread Richard Biener
On Tue, Apr 9, 2024 at 6:03 AM Jeff Law  wrote:
>
>
>
> On 4/8/24 5:04 PM, Iain Sandoe wrote:
> > Hi
> >
> > PR 109627 is about functions that have had their bodies completely elided, 
> > but still have the wrappers for EH frames (either .cfi_xxx or LFSxx/LFExx).
> >
> > These are causing issues for some linkers because such functions result in 
> > FDEs with a 0 code extent.
> >
> > The simplest representation of this is (from PR109527)
> >
> > void foo () { __builtin_unreachable (); }
> With the possibility of sounding like a broken record, I think
> __builtin_unreachable is fundamentally flawed.   It generates no code
> and just lets the program continue if ever "reached".  This is a
> security risk and (IMHO) just plain silly.  We're in a situation that is
> never supposed to happen, so continuing to execute code is just asking
> for problems.
>
> If it were up to me, I'd have __builtin_unreachable emit a trap or
> similar construct that should (in general) halt execution.

__builtin_unreachable tells the compiler it's OK to omit a path to it
while __builtin_trap doesn't.  So once we replace the former with the
latter we have to keep the path.  Maybe that's OK.  I do agree that
the RTL representation of expanding __builtin_unreachable () to
"nothing" is bad.  Expanding to a trap always would be OK with me.

Richard.

> Jeff
>


Re: [PATCH] rs6000: Fix wrong align passed to build_aligned_type [PR88309]

2024-04-09 Thread Richard Biener
On Tue, Apr 9, 2024 at 4:07 AM Kewen.Lin  wrote:
>
> on 2024/4/8 18:47, Richard Biener wrote:
> > On Mon, Apr 8, 2024 at 11:22 AM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> As the comments in PR88309 show, there are two oversights
> >> in rs6000_gimple_fold_builtin that pass align in bytes to
> >> build_aligned_type but which actually requires align in
> >> bits, it causes unexpected ICE or hanging in function
> >> is_miss_rate_acceptable due to zero align_unit value.
> >>
> >> This patch is to fix them by converting bytes to bits, add
> >> an assertion on positive align_unit value and notes function
> >> build_aligned_type requires align measured in bits in its
> >> function comment.
> >>
> >> Bootstrapped and regtested on x86_64-redhat-linux,
> >> powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9 and P10.
> >>
> >> Is it (the generic part code change) ok for trunk?
> >
> > OK
>
> Thanks, pushed as r14-9850, is it also ok to backport after burn-in time?

Sure.

> BR,
> Kewen
>
> >
> >> BR,
> >> Kewen
> >> -
> >> PR target/88309
> >>
> >> Co-authored-by: Andrew Pinski 
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Fix
> >> wrong align passed to function build_aligned_type.
> >> * tree-ssa-loop-prefetch.cc (is_miss_rate_acceptable): Add an
> >> assertion to ensure align_unit should be positive.
> >> * tree.cc (build_qualified_type): Update function comments.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/powerpc/pr88309.c: New test.
> >> ---
> >>  gcc/config/rs6000/rs6000-builtin.cc|  4 ++--
> >>  gcc/testsuite/gcc.target/powerpc/pr88309.c | 27 ++
> >>  gcc/tree-ssa-loop-prefetch.cc  |  2 ++
> >>  gcc/tree.cc|  3 ++-
> >>  4 files changed, 33 insertions(+), 3 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr88309.c
> >>
> >> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> >> b/gcc/config/rs6000/rs6000-builtin.cc
> >> index 6698274031b..e7d6204074c 100644
> >> --- a/gcc/config/rs6000/rs6000-builtin.cc
> >> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> >> @@ -1900,7 +1900,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> >> *gsi)
> >> tree lhs_type = TREE_TYPE (lhs);
> >> /* In GIMPLE the type of the MEM_REF specifies the alignment.  The
> >>   required alignment (power) is 4 bytes regardless of data type.  
> >> */
> >> -   tree align_ltype = build_aligned_type (lhs_type, 4);
> >> +   tree align_ltype = build_aligned_type (lhs_type, 32);
> >> /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
> >> Create
> >>the tree using the value from arg0.  The resulting type will 
> >> match
> >>the type of arg1.  */
> >> @@ -1944,7 +1944,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> >> *gsi)
> >> tree arg2_type = ptr_type_node;
> >> /* In GIMPLE the type of the MEM_REF specifies the alignment.  The
> >>required alignment (power) is 4 bytes regardless of data type.  
> >> */
> >> -   tree align_stype = build_aligned_type (arg0_type, 4);
> >> +   tree align_stype = build_aligned_type (arg0_type, 32);
> >> /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
> >> Create
> >>the tree using the value from arg1.  */
> >> gimple_seq stmts = NULL;
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr88309.c 
> >> b/gcc/testsuite/gcc.target/powerpc/pr88309.c
> >> new file mode 100644
> >> index 000..c0078cf2b8c
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr88309.c
> >> @@ -0,0 +1,27 @@
> >> +/* { dg-require-effective-target powerpc_vsx_ok } */
> >> +/* { dg-options "-mvsx -O2 -fprefetch-loop-arrays" } */
> >> +
> >> +/* Verify there is no ICE or hanging.  */
> >> +
> >> +#include 
> >> +
> >> +void b(float *c, vector float a, vector float, vector float)
> >> +{
> >> +  vector float d;
> >> +  vector char ahbc;
> >> +  vec_xst(

Re: [PATCH] bitint: Don't move debug stmts from before returns_twice calls [PR114628]

2024-04-09 Thread Richard Biener
On Tue, 9 Apr 2024, Jakub Jelinek wrote:

> Hi!
> 
> Debug stmts are allowed by the verifier before the returns_twice calls.

Huh, interesting ;)

> More importantly, they don't have a lhs, so the current handling of
> arg_stmts statements to force them on the edges ICEs.
> 
> The following patch just keeps them where they were before.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-04-09  Jakub Jelinek  
> 
>   PR middle-end/114628
>   * gimple-lower-bitint.cc (gimple_lower_bitint): Keep debug stmts
>   before returns_twice calls as is, don't push them into arg_stmts
>   vector/move to edges.
> 
>   * gcc.dg/bitint-105.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-04-04 10:46:52.698026863 +0200
> +++ gcc/gimple-lower-bitint.cc2024-04-08 15:42:19.719892644 +0200
> @@ -7172,8 +7172,13 @@ gimple_lower_bitint (void)
> gimple_stmt_iterator gsi = gsi_after_labels (gimple_bb (stmt));
> while (gsi_stmt (gsi) != stmt)
>   {
> -   arg_stmts.safe_push (gsi_stmt (gsi));
> -   gsi_remove (, false);
> +   if (is_gimple_debug (gsi_stmt (gsi)))
> + gsi_next ();
> +   else
> + {
> +   arg_stmts.safe_push (gsi_stmt (gsi));
> +   gsi_remove (, false);
> + }
>   }
> gimple *g;
> basic_block bb = NULL;
> --- gcc/testsuite/gcc.dg/bitint-105.c.jj  2024-04-08 16:00:07.843630530 
> +0200
> +++ gcc/testsuite/gcc.dg/bitint-105.c 2024-04-08 15:49:28.687175492 +0200
> @@ -0,0 +1,29 @@
> +/* PR middle-end/114628 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2 -g" } */
> +
> +int foo (int);
> +#if __BITINT_MAXWIDTH__ >= 129
> +__attribute__((returns_twice)) int bar (_BitInt(129) x);
> +
> +void
> +baz (int x, _BitInt(129) y)
> +{
> +  void *q[] = { &, & };
> +l2:
> +  x = foo (foo (3));
> +  bar (y);
> +  goto *q[x & 1];
> +l1:;
> +}
> +
> +void
> +qux (int x, _BitInt(129) y)
> +{
> +  void *q[] = { &, & };
> +l2:
> +  x = foo (foo (3));
> +  bar (y);
> +l1:;
> +}
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] middle-end/114604 - ranger allocates bitmap without initialized obstack

2024-04-09 Thread Richard Biener
On Tue, 9 Apr 2024, Aldy Hernandez wrote:

> BTW, I'm not opposed to this patch.  Thank you for tracking this down,
> and feel free to commit as is if y'all PMs agree it's OK.  I just
> wanted to know if there's a better way going forward.  I can certainly
> put it on my TODO list once stage1 opens again.
> 
> And no, there probably isn't an obstack for those classes, but I
> wonder if we should have a class local one, as we do for the rest of
> the classes.

OK, I pushed it now, it looks like the GCC 13 branch isn't affected
in obviously the same way (but I didn't try instrumenting there).

Feel free to improve next stage1.

Richard.

> Aldy
> 
> On Mon, Apr 8, 2024 at 7:47 PM Richard Biener
>  wrote:
> >
> >
> >
> > > Am 08.04.2024 um 18:40 schrieb Aldy Hernandez :
> > >
> > > On Mon, Apr 8, 2024 at 6:29 PM Richard Biener  wrote:
> > >>
> > >>
> > >>
> > >>>> Am 08.04.2024 um 18:09 schrieb Aldy Hernandez :
> > >>>
> > >>> On Mon, Apr 8, 2024 at 5:54 PM Jakub Jelinek  wrote:
> > >>>>
> > >>>> On Mon, Apr 08, 2024 at 05:40:23PM +0200, Aldy Hernandez wrote:
> > >>>>>>   PR middle-end/114604
> > >>>>>>   * gimple-range.cc (enable_ranger): Initialize the global
> > >>>>>>   bitmap obstack.
> > >>>>>>   (disable_ranger): Release it.
> > >>>>>> ---
> > >>>>>> gcc/gimple-range.cc | 4 
> > >>>>>> 1 file changed, 4 insertions(+)
> > >>>>>>
> > >>>>>> diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
> > >>>>>> index c16b776c1e3..4d3b1ce8588 100644
> > >>>>>> --- a/gcc/gimple-range.cc
> > >>>>>> +++ b/gcc/gimple-range.cc
> > >>>>>> @@ -689,6 +689,8 @@ enable_ranger (struct function *fun, bool 
> > >>>>>> use_imm_uses)
> > >>>>>> {
> > >>>>>>  gimple_ranger *r;
> > >>>>>>
> > >>>>>> +  bitmap_obstack_initialize (NULL);
> > >>>>>> +
> > >>>>>>  gcc_checking_assert (!fun->x_range_query);
> > >>>>>>  r = new gimple_ranger (use_imm_uses);
> > >>>>>>  fun->x_range_query = r;
> > >>>>>> @@ -705,6 +707,8 @@ disable_ranger (struct function *fun)
> > >>>>>>  gcc_checking_assert (fun->x_range_query);
> > >>>>>>  delete fun->x_range_query;
> > >>>>>>  fun->x_range_query = NULL;
> > >>>>>> +
> > >>>>>> +  bitmap_obstack_release (NULL);
> > >>>>>
> > >>>>> Are you not allowed to initialize/use obstacks unless
> > >>>>> bitmap_obstack_initialize(NULL) is called?
> > >>>>
> > >>>> You can use it with some other obstack, just not the default one.
> > >>>>
> > >>>>> If so, wouldn't it be
> > >>>>> better to lazily initialize it downstream (bitmap_alloc, or whomever
> > >>>>> needs it initialized)?
> > >>>>
> > >>>> No, you still need to decide where is the safe point to release it.
> > >>>> Unlike the non-default 
> > >>>> bitmap_obstack_initialize/bitmap_obstack_release,
> > >>>> the default one can nest (has associated nesting counter).  So, the 
> > >>>> above
> > >>>> patch just says that ranger starts using the default obstack in
> > >>>> enable_ranger and stops using it in disable_ranger and anything ranger
> > >>>> associated in the obstack can be freed at that point.
> > >>>
> > >>> I thought ranger never used the default one:
> > >>>
> > >>> $ grep bitmap_obstack_initialize *value* *range*
> > >>> value-relation.cc:  bitmap_obstack_initialize (_bitmaps);
> > >>> value-relation.cc:  bitmap_obstack_initialize (_bitmaps);
> > >>> gimple-range-cache.cc:  bitmap_obstack_initialize (_bitmaps);
> > >>> gimple-range-gori.cc:  bitmap_obstack_initialize (_bitmaps);
> > >>> gimple-range-infer.cc:  bitmap_obstack_initialize (_bitmaps);
> > >>> gimple-range-phi.cc:  bitmap_obstack_initialize (_bitmaps);
> > >>>
> > >>> or even

Re: [PATCH] middle-end/114604 - ranger allocates bitmap without initialized obstack

2024-04-08 Thread Richard Biener



> Am 08.04.2024 um 18:40 schrieb Aldy Hernandez :
> 
> On Mon, Apr 8, 2024 at 6:29 PM Richard Biener  wrote:
>> 
>> 
>> 
>>>> Am 08.04.2024 um 18:09 schrieb Aldy Hernandez :
>>> 
>>> On Mon, Apr 8, 2024 at 5:54 PM Jakub Jelinek  wrote:
>>>> 
>>>> On Mon, Apr 08, 2024 at 05:40:23PM +0200, Aldy Hernandez wrote:
>>>>>>   PR middle-end/114604
>>>>>>   * gimple-range.cc (enable_ranger): Initialize the global
>>>>>>   bitmap obstack.
>>>>>>   (disable_ranger): Release it.
>>>>>> ---
>>>>>> gcc/gimple-range.cc | 4 
>>>>>> 1 file changed, 4 insertions(+)
>>>>>> 
>>>>>> diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
>>>>>> index c16b776c1e3..4d3b1ce8588 100644
>>>>>> --- a/gcc/gimple-range.cc
>>>>>> +++ b/gcc/gimple-range.cc
>>>>>> @@ -689,6 +689,8 @@ enable_ranger (struct function *fun, bool 
>>>>>> use_imm_uses)
>>>>>> {
>>>>>>  gimple_ranger *r;
>>>>>> 
>>>>>> +  bitmap_obstack_initialize (NULL);
>>>>>> +
>>>>>>  gcc_checking_assert (!fun->x_range_query);
>>>>>>  r = new gimple_ranger (use_imm_uses);
>>>>>>  fun->x_range_query = r;
>>>>>> @@ -705,6 +707,8 @@ disable_ranger (struct function *fun)
>>>>>>  gcc_checking_assert (fun->x_range_query);
>>>>>>  delete fun->x_range_query;
>>>>>>  fun->x_range_query = NULL;
>>>>>> +
>>>>>> +  bitmap_obstack_release (NULL);
>>>>> 
>>>>> Are you not allowed to initialize/use obstacks unless
>>>>> bitmap_obstack_initialize(NULL) is called?
>>>> 
>>>> You can use it with some other obstack, just not the default one.
>>>> 
>>>>> If so, wouldn't it be
>>>>> better to lazily initialize it downstream (bitmap_alloc, or whomever
>>>>> needs it initialized)?
>>>> 
>>>> No, you still need to decide where is the safe point to release it.
>>>> Unlike the non-default bitmap_obstack_initialize/bitmap_obstack_release,
>>>> the default one can nest (has associated nesting counter).  So, the above
>>>> patch just says that ranger starts using the default obstack in
>>>> enable_ranger and stops using it in disable_ranger and anything ranger
>>>> associated in the obstack can be freed at that point.
>>> 
>>> I thought ranger never used the default one:
>>> 
>>> $ grep bitmap_obstack_initialize *value* *range*
>>> value-relation.cc:  bitmap_obstack_initialize (_bitmaps);
>>> value-relation.cc:  bitmap_obstack_initialize (_bitmaps);
>>> gimple-range-cache.cc:  bitmap_obstack_initialize (_bitmaps);
>>> gimple-range-gori.cc:  bitmap_obstack_initialize (_bitmaps);
>>> gimple-range-infer.cc:  bitmap_obstack_initialize (_bitmaps);
>>> gimple-range-phi.cc:  bitmap_obstack_initialize (_bitmaps);
>>> 
>>> or even:
>>> 
>>> $ grep obstack.*NULL *value* *range*
>>> value-range-storage.cc:obstack_free (_obstack, NULL);
>>> value-relation.cc:  obstack_free (_chain_obstack, NULL);
>>> value-relation.cc:  obstack_free (_chain_obstack, NULL);
>>> gimple-range-infer.cc:  obstack_free (_list_obstack, NULL);
>>> value-range-storage.cc:obstack_free (_obstack, NULL);
>>> 
>>> I'm obviously missing something here.
>> 
>> Look for BITMAP_ALLOC (NULL) in the backtrace in the PR
> 
> Ahh!  Thanks.
> 
> A few default obstack uses snuck in while I wasn't looking.
> 
> $ grep BITMAP_ALLOC.*NULL *range*
> gimple-range-cache.cc:  m_propfail = BITMAP_ALLOC (NULL);
> gimple-range-cache.h:  inline ssa_lazy_cache () { active_p =
> BITMAP_ALLOC (NULL); }
> gimple-range.cc:  m_pop_list = BITMAP_ALLOC (NULL);
> 
> I wonder if it would be cleaner to just change these to use named obstacks.

I didn’t find any obvious obstack to use, but sure.  This was the easiest fix ;)

Richard 

> Andrew, is there a reason we were using the default obstack for these?
> For reference, they are  class update_list used in the ranger cache,
> ssa_lazy_cache, and dom_ranger.
> 
> Aldy
> 


Re: [PATCH] middle-end/114604 - ranger allocates bitmap without initialized obstack

2024-04-08 Thread Richard Biener



> Am 08.04.2024 um 18:09 schrieb Aldy Hernandez :
> 
> On Mon, Apr 8, 2024 at 5:54 PM Jakub Jelinek  wrote:
>> 
>> On Mon, Apr 08, 2024 at 05:40:23PM +0200, Aldy Hernandez wrote:
PR middle-end/114604
* gimple-range.cc (enable_ranger): Initialize the global
bitmap obstack.
(disable_ranger): Release it.
 ---
 gcc/gimple-range.cc | 4 
 1 file changed, 4 insertions(+)
 
 diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
 index c16b776c1e3..4d3b1ce8588 100644
 --- a/gcc/gimple-range.cc
 +++ b/gcc/gimple-range.cc
 @@ -689,6 +689,8 @@ enable_ranger (struct function *fun, bool use_imm_uses)
 {
   gimple_ranger *r;
 
 +  bitmap_obstack_initialize (NULL);
 +
   gcc_checking_assert (!fun->x_range_query);
   r = new gimple_ranger (use_imm_uses);
   fun->x_range_query = r;
 @@ -705,6 +707,8 @@ disable_ranger (struct function *fun)
   gcc_checking_assert (fun->x_range_query);
   delete fun->x_range_query;
   fun->x_range_query = NULL;
 +
 +  bitmap_obstack_release (NULL);
>>> 
>>> Are you not allowed to initialize/use obstacks unless
>>> bitmap_obstack_initialize(NULL) is called?
>> 
>> You can use it with some other obstack, just not the default one.
>> 
>>> If so, wouldn't it be
>>> better to lazily initialize it downstream (bitmap_alloc, or whomever
>>> needs it initialized)?
>> 
>> No, you still need to decide where is the safe point to release it.
>> Unlike the non-default bitmap_obstack_initialize/bitmap_obstack_release,
>> the default one can nest (has associated nesting counter).  So, the above
>> patch just says that ranger starts using the default obstack in
>> enable_ranger and stops using it in disable_ranger and anything ranger
>> associated in the obstack can be freed at that point.
> 
> I thought ranger never used the default one:
> 
> $ grep bitmap_obstack_initialize *value* *range*
> value-relation.cc:  bitmap_obstack_initialize (_bitmaps);
> value-relation.cc:  bitmap_obstack_initialize (_bitmaps);
> gimple-range-cache.cc:  bitmap_obstack_initialize (_bitmaps);
> gimple-range-gori.cc:  bitmap_obstack_initialize (_bitmaps);
> gimple-range-infer.cc:  bitmap_obstack_initialize (_bitmaps);
> gimple-range-phi.cc:  bitmap_obstack_initialize (_bitmaps);
> 
> or even:
> 
> $ grep obstack.*NULL *value* *range*
> value-range-storage.cc:obstack_free (_obstack, NULL);
> value-relation.cc:  obstack_free (_chain_obstack, NULL);
> value-relation.cc:  obstack_free (_chain_obstack, NULL);
> gimple-range-infer.cc:  obstack_free (_list_obstack, NULL);
> value-range-storage.cc:obstack_free (_obstack, NULL);
> 
> I'm obviously missing something here.

Look for BITMAP_ALLOC (NULL) in the backtrace in the PR

Richard 


> Aldy
> 


Re: [PATCH 3/3] tree-optimization/114052 - niter analysis from undefined behavior

2024-04-08 Thread Richard Biener
On Mon, 8 Apr 2024, Richard Biener wrote:

> On Fri, 5 Apr 2024, Jan Hubicka wrote:
> 
> > > +   /* When there's a call that might not return the last iteration
> > > +  is possibly partial.  This matches what we check in invariant
> > > +  motion.
> > > +  ???  For the call argument evaluation it would be still OK.  */
> > > +   if (!may_have_exited
> > > +   && is_gimple_call (stmt)
> > > +   && gimple_has_side_effects (stmt))
> > > + may_have_exited = true;
> > 
> > I think you are missing here non-call EH, volatile asms and traps.
> >  We have stmt_may_terminate_function_p which tests there.
> 
> That returns true for all variable array accesses, I think we want
> to catch traps explicitly here.  I'm going to do
> 
>   if (!may_have_exited
>   && (gimple_has_side_effects (stmt)
>   || stmt_can_throw_external (cfun, stmt)))
> may_have_exited = true;
> 
> that should cover all but the generic trapping and not use IPA info
> to prove no side-effects.

Hum.  Maybe I'm a bit confused but we seem to "properly" take things
into account via maybe_lower_iteration_bound and are not directly using
the recorded bounds?  The function does a DFS walk though, not
reliably finding exits via calls early enough (it also lacks external
EH).  Oddly enough it seems to handle(?) gcc.dg/tree-ssa/cunroll-9.c
"correctly", but not gcc.dg/tree-ssa/cunroll-10.c which has the
number of iterations wrongly computed.

Maybe we should really record all the bounds properly but merge them
to a loop upper bound at one place?  gcc.dg/tree-ssa/cunroll-11.c
needs to see the g_x[i] bound is enforced in all paths to the latch
for example.

I'm most definitely defering this to GCC 15 now, I wonder if you
have any preferences here (and maybe want to pick this up also
for cleanup - it's mostly your code).

Richard.

> Richard.
> 
> > Honza
> > > +
> > > +   infer_loop_bounds_from_array (loop, stmt,
> > > + reliable && !may_have_exited);
> > >  
> > > -   if (reliable)
> > > +   if (reliable && !may_have_exited)
> > >  {
> > >infer_loop_bounds_from_signedness (loop, stmt);
> > >infer_loop_bounds_from_pointer_arith (loop, stmt);
> > >  }
> > >   }
> > > -
> > >  }
> > >  }
> > >  
> > > @@ -4832,7 +4855,7 @@ estimate_numbers_of_iterations (class loop *loop)
> > >   diagnose those loops with -Waggressive-loop-optimizations.  */
> > >    number_of_latch_executions (loop);
> > >  
> > > -  basic_block *body = get_loop_body (loop);
> > > +  basic_block *body = get_loop_body_in_rpo (cfun, loop);
> > >auto_vec exits = get_loop_exit_edges (loop, body);
> > >likely_exit = single_likely_exit (loop, exits);
> > >FOR_EACH_VEC_ELT (exits, i, ex)
> > > -- 
> > > 2.35.3
> > 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 3/3] tree-optimization/114052 - niter analysis from undefined behavior

2024-04-08 Thread Richard Biener
On Fri, 5 Apr 2024, Jan Hubicka wrote:

> > + /* When there's a call that might not return the last iteration
> > +is possibly partial.  This matches what we check in invariant
> > +motion.
> > +???  For the call argument evaluation it would be still OK.  */
> > + if (!may_have_exited
> > + && is_gimple_call (stmt)
> > + && gimple_has_side_effects (stmt))
> > +   may_have_exited = true;
> 
> I think you are missing here non-call EH, volatile asms and traps.
>  We have stmt_may_terminate_function_p which tests there.

That returns true for all variable array accesses, I think we want
to catch traps explicitly here.  I'm going to do

  if (!may_have_exited
  && (gimple_has_side_effects (stmt)
  || stmt_can_throw_external (cfun, stmt)))
may_have_exited = true;

that should cover all but the generic trapping and not use IPA info
to prove no side-effects.

Richard.

> Honza
> > +
> > + infer_loop_bounds_from_array (loop, stmt,
> > +   reliable && !may_have_exited);
> >  
> > - if (reliable)
> > + if (reliable && !may_have_exited)
> >  {
> >infer_loop_bounds_from_signedness (loop, stmt);
> >infer_loop_bounds_from_pointer_arith (loop, stmt);
> >  }
> > }
> > -
> >  }
> >  }
> >  
> > @@ -4832,7 +4855,7 @@ estimate_numbers_of_iterations (class loop *loop)
> >   diagnose those loops with -Waggressive-loop-optimizations.  */
> >number_of_latch_executions (loop);
> >  
> > -  basic_block *body = get_loop_body (loop);
> > +  basic_block *body = get_loop_body_in_rpo (cfun, loop);
> >auto_vec exits = get_loop_exit_edges (loop, body);
> >likely_exit = single_likely_exit (loop, exits);
> >FOR_EACH_VEC_ELT (exits, i, ex)
> > -- 
> > 2.35.3
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] testsuite: Add profile_update_atomic check to gcov-20.c [PR114614]

2024-04-08 Thread Richard Biener
On Mon, Apr 8, 2024 at 11:23 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR114614 shows, the newly added test case gcov-20.c by
> commit r14-9789-g08a52331803f66 failed on targets which do
> not support atomic profile update, there would be a message
> like:
>
>   warning: target does not support atomic profile update,
>single mode is selected
>
> Since the test case adopts -fprofile-update=atomic, it
> requires effective target check profile_update_atomic, this
> patch is to add the check accordingly.
>
> Tested well on x86_64-redhat-linux, powerpc64-linux-gnu P8/P9
> and powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?

OK

> BR,
> Kewen
> -
> PR testsuite/114614
>
> gcc/testsuite/ChangeLog:
>
> * gcc.misc-tests/gcov-20.c: Add effective target check
> profile_update_atomic.
> ---
>  gcc/testsuite/gcc.misc-tests/gcov-20.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.misc-tests/gcov-20.c 
> b/gcc/testsuite/gcc.misc-tests/gcov-20.c
> index 215faffc980..ca8c12aad2b 100644
> --- a/gcc/testsuite/gcc.misc-tests/gcov-20.c
> +++ b/gcc/testsuite/gcc.misc-tests/gcov-20.c
> @@ -1,5 +1,6 @@
>  /* { dg-options "-fcondition-coverage -ftest-coverage 
> -fprofile-update=atomic" } */
>  /* { dg-do run { target native } } */
> +/* { dg-require-effective-target profile_update_atomic } */
>
>  /* Some side effect to stop branches from being pruned */
>  int x = 0;
> --
> 2.43.0


Re: [PATCH] rs6000: Fix wrong align passed to build_aligned_type [PR88309]

2024-04-08 Thread Richard Biener
On Mon, Apr 8, 2024 at 11:22 AM Kewen.Lin  wrote:
>
> Hi,
>
> As the comments in PR88309 show, there are two oversights
> in rs6000_gimple_fold_builtin that pass align in bytes to
> build_aligned_type but which actually requires align in
> bits, it causes unexpected ICE or hanging in function
> is_miss_rate_acceptable due to zero align_unit value.
>
> This patch is to fix them by converting bytes to bits, add
> an assertion on positive align_unit value and notes function
> build_aligned_type requires align measured in bits in its
> function comment.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9 and P10.
>
> Is it (the generic part code change) ok for trunk?

OK

> BR,
> Kewen
> -
> PR target/88309
>
> Co-authored-by: Andrew Pinski 
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Fix
> wrong align passed to function build_aligned_type.
> * tree-ssa-loop-prefetch.cc (is_miss_rate_acceptable): Add an
> assertion to ensure align_unit should be positive.
> * tree.cc (build_qualified_type): Update function comments.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr88309.c: New test.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc|  4 ++--
>  gcc/testsuite/gcc.target/powerpc/pr88309.c | 27 ++
>  gcc/tree-ssa-loop-prefetch.cc  |  2 ++
>  gcc/tree.cc|  3 ++-
>  4 files changed, 33 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr88309.c
>
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 6698274031b..e7d6204074c 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1900,7 +1900,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> tree lhs_type = TREE_TYPE (lhs);
> /* In GIMPLE the type of the MEM_REF specifies the alignment.  The
>   required alignment (power) is 4 bytes regardless of data type.  */
> -   tree align_ltype = build_aligned_type (lhs_type, 4);
> +   tree align_ltype = build_aligned_type (lhs_type, 32);
> /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
> Create
>the tree using the value from arg0.  The resulting type will match
>the type of arg1.  */
> @@ -1944,7 +1944,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> tree arg2_type = ptr_type_node;
> /* In GIMPLE the type of the MEM_REF specifies the alignment.  The
>required alignment (power) is 4 bytes regardless of data type.  */
> -   tree align_stype = build_aligned_type (arg0_type, 4);
> +   tree align_stype = build_aligned_type (arg0_type, 32);
> /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
> Create
>the tree using the value from arg1.  */
> gimple_seq stmts = NULL;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr88309.c 
> b/gcc/testsuite/gcc.target/powerpc/pr88309.c
> new file mode 100644
> index 000..c0078cf2b8c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr88309.c
> @@ -0,0 +1,27 @@
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-mvsx -O2 -fprefetch-loop-arrays" } */
> +
> +/* Verify there is no ICE or hanging.  */
> +
> +#include 
> +
> +void b(float *c, vector float a, vector float, vector float)
> +{
> +  vector float d;
> +  vector char ahbc;
> +  vec_xst(vec_perm(a, d, ahbc), 0, c);
> +}
> +
> +vector float e(vector unsigned);
> +
> +void f() {
> +  float *dst;
> +  int g = 0;
> +  for (;; g += 16) {
> +vector unsigned m, i;
> +vector unsigned n, j;
> +vector unsigned k, l;
> +b(dst + g * 3, e(m), e(n), e(k));
> +b(dst + (g + 4) * 3, e(i), e(j), e(l));
> +  }
> +}
> diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc
> index bbd98e03254..70073cc4fe4 100644
> --- a/gcc/tree-ssa-loop-prefetch.cc
> +++ b/gcc/tree-ssa-loop-prefetch.cc
> @@ -739,6 +739,8 @@ is_miss_rate_acceptable (unsigned HOST_WIDE_INT 
> cache_line_size,
>if (delta >= (HOST_WIDE_INT) cache_line_size)
>  return false;
>
> +  gcc_assert (align_unit > 0);
> +
>miss_positions = 0;
>total_positions = (cache_line_size / align_unit) * distinct_iters;
>max_allowed_miss_positions = (ACCEPTABLE_MISS_RATE * total_positions) / 
> 1000;
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index f801712c9dd..6f8400e6640 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -5689,7 +5689,8 @@ build_qualified_type (tree type, int type_quals 
> MEM_STAT_DECL)
>return t;
>  }
>
> -/* Create a variant of type T with alignment ALIGN.  */
> +/* Create a variant of type T with alignment ALIGN which
> +   is measured in bits.  */
>
>  tree
>  build_aligned_type (tree type, unsigned int align)
> --
> 2.43.0


[PATCH] middle-end/114604 - ranger allocates bitmap without initialized obstack

2024-04-08 Thread Richard Biener
The following fixes ranger bitmap allocation when invoked from IPA
context where the global bitmap obstack possibly isn't initialized.
Instead of trying to use one of the ranger obstacks the following
simply initializes the global bitmap obstack around an active ranger.

Bootstrapped and tested on x86_64-unknown-linux-gnu with bitmap_alloc
instrumentation (all ICEs gone, so ranger was the only offender).

OK?

Thanks,
Richard.

PR middle-end/114604
* gimple-range.cc (enable_ranger): Initialize the global
bitmap obstack.
(disable_ranger): Release it.
---
 gcc/gimple-range.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index c16b776c1e3..4d3b1ce8588 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -689,6 +689,8 @@ enable_ranger (struct function *fun, bool use_imm_uses)
 {
   gimple_ranger *r;
 
+  bitmap_obstack_initialize (NULL);
+
   gcc_checking_assert (!fun->x_range_query);
   r = new gimple_ranger (use_imm_uses);
   fun->x_range_query = r;
@@ -705,6 +707,8 @@ disable_ranger (struct function *fun)
   gcc_checking_assert (fun->x_range_query);
   delete fun->x_range_query;
   fun->x_range_query = NULL;
+
+  bitmap_obstack_release (NULL);
 }
 
 // 
-- 
2.35.3


[PATCH] tree-optimization/114624 - fix use-after-free in SCCP

2024-04-08 Thread Richard Biener
We're inspecting the replaced PHI node after releasing it.

Bootstrapped and tested on x86-64-unknown-linux-gnu, pushed.

PR tree-optimization/114624
* tree-scalar-evolution.cc (final_value_replacement_loop):
Get at the PHI arg location before releasing the PHI node.

* gcc.dg/torture/pr114624.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr114624.c | 20 
 gcc/tree-scalar-evolution.cc|  4 ++--
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr114624.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr114624.c 
b/gcc/testsuite/gcc.dg/torture/pr114624.c
new file mode 100644
index 000..ae031356982
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr114624.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+
+int a, b;
+int main() {
+  int c, d = 1;
+  while (a) {
+while (b)
+  if (d)
+while (a)
+  ;
+for (; b < 2; b++)
+  if (b)
+for (c = 0; c < 8; c++)
+  d = 0;
+  else
+for (a = 0; a < 2; a++)
+  ;
+  }
+  return 0;
+}
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 25e3130e2f1..b0a5e09a77c 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3877,6 +3877,7 @@ final_value_replacement_loop (class loop *loop)
 to a GIMPLE sequence or to a statement list (keeping this a
 GENERIC interface).  */
   def = unshare_expr (def);
+  auto loc = gimple_phi_arg_location (phi, exit->dest_idx);
   remove_phi_node (, false);
 
   /* Propagate constants immediately, but leave an unused initialization
@@ -3888,8 +3889,7 @@ final_value_replacement_loop (class loop *loop)
   gimple_seq stmts;
   def = force_gimple_operand (def, , false, NULL_TREE);
   gassign *ass = gimple_build_assign (rslt, def);
-  gimple_set_location (ass,
-  gimple_phi_arg_location (phi, exit->dest_idx));
+  gimple_set_location (ass, loc);
   gimple_seq_add_stmt (, ass);
 
   /* If def's type has undefined overflow and there were folded
-- 
2.35.3


Re: Combine patch ping

2024-04-07 Thread Richard Biener



> Am 01.04.2024 um 21:28 schrieb Uros Bizjak :
> 
> Hello!
> 
> I'd like to ping the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html
> PR112560 P1 patch.

Ok.

Thanks,
Richard 

> Thanks,
> Uros.


Re: [PATCH 0/2] Condition coverage fixes

2024-04-07 Thread Richard Biener



> Am 06.04.2024 um 22:41 schrieb Jørgen Kvalsvik :
> 
> On 06/04/2024 13:15, Jørgen Kvalsvik wrote:
>>> On 06/04/2024 07:50, Richard Biener wrote:
>>> 
>>> 
>>>> Am 05.04.2024 um 21:59 schrieb Jørgen Kvalsvik :
>>>> 
>>>> Hi,
>>>> 
>>>> I propose these fixes for the current issues with the condition
>>>> coverage.
>>>> 
>>>> Rainer, I propose to simply delete the test with __sigsetjmp. I don't
>>>> think it actually detects anything reasonable any more, I kept it around
>>>> to prevent a regression. Since then I have built a lot of programs (with
>>>> optimization enabled) and not really seen this problem.
>>>> 
>>>> H.J., the problem you found with -O2 was really a problem of
>>>> tree-inlining, which was actually caught earlier by Jan [1]. It probably
>>>> warrants some more testing, but I could reproduce by tuning your test
>>>> case to use always_inline and not -O2 and trigger the error.
>>>> 
>>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648785.html
>>> 
>>> Ok
>> Thanks, committed.
>> I am wondering if the fn->cond_uids access should always be guarded (in 
>> tree-profile.cc) should always be guarded. Right now there is the assumption 
>> that if condition coverage is requested the will exist and be populated, but 
>> as this shows there may be other circumstances where this is not true.
>> Or perhaps there should be a gcc_assert to (reliably) detect cases where the 
>> map is not constructed properly?
>> Thanks,
>> Jørgen
> 
> I gave this some more thought, and realised I was too eager to fix the 
> segfault. While trunk no longer crashes (at least on my x86_64 linux) the fix 
> itself is bad. It copies the gcond -> uid mappings into the caller, but the 
> stmts are deep copied into the caller, so no gcond will ever be a hit when we 
> look up the condition_uids in tree-profile.cc.
> 
> I did a very quick prototype to confirm. By applying this patch:
> 
> @@ -2049,6 +2049,9 @@ copy_bb (copy_body_data *id, basic_block bb,
> 
>   copy_gsi = gsi_start_bb (copy_basic_block);
> 
> +  if (!cfun->cond_uids && id->src_cfun->cond_uids)
> + cfun->cond_uids = new hash_map  ();
> +
>   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> {
>   gimple_seq stmts;
> @@ -2076,6 +2079,12 @@ copy_bb (copy_body_data *id, basic_block bb,
>  if (gimple_nop_p (stmt))
>  continue;
> 
> + if (id->src_cfun->cond_uids && is_a  (orig_stmt))
> +   {
> + unsigned *v = id->src_cfun->cond_uids->get (as_a 
> (orig_stmt));
> + if (v) cfun->cond_uids->put (as_a  (stmt), *v);
> +   }
> +
> 
> 
> and this test program:
> 
> __attribute__((always_inline))
> inline int
> inlinefn (int a)
> {
>if (a > 5)
>{
>printf ("a > 5\n");
>return a;
>}
>else
>printf ("a < 5, was %d\n", a);
>return a * a - 2;
> }
> 
> int
> mcdc027e (int a, int b)
> {
>int y = inlinefn (a);
>return y + b;
> }
> 
> 
> gcov reports:
> 
>2:   18:mcdc027e (int a, int b)
> condition outcomes covered 1/2
> condition  0 not covered (true)
>-:   19:{
>2:   20:int y = inlinefn (a);
>2:   21:return y + b;
>-:   22:}
> 
> but without the patch, gcov prints nothing.
> 
> I am not sure if this approach is even ideal. Probably the most problematic 
> is the source line mapping which is all messed up. I checked with gcov 
> --branch-probabilities and it too reports the callee at the top of the caller.
> 
> If you think it is a good strategy I can clean up the prototype and submit a 
> patch. I suppose the function _totals_ should be accurate, even if the source 
> mapping is a bit surprising.
> 
> What do you think? I am open to other strategies, too

I think the most important bit is that the segfault is gone.  The interaction 
of coverage with inlining or even other optimization when applying optimization 
to coverage should be documented better.

Does condition coverage apply ontop of regular coverage counting or is it an 
either/or?

Thanks,
Richard 

> Thanks,
> Jørgen
> 
>>> 
>>> Thanks,
>>> Richard
>>> 
>>> 
>>>> Thanks,
>>>> Jørgen
>>>> 
>>>> Jørgen Kvalsvik (2):
>>>>   Remove unecessary and broken MC/DC compile test
>>>>   Copy condition->expr map when inlining [PR114599]
>>>> 
>>>> gcc/testsuite/gcc.misc-tests/gcov-19.c   | 11 -
>>>> gcc/testsuite/gcc.misc-tests/gcov-pr114599.c | 25 
>>>> gcc/tree-inline.cc   | 20 +++-
>>>> 3 files changed, 44 insertions(+), 12 deletions(-)
>>>> create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr114599.c
>>>> 
>>>> --
>>>> 2.30.2
>>>> 
> 


Re: [pushed] aarch64: Fix bogus cnot optimisation [PR114603]

2024-04-06 Thread Richard Biener
On Fri, Apr 5, 2024 at 3:52 PM Richard Sandiford
 wrote:
>
> aarch64-sve.md had a pattern that combined:
>
> cmpeq   pb.T, pa/z, zc.T, #0
> mov zd.T, pb/z, #1
>
> into:
>
> cnotzd.T, pa/m, zc.T
>
> But this is only valid if pa.T is a ptrue.  In other cases, the
> original would set inactive elements of zd.T to 0, whereas the
> combined form would copy elements from zc.T.
>
> This isn't a regression on a known testcase.  However, it's a nasty
> wrong code bug that could conceivably trigger for autovec code (although
> I've not been able to construct a reproducer so far).  That fix is also
> quite localised to the buggy operation.  I'd therefore prefer to push
> the fix now rather than wait for GCC 15.

wrong-code bugs (and also rejects-valid or ice-on-valid) are always exempt
from the regression-only fixing.  In practice every such bug will be a
regression,
in this case to when the combining pattern was introduced (unless that was
with the version with the initial introduction of the port of course).

Richard.

> Tested on aarch64-linux-gnu & pushed.  I'll backport to branches if
> there is no fallout.
>
> Richard
>
> gcc/
> PR target/114603
> * config/aarch64/aarch64-sve.md (@aarch64_pred_cnot): Replace
> with...
> (@aarch64_ptrue_cnot): ...this, requiring operand 1 to be
> a ptrue.
> (*cnot): Require operand 1 to be a ptrue.
> * config/aarch64/aarch64-sve-builtins-base.cc (svcnot_impl::expand):
> Use aarch64_ptrue_cnot for _x operations that are predicated
> with a ptrue.  Represent other _x operations as fully-defined _m
> operations.
>
> gcc/testsuite/
> PR target/114603
> * gcc.target/aarch64/sve/acle/general/cnot_1.c: New test.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  | 25 ---
>  gcc/config/aarch64/aarch64-sve.md | 22 
>  .../aarch64/sve/acle/general/cnot_1.c | 23 +
>  3 files changed, 50 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cnot_1.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 257ca5bf6ad..5be2315a3c6 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -517,15 +517,22 @@ public:
>expand (function_expander ) const override
>{
>  machine_mode mode = e.vector_mode (0);
> -if (e.pred == PRED_x)
> -  {
> -   /* The pattern for CNOT includes an UNSPEC_PRED_Z, so needs
> -  a ptrue hint.  */
> -   e.add_ptrue_hint (0, e.gp_mode (0));
> -   return e.use_pred_x_insn (code_for_aarch64_pred_cnot (mode));
> -  }
> -
> -return e.use_cond_insn (code_for_cond_cnot (mode), 0);
> +machine_mode pred_mode = e.gp_mode (0);
> +/* The underlying _x pattern is effectively:
> +
> +dst = src == 0 ? 1 : 0
> +
> +   rather than an UNSPEC_PRED_X.  Using this form allows autovec
> +   constructs to be matched by combine, but it means that the
> +   predicate on the src == 0 comparison must be all-true.
> +
> +   For simplicity, represent other _x operations as fully-defined _m
> +   operations rather than using a separate bespoke pattern.  */
> +if (e.pred == PRED_x
> +   && gen_lowpart (pred_mode, e.args[0]) == CONSTM1_RTX (pred_mode))
> +  return e.use_pred_x_insn (code_for_aarch64_ptrue_cnot (mode));
> +return e.use_cond_insn (code_for_cond_cnot (mode),
> +   e.pred == PRED_x ? 1 : 0);
>}
>  };
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index eca8623e587..0434358122d 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -3363,24 +3363,24 @@ (define_insn_and_split 
> "trunc2"
>  ;; - CNOT
>  ;; -
>
> -;; Predicated logical inverse.
> -(define_expand "@aarch64_pred_cnot"
> +;; Logical inverse, predicated with a ptrue.
> +(define_expand "@aarch64_ptrue_cnot"
>[(set (match_operand:SVE_FULL_I 0 "register_operand")
> (unspec:SVE_FULL_I
>   [(unspec:
>  [(match_operand: 1 "register_operand")
> - (match_operand:SI 2 "aarch64_sve_ptrue_flag")
> + (const_int SVE_KNOWN_PTRUE)
>   (eq:
> -   (match_operand:SVE_FULL_I 3 "register_operand")
> -   (match_dup 4))]
> +   (match_operand:SVE_FULL_I 2 "register_operand")
> +   (match_dup 3))]
>  UNSPEC_PRED_Z)
> -  (match_dup 5)
> -  (match_dup 4)]
> +  (match_dup 4)
> +  (match_dup 3)]
>   UNSPEC_SEL))]
>"TARGET_SVE"
>{
> -operands[4] = CONST0_RTX (mode);
> -operands[5] = CONST1_RTX (mode);
> +

Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination

2024-04-06 Thread Richard Biener
On Fri, Apr 5, 2024 at 11:29 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
> > The following avoids re-walking and re-combining the instructions
> > between i2 and i3 when the pattern of i2 doesn't change.
> >
> > Bootstrap and regtest running ontop of a reversal of
> > r14-9692-g839bc42772ba7a.
>
> Please include that in the patch (or series, preferably).

I'd like reversal to be considered independently of this patch which is why
I didn't include the reversal.  Of course without reversal this patch doesn't
make sense.

> > It brings down memory use frmo 9GB to 400MB and compile-time from
> > 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
> > but has shown code generation regressions across acrchitectures.
> >
> > OK to revert r14-9692-g839bc42772ba7a?
>
> No.
>
> The patch solved a very real problem.  How does your replacement handle
> that?  You don't say.  It looks like it only battles symptoms a bit,
> instead :-(

My patch isn't a replacement for your solution.  Reversal is to address
the P1 regressions caused by the change.  My change offers to address
some of the symptoms shown with the testcase without disabling the
offending 2->2 combinations.

> We had this before: 3->2 combinations that leave an instruction
> identical to what was there before.  This was just a combination with
> context as well.  The only reason this wasn't a huge problem then
> already was because this is a 3->2 combination, even if it really is a
> 2->1 one it still is beneficial in all the same cases.  But in the new
> case it can iterate indefinitely -- well not quite, but some polynomial
> number of times, for a polynomial at least of degree three, possibly
> more :-(
>
> With this patch you need to show combine still is linear.  I don't think
> it is, but some deeper analysis might show it still is.

We have come to different conclusions as to what the issue is that the
testcase exposes in combine.  While you spotted a transform that
combine shouldn't have done (and I think I agree on that!) I identified
the fact that while the number of successful combines is linear in the
number of log-links (and thus linear in the size of a basic-block), the
number of combination _attempts_ appears to be O(log-links) times
O(successful combinations).  The testcase hits hard on this because of
those 2->2 combinations done but this problem persists with all N->2
combinations.  This "quadraticness" (I know you don't like to call it that
way) is because for each successful N->2 combination of insns
{I2, ..  .., I1} we try combining into all insns
between (and including) I2 and I1.  combine wants to retry combining
into I2 rightfully so but there's no good reason to retry _all_ of the
n intervening insns.  Yes, we want to retry all insns that have their
log-links point to I2 (and possibly more for second-level references,
dependent on how combine exactly identifies I2, I3 and I4).  Re-trying
all of them obviously works, but there's your quadraticness.

My patch makes this problem a little bit (in general) less often hit
since it avoids retrying iff I2 did not change.  I _think_ in that case
the log-links/notes shouldn't have changed either.  In any case I'm
disabling less of combine than what your patch did.

So again - is it OK to revert your patch?  Or do you expect you can
address the code generation regressions within the next week?  Since
the original issue is quite old postponing your solution to the next stage1
should be acceptable.  Currently those regressions will block the release
of GCC 14.

Do you agree that the patch I propose, while it doesn't solve any actual
issue (it doesn't fix the walking scheme nor does it avoid the combinations
combine shouldn't do), it helps in some cases and shouldn't cause code
generation regressions that your patch wouldn't have caused as well?
So is that change OK until we get your real solution implemented in a way
that doesn't cause regressions?

Thanks,
Richard.


Re: [PATCH 0/2] Condition coverage fixes

2024-04-05 Thread Richard Biener



> Am 05.04.2024 um 21:59 schrieb Jørgen Kvalsvik :
> 
> Hi,
> 
> I propose these fixes for the current issues with the condition
> coverage.
> 
> Rainer, I propose to simply delete the test with __sigsetjmp. I don't
> think it actually detects anything reasonable any more, I kept it around
> to prevent a regression. Since then I have built a lot of programs (with
> optimization enabled) and not really seen this problem.
> 
> H.J., the problem you found with -O2 was really a problem of
> tree-inlining, which was actually caught earlier by Jan [1]. It probably
> warrants some more testing, but I could reproduce by tuning your test
> case to use always_inline and not -O2 and trigger the error.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648785.html

Ok

Thanks,
Richard 


> Thanks,
> Jørgen
> 
> Jørgen Kvalsvik (2):
>  Remove unecessary and broken MC/DC compile test
>  Copy condition->expr map when inlining [PR114599]
> 
> gcc/testsuite/gcc.misc-tests/gcov-19.c   | 11 -
> gcc/testsuite/gcc.misc-tests/gcov-pr114599.c | 25 
> gcc/tree-inline.cc   | 20 +++-
> 3 files changed, 44 insertions(+), 12 deletions(-)
> create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr114599.c
> 
> --
> 2.30.2
> 


Re: [PATCH] middle-end/114599 - fix bitmap allocation for check_ifunc_callee_symtab_nodes

2024-04-05 Thread Richard Biener



> Am 05.04.2024 um 15:46 schrieb H.J. Lu :
> 
> On Fri, Apr 5, 2024 at 1:21 AM Richard Biener  wrote:
>> 
>> There's no default bitmap obstack during global CTORs, so allocate the
>> bitmap locally.
>> 
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>> 
>> Richard.
>> 
>>PR middle-end/114599
>>* symtab.cc (ifunc_ref_map): Do not use auto_bitmap.
>>(is_caller_ifunc_resolver): Optimize bitmap_bit_p/bitmap_set_bit
>>pair.
>>(symtab_node::check_ifunc_callee_symtab_nodes): Properly
>>allocate ifunc_ref_map here.
>> ---
>> gcc/symtab.cc | 11 +++
>> 1 file changed, 7 insertions(+), 4 deletions(-)
>> 
>> diff --git a/gcc/symtab.cc b/gcc/symtab.cc
>> index 3256133891d..3b018ab3ea2 100644
>> --- a/gcc/symtab.cc
>> +++ b/gcc/symtab.cc
>> @@ -1383,7 +1383,7 @@ check_ifunc_resolver (cgraph_node *node, void *data)
>>   return false;
>> }
>> 
>> -static auto_bitmap ifunc_ref_map;
>> +static bitmap ifunc_ref_map;
>> 
>> /* Return true if any caller of NODE is an ifunc resolver.  */
>> 
>> @@ -1404,9 +1404,8 @@ is_caller_ifunc_resolver (cgraph_node *node)
>> 
>>   /* Skip if it has been visited.  */
>>   unsigned int uid = e->caller->get_uid ();
>> -  if (bitmap_bit_p (ifunc_ref_map, uid))
>> +  if (!bitmap_set_bit (ifunc_ref_map, uid))
>>continue;
>> -  bitmap_set_bit (ifunc_ref_map, uid);
>> 
>>   if (is_caller_ifunc_resolver (e->caller))
>>{
>> @@ -1437,6 +1436,9 @@ symtab_node::check_ifunc_callee_symtab_nodes (void)
>> {
>>   symtab_node *node;
>> 
>> +  bitmap_obstack_initialize (NULL);
>> +  ifunc_ref_map = BITMAP_ALLOC (NULL);
>> +
>>   FOR_EACH_SYMBOL (node)
>> {
>>   cgraph_node *cnode = dyn_cast  (node);
>> @@ -1455,7 +1457,8 @@ symtab_node::check_ifunc_callee_symtab_nodes (void)
>>cnode->called_by_ifunc_resolver = true;
>> }
>> 
>> -  bitmap_clear (ifunc_ref_map);
>> +  BITMAP_FREE (ifunc_ref_map);
>> +  bitmap_obstack_release (NULL);
>> }
>> 
>> /* Verify symbol table for internal consistency.  */
>> --
>> 2.35.3
> 
> The bug isn't fixed:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114599#c5

Ah, I reproduced with -coverage and that is fixed now.  The still existing bug 
must be something unrelated.

Richard 

> 
> --
> H.J.


Re: [PATCH 3/3] tree-optimization/114052 - niter analysis from undefined behavior

2024-04-05 Thread Richard Biener
On Fri, 5 Apr 2024, Richard Biener wrote:

> The following makes sure to only compute upper bounds for the number
> of iterations of loops from undefined behavior invoked by stmts when
> those are executed in each loop iteration, in particular also in the
> last one.  The latter cannot be guaranteed if there's possible
> infinite loops or calls with side-effects possibly executed before
> the stmt.  Rather than adjusting the bound by one or using the bound as
> estimate the following for now gives up.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

It FAILs

FAIL: gcc.dg/pr53265.c  at line 91 (test for warnings, line 90)
FAIL: gcc.dg/pr53265.c  at line 92 (test for warnings, line 90)

for diagnostic purposes we'd need to treat the call as not terminating

FAIL: gcc.dg/tree-ssa/cunroll-10.c scan-tree-dump-times cunroll 
"Forced statement unreachable" 2
FAIL: gcc.dg/tree-ssa/cunroll-11.c scan-tree-dump cunroll "Loop 1 
iterates at most 3 times"
FAIL: gcc.dg/tree-ssa/cunroll-9.c scan-tree-dump-times cunrolli 
"Removed pointless exit:" 1
FAIL: gcc.dg/tree-ssa/loop-38.c scan-tree-dump cunrolli "Loop 1 
iterates at most 11 times"
FAIL: gcc.dg/tree-ssa/pr68234.c scan-tree-dump vrp2 ">> 6"
FAIL: c-c++-common/ubsan/unreachable-3.c   -O0   scan-tree-dump 
optimized "__builtin___ubsan_handle_builtin_unreachable"
...
FAIL: c-c++-common/ubsan/unreachable-3.c   -Os   scan-tree-dump 
optimized "__builtin___ubsan_handle_builtin_unreachable"


>   PR tree-optimization/114052
>   * tree-ssa-loop-niter.cc (infer_loop_bounds_from_undefined):
>   When we enter a possibly infinite loop or when we come across
>   a call with side-effects record the last iteration might not
>   execute all stmts.  Consider bounds as unreliable in that case.
> 
>   * gcc.dg/tree-ssa/pr114052-1.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c | 16 ++
>  gcc/tree-ssa-loop-niter.cc | 35 ++
>  2 files changed, 45 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c
> new file mode 100644
> index 000..54a2181e67e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int foo(void)
> +{
> +  int counter = 0;
> +  while (1)
> +{
> +  if (counter >= 2)
> + continue;
> +  __builtin_printf("%i\n", counter++);
> +}
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "unreachable" "optimized" } } */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 0a77c1bb544..52a39eb3500 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -4397,7 +4397,7 @@ infer_loop_bounds_from_undefined (class loop *loop, 
> basic_block *bbs)
>unsigned i;
>gimple_stmt_iterator bsi;
>basic_block bb;
> -  bool reliable;
> +  bool may_have_exited = false;
>  
>for (i = 0; i < loop->num_nodes; i++)
>  {
> @@ -4407,21 +4407,44 @@ infer_loop_bounds_from_undefined (class loop *loop, 
> basic_block *bbs)
>use the operations in it to infer reliable upper bound on the
># of iterations of the loop.  However, we can use it as a guess. 
>Reliable guesses come only from array bounds.  */
> -  reliable = dominated_by_p (CDI_DOMINATORS, loop->latch, bb);
> +  bool reliable = dominated_by_p (CDI_DOMINATORS, loop->latch, bb);
> +
> +  /* A possibly infinite inner loop makes further blocks not always
> +  executed.  Key on the entry of such a loop as that avoids RPO
> +  issues with where the exits of that loop are.  Any block
> +  inside an irreducible sub-region is problematic as well.
> +  ???  Note this technically only makes the last iteration
> +  possibly partially executed.  */
> +  if (!may_have_exited
> +   && bb != loop->header
> +   && (!loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)
> +   || bb->flags & BB_IRREDUCIBLE_LOOP
> +   || (bb->loop_father->header == bb
> +   && !finite_loop_p (bb->loop_father
> + may_have_exited = true;
>  
>for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next ())
>   {
> gimple *stmt = gsi_stmt (bsi);
>  
> -   infer_loop_bounds_fro

[PATCH 3/3] tree-optimization/114052 - niter analysis from undefined behavior

2024-04-05 Thread Richard Biener
The following makes sure to only compute upper bounds for the number
of iterations of loops from undefined behavior invoked by stmts when
those are executed in each loop iteration, in particular also in the
last one.  The latter cannot be guaranteed if there's possible
infinite loops or calls with side-effects possibly executed before
the stmt.  Rather than adjusting the bound by one or using the bound as
estimate the following for now gives up.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/114052
* tree-ssa-loop-niter.cc (infer_loop_bounds_from_undefined):
When we enter a possibly infinite loop or when we come across
a call with side-effects record the last iteration might not
execute all stmts.  Consider bounds as unreliable in that case.

* gcc.dg/tree-ssa/pr114052-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c | 16 ++
 gcc/tree-ssa-loop-niter.cc | 35 ++
 2 files changed, 45 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c
new file mode 100644
index 000..54a2181e67e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr114052-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(void)
+{
+  int counter = 0;
+  while (1)
+{
+  if (counter >= 2)
+   continue;
+  __builtin_printf("%i\n", counter++);
+}
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "unreachable" "optimized" } } */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 0a77c1bb544..52a39eb3500 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4397,7 +4397,7 @@ infer_loop_bounds_from_undefined (class loop *loop, 
basic_block *bbs)
   unsigned i;
   gimple_stmt_iterator bsi;
   basic_block bb;
-  bool reliable;
+  bool may_have_exited = false;
 
   for (i = 0; i < loop->num_nodes; i++)
 {
@@ -4407,21 +4407,44 @@ infer_loop_bounds_from_undefined (class loop *loop, 
basic_block *bbs)
 use the operations in it to infer reliable upper bound on the
 # of iterations of the loop.  However, we can use it as a guess. 
 Reliable guesses come only from array bounds.  */
-  reliable = dominated_by_p (CDI_DOMINATORS, loop->latch, bb);
+  bool reliable = dominated_by_p (CDI_DOMINATORS, loop->latch, bb);
+
+  /* A possibly infinite inner loop makes further blocks not always
+executed.  Key on the entry of such a loop as that avoids RPO
+issues with where the exits of that loop are.  Any block
+inside an irreducible sub-region is problematic as well.
+???  Note this technically only makes the last iteration
+possibly partially executed.  */
+  if (!may_have_exited
+ && bb != loop->header
+ && (!loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)
+ || bb->flags & BB_IRREDUCIBLE_LOOP
+ || (bb->loop_father->header == bb
+ && !finite_loop_p (bb->loop_father
+   may_have_exited = true;
 
   for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next ())
{
  gimple *stmt = gsi_stmt (bsi);
 
- infer_loop_bounds_from_array (loop, stmt, reliable);
+ /* When there's a call that might not return the last iteration
+is possibly partial.  This matches what we check in invariant
+motion.
+???  For the call argument evaluation it would be still OK.  */
+ if (!may_have_exited
+ && is_gimple_call (stmt)
+ && gimple_has_side_effects (stmt))
+   may_have_exited = true;
+
+ infer_loop_bounds_from_array (loop, stmt,
+   reliable && !may_have_exited);
 
- if (reliable)
+ if (reliable && !may_have_exited)
 {
   infer_loop_bounds_from_signedness (loop, stmt);
   infer_loop_bounds_from_pointer_arith (loop, stmt);
 }
}
-
 }
 }
 
@@ -4832,7 +4855,7 @@ estimate_numbers_of_iterations (class loop *loop)
  diagnose those loops with -Waggressive-loop-optimizations.  */
   number_of_latch_executions (loop);
 
-  basic_block *body = get_loop_body (loop);
+  basic_block *body = get_loop_body_in_rpo (cfun, loop);
   auto_vec exits = get_loop_exit_edges (loop, body);
   likely_exit = single_likely_exit (loop, exits);
   FOR_EACH_VEC_ELT (exits, i, ex)
-- 
2.35.3


[PATCH 2/3] Add get_loop_body_in_rpo

2024-04-05 Thread Richard Biener
The following adds another get_loop_body variant, one to get blocks
in RPO.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* cfgloop.h (get_loop_body_in_rpo): Declare.
* cfgloop.cc (get_loop_body_in_rpo): Compute loop body in RPO.
---
 gcc/cfgloop.cc | 68 ++
 gcc/cfgloop.h  |  1 +
 2 files changed, 69 insertions(+)

diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index 5202c3865d1..d79a006554f 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -1021,6 +1021,74 @@ get_loop_body_in_bfs_order (const class loop *loop)
   return blocks;
 }
 
+/* Get the body of LOOP in FN in reverse post order.  */
+
+basic_block *
+get_loop_body_in_rpo (function *fn, const class loop *loop)
+{
+  auto_vec stack (loop->num_nodes + 1);
+  auto_bb_flag visited (fn);
+
+  basic_block *blocks = XNEWVEC (basic_block, loop->num_nodes);
+  int rev_post_order_num = loop->num_nodes - 1;
+
+  /* Find a block leading to the loop header.  */
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+if (!flow_bb_inside_loop_p (loop, e->src))
+  break;
+  basic_block preheader = e->src;
+
+  stack.quick_push (ei_start (preheader->succs));
+
+  while (!stack.is_empty ())
+{
+  basic_block src;
+  basic_block dest;
+
+  /* Look at the edge on the top of the stack.  */
+  edge_iterator ei = stack.last ();
+  src = ei_edge (ei)->src;
+  dest = ei_edge (ei)->dest;
+
+  /* Check if the edge destination has been visited yet.  */
+  if (flow_bb_inside_loop_p (loop, dest)
+ && ! (dest->flags & visited))
+   {
+ /* Mark that we have visited the destination.  */
+ dest->flags |= visited;
+
+ if (EDGE_COUNT (dest->succs) > 0)
+   /* Since the DEST node has been visited for the first
+  time, check its successors.  */
+   stack.quick_push (ei_start (dest->succs));
+ else
+   /* There are no successors for the DEST node so record
+  the block.  */
+   blocks[rev_post_order_num--] = dest;
+   }
+  else
+   {
+ if (ei_one_before_end_p (ei)
+ && src != preheader)
+   /* There are no more successors for the SRC node
+  so record the block.  */
+   blocks[rev_post_order_num--] = src;
+
+ if (!ei_one_before_end_p (ei))
+   ei_next ( ());
+ else
+   stack.pop ();
+   }
+}
+
+  for (int i = rev_post_order_num + 1; i < (int) loop->num_nodes; ++i)
+blocks[i]->flags &= ~visited;
+
+  return blocks;
+}
+
 /* Hash function for struct loop_exit.  */
 
 hashval_t
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 30b5e40d0d9..42f3079102d 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -385,6 +385,7 @@ extern basic_block *get_loop_body_in_custom_order (const 
class loop *,
   int (*) (const void *, const void *));
 extern basic_block *get_loop_body_in_custom_order (const class loop *, void *,
   int (*) (const void *, const void *, void *));
+extern basic_block *get_loop_body_in_rpo (function *, const class loop *);
 
 extern auto_vec get_loop_exit_edges (const class loop *, basic_block * = 
NULL);
 extern edge single_exit (const class loop *);
-- 
2.35.3



[PATCH 1/3] Pass reliable down to infer_loop_bounds_from_array

2024-04-05 Thread Richard Biener
The following passes down whether a stmt is always executed from
infer_loop_bounds_from_undefined to infer_loop_bounds_from_array.
The parameters were already documented.  The patch doesn't remove
possibly redundant checks from idx_infer_loop_bounds yet.

Boostrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-ssa-loop-niter.cc (ilb_data::reliable): New.
(idx_infer_loop_bounds): Initialize upper from reliable.
(infer_loop_bounds_from_ref): Get and pass through reliable flag.
(infer_loop_bounds_from_array): Likewise.
(infer_loop_bounds_from_undefined): Pass reliable flag to
infer_loop_bounds_from_array.
---
 gcc/tree-ssa-loop-niter.cc | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index c6d010f6d89..0a77c1bb544 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -4123,6 +4123,7 @@ struct ilb_data
 {
   class loop *loop;
   gimple *stmt;
+  bool reliable;
 };
 
 static bool
@@ -4131,7 +4132,7 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
   struct ilb_data *data = (struct ilb_data *) dta;
   tree ev, init, step;
   tree low, high, type, next;
-  bool sign, upper = true, has_flexible_size = false;
+  bool sign, upper = data->reliable, has_flexible_size = false;
   class loop *loop = data->loop;
 
   if (TREE_CODE (base) != ARRAY_REF)
@@ -4224,12 +4225,14 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
STMT is guaranteed to be executed in every iteration of LOOP.*/
 
 static void
-infer_loop_bounds_from_ref (class loop *loop, gimple *stmt, tree ref)
+infer_loop_bounds_from_ref (class loop *loop, gimple *stmt, tree ref,
+   bool reliable)
 {
   struct ilb_data data;
 
   data.loop = loop;
   data.stmt = stmt;
+  data.reliable = reliable;
   for_each_index (, idx_infer_loop_bounds, );
 }
 
@@ -4238,7 +4241,7 @@ infer_loop_bounds_from_ref (class loop *loop, gimple 
*stmt, tree ref)
executed in every iteration of LOOP.  */
 
 static void
-infer_loop_bounds_from_array (class loop *loop, gimple *stmt)
+infer_loop_bounds_from_array (class loop *loop, gimple *stmt, bool reliable)
 {
   if (is_gimple_assign (stmt))
 {
@@ -4248,10 +4251,10 @@ infer_loop_bounds_from_array (class loop *loop, gimple 
*stmt)
   /* For each memory access, analyze its access function
 and record a bound on the loop iteration domain.  */
   if (REFERENCE_CLASS_P (op0))
-   infer_loop_bounds_from_ref (loop, stmt, op0);
+   infer_loop_bounds_from_ref (loop, stmt, op0, reliable);
 
   if (REFERENCE_CLASS_P (op1))
-   infer_loop_bounds_from_ref (loop, stmt, op1);
+   infer_loop_bounds_from_ref (loop, stmt, op1, reliable);
 }
   else if (is_gimple_call (stmt))
 {
@@ -4260,13 +4263,13 @@ infer_loop_bounds_from_array (class loop *loop, gimple 
*stmt)
 
   lhs = gimple_call_lhs (stmt);
   if (lhs && REFERENCE_CLASS_P (lhs))
-   infer_loop_bounds_from_ref (loop, stmt, lhs);
+   infer_loop_bounds_from_ref (loop, stmt, lhs, reliable);
 
   for (i = 0; i < n; i++)
{
  arg = gimple_call_arg (stmt, i);
  if (REFERENCE_CLASS_P (arg))
-   infer_loop_bounds_from_ref (loop, stmt, arg);
+   infer_loop_bounds_from_ref (loop, stmt, arg, reliable);
}
 }
 }
@@ -4410,7 +4413,7 @@ infer_loop_bounds_from_undefined (class loop *loop, 
basic_block *bbs)
{
  gimple *stmt = gsi_stmt (bsi);
 
- infer_loop_bounds_from_array (loop, stmt);
+ infer_loop_bounds_from_array (loop, stmt, reliable);
 
  if (reliable)
 {
-- 
2.35.3



Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-04-05 Thread Richard Biener
On Fri, Apr 5, 2024 at 2:28 PM Manolis Tsamis  wrote:
>
> If we consider code like:
>
> if (bar1 == x)
>   return foo();
> if (bar2 != y)
>   return foo();
> return 0;
>
> We would like the ifcombine pass to convert this to:
>
> if (bar1 == x || bar2 != y)
>   return foo();
> return 0;
>
> The ifcombine pass can handle this transformation but it is ran very early and
> it misses the opportunity because there are two seperate blocks for foo().
> The pre pass is good at removing duplicate code and blocks and due to that
> running ifcombine again after it can increase the number of successful
> conversions.
>
> PR 102793
>
> gcc/ChangeLog:
>
> * common.opt: -ftree-ifcombine option, enabled by default.
> * doc/invoke.texi: Document.
> * passes.def: Re-run ssa-ifcombine after pre.
> * tree-ssa-ifcombine.cc: Make ifcombine cloneable. Add gate function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/20030922-2.c: Change flag to -fno-tree-ifcombine.
> * gcc.dg/uninit-pred-6_c.c: Remove inconsistent check.
> * gcc.target/aarch64/pr102793.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
>  gcc/common.opt  |  4 +++
>  gcc/doc/invoke.texi |  5 
>  gcc/passes.def  |  1 +
>  gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c  |  2 +-
>  gcc/testsuite/gcc.dg/uninit-pred-6_c.c  |  4 ---
>  gcc/testsuite/gcc.target/aarch64/pr102793.c | 30 +
>  gcc/tree-ssa-ifcombine.cc   |  5 
>  7 files changed, 46 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102793.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index ad348844775..e943202bcf1 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3163,6 +3163,10 @@ ftree-phiprop
>  Common Var(flag_tree_phiprop) Init(1) Optimization
>  Enable hoisting loads from conditional pointers.
>
> +ftree-ifcombine

Please don't add further -ftree-X flags, 'tree' means nothing
to users.  -fif-combine would be better.

> +Common Var(flag_tree_ifcombine) Init(1) Optimization
> +Merge some conditional branches to simplify control flow.
> +
>  ftree-pre
>  Common Var(flag_tree_pre) Optimization
>  Enable SSA-PRE optimization on trees.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e2edf7a6c13..8d2ff6b4512 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13454,6 +13454,11 @@ This flag is enabled by default at @option{-O1} and 
> higher.
>  Perform hoisting of loads from conditional pointers on trees.  This
>  pass is enabled by default at @option{-O1} and higher.
>
> +@opindex ftree-ifcombine
> +@item -ftree-ifcombine
> +Merge some conditional branches to simplify control flow.  This pass
> +is enabled by default at @option{-O1} and higher.
> +
>  @opindex fhoist-adjacent-loads
>  @item -fhoist-adjacent-loads
>  Speculatively hoist loads from both branches of an if-then-else if the
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1cbbd413097..1765b476131 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -270,6 +270,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_lim);
>NEXT_PASS (pass_walloca, false);
>NEXT_PASS (pass_pre);
> +  NEXT_PASS (pass_tree_ifcombine);
>NEXT_PASS (pass_sink_code, false /* unsplit edges */);

Please move it here, after sinking.

>NEXT_PASS (pass_sancov);
>NEXT_PASS (pass_asan);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> index 16c79da9521..66c9f481a2f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O1 -fdump-tree-dom2 -fdisable-tree-ifcombine" } */
> +/* { dg-options "-O1 -fdump-tree-dom2 -fno-tree-ifcombine" } */
>
>  struct rtx_def;
>  typedef struct rtx_def *rtx;
> diff --git a/gcc/testsuite/gcc.dg/uninit-pred-6_c.c 
> b/gcc/testsuite/gcc.dg/uninit-pred-6_c.c
> index f60868dad23..2d8e6501a45 100644
> --- a/gcc/testsuite/gcc.dg/uninit-pred-6_c.c
> +++ b/gcc/testsuite/gcc.dg/uninit-pred-6_c.c
> @@ -20,10 +20,6 @@ int foo (int n, int l, int m, int r)
>if ( (n > 10) && l)
>blah(v); /* { dg-bogus "uninitialized" "bogus warning" } */
>
> -  if (l)
> -if (n > 12)
> -  blah(v); /* { dg-bogus "uninitialized" "bogus warning" } */
> -

What's "inconsistent" about this check?  I suppose we now diagnose this?
The appropriate way would be to XFAIL this but I'd like you to explain
why we now diagnose this (I don't see obvious if-combining opportunities).

On a general note you rely on the tail-merging pass which is part of PRE
and which hasn't seen any love and which isn't very powerful either.  I'm not
sure it's worth doing if-combining on the whole IL again because of it.
It might 

Re: [PATCH] vect: Don't clear base_misaligned in update_epilogue_loop_vinfo [PR114566]

2024-04-05 Thread Richard Biener
On Fri, 5 Apr 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled, because in the vectorized
> epilogue the vectorizer assumes it can use aligned loads/stores
> (if the base decl gets alignment increased), but it actually doesn't
> increase that.
> This is because r10-4203-g97c1460367 added the hunk following
> patch removes.  The explanation feels reasonable, but actually it
> is not true as the testcase proves.
> The thing is, we vectorize the main loop with 64-byte vectors
> and the corresponding data refs have base_alignment 16 (the
> a array has DECL_ALIGN 128) and offset_alignment 32.  Now, because
> of the offset_alignment 32 rather than 64, we need to use unaligned
> loads/stores in the main loop (and ditto in the first load/store
> in vectorized epilogue).  But the second load/store in the vectorized
> epilogue uses only 32-byte vectors and because it is a multiple
> of offset_alignment, it checks if we could increase alignment of the
> a VAR_DECL, the function returns true, sets base_misaligned = true
> and says the access is then aligned.
> But when update_epilogue_loop_vinfo clears base_misaligned with the
> assumption that the var had to have the alignment increased already,
> the update of DECL_ALIGN doesn't happen anymore.
> 
> Now, I'd think this base_alignment = false was needed before
> r10-4030-gd2db7f7901 change was committed where it incorrectly
> overwrote DECL_ALIGN even if it was already larger, rather than
> just always increasing it.  But with that change in, it doesn't
> make sense to me anymore.

I think it was incorrect before - basically it assumed that the
main loop would never use misaligned accesses and the epilogue
analysis would align.  But for this we'd need to make sure to
never request such during epilogue analysis.

> Note, the testcase is latent on the trunk, but reproduces on the 13
> branch.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux on the trunk,
> plus tested with the testcase on 13 branch with -m32/-m64 without/with
> the tree-vect-loop.cc patch (where it FAILed before and now PASSes).
> Ok for trunk?

OK.

Thanks,
Richard.

> 2024-04-05  Jakub Jelinek  
> 
>   PR tree-optimization/114566
>   * tree-vect-loop.cc (update_epilogue_loop_vinfo): Don't clear
>   base_misaligned.
> 
>   * gcc.target/i386/avx512f-pr114566.c: New test.
> 
> --- gcc/tree-vect-loop.cc.jj  2024-04-04 00:48:05.932072711 +0200
> +++ gcc/tree-vect-loop.cc 2024-04-05 00:59:33.743101468 +0200
> @@ -11590,9 +11590,7 @@ find_in_mapping (tree t, void *context)
> corresponding dr_vec_info need to be reconnected to the EPILOGUE's
> stmt_vec_infos, their statements need to point to their corresponding 
> copy,
> if they are gather loads or scatter stores then their reference needs to 
> be
> -   updated to point to its corresponding copy and finally we set
> -   'base_misaligned' to false as we have already peeled for alignment in the
> -   prologue of the main loop.  */
> +   updated to point to its corresponding copy.  */
>  
>  static void
>  update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
> @@ -11736,10 +11734,6 @@ update_epilogue_loop_vinfo (class loop *
>   }
>DR_STMT (dr) = STMT_VINFO_STMT (stmt_vinfo);
>stmt_vinfo->dr_aux.stmt = stmt_vinfo;
> -  /* The vector size of the epilogue is smaller than that of the main 
> loop
> -  so the alignment is either the same or lower. This means the dr will
> -  thus by definition be aligned.  */
> -  STMT_VINFO_DR_INFO (stmt_vinfo)->base_misaligned = false;
>  }
>  
>epilogue_vinfo->shared->datarefs_copy.release ();
> --- gcc/testsuite/gcc.target/i386/avx512f-pr114566.c.jj   2024-04-05 
> 11:21:04.282639386 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512f-pr114566.c  2024-04-05 
> 11:21:04.282639386 +0200
> @@ -0,0 +1,34 @@
> +/* PR tree-optimization/114566 */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -mavx512f" } */
> +/* { dg-additional-options "-fstack-protector-strong" { target 
> fstack_protector } } */
> +/* { dg-require-effective-target avx512f } */
> +
> +#define AVX512F
> +#include "avx512f-helper.h"
> +
> +__attribute__((noipa)) int
> +foo (float x, float y)
> +{
> +  float a[8][56];
> +  __builtin_memset (a, 0, sizeof (a));
> +
> +  for (int j = 0; j < 8; j++)
> +for (int k = 0; k < 56; k++)
> +  {
> + float b = k * y;
> + if (b < 0.)
> +   b = 0.;
> + if (b > 0.)
> +   b = 0.;
> + a[j][k] += b;
> +  }
> +
> +  return __builtin_log (x);
> +}
> +
> +void
> +TEST (void)
> +{
> +  foo (86.25f, 0.625f);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] testsuite: Fix up error on gcov1.d

2024-04-05 Thread Richard Biener
On Fri, 5 Apr 2024, Jakub Jelinek wrote:

> On Fri, Feb 23, 2024 at 12:18:00PM +0100, Jørgen Kvalsvik wrote:
> > This is a mostly straight port from the gcov-19.c tests from the C test
> > suite. The only notable differences from C to D are that D flips the
> > true/false outcomes for loop headers, and the D front end ties loop and
> > ternary conditions to slightly different locus.
> > 
> > The test for >64 conditions warning is disabled as it either needs
> > support from the testing framework or a something similar to #pragma GCC
> > diagnostic push to not cause a test failure from detecting a warning.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gdc.dg/gcov.exp: New test.
> > * gdc.dg/gcov1.d: New test.
> 
> Unfortunately, this doesn't work.
> I see
> PASS: gdc.dg/gcov1.d   execution test
> ERROR: (DejaGnu) proc "run-gcov conditions { --conditions gcov1.d }" does not 
> exist.
> The error code is TCL LOOKUP COMMAND run-gcov
> The info on the error is:
> invalid command name "run-gcov"
> while executing
> "::tcl_unknown run-gcov conditions { --conditions gcov1.d }"
> ("uplevel" body line 1)
> invoked from within
> "uplevel 1 ::tcl_unknown $args"
> ERROR: gdc.dg/gcov1.d  : error executing dg-final: invalid command name 
> "run-gcov"
> both on x86_64-linux and i686-linux.
> The problem is that the test hasn't been added to a new directory, but
> to a directory already covered by a different *.exp file - dg.exp.
> Now, usually either one has a test directory like gcc.misc-tests where
> there are many *.exp files but each *.exp file globs for its own tests,
> or there is one *.exp per directory and covers everything in there.
> By having both dg.exp and gcov.exp in the same directory with dg.exp
> covering all *.d files in there and gcov gcov*.d in there, the gcov*.d
> tests are tested twice, once using the dg.exp driver and once using gcov.exp
> driver.  With the latter, they do work properly, with the former they don't
> because gcov.exp lib file isn't loaded and so run-gcov isn't available.
> 
> The following patch fixes that similarly how g++.dg/modules/modules.exp,
> gcc.target/s390/s390.exp or gcc.target/i386/i386.exp deal with that,
> by pruning some tests based on glob patterns from the list.
> 
> Tested on x86_64-linux with make -j32 check-d, ok for trunk?

OK.

Richard.

> 2024-04-05  Jakub Jelinek  
> 
>   * gdc.dg/dg.exp: Prune gcov*.d from the list of tests to run.
>   * gdc.dg/gcov.exp: Update copyright years.
> 
> --- gcc/testsuite/gdc.dg/dg.exp.jj2024-01-03 22:33:38.249693029 +0100
> +++ gcc/testsuite/gdc.dg/dg.exp   2024-04-05 10:20:13.518823037 +0200
> @@ -30,7 +30,8 @@ dg-init
>  
>  # Main loop.
>  gdc-dg-runtest [lsort \
> -   [glob -nocomplain $srcdir/$subdir/*.d ] ] "" $DEFAULT_DFLAGS
> +   [prune [glob -nocomplain $srcdir/$subdir/*.d ] \
> +   $srcdir/$subdir/gcov*.d ] ] "" $DEFAULT_DFLAGS
>  
>  # All done.
>  dg-finish
> --- gcc/testsuite/gdc.dg/gcov.exp.jj  2024-04-04 21:45:56.025155257 +0200
> +++ gcc/testsuite/gdc.dg/gcov.exp 2024-04-05 10:20:23.678682559 +0200
> @@ -1,4 +1,4 @@
> -#   Copyright (C) 1997-2023 Free Software Foundation, Inc.
> +#   Copyright (C) 1997-2024 Free Software Foundation, Inc.
>  
>  # This program is free software; you can redistribute it and/or modify
>  # it under the terms of the GNU General Public License as published by
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] middle-end/114599 - fix bitmap allocation for check_ifunc_callee_symtab_nodes

2024-04-05 Thread Richard Biener
There's no default bitmap obstack during global CTORs, so allocate the
bitmap locally.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

PR middle-end/114599
* symtab.cc (ifunc_ref_map): Do not use auto_bitmap.
(is_caller_ifunc_resolver): Optimize bitmap_bit_p/bitmap_set_bit
pair.
(symtab_node::check_ifunc_callee_symtab_nodes): Properly
allocate ifunc_ref_map here.
---
 gcc/symtab.cc | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/symtab.cc b/gcc/symtab.cc
index 3256133891d..3b018ab3ea2 100644
--- a/gcc/symtab.cc
+++ b/gcc/symtab.cc
@@ -1383,7 +1383,7 @@ check_ifunc_resolver (cgraph_node *node, void *data)
   return false;
 }
 
-static auto_bitmap ifunc_ref_map;
+static bitmap ifunc_ref_map;
 
 /* Return true if any caller of NODE is an ifunc resolver.  */
 
@@ -1404,9 +1404,8 @@ is_caller_ifunc_resolver (cgraph_node *node)
 
   /* Skip if it has been visited.  */
   unsigned int uid = e->caller->get_uid ();
-  if (bitmap_bit_p (ifunc_ref_map, uid))
+  if (!bitmap_set_bit (ifunc_ref_map, uid))
continue;
-  bitmap_set_bit (ifunc_ref_map, uid);
 
   if (is_caller_ifunc_resolver (e->caller))
{
@@ -1437,6 +1436,9 @@ symtab_node::check_ifunc_callee_symtab_nodes (void)
 {
   symtab_node *node;
 
+  bitmap_obstack_initialize (NULL);
+  ifunc_ref_map = BITMAP_ALLOC (NULL);
+
   FOR_EACH_SYMBOL (node)
 {
   cgraph_node *cnode = dyn_cast  (node);
@@ -1455,7 +1457,8 @@ symtab_node::check_ifunc_callee_symtab_nodes (void)
cnode->called_by_ifunc_resolver = true;
 }
 
-  bitmap_clear (ifunc_ref_map);
+  BITMAP_FREE (ifunc_ref_map);
+  bitmap_obstack_release (NULL);
 }
 
 /* Verify symbol table for internal consistency.  */
-- 
2.35.3


Re: [PATCH]middle-end vect: adjust loop upper bounds when peeling for gaps and early break [PR114403]

2024-04-05 Thread Richard Biener
On Thu, 4 Apr 2024, Tamar Christina wrote:

> Hi All,
> 
> The report shows that we end up in a situation where the code has been peeled
> for gaps and we have an early break.
> 
> The code for peeling for gaps assume that a scalar loop needs to perform at
> least one iteration.  However this doesn't take into account early break where
> the scalar loop may not need to be executed.

But we always re-start the vector iteration where the early break happens?

> That the early break loop can be partial is not accounted for in this 
> scenario.
> loop partiality is normally handled by setting bias_for_lowest to 1, but when
> peeling for gaps we end up with 0, which when the loop upper bounds are
> calculated means that a partial loop iteration loses the final partial iter:
> 
> Analyzing # of iterations of loop 1
>   exit condition [8, + , 18446744073709551615] != 0
>   bounds on difference of bases: -8 ... -8
>   result:
> # of iterations 8, bounded by 8
> 
> and a VF=4 calculating:
> 
> Loop 1 iterates at most 1 times.
> Loop 1 likely iterates at most 1 times.
> Analyzing # of iterations of loop 1
>   exit condition [1, + , 1](no_overflow) < bnd.5505_39
>   bounds on difference of bases: 0 ... 4611686018427387902
> Matching expression match.pd:2011, generic-match-8.cc:27
> Applying pattern match.pd:2067, generic-match-1.cc:4813
>   result:
> # of iterations bnd.5505_39 + 18446744073709551615, bounded by 
> 4611686018427387902
> Estimating sizes for loop 1
> ...
>Induction variable computation will be folded away.
>   size:   2 if (ivtmp_312 < bnd.5505_39)
>Exit condition will be eliminated in last copy.
> size: 24-3, last_iteration: 24-5
>   Loop size: 24
>   Estimated size after unrolling: 26
> ;; Guessed iterations of loop 1 is 0.858446. New upper bound 1.
> 
> upper bound should be 2 not 1.

Why?  This means the vector loop will iterate once (thus the body
executed twice), isn't that correct?  Peeling for gaps means the
main IV will exit the loop in time.

> 
> This patch forced the bias_for_lowest to be 1 even when peeling for gaps.

(*)

> I have however not been able to write a standalone reproducer for this so I 
> have
> no tests but bootstrap and LLVM build fine now.
> 
> The testcase:
> 
> #define COUNT 9
> #define SIZE COUNT * 4
> #define TYPE unsigned long
> 
> TYPE x[SIZE], y[SIZE];
> 
> void __attribute__((noipa))
> loop (TYPE val)
> {
>   for (int i = 0; i < COUNT; ++i)
> {
>   if (x[i * 4] > val || x[i * 4 + 1] > val)
> return;
>   x[i * 4] = y[i * 2] + 1;
>   x[i * 4 + 1] = y[i * 2] + 2;
>   x[i * 4 + 2] = y[i * 2 + 1] + 3;
>   x[i * 4 + 3] = y[i * 2 + 1] + 4;
> }
> }
> 
> does perform the peeling for gaps and early beak, however it creates a hybrid
> loop which works fine. adjusting the indices to non linear also works. So I'd
> like to submit the fix and work on a testcase separately if needed.

You can have peeling for gaps without SLP by doing interleaving.

#define COUNT 9
#define TYPE unsigned long

TYPE x[COUNT], y[COUNT*2];

void __attribute__((noipa))
loop (TYPE val)
{
  for (int i = 0; i < COUNT; ++i)
{ 
  if (x[i] > val)
return;
  x[i] = y[i * 2];
   }
}

gets me partial vectors and peeling for gaps with -O3 -march=armv8.2-a+sve 
-fno-vect-cost-model (with cost modeling we choose ADVSIMD).  Does
this reproduce the issue?

Richard.


> Bootstrapped Regtested on x86_64-pc-linux-gnu no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/114403
>   * tree-vect-loop.cc (vect_transform_loop): Adjust upper bounds for when
>   peeling for gaps and early break.
> 
> ---
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 4375ebdcb493a90fd0501cbb4b07466077b525c3..bf1bb9b005c68fbb13ee1b1279424865b237245a
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -12139,7 +12139,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
>/* The minimum number of iterations performed by the epilogue.  This
>   is 1 when peeling for gaps because we always need a final scalar
>   iteration.  */
> -  int min_epilogue_iters = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? 1 : 0;
> +  int min_epilogue_iters = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +&& !LOOP_VINFO_EARLY_BREAKS (loop_vinfo) ? 1 : 0;

(*) This adjusts min_epilogue_iters though and honestly the whole code
looks like a mess.  I'm quoting a bit more here:

>/* +1 to convert latch counts to loop iteration counts,
>   -min_epilogue_iters to remove iterations that cannot be performed
> by the vector code.  */
  int bias_for_lowest = 1 - min_epilogue_iters;
  int bias_for_assumed = bias_for_lowest;

The variable names and comments now have nothing to do with the
actual magic we compute into them.

I think it would be an improvement to disentangle this a bit like
doing

   /* +1 to convert latch counts 

[PATCH] middle-end/114579 - speed up add_scope_conflicts

2024-04-04 Thread Richard Biener
The following speeds up stack variable conflict detection by recognizing
that the all-to-all conflict recording is only necessary for CFG merges
as it's the unioning of the live variable sets that doesn't come with
explicit mentions we record conflicts for.

If we employ this optimization we have to make sure to perform the
all-to-all conflict recording for all CFG merges even those into
empty blocks where we might previously have skipped this.

I have reworded the comment before the all-to-all conflict recording
since it seemed to be confusing and missing the point - but maybe I
am also missing something here.

Nevertheless for the testcase in the PR the compile-time spend in
add_scope_conflicts at -O1 drops from previously 67s (39%) to 10s (9%).

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?

Thanks,
Richard.

PR middle-end/114579
* cfgexpand.cc (add_scope_conflicts_1): Record all-to-all
conflicts only when there's a CFG merge but for all CFG merges.
---
 gcc/cfgexpand.cc | 46 +-
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index eef565eddb5..fa48a4c633f 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -640,21 +640,26 @@ add_scope_conflicts_1 (basic_block bb, bitmap work, bool 
for_conflict)
{
  if (for_conflict && visit == visit_op)
{
- /* If this is the first real instruction in this BB we need
-to add conflicts for everything live at this point now.
-Unlike classical liveness for named objects we can't
-rely on seeing a def/use of the names we're interested in.
-There might merely be indirect loads/stores.  We'd not add any
-conflicts for such partitions.  */
+ /* When we are inheriting live variables from our predecessors
+through a CFG merge we might not see an actual mention of
+the variables to record the approprate conflict as defs/uses
+might be through indirect stores/loads.  For this reason
+we have to make sure each live variable conflicts with
+each other.  When there's just a single predecessor the
+set of conflicts is already up-to-date.
+We perform this delayed at the first real instruction to
+allow clobbers starting this block to remove variables from
+the set of live variables.  */
  bitmap_iterator bi;
  unsigned i;
- EXECUTE_IF_SET_IN_BITMAP (work, 0, i, bi)
-   {
- class stack_var *a = _vars[i];
- if (!a->conflicts)
-   a->conflicts = BITMAP_ALLOC (_var_bitmap_obstack);
- bitmap_ior_into (a->conflicts, work);
-   }
+ if (EDGE_COUNT (bb->preds) > 1)
+   EXECUTE_IF_SET_IN_BITMAP (work, 0, i, bi)
+ {
+   class stack_var *a = _vars[i];
+   if (!a->conflicts)
+ a->conflicts = BITMAP_ALLOC (_var_bitmap_obstack);
+   bitmap_ior_into (a->conflicts, work);
+ }
  visit = visit_conflict;
}
  walk_stmt_load_store_addr_ops (stmt, work, visit, visit, visit);
@@ -662,6 +667,21 @@ add_scope_conflicts_1 (basic_block bb, bitmap work, bool 
for_conflict)
add_scope_conflicts_2 (USE_FROM_PTR (use_p), work, visit);
}
 }
+
+  /* When there was no real instruction but there's a CFG merge we need
+ to add the conflicts now.  */
+  if (for_conflict && visit == visit_op && EDGE_COUNT (bb->preds) > 1)
+{
+  bitmap_iterator bi;
+  unsigned i;
+  EXECUTE_IF_SET_IN_BITMAP (work, 0, i, bi)
+   {
+ class stack_var *a = _vars[i];
+ if (!a->conflicts)
+   a->conflicts = BITMAP_ALLOC (_var_bitmap_obstack);
+ bitmap_ior_into (a->conflicts, work);
+   }
+}
 }
 
 /* Generate stack partition conflicts between all partitions that are
-- 
2.35.3


Re: [PATCH] fold-const: Handle NON_LVALUE_EXPR in native_encode_initializer [PR114537]

2024-04-04 Thread Richard Biener
On Thu, 4 Apr 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is incorrectly rejected.  The problem is that
> for bit-fields native_encode_initializer expects the corresponding
> CONSTRUCTOR elt value must be INTEGER_CST, but that isn't the case
> here, it is wrapped into NON_LVALUE_EXPR by maybe_wrap_with_location.
> We could STRIP_ANY_LOCATION_WRAPPER as well, but as all we are looking for
> is INTEGER_CST inside, just looking through NON_LVALUE_EXPR seems easier.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-04-04  Jakub Jelinek  
> 
>   PR c++/114537
>   * fold-const.cc (native_encode_initializer): Look through
>   NON_LVALUE_EXPR if val is INTEGER_CST.
> 
>   * g++.dg/cpp2a/bit-cast16.C: New test.
> 
> --- gcc/fold-const.cc.jj  2024-03-26 11:21:31.996860739 +0100
> +++ gcc/fold-const.cc 2024-04-03 16:59:05.747297410 +0200
> @@ -8601,6 +8601,8 @@ native_encode_initializer (tree init, un
> if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
>   return 0;
>  
> +   if (TREE_CODE (val) == NON_LVALUE_EXPR)
> + val = TREE_OPERAND (val, 0);
> if (TREE_CODE (val) != INTEGER_CST)
>   return 0;
>  
> --- gcc/testsuite/g++.dg/cpp2a/bit-cast16.C.jj2024-04-03 
> 17:06:46.720974426 +0200
> +++ gcc/testsuite/g++.dg/cpp2a/bit-cast16.C   2024-04-03 17:06:40.233063410 
> +0200
> @@ -0,0 +1,16 @@
> +// PR c++/114537
> +// { dg-do compile { target c++20 } }
> +
> +namespace std {
> +template
> +constexpr T
> +bit_cast (const F& f) noexcept
> +{
> +  return __builtin_bit_cast (T, f);
> +}
> +}
> +
> +struct A { signed char b : 1 = 0; signed char c : 7 = 0; };
> +struct D { unsigned char e; };
> +constexpr unsigned char f = std::bit_cast (A{}).e;
> +static_assert (f == 0);
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] bitint: Handle m_bitfld_load cast in outer m_cast_conditional [PR114555]

2024-04-04 Thread Richard Biener
On Thu, 4 Apr 2024, Jakub Jelinek wrote:

> Hi!
> 
> We ICE on the following testcase, because we use result of a PHI node
> which is only conditional because of a m_cast_conditional on the outermost
> loops PHI node argument and so is invalid SSA form.
> 
> The following patch fixes it like similar cases elsewhere by adding
> needed intervening PHI(s).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-04-04  Jakub Jelinek  
> 
>   PR tree-optimization/114555
>   * gimple-lower-bitint.cc (bitint_large_huge::handle_cast): For
>   m_bitfld_load and save_cast_conditional add any needed PHIs
>   and adjust t4 accordingly.
> 
>   * gcc.dg/bitint-103.c: New test.
>   * gcc.dg/bitint-104.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-03-23 11:19:53.0 +0100
> +++ gcc/gimple-lower-bitint.cc2024-04-03 15:31:19.686583203 +0200
> @@ -1506,7 +1506,7 @@ bitint_large_huge::handle_cast (tree lhs
> if (m_bitfld_load)
>   {
> tree t4;
> -   if (!save_first)
> +   if (!save_first && !save_cast_conditional)
>   t4 = m_data[m_bitfld_load + 1];
> else
>   t4 = make_ssa_name (m_limb_type);
> @@ -1519,6 +1519,24 @@ bitint_large_huge::handle_cast (tree lhs
> if (edge_true_true)
>   add_phi_arg (phi, m_data[m_bitfld_load], edge_true_true,
>UNKNOWN_LOCATION);
> +   if (save_cast_conditional)
> + for (basic_block bb = gsi_bb (m_gsi);;)
> +   {
> + edge e1 = single_succ_edge (bb);
> + edge e2 = find_edge (e1->dest, m_bb), e3;
> + tree t5 = ((e2 && !save_first) ? m_data[m_bitfld_load + 1]
> +: make_ssa_name (m_limb_type));
> + phi = create_phi_node (t5, e1->dest);
> + edge_iterator ei;
> + FOR_EACH_EDGE (e3, ei, e1->dest->preds)
> +   add_phi_arg (phi, (e3 == e1 ? t4
> +  : build_zero_cst (m_limb_type)),
> +e3, UNKNOWN_LOCATION);
> + t4 = t5;
> + if (e2)
> +   break;
> + bb = e1->dest;
> +   }
> m_data[m_bitfld_load] = t4;
> m_data[m_bitfld_load + 2] = t4;
> m_bitfld_load = 0;
> --- gcc/testsuite/gcc.dg/bitint-103.c.jj  2024-04-03 15:34:19.468113199 
> +0200
> +++ gcc/testsuite/gcc.dg/bitint-103.c 2024-04-03 15:34:05.805300917 +0200
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/114555 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O2" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 1225
> +struct S { _BitInt(512) : 98; _BitInt(1225) b : 509; } s;
> +_BitInt(1225) a;
> +#endif
> +
> +void
> +foo (void)
> +{
> +#if __BITINT_MAXWIDTH__ >= 1225
> +  a ^= (unsigned _BitInt(1025)) s.b;
> +#endif
> +}
> --- gcc/testsuite/gcc.dg/bitint-104.c.jj  2024-04-03 15:36:59.385916107 
> +0200
> +++ gcc/testsuite/gcc.dg/bitint-104.c 2024-04-03 15:36:28.034346850 +0200
> @@ -0,0 +1,17 @@
> +/* PR tree-optimization/114555 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-std=c23 -O -fno-tree-forwprop" } */
> +
> +#if __BITINT_MAXWIDTH__ >= 4139
> +struct S { _BitInt(31) : 6; _BitInt(513) b : 241; } s;
> +_BitInt(4139) a;
> +#endif
> +
> +void
> +foo (void)
> +{
> +#if __BITINT_MAXWIDTH__ >= 4139
> +  int i = 0;
> +  a -= s.b << i;
> +#endif
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-optimization/114485 - neg induction with partial vectors

2024-04-04 Thread Richard Biener
We can't use vect_update_ivs_after_vectorizer for partial vectors,
the following fixes vect_can_peel_nonlinear_iv_p accordingly.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

We could handle this case by vectorizing the live lane but that's
a different thing and might be tackled next stage1.

PR tree-optimization/114485
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p):
vect_step_op_neg isn't OK for partial vectors but only
for unknown niter.

* gcc.dg/vect/pr114485.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr114485.c | 18 ++
 gcc/tree-vect-loop-manip.cc  | 14 +++---
 2 files changed, 25 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr114485.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114485.c 
b/gcc/testsuite/gcc.dg/vect/pr114485.c
new file mode 100644
index 000..6536806e350
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114485.c
@@ -0,0 +1,18 @@
+#include "tree-vect.h"
+
+int b, c = 8, d;
+int e[23];
+int main()
+{
+  check_vect ();
+
+  int *h = e;
+  for (int i = 1; i < b + 21; i += 2)
+{
+  c *= -1;
+  d = h[i] ? i : 0;
+}
+  if (c != 8)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 56a6d8e4a8d..8d9b533d50f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2128,18 +2128,18 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo,
  For shift, when shift mount >= precision, there would be UD.
  For mult, don't known how to generate
  init_expr * pow (step, niters) for variable niters.
- For neg, it should be ok, since niters of vectorized main loop
+ For neg unknown niters are ok, since niters of vectorized main loop
  will always be multiple of 2.
- See also PR113163 and PR114196.  */
-  if ((!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
-   || LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
-   || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
-  && induction_type != vect_step_op_neg)
+ See also PR113163,  PR114196 and PR114485.  */
+  if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()
+  || LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
+  || (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ && induction_type != vect_step_op_neg))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Peeling for epilogue is not supported"
-" for nonlinear induction except neg"
+" for this nonlinear induction"
 " when iteration count is unknown or"
 " when using partial vectorization.\n");
   return false;
-- 
2.35.3


[PATCH] tree-optimization/114551 - loop splitting and undefined overflow

2024-04-04 Thread Richard Biener
When loop splitting hoists a guard computation it needs to make sure
that can be safely evaluated at this place when it was previously
only conditionally evaluated.  The following fixes this for the
case of undefined overflow.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114551
* tree-ssa-loop-split.cc (split_loop): If the guard is
only conditionally evaluated rewrite computations with
possibly undefined overflow to unsigned arithmetic.

* gcc.dg/torture/pr114551.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr114551.c | 18 ++
 gcc/tree-ssa-loop-split.cc  | 22 --
 2 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr114551.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr114551.c 
b/gcc/testsuite/gcc.dg/torture/pr114551.c
new file mode 100644
index 000..13c15fbc3d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr114551.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+
+int a, b[4], c, d, e, f;
+int main()
+{
+  a--;
+  for (f = 3; f >= 0; f--)
+{
+  for (e = 0; e < 4; e++)
+   c = 0;
+  for (; c < 4; c++)
+   {
+ d = f && a > 0 && f > (2147483647 - a) ? 0 : b[f];
+ continue;
+   }
+}
+  return 0;
+}
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index c0bb1b71d17..a770ea371a2 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -653,8 +653,26 @@ split_loop (class loop *loop1)
gimple_seq stmts2;
border = force_gimple_operand (border, , true, NULL_TREE);
if (stmts2)
- gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
-   stmts2);
+ {
+   /* When the split condition is not always evaluated make sure
+  to rewrite it to defined overflow.  */
+   if (!dominated_by_p (CDI_DOMINATORS, exit1->src, bbs[i]))
+ {
+   gimple_stmt_iterator gsi;
+   gsi = gsi_start (stmts2);
+   while (!gsi_end_p (gsi))
+ {
+   gimple *stmt = gsi_stmt (gsi);
+   if (is_gimple_assign (stmt)
+   && arith_code_with_undefined_signed_overflow
+   (gimple_assign_rhs_code (stmt)))
+ rewrite_to_defined_overflow ();
+   gsi_next ();
+ }
+ }
+   gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
+ stmts2);
+ }
tree cond = fold_build2 (guard_code, boolean_type_node,
 guard_init, border);
if (!initial_true)
-- 
2.35.3


Re: [Patch] lto-wrapper.cc: Add offload target name to 'offload_args' suffix

2024-04-03 Thread Richard Biener
On Wed, 3 Apr 2024, Tobias Burnus wrote:

> Found when working with -save-temps and looking at 'mkoffload'
> with a GCC configured for both nvptx and gcn offloading.
> 
> Before (for 'a.out') for mkoffload:a.offload_args now:
> a.amdgcn-amdhsa.offload_args and a.nvptx-none.offload_args
> OK for mainline?

OK.

Richard.

> Tobias
> 
> PS: The code does not free the 'xmalloc'ed memory, but that's also
> the case of all/most 'concat' in this file; the concat could also
> be skipped when no save_temps is used, in case this optimization
> makes sense.
> 

Re: [PATCH] Don't set full_profile in auto-profile [PR113765]

2024-04-03 Thread Richard Biener
On Thu, Mar 28, 2024 at 4:03 AM Eugene Rozenfeld
 wrote:
>
> auto-profile currently doesn't guarantee that it will set probabilities
> on all edges because of zero basic block counts. Normally those edges
> just have probabilities set by the preceding profile_estimate pass but
> under -O0 profile_estimate pass doesn't run. The patch removes setting
> of full_profile to true in auto-profile.

OK.

Thanks,
Richard.

> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
> * auto-profile.cc (afdo_annotate_cfg): Don't set full_profile to true
> ---
>  gcc/auto-profile.cc | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index e5407d32fbb..de59b94bcb3 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1580,7 +1580,6 @@ afdo_annotate_cfg (const stmt_set _stmts)
>  }
>update_max_bb_count ();
>profile_status_for_fn (cfun) = PROFILE_READ;
> -  cfun->cfg->full_profile = true;
>if (flag_value_profile_transformations)
>  {
>gimple_value_profile_transformations ();
> --
> 2.25.1
>


[PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination

2024-04-03 Thread Richard Biener
The following avoids re-walking and re-combining the instructions
between i2 and i3 when the pattern of i2 doesn't change.

Bootstrap and regtest running ontop of a reversal of 
r14-9692-g839bc42772ba7a.

It brings down memory use frmo 9GB to 400MB and compile-time from
80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
but has shown code generation regressions across acrchitectures.

OK to revert r14-9692-g839bc42772ba7a?

OK to install this instead?

Thanks,
Richard.

PR rtl-optimization/101523
* combine.cc (try_combine): When the pattern of i2 doesn't
change do not re-start combining at i2 or an earlier insn which
had links or notes added.
---
 gcc/combine.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..ff25752cac4 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4186,6 +4186,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   adjust_for_new_dest (i3);
 }
 
+  bool i2_unchanged = false;
+  if (rtx_equal_p (newi2pat, PATTERN (i2)))
+i2_unchanged = true;
+
   /* We now know that we can do this combination.  Merge the insns and
  update the status of registers and LOG_LINKS.  */
 
@@ -4752,6 +4756,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   combine_successes++;
   undo_commit ();
 
+  if (i2_unchanged)
+return i3;
+
   rtx_insn *ret = newi2pat ? i2 : i3;
   if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID (ret))
 ret = added_links_insn;
-- 
2.35.3


Re: [PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2024-04-03 Thread Richard Biener
On Wed, 3 Apr 2024, Iain Sandoe wrote:

> Hi Richard,
> 
> > On 7 Mar 2024, at 13:40, FX Coudert  wrote:
> > 
> >> I think it's an obvious change ...
> > 
> > Thanks, pushed.
> > 
> > Dimitry, I suggest you post the second patch for review.
> 
> Given that the two patches here (for 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632) were considered obvious 
> - and are needed on release branches.
> 
> OK for backporting?

OK.

> (Gerald has volunteered to do the earlier ones, I have already made/tested 
> the gcc-13 case)
> 
> thanks
> Iain
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] expr: Fix up emit_push_insn [PR114552]

2024-04-03 Thread Richard Biener
On Wed, 3 Apr 2024, Jakub Jelinek wrote:

> Hi!
> 
> r13-990 added optimizations in multiple spots to optimize during
> expansion storing of constant initializers into targets.
> In the load_register_parameters and expand_expr_real_1 cases,
> it checks it has a tree as the source and so knows we are reading
> that whole decl's value, so the code is fine as is, but in the
> emit_push_insn case it checks for a MEM from which something
> is pushed and checks for SYMBOL_REF as the MEM's address, but
> still assumes the whole object is copied, which as the following
> testcase shows might not always be the case.  In the testcase,
> k is 6 bytes, then 2 bytes of padding, then another 4 bytes,
> while the emit_push_insn wants to store just the 6 bytes.
> 
> The following patch simply verifies it is the whole initializer
> that is being stored, I think that is best thing to do so late
> in GCC 14 cycle as well for backporting.
> 
> For GCC 15, perhaps the code could stop requiring it must be at offset zero,
> nor that the size is a subset, but could use
> get_symbol_constant_value/fold_ctor_reference gimple-fold APIs to actually
> extract just part of the initializer if we e.g. push just some subset
> (of course, still verify that it is a subset).  For sizes which are power
> of two bytes and we have some integer modes, we could use as type for
> fold_ctor_reference corresponding integral types, otherwise dunno, punt
> or use some structure (e.g. try to find one in the initializer?), whatever.
> But even in the other spots it could perhaps handle loading of
> COMPONENT_REFs or MEM_REFs from the .rodata vars.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-04-02  Jakub Jelinek  
> 
>   PR middle-end/114552
>   * expr.cc (emit_push_insn): Only use store_constructor for
>   immediate_const_ctor_p if int_expr_size matches size.
> 
>   * gcc.c-torture/execute/pr114552.c: New test.
> 
> --- gcc/expr.cc.jj2024-03-15 10:10:51.209237835 +0100
> +++ gcc/expr.cc   2024-04-02 16:01:39.566744302 +0200
> @@ -5466,6 +5466,7 @@ emit_push_insn (rtx x, machine_mode mode
> /* If source is a constant VAR_DECL with a simple constructor,
>   store the constructor to the stack instead of moving it.  */
> const_tree decl;
> +   HOST_WIDE_INT sz;
> if (partial == 0
> && MEM_P (xinner)
> && SYMBOL_REF_P (XEXP (xinner, 0))
> @@ -5473,9 +5474,11 @@ emit_push_insn (rtx x, machine_mode mode
> && VAR_P (decl)
> && TREE_READONLY (decl)
> && !TREE_SIDE_EFFECTS (decl)
> -   && immediate_const_ctor_p (DECL_INITIAL (decl), 2))
> - store_constructor (DECL_INITIAL (decl), target, 0,
> -int_expr_size (DECL_INITIAL (decl)), false);
> +   && immediate_const_ctor_p (DECL_INITIAL (decl), 2)
> +   && (sz = int_expr_size (DECL_INITIAL (decl))) > 0
> +   && CONST_INT_P (size)
> +   && INTVAL (size) == sz)
> + store_constructor (DECL_INITIAL (decl), target, 0, sz, false);
> else
>   emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
>   }
> --- gcc/testsuite/gcc.c-torture/execute/pr114552.c.jj 2024-04-02 
> 16:08:12.959366793 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr114552.c2024-04-02 
> 16:03:49.829963659 +0200
> @@ -0,0 +1,24 @@
> +/* PR middle-end/114552 */
> +
> +struct __attribute__((packed)) S { short b; int c; };
> +struct T { struct S b; int e; };
> +static const struct T k = { { 1, 0 }, 0 };
> +
> +__attribute__((noinline)) void
> +foo (void)
> +{
> +  asm volatile ("" : : : "memory");
> +}
> +
> +__attribute__((noinline)) void
> +bar (struct S n)
> +{
> +  foo ();
> +}
> +
> +int
> +main ()
> +{
> +  bar (k.b);
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] c++: Implement C++26 P2809R3 - Trivial infinite loops are not Undefined Behavior

2024-04-03 Thread Richard Biener
On Wed, Apr 3, 2024 at 9:25 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch attempts to implement P2809R3, which has been voted
> in as a DR.
>
> The middle-end has its behavior documented:
> '-ffinite-loops'
>  Assume that a loop with an exit will eventually take the exit and
>  not loop indefinitely.  This allows the compiler to remove loops
>  that otherwise have no side-effects, not considering eventual
>  endless looping as such.
>
>  This option is enabled by default at '-O2' for C++ with -std=c++11
>  or higher.
>
> So, the following patch attempts to detect trivial infinite loops and
> turn their conditions into INTEGER_CSTs so that they don't really have
> exits in the middle-end and so regardless of -ffinite-loops or
> -fno-finite-loops they are handled as infinite loops by the middle-end.
> Otherwise, if the condition would be a large expression calling various
> constexpr functions, I'd be afraid we could e.g. just inline some of them
> and not all of them and the middle-end could still see tests in the
> condition and with -ffinite-loops optimize it by assuming that such loops
> need to be finite.
>
> The "A trivial infinite loop is a trivially empty iteration statement for
> which the converted controlling expression is a constant expression, when
> interpreted as a constant-expression ([expr.const]), and evaluates to true."
> wording isn't clear to me what it implies for manifest constant evaluation
> of the expression, especially given the
> int x = 42;
> while (std::is_constant_evaluated() || --x) ;
> example in the rationale.
>
> The patch assumes that the condition expression aren't manifestly constant
> evaluated.  If it would be supposed to be manifestly constant evaluated,
> then I think the DR would significantly change behavior of existing programs
> and have really weird effects.  Before the DR has been voted in, I think
> void foo (int x)
> {
>   if (x == 0)
> while (std::is_constant_evaluated())
>   ;
>   else
> while (!std::is_constant_evaluated())
>   ;
> }
> would have well defined behavior of zero loop body iterations if x == 0 and
> undefined behavior otherwise.  If the condition expression is manifestly
> constant evaluated if it evaluates to true, and otherwise can be
> non-constant or not manifestly constant evaluated otherwise, then the
> behavior would be that for x == 0 it is well defined trvial infinite loop,
> while for x != 0 it would keep to be undefined behavior (infinite loop,
> as !std::is_constant_evaluated() is false when manifestly constant evaluated
> and if we keep the condition as is, evaluates then to true.  I think it
> would be fairly strange if both loops are infinite even when their condition
> are negated.  Similar for anything that is dependent on if consteval or
> std::is_constant_evaluated() inside of it.
>
> So, the patch below attempts to discover trivially empty iteration
> statements at cp_fold time if it is the final mce_false folding,
> attempts to maybe_constant_value with mce_false evaluate the conditions
> and replaces it with the returned value if constant non-zero.
>
> The testcases then try to check if the FE changed the calls in the
> conditions into constants.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I'll note that -ffinite-loops is reflected into the IL (the loop structure) at
CFG build time in the following simple way:

  /* Push the global flag_finite_loops state down to individual loops.  */
  loop->finite_p = flag_finite_loops;

and the "has an exit" is checked when we evalutate a loop->finite_p loop.
The above flag evaluation could be made a langhook as it's suitably
early, but I'm not sure whether that would help.   The flag is evaluated
when we evaluate the loop ANNOTATE_EXPRs, so annotating finite
loops with a new annotate might be possible as well.

Just in case making the control expression constant to the middle-end
doesn't scale.

Richard.

> Or is it really supposed to be mce_true with the above described weird
> behavior?  If so, I think the standard at least should mention it in Annex C
> (though, where when it is a DR?).
>
> 2024-04-02  Jakub Jelinek  
>
> PR c++/114462
> * cp-gimplify.cc (cp_fold): Implement C++26 P2809R3 - Trivial infinite
> loops are not Undefined Behavior.  For trivially empty WHILE_STMT,
> DO_STMT or FOR_STMT iteration statements check if the condition is
> constant expression which evaluates to true and in that case replace
> the condition with true.
>
> * g++.dg/cpp26/trivial-infinite-loop1.C: New test.
> * g++.dg/cpp26/trivial-infinite-loop2.C: New test.
> * g++.dg/cpp26/trivial-infinite-loop3.C: New test.
>
> --- gcc/cp/cp-gimplify.cc.jj2024-03-23 11:17:06.958445857 +0100
> +++ gcc/cp/cp-gimplify.cc   2024-04-02 11:27:56.069170914 +0200
> @@ -3527,6 +3527,78 @@ cp_fold (tree x, fold_flags_t flags)
>x = 

Re: [PATCH] tree-optimization/114557 - reduce ehcleanup peak memory use

2024-04-02 Thread Richard Biener
On Tue, 2 Apr 2024, Richard Biener wrote:

> The following reduces peak memory use for the PR114480 testcase at -O1
> which is almost exclusively spent by the ehcleanup pass in allocating
> PHI nodes.  The free_phinodes cache we maintain isn't very effective
> since it has effectively two slots, one for 4 and one for 9 argument
> PHIs and it is only ever used for allocations up to 9 arguments but
> we put all larger PHIs in the 9 argument bucket.  This proves
> uneffective resulting in much garbage to be kept when incrementally
> growing PHI nodes by edge redirection.
> 
> The mitigation is to rely on the GC freelist for larger sizes and
> thus immediately return all larger bucket sized PHIs to it via ggc_free.
> 
> This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
> from 359s to 168s.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> OK for trunk?  I'll leave more surgery for stage1.

Testing revealed on other use-after-free.  Revised patch as follows.

Richard.

>From 3507c14d05994eba5396492f08a919847b9e54ab Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Tue, 2 Apr 2024 12:31:04 +0200
Subject: [PATCH] tree-optimization/114557 - reduce ehcleanup peak memory use
To: gcc-patches@gcc.gnu.org

The following reduces peak memory use for the PR114480 testcase at -O1
which is almost exclusively spent by the ehcleanup pass in allocating
PHI nodes.  The free_phinodes cache we maintain isn't very effective
since it has effectively two slots, one for 4 and one for 9 argument
PHIs and it is only ever used for allocations up to 9 arguments but
we put all larger PHIs in the 9 argument bucket.  This proves
uneffective resulting in much garbage to be kept when incrementally
growing PHI nodes by edge redirection.

The mitigation is to rely on the GC freelist for larger sizes and
thus immediately return all larger bucket sized PHIs to it via ggc_free.

This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
from 359s to 168s.

PR tree-optimization/114557
PR tree-optimization/114480
* tree-phinodes.cc (release_phi_node): Return PHIs from
allocation buckets not covered by free_phinodes to GC.
(remove_phi_node): Release the PHI LHS before freeing the
PHI node.
* tree-vect-loop.cc (vectorizable_live_operation): Get PHI lhs
before releasing it.
---
 gcc/ggc-page.cc   |  6 ++
 gcc/tree-phinodes.cc  | 10 +-
 gcc/tree-vect-loop.cc |  2 +-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-phinodes.cc b/gcc/tree-phinodes.cc
index ddd731323e1..5a7e4a94e57 100644
--- a/gcc/tree-phinodes.cc
+++ b/gcc/tree-phinodes.cc
@@ -223,6 +223,14 @@ release_phi_node (gimple *phi)
   delink_imm_use (imm);
 }
 
+  /* Immediately return the memory to the allocator when we would
+ only ever re-use it for a smaller size allocation.  */
+  if (len - 2 >= NUM_BUCKETS - 2)
+{
+  ggc_free (phi);
+  return;
+}
+
   bucket = len > NUM_BUCKETS - 1 ? NUM_BUCKETS - 1 : len;
   bucket -= 2;
   vec_safe_push (free_phinodes[bucket], phi);
@@ -445,9 +453,9 @@ remove_phi_node (gimple_stmt_iterator *gsi, bool 
release_lhs_p)
 
   /* If we are deleting the PHI node, then we should release the
  SSA_NAME node so that it can be reused.  */
-  release_phi_node (phi);
   if (release_lhs_p)
 release_ssa_name (gimple_phi_result (phi));
+  release_phi_node (phi);
 }
 
 /* Remove all the phi nodes from BB.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f33629e9b04..984636edbc5 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10962,8 +10962,8 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
 lhs_type, _gsi);
 
  auto gsi = gsi_for_stmt (use_stmt);
- remove_phi_node (, false);
  tree lhs_phi = gimple_phi_result (use_stmt);
+ remove_phi_node (, false);
  gimple *copy = gimple_build_assign (lhs_phi, new_tree);
  gsi_insert_before (_gsi, copy, GSI_SAME_STMT);
  break;
-- 
2.35.3



[PATCH] tree-optimization/114557 - reduce ehcleanup peak memory use

2024-04-02 Thread Richard Biener
The following reduces peak memory use for the PR114480 testcase at -O1
which is almost exclusively spent by the ehcleanup pass in allocating
PHI nodes.  The free_phinodes cache we maintain isn't very effective
since it has effectively two slots, one for 4 and one for 9 argument
PHIs and it is only ever used for allocations up to 9 arguments but
we put all larger PHIs in the 9 argument bucket.  This proves
uneffective resulting in much garbage to be kept when incrementally
growing PHI nodes by edge redirection.

The mitigation is to rely on the GC freelist for larger sizes and
thus immediately return all larger bucket sized PHIs to it via ggc_free.

This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
from 359s to 168s.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK for trunk?  I'll leave more surgery for stage1.

Thanks,
Richard.

PR tree-optimization/114557
PR tree-optimization/114480
* tree-phinodes.cc (release_phi_node): Return PHIs from
allocation buckets not covered by free_phinodes to GC.
(remove_phi_node): Release the PHI LHS before freeing the
PHI node.
---
 gcc/tree-phinodes.cc | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-phinodes.cc b/gcc/tree-phinodes.cc
index ddd731323e1..5a7e4a94e57 100644
--- a/gcc/tree-phinodes.cc
+++ b/gcc/tree-phinodes.cc
@@ -223,6 +223,14 @@ release_phi_node (gimple *phi)
   delink_imm_use (imm);
 }
 
+  /* Immediately return the memory to the allocator when we would
+ only ever re-use it for a smaller size allocation.  */
+  if (len - 2 >= NUM_BUCKETS - 2)
+{
+  ggc_free (phi);
+  return;
+}
+
   bucket = len > NUM_BUCKETS - 1 ? NUM_BUCKETS - 1 : len;
   bucket -= 2;
   vec_safe_push (free_phinodes[bucket], phi);
@@ -445,9 +453,9 @@ remove_phi_node (gimple_stmt_iterator *gsi, bool 
release_lhs_p)
 
   /* If we are deleting the PHI node, then we should release the
  SSA_NAME node so that it can be reused.  */
-  release_phi_node (phi);
   if (release_lhs_p)
 release_ssa_name (gimple_phi_result (phi));
+  release_phi_node (phi);
 }
 
 /* Remove all the phi nodes from BB.  */
-- 
2.35.3


Re: [PATCH] Fix up duplicated words mostly in comments, part 1

2024-04-02 Thread Richard Biener
905562 +0100
> @@ -4493,7 +4493,7 @@ AC_DEFUN([GLIBCXX_CHECK_GTHREADS], [
>  # Check whether LC_MESSAGES is available in .
>  # Ulrich Drepper , 1995.
>  #
> -# This file file be copied and used freely without restrictions.  It can
> +# This file can be copied and used freely without restrictions.  It can
>  # be used in projects which are not available under the GNU Public License
>  # but which still want to provide support for the GNU gettext functionality.
>  # Please note that the actual code is *not* freely available.
> --- libstdc++-v3/configure.host.jj2023-09-11 11:05:47.619726474 +0200
> +++ libstdc++-v3/configure.host   2024-03-28 15:41:56.740134644 +0100
> @@ -1,7 +1,7 @@
>  # configure.host
>  #
>  # This shell script handles all host based configuration for libstdc++.
> -# It sets various shell variables based on the the host and the
> +# It sets various shell variables based on the host and the
>  # configuration options.  You can modify this shell script without needing
>  # to rerun autoconf/aclocal/etc.  This file is "sourced" not executed.
>  #
> --- libvtv/vtv_rts.cc.jj  2024-01-03 12:08:23.803590499 +0100
> +++ libvtv/vtv_rts.cc 2024-03-28 15:41:17.621665634 +0100
> @@ -1791,7 +1791,7 @@ vtv_fail (const char *msg, void **data_s
>ptr_from_set_handle_handle (*data_set_ptr) :
> *data_set_ptr);
>buf_len = strlen (buffer);
> -  /*  Send this to to stderr.  */
> +  /* Send this to stderr.  */
>write (2, buffer, buf_len);
>  
>  #ifndef VTV_NO_ABORT
> --- libvtv/vtv_fail.cc.jj 2024-01-03 12:08:23.804590485 +0100
> +++ libvtv/vtv_fail.cc2024-03-28 15:41:05.055836203 +0100
> @@ -201,7 +201,7 @@ vtv_fail (const char *msg, void **data_s
>ptr_from_set_handle_handle (*data_set_ptr) :
> *data_set_ptr);
>buf_len = strlen (buffer);
> -  /*  Send this to to stderr.  */
> +  /* Send this to stderr.  */
>write (2, buffer, buf_len);
>  
>if (!vtv_no_abort)
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] Fix up postboot dependencies [PR106472]

2024-04-02 Thread Richard Biener
013,25 @@ configure-target-[+module+]: maybe-all-g
>(unless (=* target "target-")
> (string-append "configure-" target ": " dep "\n"))
>  
> +   ;; Dependencies in between target modules if the dependencies
> +   ;; are bootstrap target modules and the target modules which
> +   ;; depend on them are emitted inside of @unless gcc-bootstrap.
> +   ;; Unfortunately, some target modules like libatomic or libbacktrace
> +   ;; have bootstrap flag set, but whether they are actually built
> +   ;; during bootstrap or after bootstrap depends on e.g. enabled languages;
> +   ;; if d is enabled, libphobos is built as target module and depends
> +   ;; on libatomic and libbacktrace, which are therefore also built as
> +   ;; bootstrap modules.  If d is not enabled but go is, libatomic and
> +   ;; libbacktrace are just dependencies of libgo which is not a bootstrap
> +   ;; target module, but we need dependencies on libatomic and libbacktrace
> +   ;; in that case even when gcc-bootstrap.  This lambda emits those.
> +   (define make-postboot-target-dep (lambda ()
> + (let ((target (dep-module "module")) (on (dep-module "on")))
> +   (when (=* on "target-")
> +  (when (=* target "target-")
> +(string-append "@unless " on "-bootstrap\n" (make-dep "" "")
> +   "\n@endunless " on "-bootstrap\n"))
> +
> ;; We now build the hash table that is used by dep-kind.
> (define boot-modules (make-hash-table 113))
> (define postboot-targets (make-hash-table 113))
> @@ -2045,6 +2064,11 @@ configure-target-[+module+]: maybe-all-g
>  [+ == "postbootstrap" +][+ (make-postboot-dep) +][+ ESAC +][+
>  ENDFOR dependencies +]@endif gcc-bootstrap
>  
> +@if gcc-bootstrap
> +[+ FOR dependencies +][+ CASE (dep-kind) +]
> +[+ == "postbootstrap" +][+ (make-postboot-target-dep) +][+ ESAC +][+
> +ENDFOR dependencies +]@endif gcc-bootstrap
> +
>  @unless gcc-bootstrap
>  [+ FOR dependencies +][+ CASE (dep-kind) +]
>  [+ == "postbootstrap" +][+ (make-dep "" "") +]
> --- Makefile.in.jj2024-01-16 22:51:10.233410651 +0100
> +++ Makefile.in   2024-03-29 15:50:34.676632723 +0100
> @@ -68677,6 +68677,39 @@ configure-flex: stage_last
>  configure-m4: stage_last
>  @endif gcc-bootstrap
>  
> +@if gcc-bootstrap
> +@unless target-zlib-bootstrap
> +configure-target-fastjar: maybe-configure-target-zlib
> +@endunless target-zlib-bootstrap
> +@unless target-zlib-bootstrap
> +all-target-fastjar: maybe-all-target-zlib
> +@endunless target-zlib-bootstrap
> +@unless target-libstdc++-v3-bootstrap
> +configure-target-libgo: maybe-all-target-libstdc++-v3
> +@endunless target-libstdc++-v3-bootstrap
> +@unless target-libbacktrace-bootstrap
> +all-target-libgo: maybe-all-target-libbacktrace
> +@endunless target-libbacktrace-bootstrap
> +@unless target-libatomic-bootstrap
> +all-target-libgo: maybe-all-target-libatomic
> +@endunless target-libatomic-bootstrap
> +@unless target-libstdc++-v3-bootstrap
> +configure-target-libgm2: maybe-all-target-libstdc++-v3
> +@endunless target-libstdc++-v3-bootstrap
> +@unless target-libatomic-bootstrap
> +all-target-libgm2: maybe-all-target-libatomic
> +@endunless target-libatomic-bootstrap
> +@unless target-libstdc++-v3-bootstrap
> +configure-target-libgrust: maybe-all-target-libstdc++-v3
> +@endunless target-libstdc++-v3-bootstrap
> +@unless target-libbacktrace-bootstrap
> +configure-target-libgfortran: maybe-all-target-libbacktrace
> +@endunless target-libbacktrace-bootstrap
> +@unless target-libbacktrace-bootstrap
> +configure-target-libgo: maybe-all-target-libbacktrace
> +@endunless target-libbacktrace-bootstrap
> +@endif gcc-bootstrap
> +
>  @unless gcc-bootstrap
>  all-gnattools: maybe-all-target-libstdc++-v3
>  configure-libcc1: maybe-configure-gcc
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH][Backport][GCC10] Fix SSA corruption due to widening_mul opt on conflict across an abnormal edge [PR111407]

2024-04-02 Thread Richard Biener
On Mon, Apr 1, 2024 at 3:36 PM Qing Zhao  wrote:
>
> This is a bug in tree-ssa-math-opts.c, when applying the widening mul
> optimization, the compiler needs to check whether the operand is in a
> ABNORMAL PHI, if YES, we should avoid the transformation.
>
> PR tree-optimization/111407
>
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.c (convert_mult_to_widen): Avoid the transform
> when one of the operands is subject to abnormal coalescing.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr111407.c: New test.
>
> (cherry picked from commit 4aca1cfd6235090e48a53dab734437740671bbf3)
>
> bootstraped and regression tested on both aarch64 and x86.
>
> Okay for commit to GCC10?

Note the GCC 10 branch is closed.  If the patch boostraps/tests on the
11, 12 and 13
branches it is OK there.  You do not need approval to backport fixes
for _regressions_
if the patch cherry-picks without major edits and boostraps/tests OK.

Thanks,
Richard.

> thanks.
>
> Qing
> ---
>  gcc/testsuite/gcc.dg/pr111407.c | 21 +
>  gcc/tree-ssa-math-opts.c|  8 
>  2 files changed, 29 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr111407.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr111407.c b/gcc/testsuite/gcc.dg/pr111407.c
> new file mode 100644
> index ..a171074753f9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr111407.c
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/111407*/
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +enum { SEND_TOFILE } __sigsetjmp();
> +void fclose();
> +void foldergets();
> +void sendpart_stats(int *p1, int a1, int b1) {
> + int *a = p1;
> + fclose();
> + p1 = 0;
> + long t = b1;
> + if (__sigsetjmp()) {
> +   {
> + long t1 = a1;
> + a1+=1;
> + fclose(a1*(long)t1);
> +   }
> + }
> + if (p1)
> +   fclose();
> +}
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index dd0b8c6f0577..47981da20e05 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -2543,6 +2543,14 @@ convert_mult_to_widen (gimple *stmt, 
> gimple_stmt_iterator *gsi)
>if (!is_widening_mult_p (stmt, , , , ))
>  return false;
>
> +  /* if any one of rhs1 and rhs2 is subject to abnormal coalescing,
> + avoid the tranform. */
> +  if ((TREE_CODE (rhs1) == SSA_NAME
> +   && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs1))
> +  || (TREE_CODE (rhs2) == SSA_NAME
> + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs2)))
> +return false;
> +
>to_mode = SCALAR_INT_TYPE_MODE (type);
>from_mode = SCALAR_INT_TYPE_MODE (type1);
>if (to_mode == from_mode)
> --
> 2.31.1
>


Re: [PATCH] libiberty: Invoke D demangler when --format=auto

2024-04-02 Thread Richard Biener
On Sat, Mar 30, 2024 at 9:11 PM Tom Tromey  wrote:
>
> Investigating GDB PR d/31580 showed that the libiberty demangler
> doesn't automatically demangle D mangled names.  However, I think it
> should -- like C++ and Rust (new-style), D mangled names are readily
> distinguished by the leading "_D", and so the likelihood of confusion
> is low.  The other non-"auto" cases in this code are Ada (where the
> encoded form could more easily be confused by ordinary programs) and
> Java (which is long gone, but which also shared the C++ mangling and
> thus was just an output style preference).
>
> This patch also fixed another GDB bug, though of course that part
> won't apply to the GCC repository.

OK.

> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31580
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30276
>
> libiberty
> * cplus-dem.c (cplus_demangle): Try the D demangler with
> "auto" format.
> * testsuite/d-demangle-expected: Add --format=auto test.
> ---
>  gdb/testsuite/gdb.dlang/dlang-start-2.exp | 4 +---
>  libiberty/cplus-dem.c | 2 +-
>  libiberty/testsuite/d-demangle-expected   | 5 +
>  3 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/gdb/testsuite/gdb.dlang/dlang-start-2.exp 
> b/gdb/testsuite/gdb.dlang/dlang-start-2.exp
> index 4b3163ec97d..284f841b54a 100644
> --- a/gdb/testsuite/gdb.dlang/dlang-start-2.exp
> +++ b/gdb/testsuite/gdb.dlang/dlang-start-2.exp
> @@ -79,10 +79,8 @@ if {[gdb_start_cmd] < 0} {
>  return -1
>  }
>
> -# We should probably have "D main" instead of "_Dmain" here, filed PR30276
> -# '[gdb/symtab] function name is _Dmain instead of "D main"' about that.
>  gdb_test "" \
> -"in _Dmain \\(\\)" \
> +"in D main \\(\\)" \
>  "start"
>
>  gdb_test "show language" {"auto; currently d".}
> diff --git a/libiberty/cplus-dem.c b/libiberty/cplus-dem.c
> index 8b92946981f..ee9e84f5d6b 100644
> --- a/libiberty/cplus-dem.c
> +++ b/libiberty/cplus-dem.c
> @@ -186,7 +186,7 @@ cplus_demangle (const char *mangled, int options)
>if (GNAT_DEMANGLING)
>  return ada_demangle (mangled, options);
>
> -  if (DLANG_DEMANGLING)
> +  if (DLANG_DEMANGLING || AUTO_DEMANGLING)
>  {
>ret = dlang_demangle (mangled, options);
>if (ret)
> diff --git a/libiberty/testsuite/d-demangle-expected 
> b/libiberty/testsuite/d-demangle-expected
> index 47b059c4298..cfbdf2a52cb 100644
> --- a/libiberty/testsuite/d-demangle-expected
> +++ b/libiberty/testsuite/d-demangle-expected
> @@ -1470,3 +1470,8 @@ demangle.anonymous
>  --format=dlang
>  _D8demangle9anonymous03fooZ
>  demangle.anonymous.foo
> +#
> +# Test that 'auto' works.
> +--format=auto
> +_D8demangle9anonymous03fooZ
> +demangle.anonymous.foo
> --
> 2.43.0
>


Re: [PATCH] Prettify output of debug_dwarf_die

2024-04-02 Thread Richard Biener
On Thu, Mar 28, 2024 at 8:35 PM Tom Tromey  wrote:
>
> When debugging gcc, I tried calling debug_dwarf_die and I saw this
> output:
>
>   DW_AT_location: location descriptor:
> (0x7fffe9c2e870) DW_OP_dup 0, 0
> (0x7fffe9c2e8c0) DW_OP_bra location descriptor (0x7fffe9c2e640)
> , 0
> (0x7fffe9c2e820) DW_OP_lit4 4, 0
> (0x7fffe9c2e910) DW_OP_skip location descriptor (0x7fffe9c2e9b0)
> , 0
> (0x7fffe9c2e640) DW_OP_dup 0, 0
>
> I think those ", 0" should not appear on their own lines.  The issue
> seems to be that print_dw_val should not generally emit a newline,
> except when recursing.

OK.

> gcc/ChangeLog
>
> * dwarf2out.cc (print_dw_val) : Don't
> print newline when not recursing.
> ---
>  gcc/dwarf2out.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index 8f18bc4fe64..1b0e8b5a5b2 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -6651,7 +6651,7 @@ print_dw_val (dw_val_node *val, bool recurse, FILE 
> *outfile)
>  case dw_val_class_loc:
>fprintf (outfile, "location descriptor");
>if (val->v.val_loc == NULL)
> -   fprintf (outfile, " -> \n");
> +   fprintf (outfile, " -> ");
>else if (recurse)
> {
>   fprintf (outfile, ":\n");
> @@ -6662,9 +6662,9 @@ print_dw_val (dw_val_node *val, bool recurse, FILE 
> *outfile)
>else
> {
>   if (flag_dump_noaddr || flag_dump_unnumbered)
> -   fprintf (outfile, " #\n");
> +   fprintf (outfile, " #");
>   else
> -   fprintf (outfile, " (%p)\n", (void *) val->v.val_loc);
> +   fprintf (outfile, " (%p)", (void *) val->v.val_loc);
> }
>break;
>  case dw_val_class_loc_list:
> --
> 2.43.0
>


Re: [PATCH] middle-end/114480 - IDF compute is slow

2024-03-28 Thread Richard Biener
On Wed, 27 Mar 2024, Michael Matz wrote:

> Hey,
> 
> On Wed, 27 Mar 2024, Jakub Jelinek wrote:
> 
> > > @@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
> > >gcc_checking_assert (bb_index
> > >  < (unsigned) last_basic_block_for_fn (cfun));
> > >  
> > > -  EXECUTE_IF_AND_COMPL_IN_BITMAP ([bb_index], 
> > > phi_insertion_points,
> > > -   0, i, bi)
> > > - {
> > > +  EXECUTE_IF_SET_IN_BITMAP ([bb_index], 0, i, bi)
> > > + if (bitmap_set_bit (phi_insertion_points, i))
> > > bitmap_set_bit (work_set, i);
> > > -   bitmap_set_bit (phi_insertion_points, i);
> > > - }
> > >  }
> > 
> > I don't understand why the above is better.
> > Wouldn't it be best to do
> >   bitmap_ior_and_compl_into (work_set, [bb_index],
> >  phi_insertion_points);
> >   bitmap_ior_into (phi_insertion_points, [bb_index]);
> > ?
> 
> I had the same hunch, but:
> 
> 1) One would have to make work_set be non-tree-view again (which with the 
> current structure is a wash anyway, and that makes sense as accesses to 
> work_set aren't heavily random here).

The tree-view is a wash indeed (I tried many things).

> 2) But doing that and using bitmap_ior.._into is still measurably slower: 
> on a reduced testcase with -O0 -fno-checking, proposed structure 
> (tree-view or not-tree-view workset doesn't matter):
> 
>  tree SSA rewrite   :  14.93 ( 12%)   0.01 (  2%)  14.95 ( 
> 12%)27M (  8%)
> 
> with non-tree-view, and your suggestion:
> 
>  tree SSA rewrite   :  20.68 ( 12%)   0.02 (  4%)  20.75 ( 
> 12%)27M (  8%)
> 
> I can only speculate that the usually extreme sparsity of the bitmaps in 
> question make the setup costs of the two bitmap_ior calls actually more 
> expensive than the often skipped second call to bitmap_set_bit in Richis 
> proposed structure.  (That or cache effects)

So slightly "better" than Jakubs variant would be

  if (bitmap_ior_and_compl_into (work_set, [bb_index],
 phi_insertion_points))
bitmap_ior_into (phi_insertion_points, [bb_index]);

since phi_insertion_points grows that IOR becomes more expensive over 
time.

The above for me (today Zen2, yesterday Zen4) is

 tree SSA rewrite   : 181.02 ( 37%) 

with unconditiona ior_into:

 tree SSA rewrite   : 180.93 ( 36%)

while my patch is

 tree SSA rewrite   :  22.04 (  6%)

not sure what uarch Micha tested on.  I think the testcase has simply
many variables we write into SSA (man compute_idf calls), many BBs
but very low popcount DFS[] so iterating over DFS[] only is very
beneficial here as opposed to also walk phi_insertion_points
and work_set.  I think low popcount DFS[] is quite typical
for a CFG - but for sure popcount of DFS[] is going to be lower
than popcount of the IDF (phi_insertion_points).

Btw, with my patch compute_idf is completely off the profile so it's
hard to improve further (we do process blocks possibly twice for
example, but that doesn't make a difference here)

Indeed doing statistics shows the maximum popcount of a dominance
frontier is 8 but 99% have just a single block.  But the popcount
of the final IDF is more than 1 half of the time and more
than 1000 90% of the time.

I have pushed the patch now.

Richard.


Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Richard Biener



> Am 27.03.2024 um 18:37 schrieb Peter0x44 :
> 
> 
>> 
>>> >> > Another way would be to have a portable solution to truncate a file
>>> >> > (maybe even removing it would work).  I don't think we should override
>>> >> > SHELL.
> I've been thinking harder about this, these files get unlinked at the end if 
> they are temporary.
> Is there no way to get make to communicate back this info so lto-wrapper can 
> just unlink the file on its own? It feels easier and less invasive than 
> introducing a whole new option to the driver for only that purpose. If there 
> is some way, I'm not aware of it.

The point is we want to do this earlier than when control gets back to 
LTO-wrapper to reduce the amount of disk space required.

Richard.

[PATCH] middle-end/114480 - IDF compute is slow

2024-03-27 Thread Richard Biener
The testcase in this PR shows very slow IDF compute:

  tree SSA rewrite   :  76.99 ( 31%)
  24.78%243663  cc1plus  cc1plus [.] compute_idf

which can be mitigated to some extent by refactoring the bitmap
operations to simpler variants.  With the patch below this becomes

  tree SSA rewrite   :  15.23 (  8%)

when not optimizing and in addition to that

  tree SSA incremental   : 181.52 ( 30%)

to

  tree SSA incremental   :  24.09 (  6%)

when optimizing.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK if that succeeds?

Thanks,
Richard.

PR middle-end/114480
* cfganal.cc (compute_idf): Use simpler bitmap iteration,
touch work_set only when phi_insertion_points changed.
---
 gcc/cfganal.cc | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/gcc/cfganal.cc b/gcc/cfganal.cc
index 432775decf1..5ef629f677e 100644
--- a/gcc/cfganal.cc
+++ b/gcc/cfganal.cc
@@ -1701,8 +1701,7 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
  on earlier blocks first is better.
 ???  Basic blocks are by no means guaranteed to be ordered in
 optimal order for this iteration.  */
-  bb_index = bitmap_first_set_bit (work_set);
-  bitmap_clear_bit (work_set, bb_index);
+  bb_index = bitmap_clear_first_set_bit (work_set);
 
   /* Since the registration of NEW -> OLD name mappings is done
 separately from the call to update_ssa, when updating the SSA
@@ -1712,12 +1711,9 @@ compute_idf (bitmap def_blocks, bitmap_head *dfs)
   gcc_checking_assert (bb_index
   < (unsigned) last_basic_block_for_fn (cfun));
 
-  EXECUTE_IF_AND_COMPL_IN_BITMAP ([bb_index], phi_insertion_points,
- 0, i, bi)
-   {
+  EXECUTE_IF_SET_IN_BITMAP ([bb_index], 0, i, bi)
+   if (bitmap_set_bit (phi_insertion_points, i))
  bitmap_set_bit (work_set, i);
- bitmap_set_bit (phi_insertion_points, i);
-   }
 }
 
   return phi_insertion_points;
-- 
2.35.3


Re: [PATCH] match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]

2024-03-27 Thread Richard Biener
On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> On Wed, Mar 27, 2024 at 12:48:29PM +0100, Richard Biener wrote:
> > > The following patch attempts to fix the (view_convert (convert@0 @1))
> > > optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> > > and @0 has the same precision as @1 and it has a different sign
> > > and _BitInt with padding bits are extended on the target (x86 doesn't,
> > > aarch64 doesn't, but arm plans to do that), then optimizing it to
> > > just (view_convert @1) is wrong, the padding bits wouldn't be what
> > > it should be.
> > > E.g. bitint-64.c test with -O2 has
> > >   _5 = (unsigned _BitInt(5)) _4;
> > >   _7 = (unsigned _BitInt(5)) e.0_1;
> > >   _8 = _5 + _7;
> > >   _9 = (_BitInt(5)) _8;
> > >   _10 = VIEW_CONVERT_EXPR(_9);
> > > and forwprop1 changes that to just
> > >   _5 = (unsigned _BitInt(5)) _4;
> > >   _7 = (unsigned _BitInt(5)) e.0_1;
> > >   _8 = _5 + _7;
> > >   _10 = VIEW_CONVERT_EXPR(_8);
> > > The former makes the padding bits well defined (at least on arm in
> > > the future), while the latter has those bits zero extended.
> > 
> > So I think the existing rule, when extended to
> > 
> >|| (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
> > (@1))
> >&& TYPE_UNSIGNED (TREE_TYPE (@1))
> > 
> > assumes padding is extended according to the signedness.  But the
> > original
> > 
> >&& (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
> > (@1))
> > 
> > rule has the padding undefined.  I think that's inconsistent at least.
> 
> The whole simplification is just weird, all it tests are the precisions
> of TREE_TYPE (@0) and TREE_TYPE (@1), but doesn't actually test if there
> are padding bits (i.e. the TYPE_SIZE (type) vs. those precisions).
> 
> I've tried to construct a non-_BitInt testcase with
> struct S { signed long long a : 37; unsigned long long b : 37; } s;
> 
> unsigned long long
> foo (long long x, long long y)
> {
>   __typeof (s.a + 0) c = x, d = y;
>   __typeof (s.b + 0) e = (__typeof (s.b + 0)) (c + d);
>   union U { __typeof (s.b + 0) f; unsigned long long g; } u;
>   u.f = e;
>   return u.g;
> }
> 
> unsigned long long
> bar (unsigned long long x, unsigned long long y)
> {
>   __typeof (s.b + 0) c = x, d = y;
>   __typeof (s.a + 0) e = (__typeof (s.a + 0)) (c + d);
>   union U { __typeof (s.a + 0) f; unsigned long long g; } u;
>   u.f = e;
>   return u.g;
> }
> 
> int
> main ()
> {
>   unsigned long long x = foo (-53245722422LL, -5719971797LL);
>   unsigned long long y = bar (136917222900, 5719971797LL);
>   __builtin_printf ("%016llx %016llx\n", x, y);
>   return 0;
> }
> which seems to print
> 0012455ee8f5 000135d6ddc9
> at -O0 and
> fff2455ee8f5 000135d6ddc9
> at -O2/-O3, dunno if in that case we consider those padding bits
> uninitialized/anything can happen, then it is fine, or something else.
> 
> > I'll note that we do not constrain 'type' so the V_C_E could
> > re-interpret the integer (with or without padding) as floating-point.
> 
> Anyway, if we have the partial int types with undefined padding bits
> (the above __typeof of a bitfield type or _BitInt on x86/aarch64),
> I guess the precision0 == precision1 case is ok, we have
> say 3 padding bits before/after, a valid program better should mask
> away those bits or it will be UB.
> If precision0 == precision1 and type1 has the extended padding bits
> and type0 has undefined padding bits, then I guess it is also ok.
> Other cases of same precision are not ok, turning well defined bits
> into undefined or changing sign extended to zero extended or vice versa.
> 
> For the precision0 > precision1 && unsigned1 case on the other side,
> I don't see how it is ever correct for the undefined padding bits cases,
> the padding bits above type0 are undefined before/after the simplification,
> but the bits in between are changed from previously zero to undefined.
> 
> For the precision0 > precision1 cases if both type0 and type1 are
> extended like _BitInt on arm, then the simplification is reasonable,
> all the padding bits above type0 are zeros and all the padding bits between
> type1 and type0 precision are zeros too, before and after.
> 
> So, indeed my patch isn't correct.
> 
> And because we don't have the _BitInt arm support for GCC 14, perhaps
> we could defer that part for GCC 15, though I wonder if we shouldn't just
> kill that TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISIO

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Richard Biener
On Wed, Mar 27, 2024 at 1:20 PM Xi Ruoyao  wrote:
>
> On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote:
> > On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao  wrote:
> > >
> > > The latency of LA464 and LA664 division instructions depends on the
> > > input.  When I updated the costs in r14-6642, I unintentionally set the
> > > division costs to the best-case latency (when the first operand is 0).
> > > Per a recent discussion [1] we should use "something sensible" instead
> > > of it.
> > >
> > > Use the average of the minimum and maximum latency observed instead.
> > > This enables multiplication to reciprocal sequence reduction and speeds
> > > up the following test case for about 30%:
> > >
> > > int
> > > main (void)
> > > {
> > >   unsigned long stat = 0xdeadbeef;
> > >   for (int i = 0; i < 1; i++)
> > > stat = (stat * stat + stat * 114514 + 1919810) % 17;
> > >   asm(""::"r"(stat));
> > > }
> >
> > I think you should be able to see a constant divisor and thus could do
> > better than return the same latency for everything.  For non-constant
> > divisors using the best-case latency shouldn't be a problem.
>
> Hmm, it seems not really possible as at now.  expand_divmod does
> something like:
>
>   max_cost = (unsignedp
>   ? udiv_cost (speed, compute_mode)
>   : sdiv_cost (speed, compute_mode));
>
> which is reading the pre-calculated costs from a table.  Thus we don't
> really know the denominator and cannot estimate the cost based on it :(.

Ah, too bad.  OTOH for the actual case it decomposes it could compute
the real cost, avoiding the table which is filled with reg-reg operations only.

> CSE really invokes the cost hook with the actual (mod (a, (const_int
> 17)) RTX but it's less important.
>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


[PATCH] tree-optimization/114057 - handle BB reduction remain defs as LIVE

2024-03-27 Thread Richard Biener
The following makes sure to record the scalars we add to the BB
reduction vectorization result as scalar uses for the purpose of
computing live lanes.  This restores vectorization in the
bondfree.c TU of 435.gromacs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114057
* tree-vect-slp.cc (vect_bb_slp_mark_live_stmts): Mark
BB reduction remain defs as scalar uses.
---
 gcc/tree-vect-slp.cc | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index adb2d9ae1e5..2e5481acbc7 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6665,8 +6665,14 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo)
   auto_vec worklist;
 
   for (slp_instance instance : bb_vinfo->slp_instances)
-if (!visited.add (SLP_INSTANCE_TREE (instance)))
-  worklist.safe_push (SLP_INSTANCE_TREE (instance));
+{
+  if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_bb_reduc)
+   for (tree op : SLP_INSTANCE_REMAIN_DEFS (instance))
+ if (TREE_CODE (op) == SSA_NAME)
+   scalar_use_map.put (op, 1);
+  if (!visited.add (SLP_INSTANCE_TREE (instance)))
+   worklist.safe_push (SLP_INSTANCE_TREE (instance));
+}
 
   do
 {
@@ -6684,7 +6690,8 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo)
if (child && !visited.add (child))
  worklist.safe_push (child);
}
-} while (!worklist.is_empty ());
+}
+  while (!worklist.is_empty ());
 
   visited.empty ();
 
-- 
2.35.3


Re: [PATCH] match.pd: Avoid (view_convert (convert@0 @1)) optimization for extended _BitInts with padding bits [PR114469]

2024-03-27 Thread Richard Biener
On Wed, 27 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following patch attempts to fix the (view_convert (convert@0 @1))
> optimization.  If TREE_TYPE (@0) is a _BitInt type with padding bits
> and @0 has the same precision as @1 and it has a different sign
> and _BitInt with padding bits are extended on the target (x86 doesn't,
> aarch64 doesn't, but arm plans to do that), then optimizing it to
> just (view_convert @1) is wrong, the padding bits wouldn't be what
> it should be.
> E.g. bitint-64.c test with -O2 has
>   _5 = (unsigned _BitInt(5)) _4;
>   _7 = (unsigned _BitInt(5)) e.0_1;
>   _8 = _5 + _7;
>   _9 = (_BitInt(5)) _8;
>   _10 = VIEW_CONVERT_EXPR(_9);
> and forwprop1 changes that to just
>   _5 = (unsigned _BitInt(5)) _4;
>   _7 = (unsigned _BitInt(5)) e.0_1;
>   _8 = _5 + _7;
>   _10 = VIEW_CONVERT_EXPR(_8);
> The former makes the padding bits well defined (at least on arm in
> the future), while the latter has those bits zero extended.

So I think the existing rule, when extended to

   || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
(@1))
   && TYPE_UNSIGNED (TREE_TYPE (@1))

assumes padding is extended according to the signedness.  But the
original

   && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE 
(@1))

rule has the padding undefined.  I think that's inconsistent at least.
I'll note that we do not constrain 'type' so the V_C_E could
re-interpret the integer (with or without padding) as floating-point.

Given the previous

/* For integral conversions with the same precision or pointer
   conversions use a NOP_EXPR instead.  */
(simplify 
  (view_convert @0)

should match first we should only ever see non-integer/pointer here
or cases where the V_C_E looks at padding.  IMO we should change
this to

   && ((TYPE_PRECISION (TREE_TYPE (@0)) >= TYPE_PRECISION (TREE_TYPE 
(@1))
&& TYPE_SIGN (TREE_TYPE (@0)) == TYPE_SIGN (TREE_TYPE (@1
   || (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE 
(@1))
   && TYPE_UNSIGNED (TREE_TYPE (@1

?  Having this inconsistent treatment of padding is bad.

In your case the precision is equal so this should fix it, right?
Not sure if it's worth adding a INTEGERAL_TYPE_P (type) case with
TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1)), see the
previous pattern.


> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-03-26  Jakub Jelinek  
> 
>   PR tree-optimization/114469
>   * match.pd ((view_convert (convert@0 @1))): Don't optimize if
>   TREE_TYPE (@0) is a BITINT_TYPE with padding bits which are supposed
>   to be extended by the ABI.
> 
> --- gcc/match.pd.jj   2024-03-15 11:04:24.672914747 +0100
> +++ gcc/match.pd  2024-03-26 15:49:44.177864509 +0100
> @@ -4699,13 +4699,38 @@ (define_operator_list SYNC_FETCH_AND_AND
> zero-extend while keeping the same size (for bool-to-char).  */
>  (simplify
>(view_convert (convert@0 @1))
> -  (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> -   && (INTEGRAL_TYPE_P (TREE_TYPE (@1)) || POINTER_TYPE_P (TREE_TYPE 
> (@1)))
> -   && TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1))
> -   && (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
> -|| (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (TREE_TYPE (@1))
> -&& TYPE_UNSIGNED (TREE_TYPE (@1)
> -   (view_convert @1)))
> +  (with { tree type0 = TREE_TYPE (@0);
> +   tree type1 = TREE_TYPE (@1);
> +   bool ok = false;
> +   if ((INTEGRAL_TYPE_P (type0) || POINTER_TYPE_P (type0))
> +   && (INTEGRAL_TYPE_P (type1) || POINTER_TYPE_P (type1))
> +   && TYPE_SIZE (type0) == TYPE_SIZE (type1))
> + {
> +   if (TYPE_PRECISION (type0) == TYPE_PRECISION (type1))
> + {
> +   if (TREE_CODE (type0) != BITINT_TYPE)
> + ok = true;
> +   else
> + {
> +   /* Avoid optimizing this if type0 is a _BitInt
> +  type with padding bits which are supposed to be
> +  extended.  */
> +   struct bitint_info info;
> +   targetm.c.bitint_type_info (TYPE_PRECISION (type0),
> +   );
> +   if (!info.extended)
> + ok = true;
> +   else
> + ok = (TYPE_PRECISION (type0)
> +   == tree_to_uhwi (TYPE_SIZE (type0)));
> + }
> + }
> +   else if (TYPE_PRECISION (type0) &

Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Richard Biener
On Wed, Mar 27, 2024 at 10:13 AM peter0x44  wrote:
>
> On 2024-03-27 01:58, Richard Biener wrote:
> > On Wed, Mar 27, 2024 at 9:13 AM Peter0x44 
> > wrote:
> >>
> >> I accidentally replied off-list. Sorry.
> >>
> >> 27 Mar 2024 8:09:30 am Peter0x44 :
> >>
> >>
> >> 27 Mar 2024 7:51:26 am Richard Biener :
> >>
> >> > On Tue, Mar 26, 2024 at 11:37 PM Peter Damianov 
> >> > wrote:
> >> >>
> >> >> lto-wrapper generates Makefiles that use the following:
> >> >> touch -r file file.tmp && mv file.tmp file
> >> >> to truncate files.
> >> >> If there is no suitable "touch" or "mv" available, then this errors
> >> >> with
> >> >> "The system cannot find the file specified".
> >> >>
> >> >> The solution here is to check if sh -c true works, before trying to
> >> >> use it in
> >> >> the Makefile. If it doesn't, then fall back to "copy /y nul file"
> >> >> instead.
> >> >> The fallback doesn't work exactly the same (the modified time of the
> >> >> file gets
> >> >> updated), but this doesn't seem to matter in practice.
> >> >
> >> > I suppose it doesn't matter as we (no longer?) have the input as
> >> > dependency
> >> > on the rule so make doesn't get confused to re-build it.  I guess we
> >> > only truncate
> >> > the file because it's going to be deleted by another process.
> >> >
> >> > Instead of doing sth like sh_exists I would suggest to simply use
> >> > #ifdef __WIN
> >> > or something like that?  Not sure if we have suitable macros to
> >> > identify the
> >> > host operating system though and whether mingw, cygwin, etc. behave the
> >> > same
> >> > here.
> >>
> >> They do, I tested. Using sh_exists is deliberate, I've had to program
> >> on
> >> school computers that had cmd.exe disabled, but had busybox-w32
> >> working,
> >> so it might be more flexible in that way. I would prefer a solution
> >> which
> >> didn't require invoking cmd.exe if there is a working shell present.
> >
> > Hmm, but then I'd expect SHELL to be appropriately set in such
> > situation?  So shouldn't sh_exists at least try to look at $SHELL?
> I'm not sure it would . On windows, make searches the PATH for an sh.exe
> if
> present, and then uses it by default. The relevant code is here:
> https://git.savannah.gnu.org/cgit/make.git/tree/src/variable.c#n1628
> >
> >> I figured doing the "sh_exists" is okay because there is a basically
> >> identical check for make.
> >>
> >> >
> >> > As a stop-gap solution doing
> >> >
> >> >   ( @-touch -r ... ) || true
> >> >
> >> > might also work?  Or another way to note to make the command can fail
> >> > without causing a problem.
> >>
> >> I don't think it would work. cmd.exe can't run subshells like this.
> >
> > Hmm, OK.  So this is all for the case where 'make' is available (as you
> > say we check for that) but no POSIX command environment is
> > (IIRC both touch and mv are part of  POSIX).  Testing for 'touch' and
> > 'mv' would then be another option?
> I think it would work, but keep in mind they could be placed on the PATH
> but
> still invoked from cmd.exe, so you might have to be careful of the
> redirection
> syntax and no /dev/null. It's a bit more complexity that doesn't seem
> necessary, to me.
> >
> >> >
> >> > Another way would be to have a portable solution to truncate a file
> >> > (maybe even removing it would work).  I don't think we should override
> >> > SHELL.
> >>
> >> Do you mean that perhaps an special command line argument could be
> >> added
> >> to lto-wrapper to do it, and then the makefile could invoke
> >> lto-wrapper
> >> to remove or truncate files instead of a shell? I'm not sure I get the
> >> proposed suggestion.
> >
> > The point is to truncate the file at the earliest point to reduce the
> > peak disk space required for a LTO link with many LTRANS units.
> > But yes, I guess it would be possible to add a flag to gcc itself
> > so the LTRANS compile would truncate the file.  Currently we
> > emit sth like
> >
> > ./libquantum.ltrans0.ltrans.o:
> >  

Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Richard Biener
On Wed, Mar 27, 2024 at 9:13 AM Peter0x44  wrote:
>
> I accidentally replied off-list. Sorry.
>
> 27 Mar 2024 8:09:30 am Peter0x44 :
>
>
> 27 Mar 2024 7:51:26 am Richard Biener :
>
> > On Tue, Mar 26, 2024 at 11:37 PM Peter Damianov 
> > wrote:
> >>
> >> lto-wrapper generates Makefiles that use the following:
> >> touch -r file file.tmp && mv file.tmp file
> >> to truncate files.
> >> If there is no suitable "touch" or "mv" available, then this errors
> >> with
> >> "The system cannot find the file specified".
> >>
> >> The solution here is to check if sh -c true works, before trying to
> >> use it in
> >> the Makefile. If it doesn't, then fall back to "copy /y nul file"
> >> instead.
> >> The fallback doesn't work exactly the same (the modified time of the
> >> file gets
> >> updated), but this doesn't seem to matter in practice.
> >
> > I suppose it doesn't matter as we (no longer?) have the input as
> > dependency
> > on the rule so make doesn't get confused to re-build it.  I guess we
> > only truncate
> > the file because it's going to be deleted by another process.
> >
> > Instead of doing sth like sh_exists I would suggest to simply use
> > #ifdef __WIN
> > or something like that?  Not sure if we have suitable macros to
> > identify the
> > host operating system though and whether mingw, cygwin, etc. behave the
> > same
> > here.
>
> They do, I tested. Using sh_exists is deliberate, I've had to program on
> school computers that had cmd.exe disabled, but had busybox-w32 working,
> so it might be more flexible in that way. I would prefer a solution which
> didn't require invoking cmd.exe if there is a working shell present.

Hmm, but then I'd expect SHELL to be appropriately set in such
situation?  So shouldn't sh_exists at least try to look at $SHELL?

> I figured doing the "sh_exists" is okay because there is a basically
> identical check for make.
>
> >
> > As a stop-gap solution doing
> >
> >   ( @-touch -r ... ) || true
> >
> > might also work?  Or another way to note to make the command can fail
> > without causing a problem.
>
> I don't think it would work. cmd.exe can't run subshells like this.

Hmm, OK.  So this is all for the case where 'make' is available (as you
say we check for that) but no POSIX command environment is
(IIRC both touch and mv are part of  POSIX).  Testing for 'touch' and
'mv' would then be another option?

> >
> > Another way would be to have a portable solution to truncate a file
> > (maybe even removing it would work).  I don't think we should override
> > SHELL.
>
> Do you mean that perhaps an special command line argument could be added
> to lto-wrapper to do it, and then the makefile could invoke lto-wrapper
> to remove or truncate files instead of a shell? I'm not sure I get the
> proposed suggestion.

The point is to truncate the file at the earliest point to reduce the
peak disk space required for a LTO link with many LTRANS units.
But yes, I guess it would be possible to add a flag to gcc itself
so the LTRANS compile would truncate the file.  Currently we
emit sth like

./libquantum.ltrans0.ltrans.o:
@/space/rguenther/install/trunk-r14-8925/usr/local/bin/gcc
'-xlto' '-c' '-fno-openmp' '-fno-openacc' '-fno-pie'
'-fcf-protection=none' '-g' '-mtune=generic' '-march=x86-64' '-O2'
'-O2' '-g' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
'-dumpdir' 'libquantum.' '-dumpbase' './libquantum.ltrans0.ltrans'
'-fltrans' '-o' './libquantum.ltrans0.ltrans.o'
'./libquantum.ltrans0.o'

so adding a '-truncate-input' flag for the driver or even more
explicit '-truncate ./libquantum.ltrans0.o'
might be a more elegant solution then?

Richard.

> >
> > Richard.
> >
> >> I tested this both in environments both with and without sh present,
> >> and
> >> observed no issues.
> >>
> >> Signed-off-by: Peter Damianov 
> >> ---
> >> gcc/lto-wrapper.cc | 35 ---
> >> 1 file changed, 32 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> >> index 5186d040ce0..8dee0aaa2d8 100644
> >> --- a/gcc/lto-wrapper.cc
> >> +++ b/gcc/lto-wrapper.cc
> >> @@ -1389,6 +1389,27 @@ make_exists (void)
> >>return errmsg == NULL && exit_status == 0 && err == 0;
> >> }
> >>
> >> +/* Test that an sh command is present and working, return true if so.
> >> +   This is only relevant for windows host

Re: [PATCH] LoongArch: Increase division costs

2024-03-27 Thread Richard Biener
On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao  wrote:
>
> The latency of LA464 and LA664 division instructions depends on the
> input.  When I updated the costs in r14-6642, I unintentionally set the
> division costs to the best-case latency (when the first operand is 0).
> Per a recent discussion [1] we should use "something sensible" instead
> of it.
>
> Use the average of the minimum and maximum latency observed instead.
> This enables multiplication to reciprocal sequence reduction and speeds
> up the following test case for about 30%:
>
> int
> main (void)
> {
>   unsigned long stat = 0xdeadbeef;
>   for (int i = 0; i < 1; i++)
> stat = (stat * stat + stat * 114514 + 1919810) % 17;
>   asm(""::"r"(stat));
> }

I think you should be able to see a constant divisor and thus could do
better than return the same latency for everything.  For non-constant
divisors using the best-case latency shouldn't be a problem.

> [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
>
> gcc/ChangeLog:
>
> * config/loongarch/loongarch-def.cc
> (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Increase
> default division cost to the average of the best case and worst
> case senarios observed.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/loongarch/div-const-reduction.c: New test.
> ---
>
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
>
>  gcc/config/loongarch/loongarch-def.cc| 8 
>  gcc/testsuite/gcc.target/loongarch/div-const-reduction.c | 9 +
>  2 files changed, 13 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
>
> diff --git a/gcc/config/loongarch/loongarch-def.cc 
> b/gcc/config/loongarch/loongarch-def.cc
> index e8c129ce643..93e72a520d5 100644
> --- a/gcc/config/loongarch/loongarch-def.cc
> +++ b/gcc/config/loongarch/loongarch-def.cc
> @@ -95,12 +95,12 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
>: fp_add (COSTS_N_INSNS (5)),
>  fp_mult_sf (COSTS_N_INSNS (5)),
>  fp_mult_df (COSTS_N_INSNS (5)),
> -fp_div_sf (COSTS_N_INSNS (8)),
> -fp_div_df (COSTS_N_INSNS (8)),
> +fp_div_sf (COSTS_N_INSNS (12)),
> +fp_div_df (COSTS_N_INSNS (15)),
>  int_mult_si (COSTS_N_INSNS (4)),
>  int_mult_di (COSTS_N_INSNS (4)),
> -int_div_si (COSTS_N_INSNS (5)),
> -int_div_di (COSTS_N_INSNS (5)),
> +int_div_si (COSTS_N_INSNS (14)),
> +int_div_di (COSTS_N_INSNS (22)),
>  movcf2gr (COSTS_N_INSNS (7)),
>  movgr2cf (COSTS_N_INSNS (15)),
>  branch_cost (6),
> diff --git a/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c 
> b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
> new file mode 100644
> index 000..0ee86410dd7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/div-const-reduction.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=la464" } */
> +/* { dg-final { scan-assembler-not "div\.\[dw\]" } } */
> +
> +int
> +test (int a)
> +{
> +  return a % 17;
> +}
> --
> 2.44.0
>


Re: [PATCH] lto: Don't assume a posix shell is usable on windows [PR110710]

2024-03-27 Thread Richard Biener
On Tue, Mar 26, 2024 at 11:37 PM Peter Damianov  wrote:
>
> lto-wrapper generates Makefiles that use the following:
> touch -r file file.tmp && mv file.tmp file
> to truncate files.
> If there is no suitable "touch" or "mv" available, then this errors with
> "The system cannot find the file specified".
>
> The solution here is to check if sh -c true works, before trying to use it in
> the Makefile. If it doesn't, then fall back to "copy /y nul file" instead.
> The fallback doesn't work exactly the same (the modified time of the file gets
> updated), but this doesn't seem to matter in practice.

I suppose it doesn't matter as we (no longer?) have the input as dependency
on the rule so make doesn't get confused to re-build it.  I guess we
only truncate
the file because it's going to be deleted by another process.

Instead of doing sth like sh_exists I would suggest to simply use #ifdef __WIN
or something like that?  Not sure if we have suitable macros to identify the
host operating system though and whether mingw, cygwin, etc. behave the same
here.

As a stop-gap solution doing

  ( @-touch -r ... ) || true

might also work?  Or another way to note to make the command can fail
without causing a problem.

Another way would be to have a portable solution to truncate a file
(maybe even removing it would work).  I don't think we should override
SHELL.

Richard.

> I tested this both in environments both with and without sh present, and
> observed no issues.
>
> Signed-off-by: Peter Damianov 
> ---
>  gcc/lto-wrapper.cc | 35 ---
>  1 file changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> index 5186d040ce0..8dee0aaa2d8 100644
> --- a/gcc/lto-wrapper.cc
> +++ b/gcc/lto-wrapper.cc
> @@ -1389,6 +1389,27 @@ make_exists (void)
>return errmsg == NULL && exit_status == 0 && err == 0;
>  }
>
> +/* Test that an sh command is present and working, return true if so.
> +   This is only relevant for windows hosts, where a /bin/sh shell cannot
> +   be assumed to exist. */
> +
> +static bool
> +sh_exists (void)
> +{
> +  const char *sh = "sh";
> +  const char *sh_args[] = {sh, "-c", "true", NULL};
> +#ifdef _WIN32
> +  int exit_status = 0;
> +  int err = 0;
> +  const char *errmsg
> += pex_one (PEX_SEARCH, sh_args[0], CONST_CAST (char **, sh_args),
> +  "sh", NULL, NULL, _status, );
> +  return errmsg == NULL && exit_status == 0 && err == 0;
> +#else
> +  return true;
> +#endif
> +}
> +
>  /* Execute gcc. ARGC is the number of arguments. ARGV contains the 
> arguments. */
>
>  static void
> @@ -1402,6 +1423,7 @@ run_gcc (unsigned argc, char *argv[])
>const char *collect_gcc;
>char *collect_gcc_options;
>int parallel = 0;
> +  bool have_sh = sh_exists ();
>int jobserver = 0;
>bool jobserver_requested = false;
>int auto_parallel = 0;
> @@ -2016,6 +2038,7 @@ cont:
>   argv_ptr[5] = NULL;
>   if (parallel)
> {
> + fprintf (mstream, "SHELL=%s\n", have_sh ? "sh" : "cmd");
>   fprintf (mstream, "%s:\n\t@%s ", output_name, new_argv[0]);
>   for (j = 1; new_argv[j] != NULL; ++j)
> fprintf (mstream, " '%s'", new_argv[j]);
> @@ -2024,9 +2047,15 @@ cont:
>  truncate them as soon as we have processed it.  This
>  reduces temporary disk-space usage.  */
>   if (! save_temps)
> -   fprintf (mstream, "\t@-touch -r \"%s\" \"%s.tem\" > /dev/null 
> "
> -"2>&1 && mv \"%s.tem\" \"%s\"\n",
> -input_name, input_name, input_name, input_name);
> +   {
> + fprintf (mstream,
> +  have_sh
> +  ? "\t@-touch -r \"%s\" \"%s.tem\" > /dev/null "
> +"2>&1 && mv \"%s.tem\" \"%s\"\n"
> +  : "\t@-copy /y nul \"%s\" > NUL "
> +"2>&1\n",
> +  input_name, input_name, input_name, input_name);
> +   }
> }
>   else
> {
> --
> 2.39.2
>


Re: [PATCH] fold-const: Punt on MULT_EXPR in extract_muldiv MIN/MAX_EXPR case [PR111151]

2024-03-26 Thread Richard Biener
On Tue, 26 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> As I've tried to explain in the comments, the extract_muldiv_1
> MIN/MAX_EXPR optimization is wrong for code == MULT_EXPR.
> If the multiplication is done in unsigned type or in signed
> type with -fwrapv, it is fairly obvious that max (a, b) * c
> in many cases isn't equivalent to max (a * c, b * c) (or min if c is
> negative) due to overflows, but even for signed with undefined overflow,
> the optimization could turn something without UB in it (where
> say a * c invokes UB, but max (or min) picks the other operand where
> b * c doesn't).
> As for division/modulo, I think it is in most cases safe, except if
> the problematic INT_MIN / -1 case could be triggered, but we can
> just punt for MAX_EXPR because for MIN_EXPR if one operand is INT_MIN,
> we'd pick that operand already.  It is just for completeness, match.pd
> already has an optimization which turns x / -1 into -x, so the division
> by zero is mostly theoretical.  That is also why in the testcase the
> i case isn't actually miscompiled without the patch, while the c and f
> cases are.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
> bootstrapped/regtested on x86_64-linux with statistics gathering when
> the patch changes behavior and it is solely on the new testcase and
> nothing else during the bootstrap/regtest.  Ok for trunk?

OK.

Thanks,
Richard.

> 2024-03-26  Jakub Jelinek  
> 
>   PR middle-end/51
>   * fold-const.cc (extract_muldiv_1) : Punt for
>   MULT_EXPR altogether, or for MAX_EXPR if c is -1.
> 
>   * gcc.c-torture/execute/pr51.c: New test.
> 
> --- gcc/fold-const.cc.jj  2024-03-11 09:42:04.544588951 +0100
> +++ gcc/fold-const.cc 2024-03-25 11:48:12.133625285 +0100
> @@ -7104,6 +7104,27 @@ extract_muldiv_1 (tree t, tree c, enum t
>if (TYPE_UNSIGNED (ctype) != TYPE_UNSIGNED (type))
>   break;
>  
> +  /* Punt for multiplication altogether.
> +  MAX (1U + INT_MAX, 1U) * 2U is not equivalent to
> +  MAX ((1U + INT_MAX) * 2U, 1U * 2U), the former is
> +  0U, the latter is 2U.
> +  MAX (INT_MIN / 2, 0) * -2 is not equivalent to
> +  MIN (INT_MIN / 2 * -2, 0 * -2), the former is
> +  well defined 0, the latter invokes UB.
> +  MAX (INT_MIN / 2, 5) * 5 is not equivalent to
> +  MAX (INT_MIN / 2 * 5, 5 * 5), the former is
> +  well defined 25, the latter invokes UB.  */
> +  if (code == MULT_EXPR)
> + break;
> +  /* For division/modulo, punt on c being -1 for MAX, as
> +  MAX (INT_MIN, 0) / -1 is not equivalent to
> +  MIN (INT_MIN / -1, 0 / -1), the former is well defined
> +  0, the latter invokes UB (or for -fwrapv is INT_MIN).
> +  MIN (INT_MIN, 0) / -1 already invokes UB, so the
> +  transformation won't make it worse.  */
> +  else if (tcode == MAX_EXPR && integer_minus_onep (c))
> + break;
> +
>/* MIN (a, b) / 5 -> MIN (a / 5, b / 5)  */
>sub_strict_overflow_p = false;
>if ((t1 = extract_muldiv (op0, c, code, wide_type,
> --- gcc/testsuite/gcc.c-torture/execute/pr51.c.jj 2024-03-25 
> 11:50:27.199744988 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr51.c2024-03-26 
> 10:41:51.003384032 +0100
> @@ -0,0 +1,21 @@
> +/* PR middle-end/51 */
> +
> +int
> +main ()
> +{
> +  unsigned a = (1U + __INT_MAX__) / 2U;
> +  unsigned b = 1U;
> +  unsigned c = (a * 2U > b * 2U ? a * 2U : b * 2U) * 2U;
> +  if (c != 0U)
> +__builtin_abort ();
> +  int d = (-__INT_MAX__ - 1) / 2;
> +  int e = 10;
> +  int f = (d * 2 > e * 5 ? d * 2 : e * 5) * 6;
> +  if (f != 120)
> +__builtin_abort ();
> +  int g = (-__INT_MAX__ - 1) / 2;
> +  int h = 0;
> +  int i = (g * 2 > h * 5 ? g * 2 : h * 5) / -1;
> +  if (i != 0)
> +__builtin_abort ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-optimization/114471 - ICE with mismatching vector types

2024-03-26 Thread Richard Biener
The following fixes too lax verification of vector type compatibility
in vectorizable_operation.  When we only have a single vector size then
comparing the number of elements is enough but with SLP we mix those
and thus for operations like BIT_AND_EXPR we need to verify compatible
element types as well.  Allow sign changes for ABSU_EXPR though.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114471
* tree-vect-stmts.cc (vectorizable_operation): Verify operand
types are compatible with the result type.

* gcc.dg/vect/pr114471.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr114471.c | 13 +
 gcc/tree-vect-stmts.cc   | 11 ---
 2 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr114471.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114471.c 
b/gcc/testsuite/gcc.dg/vect/pr114471.c
new file mode 100644
index 000..218c953e45e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114471.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+float f1, f0, fa[2];
+short sa[2];
+void quantize(short s0)
+{
+  _Bool ta[2] = {(fa[0] < 0), (fa[1] < 0)};
+  _Bool t = ((s0 > 0) & ta[0]);
+  short x1 = s0 + t;
+  _Bool t1 = ((x1 > 0) & ta[1]);
+  sa[0] = x1;
+  sa[1] = s0 + t1;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5a4eb136c6d..f8d8636b139 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6667,7 +6667,8 @@ vectorizable_operation (vec_info *vinfo,
 
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype);
-  if (maybe_ne (nunits_out, nunits_in))
+  if (maybe_ne (nunits_out, nunits_in)
+  || !tree_nop_conversion_p (TREE_TYPE (vectype_out), TREE_TYPE (vectype)))
 return false;
 
   tree vectype2 = NULL_TREE, vectype3 = NULL_TREE;
@@ -6685,7 +6686,9 @@ vectorizable_operation (vec_info *vinfo,
   is_invariant &= (dt[1] == vect_external_def
   || dt[1] == vect_constant_def);
   if (vectype2
- && maybe_ne (nunits_out, TYPE_VECTOR_SUBPARTS (vectype2)))
+ && (maybe_ne (nunits_out, TYPE_VECTOR_SUBPARTS (vectype2))
+ || !tree_nop_conversion_p (TREE_TYPE (vectype_out),
+TREE_TYPE (vectype2
return false;
 }
   if (op_type == ternary_op)
@@ -6701,7 +6704,9 @@ vectorizable_operation (vec_info *vinfo,
   is_invariant &= (dt[2] == vect_external_def
   || dt[2] == vect_constant_def);
   if (vectype3
- && maybe_ne (nunits_out, TYPE_VECTOR_SUBPARTS (vectype3)))
+ && (maybe_ne (nunits_out, TYPE_VECTOR_SUBPARTS (vectype3))
+ || !tree_nop_conversion_p (TREE_TYPE (vectype_out),
+TREE_TYPE (vectype3
return false;
 }
 
-- 
2.35.3


[PATCH] tree-optimization/114464 - verify types in recurrence vectorization

2024-03-26 Thread Richard Biener
The following adds missing verification of vector type compatibility
to recurrence vectorization.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114464
* tree-vect-loop.cc (vectorizable_recurr): Verify the latch
vector type is compatible with what we chose for the recurrence.

* g++.dg/vect/pr114464.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr114464.cc | 11 +++
 gcc/tree-vect-loop.cc | 22 ++
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr114464.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr114464.cc 
b/gcc/testsuite/g++.dg/vect/pr114464.cc
new file mode 100644
index 000..0d872aae9d4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr114464.cc
@@ -0,0 +1,11 @@
+// { dg-do compile }
+
+void h(unsigned char *scratch, bool carry)
+{
+  for (int i = 0; i < 16; i++) {
+bool b = scratch[i] <<= 1;
+if (carry)
+  scratch[i] |= 1;
+carry = b;
+  }
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 302c92e6f31..42e78cc9c32 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9212,6 +9212,28 @@ vectorizable_recurr (loop_vec_info loop_vinfo, 
stmt_vec_info stmt_info,
return false;
  }
}
+
+  /* Verify we have set up compatible types.  */
+  edge le = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo));
+  tree latch_vectype = NULL_TREE;
+  if (slp_node)
+   {
+ slp_tree latch_def = SLP_TREE_CHILDREN (slp_node)[le->dest_idx];
+ latch_vectype = SLP_TREE_VECTYPE (latch_def);
+   }
+  else
+   {
+ tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, le);
+ if (TREE_CODE (latch_def) == SSA_NAME)
+   {
+ stmt_vec_info latch_def_info = loop_vinfo->lookup_def (latch_def);
+ latch_def_info = vect_stmt_to_vectorize (latch_def_info);
+ latch_vectype = STMT_VINFO_VECTYPE (latch_def_info);
+   }
+   }
+  if (!types_compatible_p (latch_vectype, vectype))
+   return false;
+
   /* The recurrence costs the initialization vector and one permute
 for each copy.  */
   unsigned prologue_cost = record_stmt_cost (cost_vec, 1, scalar_to_vec,
-- 
2.35.3


Re: [PATCH] tsan: Don't instrument non-generic AS accesses [PR111736]

2024-03-26 Thread Richard Biener
On Tue, 26 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> Similar to the asan and ubsan changes, we shouldn't instrument non-generic
> address space accesses with tsan, because we just have library functions
> which take address of the objects as generic address space pointers, so they
> can't handle anything else.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-03-26  Jakub Jelinek  
> 
>   PR sanitizer/111736
>   * tsan.cc (instrument_expr): Punt on non-generic address space
>   accesses.
> 
>   * gcc.dg/tsan/pr111736.c: New test.
> 
> --- gcc/tsan.cc.jj2024-01-03 11:51:29.155764166 +0100
> +++ gcc/tsan.cc   2024-03-25 10:36:07.602861266 +0100
> @@ -139,6 +139,9 @@ instrument_expr (gimple_stmt_iterator gs
>if (TREE_READONLY (base) || (VAR_P (base) && DECL_HARD_REGISTER (base)))
>  return false;
>  
> +  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (TREE_TYPE (base
> +return false;
> +
>stmt = gsi_stmt (gsi);
>loc = gimple_location (stmt);
>rhs = is_vptr_store (stmt, expr, is_write);
> --- gcc/testsuite/gcc.dg/tsan/pr111736.c.jj   2024-03-25 10:38:07.663191030 
> +0100
> +++ gcc/testsuite/gcc.dg/tsan/pr111736.c  2024-03-25 10:43:08.071008937 
> +0100
> @@ -0,0 +1,17 @@
> +/* PR sanitizer/111736 */
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-fsanitize=thread -fdump-tree-optimized -ffat-lto-objects" 
> } */
> +/* { dg-final { scan-tree-dump-not "__tsan_read" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "__tsan_write" "optimized" } } */
> +
> +#ifdef __x86_64__
> +#define SEG __seg_fs
> +#else
> +#define SEG __seg_gs
> +#endif
> +
> +void
> +foo (int SEG *p, int SEG *q)
> +{
> +  *q = *p;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


  1   2   3   4   5   6   7   8   9   10   >