Re: [PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Richard Biener
On Thu, 16 May 2024, Jeff Law wrote:

> 
> 
> On 5/16/24 6:03 AM, Richard Biener wrote:
> > Now that we handle pt.null conservatively we can implement the missing
> > tracking of constant pool entries (aka STRING_CST) and handle
> > ptr-ptr compares using points-to info in ptrs_compare_unequal.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.
> > 
> > Richard.
> > 
> >  PR tree-optimization/13962
> >  PR tree-optimization/96564
> >  * tree-ssa-alias.h (pt_solution::const_pool): New flag.
> >  * tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
> >  compares.
> >  (dump_points_to_solution): Dump the const_pool flag, fix guard
> >  of flag dumping.
> >  * gimple-pretty-print.cc (pp_points_to_solution): Likewise.
> >  * tree-ssa-structalias.cc (find_what_var_points_to): Set
> >  the const_pool flag for STRING.
> >  (pt_solution_ior_into): Handle the const_pool flag.
> >  (ipa_escaped_pt): Initialize it.
> > 
> >  * gcc.dg/tree-ssa/alias-39.c: New testcase.
> >  * g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
> >  to manifest in transforms no longer vectorizing this testcase
> >  for an ICE.
> You might want to test this against 92539 as well.  There's a nonzero chance
> it'll resolve that one.

Unfortunately it doesn't.

Richard.


Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-05-16 Thread Richard Biener
On Wed, 3 Apr 2024, Chung-Lin Tang wrote:

> Hi Richard, Thomas,
> 
> On 2023/10/30 8:46 PM, Richard Biener wrote:
> >>
> >> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> >> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> >> flag.
> >>
> >> The actual optimization then is done in this second patch.  Chung-Lin
> >> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> >> I don't have much experience with most of the following generic code, so
> >> would appreciate a helping hand, whether that conceptually makes sense as
> >> well as from the implementation point of view:
> 
> First of all, I have removed all of the gimplify-stage scanning and setting of
> DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes 
> to
> gimplify.cc now)
> 
> I remember this code was an artifact of earlier attempts to allow 
> struct-member
> pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
> anyways.
> I think the omp_data_* member accesses when building child function side
> receiver_refs is blocking points-to analysis from working (didn't try digging 
> deeper)
> 
> Also during gimplify, VAR_DECLs appeared to be reused (at least in some 
> cases) for map
> clause decl reference building, so hoping that the variables "happen to be" 
> single-use and
> DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does 
> appear to be
> a little risky.
> 
> However, for firstprivate pointers processed during omp-low, it appears to be 
> somewhat different.
> (see below description)
> 
> > No, I don't think you can use that flag on non-default-defs, nor
> > preserve it on copying.  So
> > it also doesn't nicely extend to DECLs as done by the patch.  We
> > currently _only_ use it
> > for incoming parameters.  When used on arbitrary code you can get to for 
> > example
> > 
> > ptr1(points-to-readony-memory) = >x;
> > ... access via ptr1 ...
> > ptr2 = >x;
> > ... access via ptr2 ...
> > 
> > where both are your OMP regions differently constrained (the constrain is 
> > on the
> > code in the region, _not_ on the actual protections of the pointed to
> > data, much like
> > for the fortran case).  But now CSE comes along and happily replaces all 
> > ptr2
> > with ptr2 in the second region and ... oops!
> 
> Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in 
> the second region"?
> 
> That doesn't happen, because during omp-lower/expand, OMP target regions 
> (which is all that
> this applies currently) is separated into different individual child 
> functions.
> 
> (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
> omp-lower, when
> for firstprivate pointers (i.e. 'a' here) we set this bit when constructing 
> the first load
> of this pointer)
> 
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[8]);
> r = a[8];
>   }
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[12]);
> r = a[12];
>   }
> 
> After omp-expand (before SSA):
> 
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> {
>  ...
>:
>   D.2962 = .omp_data_i->D.2947;
>   a.8 = D.2962;

So 'readonly: a[:32]' is put in .omp_data_i->D.2947 in the caller
and extracted here.  And you arrange for 'a.8' to have
DECL_POINTS_TO_READONLY set by "magic"?  Looking at this I wonder
if it would be more useful to "const qualify" (but "really", not
in the C sense) .omp_data_i->D.2947 instead?  Thus have a
FIELD_POINTS_TO_READONLY_MEMORY flag on the FIELD_DECL.

Points-to analysis should then be able to handle this similar to how
it handles loads of restrict qualified pointers.  Well, of course not
as simple since it now adds "qualifiers" to storage since I presume
the same object can be both readonly and not readonly like via

 #pragma acc parallel copyin(readonly: a[:32], a[33:64]) copyout(r)

?  That is, currently there's only one "readonly" object kind in
points-to, that's STRING_CSTs which get all globbed to string_id
and "ignored" for alias purposes since you can't change them.

So possibly you want to combine this with restrict qualifying the
pointer so we know there's no other (read-write) access to the memory
possible.  But then you might get all the good stuff already by
_just_ doing that restrict qualification and ignorin

Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-16 Thread Richard Biener
On Fri, Apr 12, 2024 at 5:07 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?

OK if the rs6000 part is approved.

> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
> for isfinite builtin.
> * optabs.def (isfinite_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index d2786f207b8..5262aa01660 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] Optab: add isnormal_optab for __builtin_isnormal

2024-05-16 Thread Richard Biener
On Fri, Apr 12, 2024 at 10:10 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isnormal. The normal check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?

Looks good, if the rs6000 part is approved.

> Thanks
> Gui Haochen
> ChangeLog
> optab: Add isnormal_optab for isnormal builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
> for isnormal builtin.
> * optabs.def (isnormal_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 3174f52ebe8..defb39de95f 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>  case BUILT_IN_ISFINITE:
>builtin_optab = isfinite_optab; break;
>  case BUILT_IN_ISNORMAL:
> +  builtin_optab = isnormal_optab; break;
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index dcd77315c2a..3c401fc0b4c 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
>  OPTAB_D (isfinite_optab, "isfinite$a2")
> +OPTAB_D (isnormal_optab, "isnormal$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

2024-05-16 Thread Richard Biener
On Thu, May 16, 2024 at 8:50 AM Tamar Christina  wrote:
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Thursday, May 16, 2024 5:06 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com; Richard Sandiford
> > ; Pan Li 
> > Subject: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit
> >
> > From: Pan Li 
> >
> > This patch adds early break auto-vectorization support for target which
> > use length on partial vectorization.  Consider this following example:
> >
> > unsigned vect_a[802];
> > unsigned vect_b[802];
> >
> > void test (unsigned x, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> > vect_b[i] = x + i;
> >
> > if (vect_a[i] > x)
> >   break;
> >
> > vect_a[i] = x;
> >   }
> > }
> >
> > We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
> > And then the IR of RVV looks like below:
> >
> >   ...
> >   _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
> >   _55 = (int) _87;
> >   ...
> >   mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
> >   vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
> > {0, ... }, _87, 0);
> >   if (vec_len_mask_72 != { 0, ... })
> > goto ; [5.50%]
> >   else
> > goto ; [94.50%]
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 2. The x86 bootstrap tests.
> > 3. The x86 fully regression tests.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len
> >   handling for one or multiple stmt.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-loop.cc (vect_gen_loop_len_mask): New func to gen
> >   the loop len mask.
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Invoke the
> >   vect_gen_loop_len_mask for 1 or more stmt(s).
> >   * tree-vectorizer.h (vect_gen_loop_len_mask): New func decl
> >   for vect_gen_loop_len_mask.
> >
>
> Thanks, this version looks good to me!
>
> You'll need Richi's review still.

OK.

Thanks,
Richard.

> Cheers,
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/tree-vect-loop.cc  | 27 +++
> >  gcc/tree-vect-stmts.cc | 17 +++--
> >  gcc/tree-vectorizer.h  |  4 
> >  3 files changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 361aec06488..83c0544b6aa 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -11416,6 +11416,33 @@ vect_get_loop_len (loop_vec_info loop_vinfo,
> > gimple_stmt_iterator *gsi,
> >return loop_len;
> >  }
> >
> > +/* Generate the tree for the loop len mask and return it.  Given the lens,
> > +   nvectors, vectype, index and factor to gen the len mask as below.
> > +
> > +   tree len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias)
> > +*/
> > +tree
> > +vect_gen_loop_len_mask (loop_vec_info loop_vinfo, gimple_stmt_iterator 
> > *gsi,
> > + gimple_stmt_iterator *cond_gsi, vec_loop_lens *lens,
> > + unsigned int nvectors, tree vectype, tree stmt,
> > + unsigned int index, unsigned int factor)
> > +{
> > +  tree all_one_mask = build_all_ones_cst (vectype);
> > +  tree all_zero_mask = build_zero_cst (vectype);
> > +  tree len = vect_get_loop_len (loop_vinfo, gsi, lens, nvectors, vectype, 
> > index,
> > + factor);
> > +  tree bias = build_int_cst (intQI_type_node,
> > +  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> > (loop_vinfo));
> > +  tree len_mask = make_temp_ssa_name (TREE_TYPE (stmt), NULL,
> > "vec_len_mask");
> > +  gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, stmt,
> > + all_one_mask, all_zero_mask, len,
> > + bias);
> > +  gimple_call_set_lhs (call, len_mask);
> > +  gsi_insert_before (cond_gsi, call, GSI_SAME_STMT);
> > +
> > +  return len_mask;
> > +}
> > +
> >  /* Scale profiling counters by estimation for LOOP which is vectorized
> > by factor VF.
> > If FLAT is true, the loop we started with had unrealistically flat
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index b8a71605f1b..672959501bb 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12895,7 +12895,9 @@ vectorizable_early_exit (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >  ncopies = vect_get_num_copies (loop_vinfo, vectype);
> >
> >vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> > +  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
> >bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +  bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
> >
> >/* Now build the new conditional.  Pattern gimple_conds get dropped 
> > during
> >   codegen so we must replace the original insn.  */
> > @@ 

[PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Richard Biener
Now that we handle pt.null conservatively we can implement the missing
tracking of constant pool entries (aka STRING_CST) and handle
ptr-ptr compares using points-to info in ptrs_compare_unequal.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Richard.

PR tree-optimization/13962
PR tree-optimization/96564
* tree-ssa-alias.h (pt_solution::const_pool): New flag.
* tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
compares.
(dump_points_to_solution): Dump the const_pool flag, fix guard
of flag dumping.
* gimple-pretty-print.cc (pp_points_to_solution): Likewise.
* tree-ssa-structalias.cc (find_what_var_points_to): Set
the const_pool flag for STRING.
(pt_solution_ior_into): Handle the const_pool flag.
(ipa_escaped_pt): Initialize it.

* gcc.dg/tree-ssa/alias-39.c: New testcase.
* g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
to manifest in transforms no longer vectorizing this testcase
for an ICE.
---
 gcc/gimple-pretty-print.cc   |  5 +++-
 gcc/testsuite/gcc.dg/tree-ssa/alias-39.c | 12 ++
 gcc/tree-ssa-alias.cc| 30 
 gcc/tree-ssa-alias.h |  5 
 gcc/tree-ssa-structalias.cc  |  6 ++---
 5 files changed, 50 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/alias-39.c

diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index abda8871f97..a71e1e0efc7 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -822,6 +822,8 @@ pp_points_to_solution (pretty_printer *buffer, const 
pt_solution *pt)
 pp_string (buffer, "unit-escaped ");
   if (pt->null)
 pp_string (buffer, "null ");
+  if (pt->const_pool)
+pp_string (buffer, "const-pool ");
   if (pt->vars
   && !bitmap_empty_p (pt->vars))
 {
@@ -838,7 +840,8 @@ pp_points_to_solution (pretty_printer *buffer, const 
pt_solution *pt)
   if (pt->vars_contains_nonlocal
  || pt->vars_contains_escaped
  || pt->vars_contains_escaped_heap
- || pt->vars_contains_restrict)
+ || pt->vars_contains_restrict
+ || pt->vars_contains_interposable)
{
  const char *comma = "";
  pp_string (buffer, " (");
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c 
b/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c
new file mode 100644
index 000..3b452893f6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-forwprop3" } */
+
+static int a, b;
+int foo (int n, int which)
+{
+  void *p = __builtin_malloc (n);
+  void *q = which ?  : 
+  return p == q;
+}
+
+/* { dg-final { scan-tree-dump "return 0;" "forwprop3" } } */
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 96301bbde7f..6d31fc83691 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -484,9 +484,27 @@ ptrs_compare_unequal (tree ptr1, tree ptr2)
}
   return !pt_solution_includes (>pt, obj1);
 }
-
-  /* ???  We'd like to handle ptr1 != NULL and ptr1 != ptr2
- but those require pt.null to be conservatively correct.  */
+  else if (TREE_CODE (ptr1) == SSA_NAME)
+{
+  struct ptr_info_def *pi1 = SSA_NAME_PTR_INFO (ptr1);
+  if (!pi1
+ || pi1->pt.vars_contains_restrict
+ || pi1->pt.vars_contains_interposable)
+   return false;
+  if (integer_zerop (ptr2) && !pi1->pt.null)
+   return true;
+  if (TREE_CODE (ptr2) == SSA_NAME)
+   {
+ struct ptr_info_def *pi2 = SSA_NAME_PTR_INFO (ptr2);
+ if (!pi2
+ || pi2->pt.vars_contains_restrict
+ || pi2->pt.vars_contains_interposable)
+   return false;
+ if ((!pi1->pt.null || !pi2->pt.null)
+ && (!pi1->pt.const_pool || !pi2->pt.const_pool))
+   return !pt_solutions_intersect (>pt, >pt);
+   }
+}
 
   return false;
 }
@@ -636,6 +654,9 @@ dump_points_to_solution (FILE *file, struct pt_solution *pt)
   if (pt->null)
 fprintf (file, ", points-to NULL");
 
+  if (pt->const_pool)
+fprintf (file, ", points-to const-pool");
+
   if (pt->vars)
 {
   fprintf (file, ", points-to vars: ");
@@ -643,7 +664,8 @@ dump_points_to_solution (FILE *file, struct pt_solution *pt)
   if (pt->vars_contains_nonlocal
  || pt->vars_contains_escaped
  || pt->vars_contains_escaped_heap
- || pt->vars_contains_restrict)
+ || pt->vars_contains_restrict
+ || pt->vars_contains_interposable)
{
  const char *comma = "";
  fprintf (file, " (");
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index b26fffeeb2d..e29dff58375 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -47,6 +47,11 @@ struct GTY(()) pt_solution
  includes memory at address NULL.  */
   

Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Richard Biener
On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:
>
> > OK.
>
> Thanks Richard for help and coaching. To double confirm, are you OK with this 
> patch only or for the series patch(es) of SAT middle-end?
> Thanks again for reviewing and suggestions.

For the series, the riscv specific part of course needs riscv approval.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, May 16, 2024 4:10 PM
> To: Li, Pan2 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
> 
> Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
> >
> > > LGTM but you'll need an OK from Richard,
> > > Thanks for working on this!
> >
> > Thanks Tamar for help and coaching, let's wait Richard for a while,!
>
> OK.
>
> Thanks for the patience,
> Richard.
>
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, May 15, 2024 5:12 PM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> > Liu, Hongtao 
> > Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for 
> > unsigned scalar int
> >
> > Hi Pan,
> >
> > Thanks!
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Wednesday, May 15, 2024 3:14 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > > ; richard.guent...@gmail.com;
> > > hongtao@intel.com; Pan Li 
> > > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > > scalar
> > > int
> > >
> > > From: Pan Li 
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > Given below example for the unsigned scalar integer uint64_t:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;succ:   EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;succ:   EXIT
> > > }
> > >
> > > The below tests are passed for this patch:
> > > 1. The riscv fully regression tests.
> > > 3. The x86 bootstrap tests.
> > > 4. The x86 fully regression tests.
> > >
> > >   PR target/51492
> > >   PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > >   to the return true switch case(s).
> > >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > >   * match.pd: Add unsigned SAT_ADD match(es).
> > >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > >   us/ssadd.
> > >   * tree

[PATCH] wrong code with points-to and volatile

2024-05-16 Thread Richard Biener
The following fixes points-to analysis which ignores the fact that
volatile qualified refs can result in any pointer.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Btw, I noticed this working on ptr-vs-ptr compare simplification
using points-to info and running into gcc.c-torture/execute/pr64242.c

* tree-ssa-structalias.cc (get_constraint_for_1): For
volatile referenced or decls use ANYTHING.

* gcc.dg/tree-ssa/alias-38.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/alias-38.c | 14 ++
 gcc/tree-ssa-structalias.cc  |  7 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/alias-38.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c 
b/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c
new file mode 100644
index 000..a5c41493473
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int x;
+int y;
+
+int main ()
+{
+  int *volatile p = 
+  return (p != );
+}
+
+/* { dg-final { scan-tree-dump " != " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "return 1;" "optimized" } } */
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 9c63305063c..f0454bea2ea 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -3575,6 +3575,10 @@ get_constraint_for_1 (tree t, vec *results, bool 
address_p,
   }
 case tcc_reference:
   {
+   if (TREE_THIS_VOLATILE (t))
+ /* Fall back to anything.  */
+ break;
+
switch (TREE_CODE (t))
  {
  case MEM_REF:
@@ -3676,6 +3680,9 @@ get_constraint_for_1 (tree t, vec *results, bool 
address_p,
   }
 case tcc_declaration:
   {
+   if (VAR_P (t) && TREE_THIS_VOLATILE (t))
+ /* Fall back to anything.  */
+ break;
get_constraint_for_ssa_var (t, results, address_p);
return;
   }
-- 
2.35.3


Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Richard Biener
On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
>
> On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis  wrote:
> >
> > If we consider code like:
> >
> > if (bar1 == x)
> >   return foo();
> > if (bar2 != y)
> >   return foo();
> > return 0;
> >
> > We would like the ifcombine pass to convert this to:
> >
> > if (bar1 == x || bar2 != y)
> >   return foo();
> > return 0;
> >
> > The ifcombine pass can handle this transformation but it is ran very early 
> > and
> > it misses the opportunity because there are two seperate blocks for foo().
> > The pre pass is good at removing duplicate code and blocks and due to that
> > running ifcombine again after it can increase the number of successful
> > conversions.
>
> I do think we should have something similar to re-running
> ssa-ifcombine but I think it should be much later, like after the loop
> optimizations are done.
> Maybe just a simplified version of it (that does the combining and not
> the optimizations part) included in isel or pass_optimize_widening_mul
> (which itself should most likely become part of isel or renamed since
> it handles more than just widening multiply these days).

I've long wished we had a (late?) pass that can also undo if-conversion
(basically do what RTL expansion would later do).  Maybe
gimple-predicate-analysis.cc (what's used by uninit analysis) can
represent mixed CFG + if-converted conditions so we can optimize
it and code-gen the condition in a more optimal manner much like
we have if-to-switch, switch-conversion and switch-expansion.

That said, I agree that re-running ifcombine should be later.  And there's
still the old task of splitting tail-merging from PRE (and possibly making
it more effective).

Richard.

>
> Thanks,
> Andrew Pinski
>
>
> >
> > PR 102793
> >
> > gcc/ChangeLog:
> >
> > * common.opt: -ftree-ifcombine option, enabled by default.
> > * doc/invoke.texi: Document.
> > * passes.def: Re-run ssa-ifcombine after pre.
> > * tree-ssa-ifcombine.cc: Make ifcombine cloneable. Add gate 
> > function.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/20030922-2.c: Change flag to -fno-tree-ifcombine.
> > * gcc.dg/uninit-pred-6_c.c: Remove inconsistent check.
> > * gcc.target/aarch64/pr102793.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/common.opt  |  4 +++
> >  gcc/doc/invoke.texi |  5 
> >  gcc/passes.def  |  1 +
> >  gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c  |  2 +-
> >  gcc/testsuite/gcc.dg/uninit-pred-6_c.c  |  4 ---
> >  gcc/testsuite/gcc.target/aarch64/pr102793.c | 30 +
> >  gcc/tree-ssa-ifcombine.cc   |  5 
> >  7 files changed, 46 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102793.c
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index ad348844775..e943202bcf1 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -3163,6 +3163,10 @@ ftree-phiprop
> >  Common Var(flag_tree_phiprop) Init(1) Optimization
> >  Enable hoisting loads from conditional pointers.
> >
> > +ftree-ifcombine
> > +Common Var(flag_tree_ifcombine) Init(1) Optimization
> > +Merge some conditional branches to simplify control flow.
> > +
> >  ftree-pre
> >  Common Var(flag_tree_pre) Optimization
> >  Enable SSA-PRE optimization on trees.
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index e2edf7a6c13..8d2ff6b4512 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13454,6 +13454,11 @@ This flag is enabled by default at @option{-O1} 
> > and higher.
> >  Perform hoisting of loads from conditional pointers on trees.  This
> >  pass is enabled by default at @option{-O1} and higher.
> >
> > +@opindex ftree-ifcombine
> > +@item -ftree-ifcombine
> > +Merge some conditional branches to simplify control flow.  This pass
> > +is enabled by default at @option{-O1} and higher.
> > +
> >  @opindex fhoist-adjacent-loads
> >  @item -fhoist-adjacent-loads
> >  Speculatively hoist loads from both branches of an if-then-else if the
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 1cbbd413097..1765b476131 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -270,6 +270,7 @@ along with GCC; see the file COPYING3.  If not see
> >NEXT_PASS (pass_lim);
> >NEXT_PASS (pass_walloca, false);
> >NEXT_PASS (pass_pre);
> > +  NEXT_PASS (pass_tree_ifcombine);
> >NEXT_PASS (pass_sink_code, false /* unsplit edges */);
> >NEXT_PASS (pass_sancov);
> >NEXT_PASS (pass_asan);
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> > index 16c79da9521..66c9f481a2f 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> > @@ -1,5 +1,5 @@
> 

Re: [PATCH v3] driver: Output to a temp file; rename upon success [PR80182]

2024-05-16 Thread Richard Biener
On Sun, May 12, 2024 at 3:40 PM Peter Damianov  wrote:
>
> Currently, commands like:
> gcc -o file.c -lm
> will delete the user's code.
>
> This patch makes the linker write executables to a temp file, and then renames
> the temp file if successful. This fixes the case above, but has limitations.
> The source file will still get overwritten if the link "succeeds", such as the
> case of: gcc -o file.c -lm -r
>
> It's not perfect, but it should hopefully stop some people from ruining their
> day.

Hmm.  When suggesting this I was originally hoping for this to be implemented
in the linker so that it delays opening (and truncating) of the output
file as much as possible.

If we want to do something in the compiler driver then I like the filename based
heuristics more.  v3 seems to only address the case of -o specifying the linker
output file but of course

gcc -c t.c -o t2.c

or

gcc -S t.c -o t2.c

happily overwrite a source file as well.  For these cases
heuristically rejecting
source file patterns would be better.  As we've shown the rename trick when
the link was successful doesn't fully solve the issue.  And I bet some people
will claim it isn't an issue at all ...

That is, I do think the linker itself, as a quality of implementation issue,
should avoid truncating the output early.  In fact the BFD linker seems to
unlink the output very early:

24937 stat("t.c", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
24937 lstat("t.c", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
24937 unlink("t.c") = 0
24937 openat(AT_FDCWD, "t.c", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3

before even opening other inputs or the default linker script.

Richard.

> gcc/ChangeLog:
> PR driver/80182
> * gcc.cc (output_file_temp): New global variable
> (driver_handle_option): Create temp file for executable output
> (driver::maybe_run_linker): Rename output_file_temp to output_file if
> the linker ran successfully
>
> Signed-off-by: Peter Damianov 
> ---
>
> v3: don't attempt to create temp files -> rename for -o /dev/null
>
>  gcc/gcc.cc | 53 +
>  1 file changed, 37 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..5e38c6e578a 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -2138,6 +2138,11 @@ static int have_E = 0;
>  /* Pointer to output file name passed in with -o. */
>  static const char *output_file = 0;
>
> +/* We write the output file to a temp file, and rename it if linking
> +   is successful. This is to prevent mistakes like: gcc -o file.c -lm from
> +   deleting the user's code.  */
> +static const char *output_file_temp = 0;
> +
>  /* Pointer to input file name passed in with -truncate.
> This file should be truncated after linking. */
>  static const char *totruncate_file = 0;
> @@ -4610,10 +4615,18 @@ driver_handle_option (struct gcc_options *opts,
>  #if defined(HAVE_TARGET_EXECUTABLE_SUFFIX) || 
> defined(HAVE_TARGET_OBJECT_SUFFIX)
>arg = convert_filename (arg, ! have_c, 0);
>  #endif
> -  output_file = arg;
> +  output_file_temp = output_file = arg;
> +  /* If creating an executable, create a temp file for the output, unless
> + -o /dev/null was requested. This will later get renamed, if the 
> linker
> + succeeds.  */
> +  if (!have_c && strcmp (output_file, HOST_BIT_BUCKET) != 0)
> +{
> +  output_file_temp = make_temp_file ("");
> +  record_temp_file (output_file_temp, false, true);
> +}
>/* On some systems, ld cannot handle "-o" without a space.  So
>  split the option from its argument.  */
> -  save_switch ("-o", 1, , validated, true);
> +  save_switch ("-o", 1, _file_temp, validated, true);
>return true;
>
>  case OPT_pie:
> @@ -9266,22 +9279,30 @@ driver::maybe_run_linker (const char *argv0) const
>linker_was_run = (tmp != execution_count);
>  }
>
> -  /* If options said don't run linker,
> - complain about input files to be given to the linker.  */
> -
> -  if (! linker_was_run && !seen_error ())
> -for (i = 0; (int) i < n_infiles; i++)
> -  if (explicit_link_files[i]
> - && !(infiles[i].language && infiles[i].language[0] == '*'))
> +  if (!seen_error ())
> +{
> +  if (linker_was_run)
> +   /* If the linker finished without errors, rename the output from the
> +  temporary file to the real output name.  */
> +   rename (output_file_temp, output_file);
> +  else
> {
> - warning (0, "%s: linker input file unused because linking not done",
> -  outfiles[i]);
> - if (access (outfiles[i], F_OK) < 0)
> -   /* This is can be an indication the user specifed an errorneous
> -  separated option value, (or used the wrong prefix for an
> -  option).  */
> -   error ("%s: linker input file not found: %m", outfiles[i]);
> + 

Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]

2024-05-16 Thread Richard Biener
On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis  wrote:
>
> New patch with the requested changes can be found below.
>
> I don't know how much this affects SCEV, but I do believe that we
> should incorporate this change somehow. I've seen various cases of
> suboptimal address calculation codegen that boil down to this.

This misses the ChangeLog (I assume it's unchanged) and indent
of the match.pd part is now off.

Please fix that, the patch is OK with that change.

Thanks,
Richard.

> gcc/match.pd | 31 +++
> gcc/testsuite/gcc.dg/pr109393.c | 16 
> 2 files changed, 47 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07e743ae464..1d642c205f0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (plus (convert @0) (op @2 (convert @1))
> #endif
> +/* ((T)(A + CST1)) * CST2 + CST3
> + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> + Where (A + CST1) doesn't need to have a single use. */
> +#if GIMPLE
> + (for op (plus minus)
> + (simplify
> + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> + INTEGER_CST@3)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> + && INTEGRAL_TYPE_P (type)
> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_WRAPS (type))
> + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)
> +#endif
> +
> +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> +#if GIMPLE
> + (for op (plus minus)
> + (simplify
> + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> + && INTEGRAL_TYPE_P (type)
> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> + && TYPE_OVERFLOW_WRAPS (type))
> + (op (mult (convert @0) @2) (mult (convert @1) @2)
> +#endif
> +
> /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> to a simple value. */
> (for op (plus minus)
> diff --git a/gcc/testsuite/gcc.dg/pr109393.c b/gcc/testsuite/gcc.dg/pr109393.c
> new file mode 100644
> index 000..e9051273672
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109393.c
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/109393 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> +
> +int foo(int *a, int j)
> +{
> + int k = j - 1;
> + return a[j - 1] == a[k];
> +}
> +
> +int bar(int *a, int j)
> +{
> + int k = j - 1;
> + return ([j + 1] - 2) == [k];
> +}
> --
> 2.44.0
>
>
> On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis  
> wrote:
> >
> > The original motivation for this pattern was that the following function 
> > does
> > not fold to 'return 1':
> >
> > int foo(int *a, int j)
> > {
> >   int k = j - 1;
> >   return a[j - 1] == a[k];
> > }
> >
> > The expression ((unsigned long) (X +- C1) * C2) appears frequently as part 
> > of
> > address calculations (e.g. arrays). These patterns help fold and simplify 
> > more
> > expressions.
> >
> > PR tree-optimization/109393
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> >   ((T)(A +- CST1)) * CST2 + CST3.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/pr109393.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/match.pd| 30 ++
> >  gcc/testsuite/gcc.dg/pr109393.c | 16 
> >  2 files changed, 46 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index d401e7503e6..13c828ba70d 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > (plus (convert @0) (op @2 (convert @1))
> >  #endif
> >
> > +/* ((T)(A + CST1)) * CST2 + CST3
> > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > +   Where (A + CST1) doesn't need to have a single use.  */
> > +#if GIMPLE
> > +  (for op (plus minus)
> > +   (simplify
> > +(plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) 
> > INTEGER_CST@3)
> > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > + && TREE_CODE (type) == INTEGER_TYPE
> > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > + && TYPE_OVERFLOW_WRAPS (type))
> > +   (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) @3)
> > +#endif
> > +
> > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > +#if GIMPLE
> > +  (for op (plus minus)
> > +   (simplify
> > +(mult 

Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Richard Biener
On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
>
> > LGTM but you'll need an OK from Richard,
> > Thanks for working on this!
>
> Thanks Tamar for help and coaching, let's wait Richard for a while,!

OK.

Thanks for the patience,
Richard.

> Pan
>
> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> Liu, Hongtao 
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> Hi Pan,
>
> Thanks!
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com;
> > hongtao@intel.com; Pan Li 
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > scalar
> > int
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> >   PR target/51492
> >   PR target/112600
> >
> > gcc/ChangeLog:
> >
> >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> >   to the return true switch case(s).
> >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> >   * match.pd: Add unsigned SAT_ADD match(es).
> >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> >   us/ssadd.
> >   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> >   extern func decl generated in match.pd match.
> >   (match_saturation_arith): New func impl to match the saturation arith.
> >   (math_opts_dom_walker::after_dom_children): Try match saturation
> >   arith when IOR expr.
> >
>
>  LGTM but you'll need an OK from Richard,
>
> Thanks for working on this!
>
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/internal-fn.cc|  1 +
> >  gcc/internal-fn.def   |  2 ++
> >  gcc/match.pd  | 51 +++
> >  gcc/optabs.def|  4 +--
> >  gcc/tree-ssa-math-opts.cc | 32 
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >  case IFN_UBSAN_CHECK_MUL:
> >  case IFN_ADD_OVERFLOW:
> >  case IFN_MUL_OVERFLOW:
> > +case IFN_SAT_ADD:
> >  case IFN_VEC_WIDEN_PLUS:
> >  case IFN_VEC_WIDEN_PLUS_LO:
> >  case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> > smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > binary)
> > +
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd 

[PATCH][v2] tree-optimization/79958 - make DSE track multiple paths

2024-05-16 Thread Richard Biener
DSE currently gives up when the path we analyze forks.  This leads
to multiple missed dead store elimination PRs.  The following fixes
this by recursing for each path and maintaining the visited bitmap
to avoid visiting CFG re-merges multiple times.  The overall cost
is still limited by the same bound, it's just more likely we'll hit
the limit now.  The patch doesn't try to deal with byte tracking
once a path forks but drops info on the floor and only handling
fully dead stores in that case.

This version adds some testsuite adjustments to avoid regressions.
Will push after retesting completed.

Richard.

PR tree-optimization/79958
PR tree-optimization/109087
PR tree-optimization/100314
PR tree-optimization/114774
* tree-ssa-dse.cc (dse_classify_store): New forwarder.
(dse_classify_store): Add arguments cnt and visited, recurse
to track multiple paths when we end up with multiple defs.

* gcc.dg/tree-ssa/ssa-dse-48.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-49.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-50.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-51.c: Likewise.
* gcc.dg/graphite/pr80906.c: Avoid DSE of last data reference
in loop.
* g++.dg/ipa/devirt-24.C: Adjust for extra DSE.
* g++.dg/warn/Wuninitialized-pr107919-1.C: Use more important
-O2 optimization level, -O1 regresses.
---
 gcc/testsuite/g++.dg/ipa/devirt-24.C  |  4 ++-
 .../g++.dg/warn/Wuninitialized-pr107919-1.C   |  2 +-
 gcc/testsuite/gcc.dg/graphite/pr80906.c   |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c| 17 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c| 18 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c| 25 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c| 24 ++
 gcc/tree-ssa-dse.cc   | 31 ---
 8 files changed, 116 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c

diff --git a/gcc/testsuite/g++.dg/ipa/devirt-24.C 
b/gcc/testsuite/g++.dg/ipa/devirt-24.C
index 7b5b806dd05..333c03cd8dd 100644
--- a/gcc/testsuite/g++.dg/ipa/devirt-24.C
+++ b/gcc/testsuite/g++.dg/ipa/devirt-24.C
@@ -37,4 +37,6 @@ C *b = new (C);
   }
 }
 /* { dg-final { scan-ipa-dump-times "Discovered a virtual call to a known 
target" 1 "inline" { xfail *-*-* } } } */
-/* { dg-final { scan-ipa-dump-times "Aggregate passed by reference" 2 "cp"  } 
} */
+/* We used to have IPA CP see two aggregates passed to sort() but as the
+   first argument is unused DSE now elides the vptr initialization.  */
+/* { dg-final { scan-ipa-dump-times "Aggregate passed by reference" 1 "cp"  } 
} */
diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C 
b/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
index dd631dc8bfe..067a44a462e 100644
--- a/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
+++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-pr107919-1.C
@@ -1,6 +1,6 @@
 // { dg-do compile }
 // { dg-require-effective-target c++17 }
-// { dg-options "-O -Wuninitialized" }
+// { dg-options "-O2 -Wuninitialized" }
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/graphite/pr80906.c 
b/gcc/testsuite/gcc.dg/graphite/pr80906.c
index 59c7f59cadf..ec3840834fc 100644
--- a/gcc/testsuite/gcc.dg/graphite/pr80906.c
+++ b/gcc/testsuite/gcc.dg/graphite/pr80906.c
@@ -18,7 +18,7 @@ ec (int lh[][2])
  --bm;
if (bm != 0)
  --c5;
-   lh[0][0] = 0;
+   lh[hp][0] = 0;
m3 *= jv;
   }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
new file mode 100644
index 000..edfc62c7e4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+int a;
+int foo (void);
+int bar (void);
+
+void
+baz (void)
+{
+  int *b[6];
+  b[0] = 
+  if (foo ())
+a |= bar ();
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store: b\\\[0\\\] = " "dse1" } 
} */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
new file mode 100644
index 000..1eec284a415
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-tree-dce -fdump-tree-dse1-details" } */
+
+struct X { int i; };
+void bar ();
+void foo (int b)
+{
+  struct X x;
+  x.i = 1;
+  if (b)
+{
+  bar ();
+  __builtin_abort ();
+}
+  bar ();
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store: x.i = 1;" "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c
new file mode 100644
index 

Re: [PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-16 Thread Richard Biener
On Thu, May 16, 2024 at 8:25 AM Hongyu Wang  wrote:
>
> Hi,
>
> In ix86_override_options_after_change, calls to ix86_default_align
> and ix86_recompute_optlev_based_flags will cause mismatched target
> opt_set when doing cl_optimization_restore. Move them back to
> ix86_option_override_internal to solve the issue.
>
> Bootstrapped & regtested on x86_64-pc-linux-gnu, and Rainer helped to
> test with i386-pc-solaris2.11 which also passed 32/64bit tests.

Since this is a tricky area apparently without too much test coverage can
we have a testcase for this?

> Ok for trunk and backport down to gcc12?
>
> gcc/ChangeLog:
>
> PR target/113719
> * config/i386/i386-options.cc (ix86_override_options_after_change):
> Remove call to ix86_default_align and
> ix86_recompute_optlev_based_flags.
> (ix86_option_override_internal): Call ix86_default_align and
> ix86_recompute_optlev_based_flags.
> ---
>  gcc/config/i386/i386-options.cc | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index ac48b5c61c4..d97464f2c74 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1930,11 +1930,6 @@ ix86_recompute_optlev_based_flags (struct gcc_options 
> *opts,
>  void
>  ix86_override_options_after_change (void)
>  {
> -  /* Default align_* from the processor table.  */
> -  ix86_default_align (_options);
> -
> -  ix86_recompute_optlev_based_flags (_options, _options_set);
> -
>/* Disable unrolling small loops when there's explicit
>   -f{,no}unroll-loop.  */
>if ((OPTION_SET_P (flag_unroll_loops))
> @@ -2530,6 +2525,8 @@ ix86_option_override_internal (bool main_args_p,
>
>set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
>
> +  ix86_recompute_optlev_based_flags (opts, opts_set);
> +
>ix86_override_options_after_change ();
>
>ix86_tune_cost = processor_cost_table[ix86_tune];
> @@ -2565,6 +2562,9 @@ ix86_option_override_internal (bool main_args_p,
>|| TARGET_64BIT_P (opts->x_ix86_isa_flags))
>  opts->x_ix86_regparm = REGPARM_MAX;
>
> +  /* Default align_* from the processor table.  */
> +  ix86_default_align (_options);
> +
>/* Provide default for -mbranch-cost= value.  */
>SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
>ix86_tune_cost->branch_cost);
> --
> 2.31.1
>


[PATCH] tree-optimization/79958 - make DSE track multiple paths

2024-05-15 Thread Richard Biener
DSE currently gives up when the path we analyze forks.  This leads
to multiple missed dead store elimination PRs.  The following fixes
this by recursing for each path and maintaining the visited bitmap
to avoid visiting CFG re-merges multiple times.  The overall cost
is still limited by the same bound, it's just more likely we'll hit
the limit now.  The patch doesn't try to deal with byte tracking
once a path forks but drops info on the floor and only handling
fully dead stores in that case.

Bootstrapped on x86_64-unknown-linux-gnu for all languages, testing in 
progress.

Richard.

PR tree-optimization/79958
PR tree-optimization/109087
PR tree-optimization/100314
PR tree-optimization/114774
* tree-ssa-dse.cc (dse_classify_store): New forwarder.
(dse_classify_store): Add arguments cnt and visited, recurse
to track multiple paths when we end up with multiple defs.

* gcc.dg/tree-ssa/ssa-dse-48.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-49.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-50.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-51.c: Likewise.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c | 17 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c | 18 +
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c | 25 +
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c | 24 +
 gcc/tree-ssa-dse.cc| 31 +++---
 5 files changed, 111 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
new file mode 100644
index 000..edfc62c7e4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-48.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+int a;
+int foo (void);
+int bar (void);
+
+void
+baz (void)
+{
+  int *b[6];
+  b[0] = 
+  if (foo ())
+a |= bar ();
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store: b\\\[0\\\] = " "dse1" } 
} */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
new file mode 100644
index 000..1eec284a415
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-49.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-tree-dce -fdump-tree-dse1-details" } */
+
+struct X { int i; };
+void bar ();
+void foo (int b)
+{
+  struct X x;
+  x.i = 1;
+  if (b)
+{
+  bar ();
+  __builtin_abort ();
+}
+  bar ();
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store: x.i = 1;" "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c
new file mode 100644
index 000..7c42ae6a67a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-50.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+extern void foo(void);
+static int a, *c, g, **j;
+int b;
+static void e() {
+  int k, *l[5] = {, , , , };
+  while (g) {
+j = [0];
+b++;
+  }
+}
+static void d(int m) {
+  int **h[30] = {}, ***i[1] = {[3]};
+  if (m)
+foo();
+  e();
+}
+int main() {
+  d(a);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead store" 8 "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c
new file mode 100644
index 000..ac9d1bb1fc8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-51.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fstrict-aliasing -fdump-tree-dse1-details" } */
+
+int a;
+short *p;
+void
+test (int b)
+{
+  a=1;
+  if (b)
+{
+  (*p)++;
+  a=2;
+  __builtin_printf ("1\n");
+}
+  else
+{
+  (*p)++;
+  a=3;
+  __builtin_printf ("2\n");
+}
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store: a = 1;" "dse1" } } */
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index fce4fc76a56..9252ca34050 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -971,14 +971,13 @@ static hash_map 
*dse_stmt_to_dr_map;
if only clobber statements influenced the classification result.
Returns the classification.  */
 
-dse_store_status
+static dse_store_status
 dse_classify_store (ao_ref *ref, gimple *stmt,
bool byte_tracking_enabled, sbitmap live_bytes,
-   bool *by_clobber_p, tree stop_at_vuse)
+   bool *by_clobber_p, tree stop_at_vuse, int ,
+   bitmap visited)
 {
   gimple *temp;
-  int cnt = 0;
-  auto_bitmap visited;
   std::unique_ptr
 dra (nullptr, free_data_ref);
 
@@ -1238,6 +1237,19 @@ dse_classify_store (ao_ref *ref, gimple *stmt,

Re: [PATCH] middle-end/111422 - wrong stack var coalescing, handle PHIs

2024-05-15 Thread Richard Biener
On Wed, 15 May 2024, Jakub Jelinek wrote:

> On Wed, May 15, 2024 at 01:41:04PM +0200, Richard Biener wrote:
> > PR middle-end/111422
> > * cfgexpand.cc (add_scope_conflicts_2): Handle PHIs
> > by recursing to their arguments.
> > ---
> >  gcc/cfgexpand.cc | 21 +
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> > index 557cb28733b..e4d763fa998 100644
> > --- a/gcc/cfgexpand.cc
> > +++ b/gcc/cfgexpand.cc
> > @@ -584,10 +584,23 @@ add_scope_conflicts_2 (tree use, bitmap work,
> >   || INTEGRAL_TYPE_P (TREE_TYPE (use
> >  {
> >gimple *g = SSA_NAME_DEF_STMT (use);
> > -  if (is_gimple_assign (g))
> > -   if (tree op = gimple_assign_rhs1 (g))
> > - if (TREE_CODE (op) == ADDR_EXPR)
> > -   visit (g, TREE_OPERAND (op, 0), op, work);
> > +  if (gassign *a = dyn_cast  (g))
> > +   {
> > + if (tree op = gimple_assign_rhs1 (a))
> > +   if (TREE_CODE (op) == ADDR_EXPR)
> > + visit (a, TREE_OPERAND (op, 0), op, work);
> > +   }
> > +  else if (gphi *p = dyn_cast  (g))
> > +   {
> > + for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
> > +   if (TREE_CODE (use = gimple_phi_arg_def (p, i)) == SSA_NAME)
> > + if (gassign *a = dyn_cast  (SSA_NAME_DEF_STMT (use)))
> > +   {
> > + if (tree op = gimple_assign_rhs1 (a))
> > +   if (TREE_CODE (op) == ADDR_EXPR)
> > + visit (a, TREE_OPERAND (op, 0), op, work);
> > +   }
> > +   }
> 
> Why the 2 {} pairs here?  Can't it be done without them (sure, before the
> else if it is required)?

Removed and pushed.

Richard.


[PATCH] middle-end/111422 - wrong stack var coalescing, handle PHIs

2024-05-15 Thread Richard Biener
The gcc.c-torture/execute/pr111422.c testcase after installing the
sink pass improvement reveals that we also need to handle

 _65 =  + _58;  _44 =  + _43;
 # _59 = PHI <_65, _44>
 *_59 = 8;
 g = {v} {CLOBBER(eos)};
 ...
 n[0] = 
 *_59 = 8;
 g = {v} {CLOBBER(eos)};

where we fail to see the conflict between n and g after the first
clobber of g.  Before the sinking improvement there was a conflict
recorded on a path where _65/_44 are unused, so the real conflict
was missed but the fake one avoided the miscompile.

The following handles PHI defs in add_scope_conflicts_2 which
fixes the issue.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK if that succeeds?

Thanks,
Richard.

PR middle-end/111422
* cfgexpand.cc (add_scope_conflicts_2): Handle PHIs
by recursing to their arguments.
---
 gcc/cfgexpand.cc | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 557cb28733b..e4d763fa998 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -584,10 +584,23 @@ add_scope_conflicts_2 (tree use, bitmap work,
  || INTEGRAL_TYPE_P (TREE_TYPE (use
 {
   gimple *g = SSA_NAME_DEF_STMT (use);
-  if (is_gimple_assign (g))
-   if (tree op = gimple_assign_rhs1 (g))
- if (TREE_CODE (op) == ADDR_EXPR)
-   visit (g, TREE_OPERAND (op, 0), op, work);
+  if (gassign *a = dyn_cast  (g))
+   {
+ if (tree op = gimple_assign_rhs1 (a))
+   if (TREE_CODE (op) == ADDR_EXPR)
+ visit (a, TREE_OPERAND (op, 0), op, work);
+   }
+  else if (gphi *p = dyn_cast  (g))
+   {
+ for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
+   if (TREE_CODE (use = gimple_phi_arg_def (p, i)) == SSA_NAME)
+ if (gassign *a = dyn_cast  (SSA_NAME_DEF_STMT (use)))
+   {
+ if (tree op = gimple_assign_rhs1 (a))
+   if (TREE_CODE (op) == ADDR_EXPR)
+ visit (a, TREE_OPERAND (op, 0), op, work);
+   }
+   }
 }
 }
 
-- 
2.35.3


Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Richard Biener
On Wed, May 15, 2024 at 12:29 PM Tamar Christina
 wrote:
>
> Hi All,
>
> Some Neoverse Software Optimization Guides (SWoG) have a clause that state
> that for predicated operations that also produce a predicate it is preferred
> that the codegen should use a different register for the destination than that
> of the input predicate in order to avoid a performance overhead.
>
> This of course has the problem that it increases register pressure and so 
> should
> be done with care.  Additionally not all micro-architectures have this
> consideration and so it shouldn't be done as a default thing.
>
> The patch series adds support for doing conditional early clobbers through a
> combination of new alternatives and attributes to control their availability.

You could have two alternatives, one with early clobber and one with
a matching constraint where you'd disparage the matching constraint one?

> On high register pressure we also use LRA's costing to prefer not to use the
> alternative and instead just use the tie as this is preferable to a reload.
>
> Concretely this patch series does:
>
> > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2
>
> foo:
> mov z31.h, w0
> ptrue   p3.b, all
> cmplo   p0.h, p3/z, z0.h, z31.h
> b   use
>
> > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n1+sve
>
> foo:
> mov z31.h, w0
> ptrue   p0.b, all
> cmplo   p0.h, p0/z, z0.h, z31.h
> b   use
>
> > aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 
> > -ffixed-p[1-15]
>
> foo:
> mov z31.h, w0
> ptrue   p0.b, all
> cmplo   p0.h, p0/z, z0.h, z31.h
> b   use
>
> Testcases for the changes are in the last patch of the series.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Thanks,
> Tamar
>
> ---
>
> --


Re: [PATCH] [PATCH] Correct DLL Installation Path for x86_64-w64-mingw32 Multilib [PR115094]

2024-05-15 Thread Richard Biener
On Wed, May 15, 2024 at 11:39 AM unlvsur unlvsur  wrote:
>
> cqwrteur@DESKTOP-9B705LH:~/gcc$ grep -r "# DLL is installed to" .
> ./zlib/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libitm/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libitm/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libquadmath/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libssp/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libobjc/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libvtv/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libvtv/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libsanitizer/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libsanitizer/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libstdc++-v3/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libstdc++-v3/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libffi/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libffi/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./gcc/configure:# DLL is installed to $(libdir)/../bin by postinstall_cmds
> ./gcc/configure:# DLL is installed to $(libdir)/../bin by postinstall_cmds
> ./libphobos/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgomp/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgomp/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgm2/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgm2/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libcc1/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libcc1/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libbacktrace/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgrust/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgrust/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libtool.m4:# DLL is installed to $(libdir)/../bin by postinstall_cmds
> ./libgfortran/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgfortran/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./lto-plugin/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgo/config/libtool.m4:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libgo/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
> ./libatomic/configure:# DLL is installed to $(libdir)/../bin by 
> postinstall_cmds
>
> The comment can only find it from libtool and configure. configure.ac does 
> not contain the information.
>
> I just wrote a program to replace all text in gcc directory here.
>
> can you tell me how to generate configure from libtool.m4? Thank you

You need to have exactly autoconf 2.69 installed and then invoke
'autoconf' from each directory.
At least that's how I do it.  But my question was whether upstream
libtool has your fix or
whether this is a downstream patch against libtool.m4 which we need to carry.

Richard.

> 
> From: Richard Biener 
> Sent: Wednesday, May 15, 2024 5:28
> To: unlvsur unlvsur 
> Cc: gcc-patches@gcc.gnu.org ; trcrsired 
> 
> Subject: Re: [PATCH] [PATCH] Correct DLL Installation Path for 
> x86_64-w64-mingw32 Multilib [PR115094]
>
> On Wed, May 15, 2024 at 11:02 AM unlvsur unlvsur  wrote:
> >
> > Hi. Richard. I checked configure.ac and it is not in configure.ac. It is in 
> > the libtool.m4. The code was generated from libtool.m4 so it is correct.
>
> Ah, sorry - the libtool.m4 change escaped me ...
>
> It's been some time since we updated libtool, is this fixed in libtool
> upstream in the
> same way?  You are missing a ChangeLog entry which should indicate which
> files were just re-generated and which ones you edited (and what part).
>
> Richard.
>
> > 
> > From: Richard Biener 
> > Sent: Wednesday, May 15, 2024 3:46
> > To: trcrsired 
> > Cc: gcc-patches@gcc.gnu.org ; trcrsired 
> > 
> > Subject: Re: [PATCH] [PATCH] Correct DLL Installation Path for 
> > x86_64-w6

[PATCH] tree-optimization/114589 - remove profile based sink heuristics

2024-05-15 Thread Richard Biener
The following removes the profile based heuristic limiting sinking
and instead uses post-dominators to avoid sinking to places that
are executed under the same conditions as the earlier location which
the profile based heuristic should have guaranteed as well.

To avoid regressing this moves the empty-latch check to cover all
sink cases.

It also stream-lines the resulting select_best_block a bit but avoids
adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
starts execute failing with this on x86_64 with -m32 because the
(float)i * 9....e-7 compute is sunk across a STOP causing it
to be no longer spilled and thus the compare failing due to excess
precision.  The patch adds -ffloat-store to avoid this, following
other similar testcases.

This change doesn't fix the testcase in the PR on itself.

Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.

PR tree-optimization/114589
* tree-ssa-sink.cc (select_best_block): Remove profile-based
heuristics.  Instead reject sink locations that sink
to post-dominators.  Move empty latch check here from
statement_sink_location.  Also consider early_bb for the
loop depth check.
(statement_sink_location): Remove superfluous check.  Remove
empty latch check.
(pass_sink_code::execute): Compute/release post-dominators.

* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
excess precision when not spilling.
---
 gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
 gcc/tree-ssa-sink.cc | 62 
 2 files changed, 20 insertions(+), 43 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 
b/gcc/testsuite/gfortran.dg/streamio_9.f90
index b6bddb973f8..f29ded6ba54 100644
--- a/gcc/testsuite/gfortran.dg/streamio_9.f90
+++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-options "-ffloat-store" }
 ! PR29053 Stream IO test 9.
 ! Contributed by Jerry DeLisle .
 ! Test case derived from that given in PR by Steve Kargl.
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2f90acb7ef4..2188b7523c7 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
 
We want the most control dependent block in the shallowest loop nest.
 
-   If the resulting block is in a shallower loop nest, then use it.  Else
-   only use the resulting block if it has significantly lower execution
-   frequency than EARLY_BB to avoid gratuitous statement movement.  We
-   consider statements with VOPS more desirable to move.
-
-   This pass would obviously benefit from PDO as it utilizes block
-   frequencies.  It would also benefit from recomputing frequencies
-   if profile data is not available since frequencies often get out
-   of sync with reality.  */
+   If the resulting block is in a shallower loop nest, then use it.  */
 
 static basic_block
 select_best_block (basic_block early_bb,
@@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-  int threshold;
 
   while (temp_bb != early_bb)
 {
+  /* Walk up the dominator tree, hopefully we'll find a shallower
+loop nest.  */
+  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+
   /* If we've moved into a lower loop nest, then that becomes
 our best block.  */
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
-
-  /* Walk up the dominator tree, hopefully we'll find a shallower
-loop nest.  */
-  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
 
   /* Placing a statement before a setjmp-like function would be invalid
@@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
 return best_bb;
 
+  /* Do not move stmts to post-dominating places on the same loop depth.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
+return early_bb;
+
+  /* If the latch block is empty, don't make it non-empty by sinking
+ something into it.  */
+  if (best_bb == early_bb->loop_father->latch
+  && empty_block_p (best_bb))
+return early_bb;
+
   /* Avoid turning an unconditional read into a conditional one when we
  still might want to perform vectorization.  */
   if (best_bb->loop_father == early_bb->loop_father
@@ -233,28 +234,7 @@ select_best_block (basic_block early_bb,
   && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, 
best_bb))
 return early_bb;
 
-  /* Get the sinking threshold.  If the statement to be moved has memory
- operands, then increase the threshold by 7% as those are even more
- profitable to avoid, clamping at 100%.  */
-  threshold = param_sink_frequency_threshold;
-  if (gimple_vuse (stmt) || 

Re: [PATCH] [PATCH] Correct DLL Installation Path for x86_64-w64-mingw32 Multilib [PR115094]

2024-05-15 Thread Richard Biener
On Wed, May 15, 2024 at 11:02 AM unlvsur unlvsur  wrote:
>
> Hi. Richard. I checked configure.ac and it is not in configure.ac. It is in 
> the libtool.m4. The code was generated from libtool.m4 so it is correct.

Ah, sorry - the libtool.m4 change escaped me ...

It's been some time since we updated libtool, is this fixed in libtool
upstream in the
same way?  You are missing a ChangeLog entry which should indicate which
files were just re-generated and which ones you edited (and what part).

Richard.

> ____
> From: Richard Biener 
> Sent: Wednesday, May 15, 2024 3:46
> To: trcrsired 
> Cc: gcc-patches@gcc.gnu.org ; trcrsired 
> 
> Subject: Re: [PATCH] [PATCH] Correct DLL Installation Path for 
> x86_64-w64-mingw32 Multilib [PR115094]
>
> On Tue, May 14, 2024 at 10:27 PM trcrsired  wrote:
> >
> > From: trcrsired 
> >
> > When building native GCC for the x86_64-w64-mingw32 host, the compiler 
> > copies its library DLLs to the `bin` directory. However, in the case of a 
> > multilib configuration, both 32-bit and 64-bit libraries end up in the same 
> > `bin` directory, leading to conflicts where 64-bit DLLs are overridden by 
> > their 32-bit counterparts.
> >
> > This patch addresses the issue by adjusting the installation path for the 
> > libraries. Specifically, it installs the libraries to separate directories: 
> > `lib` for 64-bit and `lib32` for 32-bit. This behavior aligns with how 
> > libraries are installed when creating an x86_64-w64-mingw32 cross-compiler 
> > without copying them to the `bin` directory if it is a multilib build.
>
> You need to patch configure.ac, not only the generated files.
>
> > ---
> >  gcc/configure   | 26 ++
> >  libatomic/configure | 13 +
> >  libbacktrace/configure  | 13 +
> >  libcc1/configure| 26 ++
> >  libffi/configure| 26 ++
> >  libgfortran/configure   | 26 ++
> >  libgm2/configure| 26 ++
> >  libgo/config/libtool.m4 | 13 +
> >  libgo/configure | 13 +
> >  libgomp/configure   | 26 ++
> >  libgrust/configure  | 26 ++
> >  libitm/configure| 26 ++
> >  libobjc/configure   | 13 +
> >  libphobos/configure | 13 +
> >  libquadmath/configure   | 13 +
> >  libsanitizer/configure  | 26 ++
> >  libssp/configure| 13 +
> >  libstdc++-v3/configure  | 26 ++
> >  libtool.m4  | 13 +
> >  libvtv/configure| 26 ++
> >  lto-plugin/configure| 13 +
> >  zlib/configure  | 13 +
> >  22 files changed, 429 insertions(+)
> >
> > diff --git a/gcc/configure b/gcc/configure
> > index aaf5899cc03..beab6df1878 100755
> > --- a/gcc/configure
> > +++ b/gcc/configure
> > @@ -20472,6 +20472,18 @@ cygwin* | mingw* | pw32* | cegcc*)
> >yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*)
> >  library_names_spec='$libname.dll.a'
> >  # DLL is installed to $(libdir)/../bin by postinstall_cmds
> > +# If user builds GCC with mulitlibs enabled, it should just install on 
> > $(libdir)
> > +# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
> > +if test ${multilib} = yes; then
> > +postinstall_cmds='base_file=`basename \${file}`~
> > +  dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
> > \$dlname'\''`~
> > +  dldir=$destdir/`dirname \$dlpath`~
> > +  $install_prog $dir/$dlname $destdir/$dlname~
> > +  chmod a+x $destdir/$dlname~
> > +  if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
> > +eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
> > +  fi'
> > +else
> >  postinstall_cmds='base_file=`basename \${file}`~
> >dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
> > \$dlname'\''`~
> >dldir=$destdir/`dirname \$dlpath`~
> > @@ -20481,6 +20493,7 @@ cygwin* | mingw* | pw32* | cegcc*)
> >if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
> >  eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
> >fi'
> > +fi
> >  postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo 
> >

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-15 Thread Richard Biener
On Wed, May 15, 2024 at 4:15 AM Hongtao Liu  wrote:
>
> On Mon, May 13, 2024 at 3:40 PM Richard Biener
>  wrote:
> >
> > On Mon, May 13, 2024 at 4:29 AM liuhongt  wrote:
> > >
> > > As testcase in the PR, O3 cunrolli may prevent vectorization for the
> > > innermost loop and increase register pressure.
> > > The patch removes the 1/3 reduction of unr_insn for innermost loop for 
> > > UL_ALL.
> > > ul != UR_ALL is needed since some small loop complete unrolling at O2 
> > > relies
> > > the reduction.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > No big impact for SPEC2017.
> > > Ok for trunk?
> >
> > This removes the 1/3 reduction when unrolling a loop nest (the case I was
> > concerned about).  Unrolling of a nest is by iterating in
> > tree_unroll_loops_completely
> > so the to be unrolled loop appears innermost.  So I think you need a new
> > parameter on tree_unroll_loops_completely_1 indicating whether we're in the
> > first iteration (or whether to assume inner most loops will "simplify").
> yes, it would be better.
> >
> > Few comments below
> >
> > > gcc/ChangeLog:
> > >
> > > PR tree-optimization/112325
> > > * tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Add 2
> > > new parameters: loop and ul, and remove unr_insns reduction
> > > for innermost loop.
> > > (try_unroll_loop_completely): Pass loop and ul to
> > > estimated_unrolled_size.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/pr112325.c: New test.
> > > * gcc.dg/vect/pr69783.c: Add extra option --param
> > > max-completely-peeled-insns=300.
> > > ---
> > >  gcc/testsuite/gcc.dg/tree-ssa/pr112325.c | 57 
> > >  gcc/testsuite/gcc.dg/vect/pr69783.c  |  2 +-
> > >  gcc/tree-ssa-loop-ivcanon.cc | 16 +--
> > >  3 files changed, 71 insertions(+), 4 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c 
> > > b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > > new file mode 100644
> > > index 000..14208b3e7f8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > > @@ -0,0 +1,57 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> > > +
> > > +typedef unsigned short ggml_fp16_t;
> > > +static float table_f32_f16[1 << 16];
> > > +
> > > +inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
> > > +unsigned short s;
> > > +__builtin_memcpy(, , sizeof(unsigned short));
> > > +return table_f32_f16[s];
> > > +}
> > > +
> > > +typedef struct {
> > > +ggml_fp16_t d;
> > > +ggml_fp16_t m;
> > > +unsigned char qh[4];
> > > +unsigned char qs[32 / 2];
> > > +} block_q5_1;
> > > +
> > > +typedef struct {
> > > +float d;
> > > +float s;
> > > +char qs[32];
> > > +} block_q8_1;
> > > +
> > > +void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void 
> > > * restrict vx, const void * restrict vy) {
> > > +const int qk = 32;
> > > +const int nb = n / qk;
> > > +
> > > +const block_q5_1 * restrict x = vx;
> > > +const block_q8_1 * restrict y = vy;
> > > +
> > > +float sumf = 0.0;
> > > +
> > > +for (int i = 0; i < nb; i++) {
> > > +unsigned qh;
> > > +__builtin_memcpy(, x[i].qh, sizeof(qh));
> > > +
> > > +int sumi = 0;
> > > +
> > > +for (int j = 0; j < qk/2; ++j) {
> > > +const unsigned char xh_0 = ((qh >> (j + 0)) << 4) & 0x10;
> > > +const unsigned char xh_1 = ((qh >> (j + 12)) ) & 0x10;
> > > +
> > > +const int x0 = (x[i].qs[j] & 0xF) | xh_0;
> > > +const int x1 = (x[i].qs[j] >> 4) | xh_1;
> > > +
> > > +sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
> > > +}
> > > +
> > > +sumf += (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + 
> > > ggml_lookup_fp16_to_fp

Re: [PATCH] tree-cfg: Move the returns_twice check to be last statement only [PR114301]

2024-05-15 Thread Richard Biener
On Tue, May 14, 2024 at 5:52 PM Andrew Pinski  wrote:
>
> When I was checking to making sure that all of the bugs dealing
> with the case where gimple_can_duplicate_bb_p would return false was fixed,
> I noticed that the code which was checking if a call statement was
> returns_twice was checking all call statements rather than just the
> last statement. Since calling gimple_call_flags has a small non-zero
> overhead due to a few string comparison, removing the uses of it
> can have a small performance improvement. In the case of returns_twice
> functions calls, will always end the basic-block due to the check in
> stmt_can_terminate_bb_p (and others). So checking only the last statement
> is a small optimization and will be safe.
>
> Bootstrapped and tested pon x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/114301
> gcc/ChangeLog:
>
> * tree-cfg.cc (gimple_can_duplicate_bb_p): Check returns_twice
> only on the last call statement rather than all.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-cfg.cc | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index b2d47b72084..7fb7b92966b 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -6495,6 +6495,13 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
> && gimple_call_internal_p (last)
> && gimple_call_internal_unique_p (last))
>return false;
> +
> +/* Prohibit duplication of returns_twice calls, otherwise associated
> +   abnormal edges also need to be duplicated properly.
> +   return_twice functions will always be the last statement.  */
> +if (is_gimple_call (last)
> +   && (gimple_call_flags (last) & ECF_RETURNS_TWICE))
> +  return false;
>}
>
>for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
> @@ -6502,15 +6509,12 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
>  {
>gimple *g = gsi_stmt (gsi);
>
> -  /* Prohibit duplication of returns_twice calls, otherwise associated
> -abnormal edges also need to be duplicated properly.
> -An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
> +  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
>  duplicated as part of its group, or not at all.
>  The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such 
> a
>  group, so the same holds there.  */
>if (is_gimple_call (g)
> - && (gimple_call_flags (g) & ECF_RETURNS_TWICE
> - || gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
> + && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
>   || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
>   || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
>   || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
> --
> 2.34.1
>


Re: [PATCH] [PATCH] Correct DLL Installation Path for x86_64-w64-mingw32 Multilib [PR115094]

2024-05-15 Thread Richard Biener
On Tue, May 14, 2024 at 10:27 PM trcrsired  wrote:
>
> From: trcrsired 
>
> When building native GCC for the x86_64-w64-mingw32 host, the compiler copies 
> its library DLLs to the `bin` directory. However, in the case of a multilib 
> configuration, both 32-bit and 64-bit libraries end up in the same `bin` 
> directory, leading to conflicts where 64-bit DLLs are overridden by their 
> 32-bit counterparts.
>
> This patch addresses the issue by adjusting the installation path for the 
> libraries. Specifically, it installs the libraries to separate directories: 
> `lib` for 64-bit and `lib32` for 32-bit. This behavior aligns with how 
> libraries are installed when creating an x86_64-w64-mingw32 cross-compiler 
> without copying them to the `bin` directory if it is a multilib build.

You need to patch configure.ac, not only the generated files.

> ---
>  gcc/configure   | 26 ++
>  libatomic/configure | 13 +
>  libbacktrace/configure  | 13 +
>  libcc1/configure| 26 ++
>  libffi/configure| 26 ++
>  libgfortran/configure   | 26 ++
>  libgm2/configure| 26 ++
>  libgo/config/libtool.m4 | 13 +
>  libgo/configure | 13 +
>  libgomp/configure   | 26 ++
>  libgrust/configure  | 26 ++
>  libitm/configure| 26 ++
>  libobjc/configure   | 13 +
>  libphobos/configure | 13 +
>  libquadmath/configure   | 13 +
>  libsanitizer/configure  | 26 ++
>  libssp/configure| 13 +
>  libstdc++-v3/configure  | 26 ++
>  libtool.m4  | 13 +
>  libvtv/configure| 26 ++
>  lto-plugin/configure| 13 +
>  zlib/configure  | 13 +
>  22 files changed, 429 insertions(+)
>
> diff --git a/gcc/configure b/gcc/configure
> index aaf5899cc03..beab6df1878 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -20472,6 +20472,18 @@ cygwin* | mingw* | pw32* | cegcc*)
>yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*)
>  library_names_spec='$libname.dll.a'
>  # DLL is installed to $(libdir)/../bin by postinstall_cmds
> +# If user builds GCC with mulitlibs enabled, it should just install on 
> $(libdir)
> +# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
> +if test ${multilib} = yes; then
> +postinstall_cmds='base_file=`basename \${file}`~
> +  dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
> \$dlname'\''`~
> +  dldir=$destdir/`dirname \$dlpath`~
> +  $install_prog $dir/$dlname $destdir/$dlname~
> +  chmod a+x $destdir/$dlname~
> +  if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
> +eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
> +  fi'
> +else
>  postinstall_cmds='base_file=`basename \${file}`~
>dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
> \$dlname'\''`~
>dldir=$destdir/`dirname \$dlpath`~
> @@ -20481,6 +20493,7 @@ cygwin* | mingw* | pw32* | cegcc*)
>if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
>  eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
>fi'
> +fi
>  postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo 
> \$dlname'\''`~
>dlpath=$dir/\$dldll~
> $RM \$dlpath'
> @@ -24200,6 +24213,18 @@ cygwin* | mingw* | pw32* | cegcc*)
>yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*)
>  library_names_spec='$libname.dll.a'
>  # DLL is installed to $(libdir)/../bin by postinstall_cmds
> +# If user builds GCC with mulitlibs enabled, it should just install on 
> $(libdir)
> +# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones.
> +if test ${multilib} = yes; then
> +postinstall_cmds='base_file=`basename \${file}`~
> +  dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
> \$dlname'\''`~
> +  dldir=$destdir/`dirname \$dlpath`~
> +  $install_prog $dir/$dlname $destdir/$dlname~
> +  chmod a+x $destdir/$dlname~
> +  if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
> +eval '\''$striplib $destdir/$dlname'\'' || exit \$?;
> +  fi'
> +else
>  postinstall_cmds='base_file=`basename \${file}`~
>dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo 
> \$dlname'\''`~
>dldir=$destdir/`dirname \$dlpath`~
> @@ -24209,6 +24234,7 @@ cygwin* | mingw* | pw32* | cegcc*)
>if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then
>  eval '\''$striplib \$dldir/$dlname'\'' || exit \$?;
>fi'
> +fi
>  postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo 
> \$dlname'\''`~
>

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-15 Thread Richard Biener
On Tue, 14 May 2024, Qing Zhao wrote:

> 
> 
> > On May 14, 2024, at 13:14, Richard Biener  wrote:
> > 
> > On Tue, 14 May 2024, Qing Zhao wrote:
> > 
> >> 
> >> 
> >>> On May 14, 2024, at 10:29, Richard Biener  wrote:
> >>> 
> > [...]
> >>> It would of course
> >>> need experimenting since we can end up moving stmts and merging blocks
> >>> though the linear traces created by jump threading should be quite
> >>> stable (as opposed to say the unrolling case where multiple instances
> >>> of the loop body likely will end up in the exact same basic block).
> >> 
> >> Do you mean, for loop unrolling the approach with one extra stmt for one 
> >> basic block might be even harder and unreliable?
> > 
> > The question is whether the stmt marks the whole block or whether we
> > for example add both a START and END stmt covering a copied path.
> > I would guess for unrolling we need definitely need to do the latter
> > (so we can diagnose "on the 3rd iteration of an unrolled loop" or
> > similar).
> 
> Okay. I see. 
> 
> Is it possible that the START and END stmts might be moved around and 
> out-of-place by the different optimizations?

There is nothign preventing stmts to be moved across START or END.

Richard.


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-14 Thread Richard Biener
On Tue, 14 May 2024, Kees Cook wrote:

> On Tue, May 14, 2024 at 02:17:16PM +, Qing Zhao wrote:
> > The current major issue with the warning is:  the constant index value 4
> > is not in the source code, it’s a compiler generated intermediate value
> > (even though it’s a correct value -:)). Such warning messages confuse
> > the end-users with information that cannot be connected directly to the
> > source code.
> 
> Right -- this "4" comes from -fsanitize=array-bounds (in "warn but
> continue" mode).
> 
> Now, the minimized PoC shows a situation that triggers the situation, but
> I think it's worth looking at the original code that caused this false
> positive:
> 
>   for (i = 0; i < sg->num_entries; i++) {
>   gce = >gce[i];
> 
> 
> The issue here is that sg->num_entries has already been bounds-checked
> in a separate function. As a result, the value range tracking for "i"
> here is unbounded.
> 
> Enabling -fsanitize=array-bounds means the sg->gce[i] access gets
> instrumented, and suddenly "i" gains an implicit range, induced by the
> sanitizer.
> 
> (I would point out that this is very similar to the problems we've had
> with -fsanitize=shift[1][2]: the sanitizer induces a belief about a
> given variable's range this isn't true.)
> 
> Now, there is an argument to be made that the original code should be
> doing:
> 
>   for (i = 0; i < 4 && i < sg->num_entries; i++) {
> 
> But this is:
> 
> a) logically redundant (Linux maintainers don't tend to like duplicating
>their range checking)
> 
> b) a very simple case
> 
> The point of the sanitizers is to catch "impossible" situations at
> run-time for the cases where some value may end up out of range. Having
> it _induce_ a range on the resulting code makes no sense.
> 
> Could we, perhaps, have sanitizer code not influence the value range
> tracking? That continues to look like the root cause for these things.

The sanitizer code adds checks that are not distinguishable from
user code exactly because we want value-range analysis to eventually
elide even (redundant) sanitizer checks.

I think the fix for the source when there's a-priori knowledge
of sg->num_entries is to assert that knowledge through language
features or when using C through GNU extensions like assert()
using __builtin_unreachable ().  That also serves documentation
purposes "this code expects sg->num_entries to be bounds-checked".

To me it doesn't make much sense to mix sanitizing of array
accesses and at the same time do -Warray-bound diagnostics.

Note I tried to teach jump threading to be less aggressive
threading paths to exceptional situations (I think the
sanitizer runtime calls are at least marked unlikely), but
the comment was that even those are very much desired but
I can't remember the details.  This was as part of PR111515
but I think I've run into this earlier when trying to
improve back_threader_profitability::possibly_profitable_path_p.
There we have

  /* Threading is profitable if the path duplicated is hot but also
 in a case we separate cold path from hot path and permit optimization
 of the hot path later.  Be on the agressive side here. In some testcases,
 as in PR 78407 this leads to noticeable improvements.  */

here we have

  if (A)
unlikely();
  B;
  if (A)
unlikely ();

and we choose to perform that path separation which optimizes
the not exceptional path which automatically separates the
exceptional path as well.

IMO that sanitizer mode that continues running is bad - it makes
the compiler aware of undefined behavior and make the code run
into it with open eyes.  You get what you asked for.

Richard.

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-14 Thread Richard Biener
On Tue, 14 May 2024, Qing Zhao wrote:

> 
> 
> > On May 14, 2024, at 10:29, Richard Biener  wrote:
> > 
[...]
> >  It would of course
> > need experimenting since we can end up moving stmts and merging blocks
> > though the linear traces created by jump threading should be quite
> > stable (as opposed to say the unrolling case where multiple instances
> > of the loop body likely will end up in the exact same basic block).
> 
> Do you mean, for loop unrolling the approach with one extra stmt for one 
> basic block might be even harder and unreliable?

The question is whether the stmt marks the whole block or whether we
for example add both a START and END stmt covering a copied path.
I would guess for unrolling we need definitely need to do the latter
(so we can diagnose "on the 3rd iteration of an unrolled loop" or
similar).

Richard.



Re: [PATCH v5 5/5] Add documentation for musttail attribute

2024-05-14 Thread Richard Biener
On Tue, May 14, 2024 at 6:30 PM Andi Kleen  wrote:
>
> > Looks generally OK though does this mean people can debug
> > programs using [[gnu::musttail]] only with optimized builds?  It
> > seems to me we should try harder to make [[gnu::musttail]] work
> > at -O0 and generally behave the same at all optimization levels?
>
> Yes that's a fair point. The problem is tree-tailcall failing,
> not the RTL backend. Have to see what it would take to fix. I would
> prefer to do this as a followon patch though.

Btw, -Og also doesn't run the tail-calls pass.  I think we should at least
try to run the pass at -Og and -O0 somewhere before pass_expand
and in find_tail_calls simply only consider the musttail annotated calls
in this case?  It should be reasonably cheap to walk the return stmts
of each function.

I'm not sure why we run pass_tail_calls so "early" and within the
regular post-IPA optimization pipeline rather than in the pre-expand
set of passes common to all optimization levels.  Possibly simply
moving the pass works out already.

Richard.

>
> -Andi


[PATCH][v2] tree-optimization/99954 - redo loop distribution memcpy recognition fix

2024-05-14 Thread Richard Biener
The following revisits the fix for PR99954 which was observed as
causing missed memcpy recognition and instead using memmove for
non-aliasing copies.  While the original fix mitigated bogus
recognition of memcpy the root cause was not properly identified.
The root cause is dr_analyze_indices "failing" to handle union
references and leaving the DRs indices in a state that's not correctly
handled by dr_may_alias.  The following mitigates this there
appropriately, restoring memcpy recognition for non-aliasing copies.

This makes us run into a latent issue in ptr_deref_may_alias_decl_p
when the pointer is something like [0].a in which case we fail
to handle non-SSA name pointers.  Add code similar to what we have
in ptr_derefs_may_alias_p.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

PR tree-optimization/99954
* tree-data-ref.cc (dr_may_alias_p): For bases that are
not completely analyzed fall back to TBAA and points-to.
* tree-loop-distribution.cc
(loop_distribution::classify_builtin_ldst): When there
is no dependence again classify as memcpy.
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.

* gcc.dg/tree-ssa/ldist-40.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c | 10 ++
 gcc/tree-data-ref.cc | 22 ++
 gcc/tree-loop-distribution.cc|  4 ++--
 gcc/tree-ssa-alias.cc|  5 +
 4 files changed, 39 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c
new file mode 100644
index 000..238a0098352
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ldist-details" } */
+
+void copy_a_to_b (char * __restrict b, char * a, int n)
+{
+  for (int i = 0; i < n; ++i)
+b[i] = a[i];
+}
+
+/* { dg-final { scan-tree-dump "generated memcpy" "ldist" } } */
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index f37734b5340..db15ddb43de 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -3066,6 +3066,28 @@ dr_may_alias_p (const struct data_reference *a, const 
struct data_reference *b,
return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
   TREE_OPERAND (addr_b, 0));
 }
+  /* If dr_analyze_innermost failed to handle a component we are
+ possibly left with a non-base in which case we didn't analyze
+ a possible evolution of the base when analyzing a loop.  */
+  else if (loop_nest
+  && (handled_component_p (addr_a) || handled_component_p (addr_b)))
+{
+  /* For true dependences we can apply TBAA.  */
+  if (flag_strict_aliasing
+ && DR_IS_WRITE (a) && DR_IS_READ (b)
+ && !alias_sets_conflict_p (get_alias_set (DR_REF (a)),
+get_alias_set (DR_REF (b
+   return false;
+  if (TREE_CODE (addr_a) == MEM_REF)
+   return ptr_derefs_may_alias_p (TREE_OPERAND (addr_a, 0),
+  build_fold_addr_expr (addr_b));
+  else if (TREE_CODE (addr_b) == MEM_REF)
+   return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
+  TREE_OPERAND (addr_b, 0));
+  else
+   return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
+  build_fold_addr_expr (addr_b));
+}
 
   /* Otherwise DR_BASE_OBJECT is an access that covers the whole object
  that is being subsetted in the loop nest.  */
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 45932bae5e7..668dc420449 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -1840,11 +1840,11 @@ loop_distribution::classify_builtin_ldst (loop_p loop, 
struct graph *rdg,
   /* Now check that if there is a dependence.  */
   ddr_p ddr = get_data_dependence (rdg, src_dr, dst_dr);
 
-  /* Classify as memmove if no dependence between load and store.  */
+  /* Classify as memcpy if no dependence between load and store.  */
   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
 {
   partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, 
size);
-  partition->kind = PKIND_MEMMOVE;
+  partition->kind = PKIND_MEMCPY;
   return;
 }
 
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index e7c1c1aa624..374ba04e6fd 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -294,6 +294,11 @@ ptr_deref_may_alias_decl_p (tree ptr, tree decl)
   if (!may_be_aliased (decl))
 return false;
 
+  /* From here we require a SSA name pointer.  Anything else aliases.  */
+  if (TREE_CODE (ptr) != SSA_NAME
+  || !POINTER_TYPE_P (TREE_TYPE (ptr)))
+return true;
+
   /* If we do not 

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-14 Thread Richard Biener
On Tue, 14 May 2024, Qing Zhao wrote:

> 
> 
> > On May 14, 2024, at 09:08, Richard Biener  wrote:
> > 
> > On Mon, 13 May 2024, Qing Zhao wrote:
> > 
> >> -Warray-bounds is an important option to enable linux kernal to keep
> >> the array out-of-bound errors out of the source tree.
> >> 
> >> However, due to the false positive warnings reported in PR109071
> >> (-Warray-bounds false positive warnings due to code duplication from
> >> jump threading), -Warray-bounds=1 cannot be added on by default.
> >> 
> >> Although it's impossible to elinimate all the false positive warnings
> >> from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> >> documentation says "always out of bounds"), we should minimize the
> >> false positive warnings in -Warray-bounds=1.
> >> 
> >> The root reason for the false positive warnings reported in PR109071 is:
> >> 
> >> When the thread jump optimization tries to reduce the # of branches
> >> inside the routine, sometimes it needs to duplicate the code and
> >> split into two conditional pathes. for example:
> >> 
> >> The original code:
> >> 
> >> void sparx5_set (int * ptr, struct nums * sg, int index)
> >> {
> >>  if (index >= 4)
> >>warn ();
> >>  *ptr = 0;
> >>  *val = sg->vals[index];
> >>  if (index >= 4)
> >>warn ();
> >>  *ptr = *val;
> >> 
> >>  return;
> >> }
> >> 
> >> With the thread jump, the above becomes:
> >> 
> >> void sparx5_set (int * ptr, struct nums * sg, int index)
> >> {
> >>  if (index >= 4)
> >>{
> >>  warn ();
> >>  *ptr = 0; // Code duplications since "warn" does return;
> >>  *val = sg->vals[index]; // same this line.
> >> // In this path, since it's under the condition
> >> // "index >= 4", the compiler knows the value
> >> // of "index" is larger then 4, therefore the
> >> // out-of-bound warning.
> >>  warn ();
> >>}
> >>  else
> >>{
> >>  *ptr = 0;
> >>  *val = sg->vals[index];
> >>}
> >>  *ptr = *val;
> >>  return;
> >> }
> >> 
> >> We can see, after the thread jump optimization, the # of branches inside
> >> the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> >> code duplication (which is needed for the correctness of the code), we
> >> got a false positive out-of-bound warning.
> >> 
> >> In order to eliminate such false positive out-of-bound warning,
> >> 
> >> A. Add one more flag for GIMPLE: is_splitted.
> >> B. During the thread jump optimization, when the basic blocks are
> >>   duplicated, mark all the STMTs inside the original and duplicated
> >>   basic blocks as "is_splitted";
> >> C. Inside the array bound checker, add the following new heuristic:
> >> 
> >> If
> >>   1. the stmt is duplicated and splitted into two conditional paths;
> >> +  2. the warning level < 2;
> >> +  3. the current block is not dominating the exit block
> >> Then not report the warning.
> >> 
> >> The false positive warnings are moved from -Warray-bounds=1 to
> >> -Warray-bounds=2 now.
> >> 
> >> Bootstrapped and regression tested on both x86 and aarch64. adjusted
> >> -Warray-bounds-61.c due to the false positive warnings.
> >> 
> >> Let me know if you have any comments and suggestions.
> > 
> > At the last Cauldron I talked with David Malcolm about these kind of
> > issues and thought of instead of suppressing diagnostics to record
> > how a block was duplicated.  For jump threading my idea was to record
> > the condition that was proved true when entering the path and do this
> > by recording the corresponding locations so that in the end we can
> > use the diagnostic-path infrastructure to say
> > 
> > warning: array index always above array bounds
> > events 1:
> > 
> > | 3 |  if (index >= 4)
> > |
> >(1) when index >= 4

As it's been quite some time I think I remeber that I thought of
constructing the diagnostic path at jump threading time and associating
that with the location.  But I don't remember exactly where I wanted to
put it - I think it was on an extra stmt to avoid having too many
ad-hoc locations as I'm not sur

Re: [PATCH 2/4] libcpp/init: remove unnecessary `struct` keyword

2024-05-14 Thread Richard Biener
On Sat, May 4, 2024 at 5:06 PM Ben Boeckel  wrote:
>
> The initial P1689 patches were written in 2019 and ended up having code
> move around over time ended up introducing a `struct` keyword to the
> implementation of `cpp_finish`. Remove it to match the rest of the file
> and its declaration in the header.
>
> Fixes: 024f135a1e9 (p1689r5: initial support, 2023-09-01)
>
> Reported-by: Roland Illig 

OK.

Thanks,
Richard.

> libcpp/
>
> * init.cc (cpp_finish): Remove unnecessary `struct` keyword.
>
> Signed-off-by: Ben Boeckel 
> ---
>  libcpp/init.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libcpp/init.cc b/libcpp/init.cc
> index 54fc9236d38..cbd22249b04 100644
> --- a/libcpp/init.cc
> +++ b/libcpp/init.cc
> @@ -862,7 +862,7 @@ read_original_directory (cpp_reader *pfile)
> Maybe it should also reset state, such that you could call
> cpp_start_read with a new filename to restart processing.  */
>  void
> -cpp_finish (struct cpp_reader *pfile, FILE *deps_stream, FILE *fdeps_stream)
> +cpp_finish (cpp_reader *pfile, FILE *deps_stream, FILE *fdeps_stream)
>  {
>/* Warn about unused macros before popping the final buffer.  */
>if (CPP_OPTION (pfile, warn_unused_macros))
> --
> 2.44.0
>


Re: [PATCH v5 5/5] Add documentation for musttail attribute

2024-05-14 Thread Richard Biener
On Sun, May 5, 2024 at 8:16 PM Andi Kleen  wrote:
>
> gcc/ChangeLog:
>
> * doc/extend.texi: Document [[musttail]]
> ---
>  gcc/doc/extend.texi | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index e290265d68d3..deb100ad93b6 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -9839,7 +9839,7 @@ same manner as the @code{deprecated} attribute.
>  @section Statement Attributes
>  @cindex Statement Attributes
>
> -GCC allows attributes to be set on null statements.  @xref{Attribute Syntax},
> +GCC allows attributes to be set on statements.  @xref{Attribute Syntax},
>  for details of the exact syntax for using attributes.  Other attributes are
>  available for functions (@pxref{Function Attributes}), variables
>  (@pxref{Variable Attributes}), labels (@pxref{Label Attributes}), enumerators
> @@ -9896,6 +9896,22 @@ foo (int x, int y)
>  @code{y} is not actually incremented and the compiler can but does not
>  have to optimize it to just @code{return 42 + 42;}.
>
> +@cindex @code{musttail} statement attribute
> +@item musttail
> +
> +The @code{gnu::musttail} or @code{clang::musttail} attribute
> +can be applied to a @code{return} statement with a return-value expression
> +that is a function call.  It asserts that the call must be a tail call that
> +does not allocate extra stack space.
> +
> +@smallexample
> +[[gnu::musttail]] return foo();
> +@end smallexample
> +
> +If the compiler cannot generate a tail call it generates
> +an error. Tail calls generally require enabling optimization.
> +On some targets they may not be supported.

Looks generally OK though does this mean people can debug
programs using [[gnu::musttail]] only with optimized builds?  It
seems to me we should try harder to make [[gnu::musttail]] work
at -O0 and generally behave the same at all optimization levels?

> +
>  @end table
>
>  @node Attribute Syntax
> @@ -10019,7 +10035,9 @@ the constant expression, if present.
>
>  @subsubheading Statement Attributes
>  In GNU C, an attribute specifier list may appear as part of a null
> -statement.  The attribute goes before the semicolon.
> +statement. The attribute goes before the semicolon.
> +Some attributes in new style syntax are also supported
> +on non-null statements.
>
>  @subsubheading Type Attributes
>
> --
> 2.44.0
>


Re: [PATCH v5 1/5] Improve must tail in RTL backend

2024-05-14 Thread Richard Biener
On Sun, May 5, 2024 at 8:16 PM Andi Kleen  wrote:
>
> - Give error messages for all causes of non sibling call generation
> - Don't override choices of other non sibling call checks with
> must tail. This causes ICEs. The must tail attribute now only
> overrides flag_optimize_sibling_calls locally.
> - Error out when tree-tailcall failed to mark a must-tail call
> sibcall. In this case it doesn't know the true reason and only gives
> a vague message (this could be improved, but it's already useful without
> that) tree-tailcall usually fails without optimization, so must
> adjust the existing must-tail plugin test to specify -O2.
>
> PR83324
>
> gcc/ChangeLog:
>
> * calls.cc (expand_call): Fix mustcall implementation.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/plugin/must-tail-call-1.c: Adjust.
> ---
>  gcc/calls.cc  | 30 ---
>  .../gcc.dg/plugin/must-tail-call-1.c  |  1 +
>  2 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 21d78f9779fe..a6b8ee44cc29 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -2650,7 +2650,9 @@ expand_call (tree exp, rtx target, int ignore)
>/* The type of the function being called.  */
>tree fntype;
>bool try_tail_call = CALL_EXPR_TAILCALL (exp);
> -  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
> +  /* tree-tailcall decided not to do tail calls. Error for the musttail 
> case.  */
> +  if (!try_tail_call)
> +  maybe_complain_about_tail_call (exp, "other reasons");
>int pass;
>
>/* Register in which non-BLKmode value will be returned,
> @@ -3022,10 +3024,22 @@ expand_call (tree exp, rtx target, int ignore)
>   pushed these optimizations into -O2.  Don't try if we're already
>   expanding a call, as that means we're an argument.  Don't try if
>   there's cleanups, as we know there's code to follow the call.  */
> -  if (currently_expanding_call++ != 0
> -  || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
> -  || args_size.var
> -  || dbg_cnt (tail_call) == false)
> +  if (currently_expanding_call++ != 0)
> +{
> +  maybe_complain_about_tail_call (exp, "inside another call");
> +  try_tail_call = 0;
> +}
> +  if (!flag_optimize_sibling_calls
> +   && !CALL_FROM_THUNK_P (exp)
> +   && !CALL_EXPR_MUST_TAIL_CALL (exp))
> +try_tail_call = 0;
> +  if (args_size.var)

If we are both inside another call and run into this we give two errors,
but I guess that's OK ...

> +{
> +  /* ??? correct message?  */
> +  maybe_complain_about_tail_call (exp, "stack space needed");

args_size.var != NULL_TREE means the argument size is not constant.
I'm quite sure this is an overly conservative check.

> +  try_tail_call = 0;
> +}
> +  if (dbg_cnt (tail_call) == false)
>  try_tail_call = 0;
>
>/* Workaround buggy C/C++ wrappers around Fortran routines with
> @@ -3046,15 +3060,11 @@ expand_call (tree exp, rtx target, int ignore)
> if (MEM_P (*iter))
>   {
> try_tail_call = 0;
> +   maybe_complain_about_tail_call (exp, "hidden string length 
> argument");

"hidden string length argument passed on stack"

from what I read the code.

> break;
>   }
> }
>
> -  /* If the user has marked the function as requiring tail-call
> - optimization, attempt it.  */
> -  if (must_tail_call)
> -try_tail_call = 1;
> -
>/*  Rest of purposes for tail call optimizations to fail.  */
>if (try_tail_call)
>  try_tail_call = can_implement_as_sibling_call_p (exp,
> diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c 
> b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> index 3a6d4cceaba7..44af361e2925 100644
> --- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> +++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile { target tail_call } } */
> +/* { dg-options "-O2" } */

So I think this is unfortunate - I think when there's a must-tail attribute
we should either run the tailcall pass to check the call even at -O0 or
trust the user with correctness  (hoping no optimization interfered with
the ability to tail-call).

What were the ICEs you ran into?

I would guess it's for example problematic to duplicate must-tail calls?

Thanks,
Richard.

>  /* { dg-options "-fdelayed-branch" { target sparc*-*-* } } */
>
>  extern void abort (void);
> --
> 2.44.0
>


Re: Ping [PATCH/RFC] target, hooks: Allow a target to trap on unreachable [PR109267].

2024-05-14 Thread Richard Biener
On Wed, May 8, 2024 at 9:37 PM Iain Sandoe  wrote:
>
> Hi Folks,
>
> I’d like to land a viable solution to this issue if possible, (it is a show-
> stopper for the aarch64-darwin development branch).

I was looking as to how we handle __builtin_trap (whether we have an
optab for it) - we seem to use two target hooks, have_trap () and
gen_trap () to expand it (and fall back to a call to abort()).  So I guess
your target hook is reasonable though I'd name it
expand_unreachable_as_trap maybe (well, that's now bikeshedding).

Is this all still required or is there a workaround you can apply at
mdreorg or bb-reorder time to avoid expanding _all_ unreachable()s
as traps?

> > On 9 Apr 2024, at 14:55, Iain Sandoe  wrote:
> >
> > So far, tested lightly on aarch64-darwin; if this is acceptable then
> > it will be possible to back out of the ad hoc fixes used on x86 and
> > powerpc darwin.
> > Comments welcome, thanks,
>
> @Andrew - you were also (at one stage) talking about some ideas about
> how to handle this is in the middle end.
> Is that something you are likely to have time to do?
> Would it still be reasonable to have a target hook to control the behaviour.
> (the implementation below allows one to make the effect per TU)
>
>
> > Iain
> >
> > --- 8< ---
> >
> >
> > In the PR cited case a target linker cannot handle enpty FDEs,
> > arguably this is a linker bug - but in some cases we might still
> > wish to work around it.
> >
> > In the case of Darwin, the ABI does not allow two global symbols
> > to have the same address, so that emitting empty functions has
> > potential (almost guarantee) to break ABI.
> >
> > This patch allows a target to ask that __builtin_unreachable is
> > expanded in the same way as __builtin_trap (either to a trap
> > instruction or to abort() if there is no such insn).
> >
> > This means that the middle end's use of unreachability for
> > optimisation should not be altered.
> >
> > __builtin_unreachble is currently expanded to a barrier and
> > __builtin_trap is expanded to a trap insn + a barrier so that it
> > seems we should not be unduly affecting RTL optimisations.
> >
> > For Darwin, we enable this by default, but allow it to be disabled
> > per TU using -mno-unreachable-traps.
> >
> >   PR middle-end/109267
> >
> > gcc/ChangeLog:
> >
> >   * builtins.cc (expand_builtin_unreachable): Allow for
> >   a target to expand this as a trap.
> >   * config/darwin-protos.h (darwin_unreachable_traps_p): New.
> >   * config/darwin.cc (darwin_unreachable_traps_p): New.
> >   * config/darwin.h (TARGET_UNREACHABLE_SHOULD_TRAP): New.
> >   * config/darwin.opt (munreachable-traps): New.
> >   * doc/invoke.texi: Document -munreachable-traps.
> >   * doc/tm.texi: Regenerate.
> >   * doc/tm.texi.in: Document TARGET_UNREACHABLE_SHOULD_TRAP.
> >   * target.def (TARGET_UNREACHABLE_SHOULD_TRAP): New hook.
> >
> > Signed-off-by: Iain Sandoe 
> > ---
> > gcc/builtins.cc|  7 +++
> > gcc/config/darwin-protos.h |  1 +
> > gcc/config/darwin.cc   |  7 +++
> > gcc/config/darwin.h|  4 
> > gcc/config/darwin.opt  |  4 
> > gcc/doc/invoke.texi|  7 ++-
> > gcc/doc/tm.texi|  5 +
> > gcc/doc/tm.texi.in |  2 ++
> > gcc/target.def | 10 ++
> > 9 files changed, 46 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index f8d94c4b435..13f321b6be6 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -5929,6 +5929,13 @@ expand_builtin_trap (void)
> > static void
> > expand_builtin_unreachable (void)
> > {
> > +  /* If the target wants a trap in place of the fall-through, use that.  */
> > +  if (targetm.unreachable_should_trap ())
> > +{
> > +  expand_builtin_trap ();
> > +  return;
> > +}
> > +
> >   /* Use gimple_build_builtin_unreachable or builtin_decl_unreachable
> >  to avoid this.  */
> >   gcc_checking_assert (!sanitize_flags_p (SANITIZE_UNREACHABLE));
> > diff --git a/gcc/config/darwin-protos.h b/gcc/config/darwin-protos.h
> > index b67e05264e1..48a32b2ccc2 100644
> > --- a/gcc/config/darwin-protos.h
> > +++ b/gcc/config/darwin-protos.h
> > @@ -124,6 +124,7 @@ extern void darwin_enter_string_into_cfstring_table 
> > (tree);
> > extern void darwin_asm_output_anchor (rtx symbol);
> > extern bool darwin_use_anchors_for_symbol_p (const_rtx symbol);
> > extern bool darwin_kextabi_p (void);
> > +extern bool darwin_unreachable_traps_p (void);
> > extern void darwin_override_options (void);
> > extern void darwin_patch_builtins (void);
> > extern void darwin_rename_builtins (void);
> > diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
> > index dcfccb4952a..018547d09c6 100644
> > --- a/gcc/config/darwin.cc
> > +++ b/gcc/config/darwin.cc
> > @@ -3339,6 +3339,13 @@ darwin_kextabi_p (void) {
> >   return flag_apple_kext;
> > }
> >
> > +/* True, iff we want to map __builtin_unreachable to a trap.  */
> > 

Re: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int

2024-05-14 Thread Richard Biener
On Mon, May 6, 2024 at 4:49 PM  wrote:
>
> From: Pan Li 
>
> This patch depends on below scalar enabling patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650822.html
>
> For vectorize, we leverage the existing vect pattern recog to find
> the pattern similar to scalar and let the vectorizer to perform
> the rest part for standard name usadd3 in vector mode.
> The riscv vector backend have insn "Vector Single-Width Saturating
> Add and Subtract" which can be leveraged when expand the usadd3
> in vector mode.  For example:
>
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   unsigned i;
>
>   for (i = 0; i < n; i++)
> out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> }
>
> Before this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
>   ivtmp_58 = _80 * 8;
>   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
>   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
>   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
>   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
>   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, 
> vect__7.11_66);
>   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
>   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
>   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
>   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
>   ivtmp_79 = ivtmp_78 - _80;
>   ...
> }
>
> After this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
>   ivtmp_46 = _62 * 8;
>   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
>   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
>   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
>   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
>   ...
> }
>
> The below test suites are passed for this patch.
> * The riscv fully regression tests.
> * The aarch64 fully regression tests.
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
>
> PR target/51492
> PR target/112600
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New func
> decl generated by match.pd match.
> (vect_recog_sat_add_pattern): New func impl to recog the pattern
> for unsigned SAT_ADD.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 51 +++
>  1 file changed, 51 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 87c2acff386..8ffcaf71d5c 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4487,6 +4487,56 @@ vect_recog_mult_pattern (vec_info *vinfo,
>return pattern_stmt;
>  }
>
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> + *   _7 = _4 + _6;
> + *   _8 = _4 > _7;
> + *   _9 = (long unsigned int) _8;
> + *   _10 = -_9;
> + *   _12 = _7 | _10;
> + *
> + * And then simplied to
> + *   _12 = .SAT_ADD (_4, _6);
> + */
> +
> +static gimple *
> +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> +   tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +
> +  if (!is_gimple_assign (last_stmt))
> +return NULL;
> +
> +  tree res_ops[2];
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> +{
> +  tree itype = TREE_TYPE (res_ops[0]);
> +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +  if (vtype != NULL_TREE && direct_internal_fn_supported_p (
> +   IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))

Please break the line before the && instead, like

  if (vtype != NULL_TREE
  && direct_internal_fn_supported_p (...

Otherwise this is OK once 1/3 is approved.

Thanks,
Richard.

> +   {
> + *type_out = vtype;
> + gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, 
> res_ops[0],
> +   res_ops[1]);
> +
> + gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> + gimple_call_set_nothrow (call, /* nothrow_p */ false);
> + gimple_set_location (call, gimple_location (last_stmt));
> +
> + vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> + return call;
> +   }
> +}
> +
> +  return NULL;
> +}
> +
>  /* Detect a signed division by a constant that wouldn't be
> otherwise vectorized:
>
> @@ -6987,6 +7037,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>{ vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>{ 

Re: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-14 Thread Richard Biener
On Mon, May 6, 2024 at 4:48 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
>
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
>
> Take uint8_t as example, we will have:
>
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
>
> Given below example for the unsigned scalar integer uint64_t:
>
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
>
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;succ:   EXIT
>
> }
>
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;succ:   EXIT
> }
>
> We perform the tranform during widen_mult because that the sub-expr of
> SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> cannot perform the .SAT_ADD pattern match as the sub-expr will be
> optmized to .ADD_OVERFLOW first.
>
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 2. The aarch64 fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> PR target/51492
> PR target/112600
>
> gcc/ChangeLog:
>
> * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> to the return true switch case(s).
> * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> * match.pd: Add unsigned SAT_ADD match.
> * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
> func decl generated in match.pd match.
> (match_saturation_arith): New func impl to match the saturation arith.
> (math_opts_dom_walker::after_dom_children): Try match saturation
> arith.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc|  1 +
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  | 28 
>  gcc/optabs.def|  4 ++--
>  gcc/tree-ssa-math-opts.cc | 46 +++
>  5 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>  case IFN_UBSAN_CHECK_MUL:
>  case IFN_ADD_OVERFLOW:
>  case IFN_MUL_OVERFLOW:
> +case IFN_SAT_ADD:
>  case IFN_VEC_WIDEN_PLUS:
>  case IFN_VEC_WIDEN_PLUS_LO:
>  case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | 
> ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
>   smulhrs, umulhrs, binary)
>
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, 
> binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..7058e4cbe29 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> || POINTER_TYPE_P (itype))
>&& wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
>
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0))
> +  && types_match (type, TREE_TYPE (@0))
> +  && types_match (type, TREE_TYPE (@1)
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0))
> +  && types_match (type, TREE_TYPE (@0))
> +  && types_match (type, TREE_TYPE (@1)
> +
> +(match (usadd_right_part @0 @1)

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-14 Thread Richard Biener
; +  ua3_a0.a2[i] = 0;   // { dg-bogus "\\\[-Warray-bounds" }
>  
>if (i > -1)
>  i = -1;
> diff --git a/gcc/testsuite/gcc.dg/pr109071-1.c 
> b/gcc/testsuite/gcc.dg/pr109071-1.c
> new file mode 100644
> index ..a405c80bd549
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109071-1.c
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
> +   due to code duplication from jump threading 
> +   { dg-do compile }
> +   { dg-options "-O2 -Warray-bounds=2" }
> + */
> +
> +extern void warn(void);
> +static inline void assign(int val, int *regs, int index)
> +{
> +  if (index >= 4)
> +warn();
> +  *regs = val;
> +}
> +struct nums {int vals[4];};
> +
> +void sparx5_set (int *ptr, struct nums *sg, int index)
> +{
> +  int *val = >vals[index]; /* { dg-warning "is above array bounds" } */
> +
> +  assign(0,ptr, index);
> +  assign(*val, ptr, index);
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr109071.c b/gcc/testsuite/gcc.dg/pr109071.c
> new file mode 100644
> index ..782dfad84ea2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr109071.c
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/109071 -Warray-bounds false positive warnings
> +   due to code duplication from jump threading 
> +   { dg-do compile }
> +   { dg-options "-O2 -Wall" }
> + */
> +
> +extern void warn(void);
> +static inline void assign(int val, int *regs, int index)
> +{
> +  if (index >= 4)
> +warn();
> +  *regs = val;
> +}
> +struct nums {int vals[4];};
> +
> +void sparx5_set (int *ptr, struct nums *sg, int index)
> +{
> +  int *val = >vals[index]; /* { dg-bogus "is above array bounds" } */
> +
> +  assign(0,ptr, index);
> +  assign(*val, ptr, index);
> +}
> diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
> index fa61ba9512b7..9f338dd4d54d 100644
> --- a/gcc/tree-ssa-threadupdate.cc
> +++ b/gcc/tree-ssa-threadupdate.cc
> @@ -2371,6 +2371,17 @@ back_jt_path_registry::adjust_paths_after_duplication 
> (unsigned curr_path_num)
>  }
>  }
>  
> +/* Set all the stmts in the basic block BB as IS_SPLITTED.  */
> +
> +static void
> +set_stmts_in_bb_is_splitted (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi;
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> +gimple_set_is_splitted (gsi_stmt (gsi), true);
> +  return;
> +}
> +
>  /* Duplicates a jump-thread path of N_REGION basic blocks.
> The ENTRY edge is redirected to the duplicate of the region.
>  
> @@ -2418,6 +2429,10 @@ back_jt_path_registry::duplicate_thread_path (edge 
> entry,
>basic_block *region_copy = XNEWVEC (basic_block, n_region);
>copy_bbs (region, n_region, region_copy, , 1, _copy, loop,
>   split_edge_bb_loc (entry), false);
> +  /* Mark all the stmts in both original and copied basic blocks
> + as IS_SPLITTED.  */
> +  set_stmts_in_bb_is_splitted (*region);
> +  set_stmts_in_bb_is_splitted (*region_copy);
>  
>/* Fix up: copy_bbs redirects all edges pointing to copied blocks.  The
>   following code ensures that all the edges exiting the jump-thread path 
> are
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [COMMITTED 2/5] Fix ranger when called from SCEV.

2024-05-14 Thread Richard Biener
On Mon, May 13, 2024 at 8:28 PM Jan-Benedict Glaw  wrote:
>
> On Mon, 2024-05-13 20:19:42 +0200, Jan-Benedict Glaw  
> wrote:
> > On Tue, 2024-04-30 17:24:15 -0400, Andrew MacLeod  
> > wrote:
> > > Bootstrapped on x86_64-pc-linux-gnu with no regressions.  pushed.
> >
> > Starting with this patch (upstream as
> > e8ae56a7dc46e39a48017bb5159e4dc672ec7fad, can still be reproduced with
> > 0c585c8d0dd85601a8d116ada99126a48c8ce9fd as of May 13th), my CI builds fail 
> > for
> > csky-elf in all-target-libgcc by falling into a loop infinite loop:

Does the CI build GCC for the host and then use that compiler to build
the csky cross?  That said,
I can't see how the ref (or wasn't this a bisect?) can cause an issue
in LRA when building a cross-compiler.

Richard.

> > ../gcc/configure '--with-pkgversion=basepoints/gcc-15-432-g0c585c8d0dd, 
> > built at 1715608899'  \
> >   --prefix=/tmp/gcc-csky-elf --enable-werror-always 
> > --enable-languages=all\
> >   --disable-gcov --disable-shared --disable-threads --target=csky-elf 
> > --without-headers
> > make V=1 all-gcc
> > make V=1 install-strip-gcc
> > make V=1 all-target-libgcc
>
> Just to add:
>
> /var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/cc1 -quiet 
>   \
> -I . -I . -I ../../.././gcc -I ../../../../gcc/libgcc 
>   \
> -I ../../../../gcc/libgcc/. -I ../../../../gcc/libgcc/../gcc  
>   \
> -I ../../../../gcc/libgcc/../include -imultilib ck801 
>   \
> -iprefix 
> /var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/gcc/../lib/gcc/csky-elf/15.0.0/
>\
> -isystem 
> /var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/include
>  \
> -isystem 
> /var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/include-fixed  
>  \
> -MD unwind-dw2-fde.d -MF unwind-dw2-fde.dep -MP -MT unwind-dw2-fde.o  
>   \
> -D IN_GCC -D CROSS_DIRECTORY_STRUCTURE -D IN_LIBGCC2 -D inhibit_libc  
>   \
> -D HAVE_CC_TLS -D USE_EMUTLS -D HIDE_EXPORTS  
>   \
> -isystem /tmp/gcc-csky-elf/csky-elf/include   
>   \
> -isystem /tmp/gcc-csky-elf/csky-elf/sys-include   
>   \
> -isystem ./include ../../../../gcc/libgcc/unwind-dw2-fde.c -quiet 
>   \
> -dumpbase unwind-dw2-fde.c -dumpbase-ext .c -mcpu=ck801 -g -g -g -O2 
> -O2 -O2\
> -Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual 
> -Wstrict-prototypes\
> -Wmissing-prototypes -Wold-style-definition -fbuilding-libgcc 
> -fno-stack-protector  \
> -fexceptions -fvisibility=hidden -o /tmp/cc3SHedS.s
>
> > (gdb) bt
> > #0  0x0098f1df in bitmap_list_find_element (head=0x38f2e18, 
> > indx=5001) at ../../gcc/gcc/bitmap.cc:375
> > #1  bitmap_set_bit (head=0x38f2e18, bit=640244) at 
> > ../../gcc/gcc/bitmap.cc:962
> > #2  0x00d39cd1 in process_bb_lives (bb=, 
> > curr_point=@0x7ffe062c1b2c: 3039473, dead_insn_p=) at 
> > ../../gcc/gcc/lra-lives.cc:889
> > #3  lra_create_live_ranges_1 (all_p=all_p@entry=true, 
> > dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:1416
> > #4  0x00d3b810 in lra_create_live_ranges (all_p=all_p@entry=true, 
> > dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:1486
> > #5  0x00d1a8bd in lra (f=, verbose=) 
> > at ../../gcc/gcc/lra.cc:2482
> > #6  0x00cd0e18 in do_reload () at ../../gcc/gcc/ira.cc:5973
> > #7  (anonymous namespace)::pass_reload::execute (this=) at 
> > ../../gcc/gcc/ira.cc:6161
> > #8  0x00de6368 in execute_one_pass (pass=pass@entry=0x367c490) at 
> > ../../gcc/gcc/passes.cc:2647
> > #9  0x00de6c00 in execute_pass_list_1 (pass=0x367c490) at 
> > ../../gcc/gcc/passes.cc:2756
> > #10 0x00de6c12 in execute_pass_list_1 (pass=0x367b2f0) at 
> > ../../gcc/gcc/passes.cc:2757
> > #11 0x00de6c39 in execute_pass_list (fn=0x7f24a1c06240, 
> > pass=) at ../../gcc/gcc/passes.cc:2767
> > #12 0x00a188c6 in cgraph_node::expand (this=0x7f24a1bfaaa0) at 
> > ../../gcc/gcc/context.h:48
> > #13 cgraph_node::expand (this=0x7f24a1bfaaa0) at 
> > ../../gcc/gcc/cgraphunit.cc:1798
> > #14 0x00a1a69b in expand_all_functions () at 
> > ../../gcc/gcc/cgraphunit.cc:2028
> > #15 symbol_table::compile (this=0x7f24a205b000) at 
> > ../../gcc/gcc/cgraphunit.cc:2404
> > #16 0x00a1ccb8 in symbol_table::compile (this=0x7f24a205b000) at 
> > ../../gcc/gcc/cgraphunit.cc:2315
> > #17 symbol_table::finalize_compilation_unit (this=0x7f24a205b000) at 
> > ../../gcc/gcc/cgraphunit.cc:2589
> > #18 0x00f0932d in compile_file () at ../../gcc/gcc/toplev.cc:476
> > #19 0x00839648 in do_compile () at ../../gcc/gcc/toplev.cc:2158
> > #20 toplev::main 

Re: Avoid TYPE_MAIN_VARIANT compares in TBAA

2024-05-14 Thread Richard Biener
On Tue, 14 May 2024, Jan Hubicka wrote:

> Hi,
> while building more testcases for ipa-icf I noticed that there are two places
> in aliasing code where we still compare TYPE_MAIN_VARIANT for pointer 
> equality.
> This is not good idea for LTO since type merging may not happen for example
> when in one unit pointed to type is forward declared while in other it is 
> fully
> defined.  We have same_type_for_tbaa for that.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

Richard.

> gcc/ChangeLog:
> 
>   * alias.cc (reference_alias_ptr_type_1): Use view_converted_memref_p.
>   * alias.h (view_converted_memref_p): Declare.
>   * tree-ssa-alias.cc (view_converted_memref_p): Export.
>   (ao_compare::compare_ao_refs): Use same_type_for_tbaa.
> 
> diff --git a/gcc/alias.cc b/gcc/alias.cc
> index 808e2095d9b..853e84d7439 100644
> --- a/gcc/alias.cc
> +++ b/gcc/alias.cc
> @@ -770,10 +770,7 @@ reference_alias_ptr_type_1 (tree *t)
>/* If the innermost reference is a MEM_REF that has a
>   conversion embedded treat it like a VIEW_CONVERT_EXPR above,
>   using the memory access type for determining the alias-set.  */
> -  if (TREE_CODE (inner) == MEM_REF
> -  && (TYPE_MAIN_VARIANT (TREE_TYPE (inner))
> -   != TYPE_MAIN_VARIANT
> -(TREE_TYPE (TREE_TYPE (TREE_OPERAND (inner, 1))
> +  if (view_converted_memref_p (inner))
>  {
>tree alias_ptrtype = TREE_TYPE (TREE_OPERAND (inner, 1));
>/* Unless we have the (aggregate) effective type of the access
> diff --git a/gcc/alias.h b/gcc/alias.h
> index f8d93e8b5f4..36095f0bf73 100644
> --- a/gcc/alias.h
> +++ b/gcc/alias.h
> @@ -41,6 +41,7 @@ bool alias_ptr_types_compatible_p (tree, tree);
>  int compare_base_decls (tree, tree);
>  bool refs_same_for_tbaa_p (tree, tree);
>  bool mems_same_for_tbaa_p (rtx, rtx);
> +bool view_converted_memref_p (tree);
>  
>  /* This alias set can be used to force a memory to conflict with all
> other memories, creating a barrier across which no memory reference
> diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
> index e7c1c1aa624..632cf78028b 100644
> --- a/gcc/tree-ssa-alias.cc
> +++ b/gcc/tree-ssa-alias.cc
> @@ -2044,7 +2044,7 @@ decl_refs_may_alias_p (tree ref1, tree base1,
> which is done by ao_ref_base and thus one extra walk
> of handled components is needed.  */
>  
> -static bool
> +bool
>  view_converted_memref_p (tree base)
>  {
>if (TREE_CODE (base) != MEM_REF && TREE_CODE (base) != TARGET_MEM_REF)
> @@ -4325,8 +4325,8 @@ ao_compare::compare_ao_refs (ao_ref *ref1, ao_ref *ref2,
>else if ((end_struct_ref1 != NULL) != (end_struct_ref2 != NULL))
>  return flags | ACCESS_PATH;
>if (end_struct_ref1
> -  && TYPE_MAIN_VARIANT (TREE_TYPE (end_struct_ref1))
> -  != TYPE_MAIN_VARIANT (TREE_TYPE (end_struct_ref2)))
> +  && same_type_for_tbaa (TREE_TYPE (end_struct_ref1),
> +  TREE_TYPE (end_struct_ref2)) != 1)
>  return flags | ACCESS_PATH;
>  
>/* Now compare all handled components of the access path.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-05-14 Thread Richard Biener
;  }
>  
> +/* A subroutine of expand_vector_conversion, support indirect conversion for
> +   float <-> int, like char -> double.  */ bool 
> +expand_vector_conversion_no_vec_pack (gimple_stmt_iterator *gsi,
> +   enum tree_code code,
> +   tree lhs,
> +   tree arg)
> +{
> +  gimple *g;
> +  tree ret_type = TREE_TYPE (lhs);
> +  tree arg_type = TREE_TYPE (arg);
> +  tree new_rhs;
> +  enum {NARROW, NONE, WIDEN} modifier = NONE;
> +  enum tree_code code1 = ERROR_MARK;
> +  enum tree_code codecvt1 = ERROR_MARK;
> +  bool float_expr_p = code == FLOAT_EXPR;
> +
> +  if (supportable_convert_operation (code, ret_type, arg_type, ))
> +{
> +  g = gimple_build_assign (lhs, code1, arg);
> +  gsi_replace (gsi, g, false);
> +  return true;
> +}
> +
> +  unsigned int ret_elt_bits = vector_element_bits (ret_type);  unsigned 
> + int arg_elt_bits = vector_element_bits (arg_type);  if (ret_elt_bits < 
> + arg_elt_bits)
> +modifier = NARROW;
> +  else if (ret_elt_bits > arg_elt_bits)
> +modifier = WIDEN;
> +
> +  if (((code == FIX_TRUNC_EXPR && !flag_trapping_math && modifier == NARROW)
> +   || (code == FLOAT_EXPR && modifier == WIDEN)))
> +{
> +  unsigned short target_size;
> +  scalar_mode tmp_cvt_mode;
> +  scalar_mode lhs_mode = GET_MODE_INNER (TYPE_MODE (ret_type));
> +  scalar_mode rhs_mode = GET_MODE_INNER (TYPE_MODE (arg_type));
> +  tree cvt_type = NULL_TREE;
> +  if (modifier == NARROW)
> + {
> +   tmp_cvt_mode = lhs_mode;
> +   target_size = GET_MODE_SIZE (rhs_mode);
> + }
> +  else
> + {
> +   target_size = GET_MODE_SIZE (lhs_mode);
> +   int rhs_size = GET_MODE_BITSIZE (rhs_mode);
> +   if (!int_mode_for_size (rhs_size, 0).exists (_cvt_mode))
> + return false;
> + }
> +
> +  code1 = float_expr_p ? code : NOP_EXPR;
> +  codecvt1 = float_expr_p ? NOP_EXPR : code;
> +  opt_scalar_mode mode_iter;
> +  enum tree_code tc1, tc2;
> +  unsigned HOST_WIDE_INT nelts
> + = constant_lower_bound (TYPE_VECTOR_SUBPARTS (arg_type));
> +
> +  FOR_EACH_2XWIDER_MODE (mode_iter, tmp_cvt_mode)
> + {
> +   tmp_cvt_mode = mode_iter.require ();
> +
> +   if (GET_MODE_SIZE (tmp_cvt_mode) > target_size)
> + break;
> +
> +   scalar_mode cvt_mode;
> +   int tmp_cvt_size = GET_MODE_BITSIZE (tmp_cvt_mode);
> +   if (!int_mode_for_size (tmp_cvt_size, 0).exists (_mode))
> + break;
> +
> +   int cvt_size = GET_MODE_BITSIZE (cvt_mode);
> +   bool isUnsigned = TYPE_UNSIGNED (ret_type) || TYPE_UNSIGNED 
> (arg_type);
> +   cvt_type = build_nonstandard_integer_type (cvt_size, isUnsigned);
> +
> +   cvt_type = build_vector_type (cvt_type, nelts);
> +   if (cvt_type == NULL_TREE
> +   || !supportable_convert_operation ((tree_code) code1,
> +  ret_type,
> +  cvt_type, )
> +   || !supportable_convert_operation ((tree_code) codecvt1,
> +  cvt_type,
> +  arg_type, ))
> + continue;
> +
> +   new_rhs = make_ssa_name (cvt_type);
> +   g = vect_gimple_build (new_rhs, tc2, arg);
> +   gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +   g = gimple_build_assign (lhs, tc1, new_rhs);
> +   gsi_replace (gsi, g, false);
> +   return true;
> + }
> +}
> +  return false;
> +}
> +
>  /* Expand VEC_CONVERT ifn call.  */
>  
>  static void
> @@ -1871,14 +1969,11 @@ expand_vector_conversion (gimple_stmt_iterator *gsi)
>else if (ret_elt_bits > arg_elt_bits)
>  modifier = WIDEN;
>  
> +  if (expand_vector_conversion_no_vec_pack(gsi, code, lhs, arg))
> +return;
> +
>if (modifier == NONE && (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR))
>  {
> -  if (supportable_convert_operation (code, ret_type, arg_type, ))
> - {
> -   g = gimple_build_assign (lhs, code1, arg);
> -   gsi_replace (gsi, g, false);
> -   return;
> - }
>/* Can't use get_compute_type here, as supportable_convert_operation
>doesn't necessarily use an optab and needs two arguments.  */
>tree vec_compute_type
> --
> 2.31.1
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-optimization/99954 - redo loop distribution memcpy recognition fix

2024-05-14 Thread Richard Biener

The following revisits the fix for PR99954 which was observed as
causing missed memcpy recognition and instead using memmove for
non-aliasing copies.  While the original fix mitigated bogus
recognition of memcpy the root cause was not properly identified.
The root cause is dr_analyze_indices "failing" to handle union
references and leaving the DRs indices in a state that's not correctly
handled by dr_may_alias.  The following mitigates this there
appropriately, restoring memcpy recognition for non-aliasing copies.

Boostrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/99954
* tree-data-ref.cc (dr_may_alias_p): For bases that are
not completely analyzed fall back to TBAA and points-to.
* tree-loop-distribution.cc
(loop_distribution::classify_builtin_ldst): When there
is no dependence again classify as memcpy.

* gcc.dg/tree-ssa/ldist-40.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c | 10 ++
 gcc/tree-data-ref.cc | 21 +
 gcc/tree-loop-distribution.cc|  4 ++--
 3 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c
new file mode 100644
index 000..238a0098352
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-40.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ldist-details" } */
+
+void copy_a_to_b (char * __restrict b, char * a, int n)
+{
+  for (int i = 0; i < n; ++i)
+b[i] = a[i];
+}
+
+/* { dg-final { scan-tree-dump "generated memcpy" "ldist" } } */
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index f37734b5340..9d3f5d7507f 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -3066,6 +3066,27 @@ dr_may_alias_p (const struct data_reference *a, const 
struct data_reference *b,
return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
   TREE_OPERAND (addr_b, 0));
 }
+  /* If dr_analyze_innermost failed to handle a component we are
+ possibly left with a non-base in which case we didn't analyze
+ a possible evolution of the base.  */
+  else if (handled_component_p (addr_a) || handled_component_p (addr_b))
+{
+  /* For true dependences we can apply TBAA.  */
+  if (flag_strict_aliasing
+ && DR_IS_WRITE (a) && DR_IS_READ (b)
+ && !alias_sets_conflict_p (get_alias_set (DR_REF (a)),
+get_alias_set (DR_REF (b
+   return false;
+  if (TREE_CODE (addr_a) == MEM_REF)
+   return ptr_derefs_may_alias_p (TREE_OPERAND (addr_a, 0),
+  build_fold_addr_expr (addr_b));
+  else if (TREE_CODE (addr_b) == MEM_REF)
+   return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
+  TREE_OPERAND (addr_b, 0));
+  else
+   return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
+  build_fold_addr_expr (addr_b));
+}

   /* Otherwise DR_BASE_OBJECT is an access that covers the whole object
  that is being subsetted in the loop nest.  */
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 45932bae5e7..668dc420449 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -1840,11 +1840,11 @@ loop_distribution::classify_builtin_ldst (loop_p loop, 
struct graph *rdg,
   /* Now check that if there is a dependence.  */
   ddr_p ddr = get_data_dependence (rdg, src_dr, dst_dr);

-  /* Classify as memmove if no dependence between load and store.  */
+  /* Classify as memcpy if no dependence between load and store.  */
   if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
 {
   partition->builtin = alloc_builtin (dst_dr, src_dr, base, src_base, 
size);
-  partition->kind = PKIND_MEMMOVE;
+  partition->kind = PKIND_MEMCPY;
   return;
 }

--
2.25.1


Re: [PATCH] internal-fn: Do not force vcond operand to reg.

2024-05-13 Thread Richard Biener
On Mon, May 13, 2024 at 4:14 PM Robin Dapp  wrote:
>
> > What happens if we simply remove all of the force_reg here?
>
> On x86 I bootstrapped and tested the attached without fallout
> (gcc188, so it's no avx512-native machine and therefore limited
> coverage).  riscv regtest is unchanged.
> For aarch64 I would to rely on the pre-commit CI to pick it
> up (does that work on sub-threads?).

OK if that pre-commit CI works out.

Richard.

> Regards
>  Robin
>
>
> gcc/ChangeLog:
>
> PR middle-end/113474
>
> * internal-fn.cc (expand_vec_cond_mask_optab_fn):  Remove
> force_regs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr113474.c: New test.
> ---
>  gcc/internal-fn.cc  |  3 ---
>  .../gcc.target/riscv/rvv/autovec/pr113474.c | 13 +
>  2 files changed, 13 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 2c764441cde..4d226c478b4 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3163,9 +3163,6 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>rtx_op1 = expand_normal (op1);
>rtx_op2 = expand_normal (op2);
>
> -  mask = force_reg (mask_mode, mask);
> -  rtx_op1 = force_reg (mode, rtx_op1);
> -
>rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>create_output_operand ([0], target, mode);
>create_input_operand ([1], rtx_op1, mode);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
> new file mode 100644
> index 000..0364bf9f5e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target riscv_v } }  */
> +/* { dg-additional-options "-std=c99" }  */
> +
> +void
> +foo (int n, int **a)
> +{
> +  int b;
> +  for (b = 0; b < n; b++)
> +for (long e = 8; e > 0; e--)
> +  a[b][e] = a[b][e] == 15;
> +}
> +
> +/* { dg-final { scan-assembler "vmerge.vim" } }  */
> --
> 2.45.0
>


[PATCH] PR60276 fix for single-lane SLP

2024-05-13 Thread Richard Biener
When enabling single-lane SLP and not splitting groups the fix for
PR60276 is no longer effective since it for unknown reason exempted
pure SLP.  The following removes this exemption, making
gcc.dg/vect/pr60276.c PASS even with --param vect-single-lane-slp=1

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/60276
* tree-vect-stmts.cc (vectorizable_load): Do not exempt
pure_slp grouped loads from the STMT_VINFO_MIN_NEG_DIST
restriction.
---
 gcc/tree-vect-stmts.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 21e8fe98e44..b8a71605f1b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9995,8 +9995,7 @@ vectorizable_load (vec_info *vinfo,
 
   /* Invalidate assumptions made by dependence analysis when vectorization
 on the unrolled body effectively re-orders stmts.  */
-  if (!PURE_SLP_STMT (stmt_info)
- && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
+  if (STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
  && maybe_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
   STMT_VINFO_MIN_NEG_DIST (stmt_info)))
{
-- 
2.35.3


[PATCH] Refactor SLP reduction group discovery

2024-05-13 Thread Richard Biener
The following refactors a bit how we perform SLP reduction group
discovery possibly making it easier to have multiple reduction
groups later, esp. with single-lane SLP.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_analyze_slp_instance): Remove
slp_inst_kind_reduc_group handling.
(vect_analyze_slp): Add the meat here.
---
 gcc/tree-vect-slp.cc | 67 ++--
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 8c18f5308e2..f34ed54a70b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3586,7 +3586,6 @@ vect_analyze_slp_instance (vec_info *vinfo,
   slp_instance_kind kind,
   unsigned max_tree_size, unsigned *limit)
 {
-  unsigned int i;
   vec scalar_stmts;
 
   if (is_a  (vinfo))
@@ -3620,35 +3619,6 @@ vect_analyze_slp_instance (vec_info *vinfo,
   STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
= STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
 }
-  else if (kind == slp_inst_kind_reduc_group)
-{
-  /* Collect reduction statements.  */
-  const vec 
-   = as_a  (vinfo)->reductions;
-  scalar_stmts.create (reductions.length ());
-  for (i = 0; reductions.iterate (i, _info); i++)
-   {
- gassign *g;
- next_info = vect_stmt_to_vectorize (next_info);
- if ((STMT_VINFO_RELEVANT_P (next_info)
-  || STMT_VINFO_LIVE_P (next_info))
- /* ???  Make sure we didn't skip a conversion around a reduction
-path.  In that case we'd have to reverse engineer that
-conversion stmt following the chain using reduc_idx and from
-the PHI using reduc_def.  */
- && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
- /* Do not discover SLP reductions for lane-reducing ops, that
-will fail later.  */
- && (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
- || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
- && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
- && gimple_assign_rhs_code (g) != SAD_EXPR)))
-   scalar_stmts.quick_push (next_info);
-   }
-  /* If less than two were relevant/live there's nothing to SLP.  */
-  if (scalar_stmts.length () < 2)
-   return false;
-}
   else
 gcc_unreachable ();
 
@@ -3740,9 +3710,40 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
 
   /* Find SLP sequences starting from groups of reductions.  */
   if (loop_vinfo->reductions.length () > 1)
-   vect_analyze_slp_instance (vinfo, bst_map, loop_vinfo->reductions[0],
-  slp_inst_kind_reduc_group, max_tree_size,
-  );
+   {
+ /* Collect reduction statements.  */
+ vec scalar_stmts;
+ scalar_stmts.create (loop_vinfo->reductions.length ());
+ for (auto next_info : loop_vinfo->reductions)
+   {
+ gassign *g;
+ next_info = vect_stmt_to_vectorize (next_info);
+ if ((STMT_VINFO_RELEVANT_P (next_info)
+  || STMT_VINFO_LIVE_P (next_info))
+ /* ???  Make sure we didn't skip a conversion around a
+reduction path.  In that case we'd have to reverse
+engineer that conversion stmt following the chain using
+reduc_idx and from the PHI using reduc_def.  */
+ && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
+ /* Do not discover SLP reductions for lane-reducing ops, that
+will fail later.  */
+ && (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
+ || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
+ && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
+ && gimple_assign_rhs_code (g) != SAD_EXPR)))
+   scalar_stmts.quick_push (next_info);
+   }
+ if (scalar_stmts.length () > 1)
+   {
+ vec roots = vNULL;
+ vec remain = vNULL;
+ vect_build_slp_instance (loop_vinfo, slp_inst_kind_reduc_group,
+  scalar_stmts, roots, remain,
+  max_tree_size, , bst_map, NULL);
+   }
+ else
+   scalar_stmts.release ();
+   }
 }
 
   hash_set visited_patterns;
-- 
2.35.3


RE: [PATCH] Allow patterns in SLP reductions

2024-05-13 Thread Richard Biener
On Mon, 13 May 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 10, 2024 2:07 PM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Allow patterns in SLP reductions
> > 
> > On Fri, Mar 1, 2024 at 10:21 AM Richard Biener  wrote:
> > >
> > > The following removes the over-broad rejection of patterns for SLP
> > > reductions which is done by removing them from LOOP_VINFO_REDUCTIONS
> > > during pattern detection.  That's also insufficient in case the
> > > pattern only appears on the reduction path.  Instead this implements
> > > the proper correctness check in vectorizable_reduction and guides
> > > SLP discovery to heuristically avoid forming later invalid groups.
> > >
> > > I also couldn't find any testcase that FAILs when allowing the SLP
> > > reductions to form so I've added one.
> > >
> > > I came across this for single-lane SLP reductions with the all-SLP
> > > work where we rely on patterns to properly vectorize COND_EXPR
> > > reductions.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.
> > 
> > Re-bootstrapped/tested, r15-361-g52d4691294c847
> 
> Awesome!
> 
> Does this now allow us to write new reductions using patterns? i.e. 
> widening reductions?

Yes (SLP reductions, that is).  This is really only for SLP reductions
(not SLP reduction chains, not non-SLP reductions).  So it's just
a corner-case but since with SLP-only non-SLP reductions become
SLP reductions with a single lane that was important to fix ;)

Richard.

> Cheers,
> Tamar
> > 
> > Richard.
> > 
> > > Richard.
> > >
> > > * tree-vect-patterns.cc (vect_pattern_recog_1): Do not
> > > remove reductions involving patterns.
> > > * tree-vect-loop.cc (vectorizable_reduction): Reject SLP
> > > reduction groups with multiple lane-reducing reductions.
> > > * tree-vect-slp.cc (vect_analyze_slp_instance): When discovering
> > > SLP reduction groups avoid including lane-reducing ones.
> > >
> > > * gcc.dg/vect/vect-reduc-sad-9.c: New testcase.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c | 68 
> > >  gcc/tree-vect-loop.cc| 15 +
> > >  gcc/tree-vect-patterns.cc| 13 
> > >  gcc/tree-vect-slp.cc | 26 +---
> > >  4 files changed, 101 insertions(+), 21 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > > new file mode 100644
> > > index 000..3c6af4510f4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > > @@ -0,0 +1,68 @@
> > > +/* Disabling epilogues until we find a better way to deal with scans.  */
> > > +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> > > +/* { dg-additional-options "-msse4.2" { target { x86_64-*-* i?86-*-* } } 
> > > } */
> > > +/* { dg-require-effective-target vect_usad_char } */
> > > +
> > > +#include 
> > > +#include "tree-vect.h"
> > > +
> > > +#define N 64
> > > +
> > > +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> > > +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> > > +int abs (int);
> > > +
> > > +/* Sum of absolute differences between arrays of unsigned char types.
> > > +   Detected as a sad pattern.
> > > +   Vectorized on targets that support sad for unsigned chars.  */
> > > +
> > > +__attribute__ ((noinline)) int
> > > +foo (int len, int *res2)
> > > +{
> > > +  int i;
> > > +  int result = 0;
> > > +  int result2 = 0;
> > > +
> > > +  for (i = 0; i < len; i++)
> > > +{
> > > +  /* Make sure we are not using an SLP reduction for this.  */
> > > +  result += abs (X[2*i] - Y[2*i]);
> > > +  result2 += abs (X[2*i + 1] - Y[2*i + 1]);
> > > +}
> > > +
> > > +  *res2 = result2;
> > > +  return result;
> > > +}
> > > +
> > > +
> > > +int
> > > +main (void)
> > > +{
> > > +  i

Re: [PATCH] tree-ssa-math-opts: Pattern recognize yet another .ADD_OVERFLOW pattern [PR113982]

2024-05-13 Thread Richard Biener
  {
> +   g2 = gimple_build_assign (make_ssa_name (boolean_type_node),
> + ovf_use == 1 ? NE_EXPR : EQ_EXPR,
> + ovf, build_int_cst (type, 0));
> +   gimple_stmt_iterator gsiu = gsi_for_stmt (use_stmt);
> +   gsi_insert_before (, g2, GSI_SAME_STMT);
> +   gimple_assign_set_rhs_with_ops (, NOP_EXPR,
> +   gimple_assign_lhs (g2));
> +   update_stmt (use_stmt);
> +   use_operand_p use;
> +   single_imm_use (gimple_assign_lhs (use_stmt), ,
> +   _stmt);
> +   if (gimple_code (use_stmt) == GIMPLE_COND)
> + {
> +   gcond *cond_stmt = as_a  (use_stmt);
> +   gimple_cond_set_lhs (cond_stmt, ovf);
> +   gimple_cond_set_rhs (cond_stmt, build_int_cst (type, 0));
> + }
> +   else
> + {
> +   gcc_checking_assert (is_gimple_assign (use_stmt));
> +   if (gimple_assign_rhs_class (use_stmt)
> +   == GIMPLE_BINARY_RHS)
> + {
> +   gimple_assign_set_rhs1 (use_stmt, ovf);
> +   gimple_assign_set_rhs2 (use_stmt,
> +   build_int_cst (type, 0));
> + }
> +   else if (gimple_assign_cast_p (use_stmt))
> + gimple_assign_set_rhs1 (use_stmt, ovf);
> +   else
> + {
> +   tree_code sc = gimple_assign_rhs_code (use_stmt);
> +   gcc_checking_assert (sc == COND_EXPR);
> +   tree cond = gimple_assign_rhs1 (use_stmt);
> +   cond = build2 (TREE_CODE (cond),
> +  boolean_type_node, ovf,
> +  build_int_cst (type, 0));
> +   gimple_assign_set_rhs1 (use_stmt, cond);
> + }
> + }
> +   update_stmt (use_stmt);
> +   gsi_remove (, true);
> +   gsiu = gsi_for_stmt (g2);
> +   gsi_remove (, true);
> +   continue;
> + }
> +   else
> + {
> +   gimple_assign_set_rhs1 (use_stmt, ovf);
> +   gimple_assign_set_rhs2 (use_stmt, build_int_cst (type, 0));
> +   gimple_assign_set_rhs_code (use_stmt,
> +   ovf_use == 1
> +   ? NE_EXPR : EQ_EXPR);
> + }
>   }
> else
>   {
> --- gcc/testsuite/gcc.dg/pr113982.c.jj2024-05-10 15:00:28.536651833 
> +0200
> +++ gcc/testsuite/gcc.dg/pr113982.c   2024-05-10 15:01:49.721570343 +0200
> @@ -0,0 +1,60 @@
> +/* PR middle-end/113982 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-widening_mul" } */
> +
> +#if __SIZEOF_INT128__
> +typedef __uint128_t W;
> +typedef unsigned long long T;
> +#else
> +typedef unsigned long long W;
> +typedef unsigned int T;
> +#endif
> +#define B __CHAR_BIT__ * sizeof (T)
> +
> +struct S { int p; T r; };
> +
> +struct S
> +foo (T x, T y)
> +{
> +  W z = (W) x + y;
> +  return (struct S) { z >> B, (T) z };
> +}
> +
> +struct S
> +bar (T x)
> +{
> +  W z = (W) x + 132;
> +  return (struct S) { z >> B, (T) z };
> +}
> +
> +struct S
> +baz (T x, unsigned short y)
> +{
> +  W z = (W) x + y;
> +  return (struct S) { z >> B, (T) z };
> +}
> +
> +struct S
> +qux (unsigned short x, T y)
> +{
> +  W z = (W) x + y;
> +  return (struct S) { z >> B, (T) z };
> +}
> +
> +struct S
> +corge (T x, T y)
> +{
> +  T w = x + y;
> +  W z = (W) x + y;
> +  return (struct S) { z >> B, w };
> +}
> +
> +struct S
> +garple (T x, T y)
> +{
> +  W z = (W) x + y;
> +  T w = x + y;
> +  return (struct S) { z >> B, w };
> +}
> +
> +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 6 "widening_mul" { 
> target { i?86-*-* x86_64-*-* } } } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] internal-fn: Do not force vcond operand to reg.

2024-05-13 Thread Richard Biener
On Mon, May 13, 2024 at 8:18 AM Robin Dapp  wrote:
>
> > How does this make a difference in the end?  I'd expect say forwprop to
> > fix things?
>
> In general we try to only add the masking "boilerplate" of our
> instructions at split time so fwprop, combine et al. can do their
> work uninhibited of it (and we don't need numerous
> (if_then_else ... (if_then_else) ...) combinations in our patterns).
> A vec constant we expand directly to a masked representation, though
> which makes further simplification difficult.  I can experiment with
> changing that if preferred.
>
> My thinking was, however, that for other operations like binops we
> directly emit the right variant via expand_operands without
> forcing to a reg and don't even need to fwprop so I wanted to
> imitate that.

Ah, so yeah, it probably makes sense for constants.  Btw,
there's prepare_operand which I think might be better for
its CONST_INT handling?  I can also see we usually do not
bother with force_reg, the force_reg was added with the
initial r6-4696-ga414c77f2a30bb already.

What happens if we simply remove all of the force_reg here?

Thanks,
Richard.

> Regards
>  Robin
>


Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-13 Thread Richard Biener
On Mon, May 13, 2024 at 4:29 AM liuhongt  wrote:
>
> As testcase in the PR, O3 cunrolli may prevent vectorization for the
> innermost loop and increase register pressure.
> The patch removes the 1/3 reduction of unr_insn for innermost loop for UL_ALL.
> ul != UR_ALL is needed since some small loop complete unrolling at O2 relies
> the reduction.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> No big impact for SPEC2017.
> Ok for trunk?

This removes the 1/3 reduction when unrolling a loop nest (the case I was
concerned about).  Unrolling of a nest is by iterating in
tree_unroll_loops_completely
so the to be unrolled loop appears innermost.  So I think you need a new
parameter on tree_unroll_loops_completely_1 indicating whether we're in the
first iteration (or whether to assume inner most loops will "simplify").

Few comments below

> gcc/ChangeLog:
>
> PR tree-optimization/112325
> * tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Add 2
> new parameters: loop and ul, and remove unr_insns reduction
> for innermost loop.
> (try_unroll_loop_completely): Pass loop and ul to
> estimated_unrolled_size.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr112325.c: New test.
> * gcc.dg/vect/pr69783.c: Add extra option --param
> max-completely-peeled-insns=300.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr112325.c | 57 
>  gcc/testsuite/gcc.dg/vect/pr69783.c  |  2 +-
>  gcc/tree-ssa-loop-ivcanon.cc | 16 +--
>  3 files changed, 71 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> new file mode 100644
> index 000..14208b3e7f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> @@ -0,0 +1,57 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> +
> +typedef unsigned short ggml_fp16_t;
> +static float table_f32_f16[1 << 16];
> +
> +inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
> +unsigned short s;
> +__builtin_memcpy(, , sizeof(unsigned short));
> +return table_f32_f16[s];
> +}
> +
> +typedef struct {
> +ggml_fp16_t d;
> +ggml_fp16_t m;
> +unsigned char qh[4];
> +unsigned char qs[32 / 2];
> +} block_q5_1;
> +
> +typedef struct {
> +float d;
> +float s;
> +char qs[32];
> +} block_q8_1;
> +
> +void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * 
> restrict vx, const void * restrict vy) {
> +const int qk = 32;
> +const int nb = n / qk;
> +
> +const block_q5_1 * restrict x = vx;
> +const block_q8_1 * restrict y = vy;
> +
> +float sumf = 0.0;
> +
> +for (int i = 0; i < nb; i++) {
> +unsigned qh;
> +__builtin_memcpy(, x[i].qh, sizeof(qh));
> +
> +int sumi = 0;
> +
> +for (int j = 0; j < qk/2; ++j) {
> +const unsigned char xh_0 = ((qh >> (j + 0)) << 4) & 0x10;
> +const unsigned char xh_1 = ((qh >> (j + 12)) ) & 0x10;
> +
> +const int x0 = (x[i].qs[j] & 0xF) | xh_0;
> +const int x1 = (x[i].qs[j] >> 4) | xh_1;
> +
> +sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
> +}
> +
> +sumf += (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + 
> ggml_lookup_fp16_to_fp32(x[i].m)*y[i].s;
> +}
> +
> +*s = sumf;
> +}
> +
> +/* { dg-final { scan-tree-dump {(?n)Not unrolling loop [1-9] \(--param 
> max-completely-peel-times limit reached} "cunrolli"} } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr69783.c 
> b/gcc/testsuite/gcc.dg/vect/pr69783.c
> index 5df95d0ce4e..a1f75514d72 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr69783.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr69783.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target vect_float } */
> -/* { dg-additional-options "-Ofast -funroll-loops" } */
> +/* { dg-additional-options "-Ofast -funroll-loops --param 
> max-completely-peeled-insns=300" } */

If we rely on unrolling of a loop can you put #pragma unroll [N]
before the respective loop
instead?

>  #define NXX 516
>  #define NYY 516
> diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
> index bf017137260..5e0eca647a1 100644
> --- a/gcc/tree-ssa-loop-ivcanon.cc
> +++ b/gcc/tree-ssa-loop-ivcanon.cc
> @@ -444,7 +444,9 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>
>  static unsigned HOST_WIDE_INT
>  estimated_unrolled_size (struct loop_size *size,
> -unsigned HOST_WIDE_INT nunroll)
> +unsigned HOST_WIDE_INT nunroll,
> +enum unroll_level ul,
> +class loop* loop)
>  {
>HOST_WIDE_INT unr_insns = ((nunroll)
>  * (HOST_WIDE_INT) (size->overall
> @@ -453,7 +455,15 @@ 

Re: [PATCH 1/4] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-05-13 Thread Richard Biener
On Mon, May 13, 2024 at 3:39 AM Kewen.Lin  wrote:
>
> Hi Joseph and Richi,
>
> Thanks for the suggestions and comments!
>
> on 2024/5/10 14:31, Richard Biener wrote:
> > On Thu, May 9, 2024 at 9:12 PM Joseph Myers  wrote:
> >>
> >> On Wed, 8 May 2024, Kewen.Lin wrote:
> >>
> >>> to widen IFmode to TFmode.  To make build_common_tree_nodes
> >>> be able to find the correct mode for long double type node,
> >>> it introduces one hook mode_for_longdouble to offer target
> >>> a way to specify the mode used for long double type node.
> >>
> >> I don't really like layering a hook on top of the old target macro as a
> >> way to address a deficiency in the design of that target macro (floating
> >> types should have their mode, not a poorly defined precision value,
> >> specified directly by the target).
>
> Good point!
>
> >
> > Seconded.
> >
> >> A better hook design might be something like mode_for_floating_type (enum
> >> tree_index), where the argument is TI_FLOAT_TYPE, TI_DOUBLE_TYPE or
> >> TI_LONG_DOUBLE_TYPE, replacing all definitions and uses of
> >> FLOAT_TYPE_SIZE, DOUBLE_TYPE_SIZE and LONG_DOUBLE_TYPE_SIZE with the
> >> single new hook and appropriate definitions for each target (with a
> >> default definition that uses SFmode for float and DFmode for double and
> >> long double, which would be suitable for many targets).
> >
>
> The originally proposed hook was meant to make the other ports unaffected,
> but I agree that introducing such hook would be more clear.
>
> > In fact replacing all of X_TYPE_SIZE with a single hook might be worthwhile
> > though this removes the "convenient" defaulting, requiring each target to
> > enumerate all standard C ABI type modes.  But that might be also a good 
> > thing.
> >
>
> I guess the main value by extending from floating point types to all is to
> unify them?  (Assuming that excepting for floating types the others would
> not have multiple possible representations like what we faces on 128bit fp).
>
> > The most pragmatic solution would be to do
> > s/LONG_DOUBLE_TYPE_SIZE/LONG_DOUBLE_TYPE_MODE/
>
> Yeah, this beats my proposed hook (assuming the default is VOIDmode too).
>
> So it seems we have three alternatives here:
>   1) s/LONG_DOUBLE_TYPE_SIZE/LONG_DOUBLE_TYPE_MODE/
>   2) mode_for_floating_type
>   3) mode_for_abi_type
>
> Since 1) would make long double type special (different from the other types
> having _TYPE_SIZE), personally I'm inclined to 3): implement 2) first, get
> this patch series landed, extend to all.
>
> Do you have any preference?

Maybe do 3) but have the default hook implementation look at
*_TYPE_SIZE when the target doesn't implement the hook?  That would
force you to transition rs6000 away from *_TYPE_SIZE completely
but this would also prove the design.

Btw, for .c.mode_for_abi_type I'd exclude ADA_LONG_TYPE_SIZE.

Joseph, do you agree with this?  I'd not touch the target macros like
PTRDIFF_TYPE (those evaluating to a string) at this point though
they could be handled with a common target hook as well (not sure
if we'd want to have a unified hook for both?).

Thanks,
Richard.

>
> BR,
> Kewen


[PATCH] tree-optimization/114998 - use-after-free with loop distribution

2024-05-10 Thread Richard Biener
When loop distribution releases a PHI node of the original IL it
can end up clobbering memory that's re-used when it upon releasing
its RDG resets all stmt UIDs back to -1, even those that got released.

The fix is to avoid resetting UIDs based on stmts in the RDG but
instead reset only those still present in the loop.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

PR tree-optimization/114998
* tree-loop-distribution.cc (free_rdg): Take loop argument.
Reset UIDs of stmts still in the IL rather than all stmts
referenced from the RDG.
(loop_distribution::build_rdg): Pass loop to free_rdg.
(loop_distribution::distribute_loop): Likewise.
(loop_distribution::transform_reduction_loop): Likewise.

* gcc.dg/torture/pr114998.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr114998.c | 35 +
 gcc/tree-loop-distribution.cc   | 24 -
 2 files changed, 53 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr114998.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr114998.c 
b/gcc/testsuite/gcc.dg/torture/pr114998.c
new file mode 100644
index 000..81fc1e077cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr114998.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-dce -ftree-loop-distribution" } */
+
+short a, d;
+int b, c, f, g, h, i, j[2], o;
+__attribute__((const)) int s(char r);
+int main() {
+  int l, m, k, n;
+  if (b) {
+char p;
+for (; p >= 0; p--) {
+  int e[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
+ 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1,
+ 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0};
+  if (j[p]) {
+int q[1];
+i = o;
+o = q[h];
+if (g)
+  n = d;
+m = 4;
+for (; m; m--) {
+  if (l)
+k |= c;
+  if (a)
+break;
+}
+  }
+  s(n);
+  f |= b;
+}
+  }
+  return 0;
+}
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 95203fefa18..45932bae5e7 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -778,7 +778,7 @@ loop_distribution::stmts_from_loop (class loop *loop, 
vec *stmts)
 /* Free the reduced dependence graph RDG.  */
 
 static void
-free_rdg (struct graph *rdg)
+free_rdg (struct graph *rdg, loop_p loop)
 {
   int i;
 
@@ -792,13 +792,25 @@ free_rdg (struct graph *rdg)
 
   if (v->data)
{
- gimple_set_uid (RDGV_STMT (v), -1);
  (RDGV_DATAREFS (v)).release ();
  free (v->data);
}
 }
 
   free_graph (rdg);
+
+  /* Reset UIDs of stmts still in the loop.  */
+  basic_block *bbs = get_loop_body (loop);
+  for (unsigned i = 0; i < loop->num_nodes; ++i)
+{
+  basic_block bb = bbs[i];
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
+   gimple_set_uid (gsi_stmt (gsi), -1);
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
+   gimple_set_uid (gsi_stmt (gsi), -1);
+}
+  free (bbs);
 }
 
 struct graph *
@@ -812,7 +824,7 @@ loop_distribution::build_rdg (class loop *loop, 
control_dependences *cd)
   rdg = new_graph (stmts.length ());
   if (!create_rdg_vertices (rdg, stmts, loop))
 {
-  free_rdg (rdg);
+  free_rdg (rdg, loop);
   return NULL;
 }
   stmts.release ();
@@ -3062,7 +3074,7 @@ loop_distribution::distribute_loop (class loop *loop,
 "Loop %d not distributed: too many memory references.\n",
 loop->num);
 
-  free_rdg (rdg);
+  free_rdg (rdg, loop);
   loop_nest.release ();
   free_data_refs (datarefs_vec);
   delete ddrs_table;
@@ -3259,7 +3271,7 @@ loop_distribution::distribute_loop (class loop *loop,
   FOR_EACH_VEC_ELT (partitions, i, partition)
 partition_free (partition);
 
-  free_rdg (rdg);
+  free_rdg (rdg, loop);
   return nbp - *nb_calls;
 }
 
@@ -3665,7 +3677,7 @@ loop_distribution::transform_reduction_loop (loop_p loop)
   auto_bitmap partition_stmts;
   bitmap_set_range (partition_stmts, 0, rdg->n_vertices);
   find_single_drs (loop, rdg, partition_stmts, _dr, _dr);
-  free_rdg (rdg);
+  free_rdg (rdg, loop);
 
   /* Bail out if there is no single load.  */
   if (load_dr == NULL)
-- 
2.35.3


Re: [PATCH] internal-fn: Do not force vcond operand to reg.

2024-05-10 Thread Richard Biener
On Fri, May 10, 2024 at 3:18 PM Robin Dapp  wrote:
>
> Hi,
>
> this only forces the first comparison operator into a register if it is
> not already suitable.
>
> Bootstrap and regtest is running on x86 and aarch64, successful on p10.
> Regtested on riscv.

How does this make a difference in the end?  I'd expect say forwprop to
fix things?

> gcc/ChangeLog:
>
> PR middle-end/113474
>
> * internal-fn.cc (expand_vec_cond_mask_optab_fn):  Only force
> op1 to reg if necessary.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr113474.c: New test.
>
> Regards
>  Robin
>
> ---
>  gcc/internal-fn.cc  |  3 ++-
>  .../gcc.target/riscv/rvv/autovec/pr113474.c | 13 +
>  2 files changed, 15 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 2c764441cde..72cc6b7a1f7 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3164,7 +3164,8 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>rtx_op2 = expand_normal (op2);
>
>mask = force_reg (mask_mode, mask);
> -  rtx_op1 = force_reg (mode, rtx_op1);
> +  if (!insn_operand_matches (icode, 1, rtx_op1))
> +rtx_op1 = force_reg (mode, rtx_op1);
>
>rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>create_output_operand ([0], target, mode);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
> new file mode 100644
> index 000..0364bf9f5e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target riscv_v } }  */
> +/* { dg-additional-options "-std=c99" }  */
> +
> +void
> +foo (int n, int **a)
> +{
> +  int b;
> +  for (b = 0; b < n; b++)
> +for (long e = 8; e > 0; e--)
> +  a[b][e] = a[b][e] == 15;
> +}
> +
> +/* { dg-final { scan-assembler "vmerge.vim" } }  */
> --
> 2.45.0


Re: [PATCH] Allow patterns in SLP reductions

2024-05-10 Thread Richard Biener
On Fri, Mar 1, 2024 at 10:21 AM Richard Biener  wrote:
>
> The following removes the over-broad rejection of patterns for SLP
> reductions which is done by removing them from LOOP_VINFO_REDUCTIONS
> during pattern detection.  That's also insufficient in case the
> pattern only appears on the reduction path.  Instead this implements
> the proper correctness check in vectorizable_reduction and guides
> SLP discovery to heuristically avoid forming later invalid groups.
>
> I also couldn't find any testcase that FAILs when allowing the SLP
> reductions to form so I've added one.
>
> I came across this for single-lane SLP reductions with the all-SLP
> work where we rely on patterns to properly vectorize COND_EXPR
> reductions.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.

Re-bootstrapped/tested, r15-361-g52d4691294c847

Richard.

> Richard.
>
> * tree-vect-patterns.cc (vect_pattern_recog_1): Do not
> remove reductions involving patterns.
> * tree-vect-loop.cc (vectorizable_reduction): Reject SLP
> reduction groups with multiple lane-reducing reductions.
> * tree-vect-slp.cc (vect_analyze_slp_instance): When discovering
> SLP reduction groups avoid including lane-reducing ones.
>
> * gcc.dg/vect/vect-reduc-sad-9.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c | 68 
>  gcc/tree-vect-loop.cc| 15 +
>  gcc/tree-vect-patterns.cc| 13 
>  gcc/tree-vect-slp.cc | 26 +---
>  4 files changed, 101 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> new file mode 100644
> index 000..3c6af4510f4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> @@ -0,0 +1,68 @@
> +/* Disabling epilogues until we find a better way to deal with scans.  */
> +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> +/* { dg-additional-options "-msse4.2" { target { x86_64-*-* i?86-*-* } } } */
> +/* { dg-require-effective-target vect_usad_char } */
> +
> +#include 
> +#include "tree-vect.h"
> +
> +#define N 64
> +
> +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> +int abs (int);
> +
> +/* Sum of absolute differences between arrays of unsigned char types.
> +   Detected as a sad pattern.
> +   Vectorized on targets that support sad for unsigned chars.  */
> +
> +__attribute__ ((noinline)) int
> +foo (int len, int *res2)
> +{
> +  int i;
> +  int result = 0;
> +  int result2 = 0;
> +
> +  for (i = 0; i < len; i++)
> +{
> +  /* Make sure we are not using an SLP reduction for this.  */
> +  result += abs (X[2*i] - Y[2*i]);
> +  result2 += abs (X[2*i + 1] - Y[2*i + 1]);
> +}
> +
> +  *res2 = result2;
> +  return result;
> +}
> +
> +
> +int
> +main (void)
> +{
> +  int i;
> +  int sad;
> +
> +  check_vect ();
> +
> +  for (i = 0; i < N/2; i++)
> +{
> +  X[2*i] = i;
> +  Y[2*i] = N/2 - i;
> +  X[2*i+1] = i;
> +  Y[2*i+1] = 0;
> +  __asm__ volatile ("");
> +}
> +
> +
> +  int sad2;
> +  sad = foo (N/2, );
> +  if (sad != (N/2)*(N/4))
> +abort ();
> +  if (sad2 != (N/2-1)*(N/2)/2)
> +abort ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } } 
> */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 35f1f8c7d42..13dcdba403a 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7703,6 +7703,21 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>return false;
>  }
>
> +  /* Lane-reducing ops also never can be used in a SLP reduction group
> + since we'll mix lanes belonging to different reductions.  But it's
> + OK to use them in a reduction chain or when the reduction group
> + has just one element.  */
> +  if (lane_reduc_code_p
> +  && slp_node
> +  && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
> +  && SLP_TREE_LANES (slp_node) > 1)
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"lane-reducing reduction 

Re: [PATCH] tree-optimization/114760 - check variants of >> and << in loop-niter

2024-05-10 Thread Richard Biener
On Fri, May 10, 2024 at 12:55 PM Di Zhao OS
 wrote:
>
> This patch tries to fix pr114760 by checking for the
> variants explicitly. When recognizing bit counting idiom,
> include pattern "x * 2" for "x << 1", and "x / 2" for
> "x >> 1" (given x is unsigned).
>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> Thanks,
> Di Zhao
>
> ---
>
> gcc/ChangeLog:
> PR tree-optimization/114760
> * tree-ssa-loop-niter.cc (is_lshift_by_1): New function
> to check if STMT is equivalent to x << 1.
> (is_rshift_by_1): New function to check if STMT is
> equivalent to x >> 1.
> (number_of_iterations_cltz): Enhance the identification
> of logical shift by one.
> (number_of_iterations_cltz_complement): Enhance the
> identification of logical shift by one.
>
> gcc/testsuite/ChangeLog:
> PR tree-optimization/114760
> * gcc.dg/tree-ssa/pr114760-1.c: New test.
> * gcc.dg/tree-ssa/pr114760-2.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr114760-1.c | 69 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr114760-2.c | 20 +++
>  gcc/tree-ssa-loop-niter.cc | 56 +-
>  3 files changed, 131 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr114760-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr114760-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr114760-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr114760-1.c
> new file mode 100644
> index 000..9f10ccc3b51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr114760-1.c
> @@ -0,0 +1,69 @@
> +/* PR tree-optimization/114760 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target clz } */
> +/* { dg-require-effective-target ctz } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +unsigned
> +ntz32_1 (unsigned x)
> +{
> +  int n = 32;
> +  while (x != 0)
> +{
> +  n = n - 1;
> +  x = x * 2;
> +}
> +  return n;
> +}
> +
> +unsigned
> +ntz32_2 (unsigned x)
> +{
> +  int n = 32;
> +  while (x != 0)
> +{
> +  n = n - 1;
> +  x = x + x;
> +}
> +  return n;
> +}
> +
> +unsigned
> +ntz32_3 (unsigned x)
> +{
> +  int n = 32;
> +  while (x != 0)
> +{
> +  n = n - 1;
> +  x = x << 1;
> +}
> +  return n;
> +}
> +
> +#define PREC (__CHAR_BIT__ * __SIZEOF_INT__)
> +int
> +nlz32_1 (unsigned int b) {
> +int c = PREC;
> +
> +while (b != 0) {
> +   b >>= 1;
> +   c --;
> +}
> +
> +return c;
> +}
> +
> +int
> +nlz32_2 (unsigned int b) {
> +int c = PREC;
> +
> +while (b != 0) {
> +   b /= 2;
> +   c --;
> +}
> +
> +return c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "__builtin_ctz|\\.CTZ" 3 "optimized" } 
> } */
> +/* { dg-final { scan-tree-dump-times "__builtin_clz|\\.CLZ" 2 "optimized" } 
> } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr114760-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr114760-2.c
> new file mode 100644
> index 000..e1b4c4b1338
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr114760-2.c
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/114760 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target clz } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +// Check that for signed type, there's no CLZ.
> +#define PREC (__CHAR_BIT__ * __SIZEOF_INT__)
> +int
> +no_nlz32 (int b) {
> +int c = PREC;
> +
> +while (b != 0) {
> +   b /= 2;
> +   c --;
> +}
> +
> +return c;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "__builtin_ctz|\\.CLZ" "optimized" } } */
> \ No newline at end of file
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 0fde07e626f..1d99264949b 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -2303,6 +2303,38 @@ build_cltz_expr (tree src, bool leading, bool 
> define_at_zero)
>return call;
>  }
>
> +/* Returns true if STMT is equivalent to x << 1.  */
> +
> +static bool
> +is_lshift_by_1 (gimple *stmt)

You are checking for gimple-assign before calling these so please
use a 'gassign *' typed argument.

> +{
> +  if (gimple_assign_rhs_code (stmt) == LSHIFT_EXPR
> +  && integer_onep (gimple_assign_rhs2 (stmt)))
> +return true;
> +  if (gimple_assign_rhs_code (stmt) == MULT_EXPR
> +  && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST
> +  && tree_to_shwi (gimple_assign_rhs2 (stmt)) == 2)

You need to check for tree_fits_shwi_p (wich also checks for INTEGER_CST)
before using tree_to_shwi.

Ok with this change (to both functions).

Thanks,
Richard.

> +return true;
> +  return false;
> +}
> +
> +/* Returns true if STMT is equivalent to x >> 1.  */
> +
> +static bool
> +is_rshift_by_1 (gimple *stmt)
> +{
> +  if (!TYPE_UNSIGNED (TREE_TYPE (gimple_assign_lhs (stmt
> +return false;
> +  if (gimple_assign_rhs_code (stmt) == RSHIFT_EXPR
> +  && integer_onep (gimple_assign_rhs2 

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread Richard Biener
On Fri, May 10, 2024 at 12:54 PM Segher Boessenkool
 wrote:
>
> On Fri, May 10, 2024 at 12:19:35PM +0200, Richard Biener wrote:
> > On Fri, May 10, 2024 at 11:06 AM Segher Boessenkool
> >  wrote:
> > > *All* code using a cost will have to be inspected and possibly adjusted
> > > if you decide to use a different value for "unknown" than what we have
> > > had for ages.  All other cost functions interacting with this one, too.
> >
> > Btw, looking around pattern_cost is the only API documenting this special
> > value and the function after it using this function, insn_cost does the same
> > but
> >
> > int
> > insn_cost (rtx_insn *insn, bool speed)
> > {
> >   if (targetm.insn_cost)
> > return targetm.insn_cost (insn, speed);
> >
> > and the target hook doesn't document this special value.  set_src_cost
> > doesn't either, btw (that just uses rtx_cost).  So I don't think how
> > pattern_cost handles the set_src_cost result is warranted.  There's
> > simply no way to detect whether set_src_cost returns an actual
> > value - on the contrary, it always does.
>
> I introduced insn_cost.  I didn't think about documenting that 0 means
> unknown, precisely because that is so pervasive!

But for example a reg-reg move when optimizing for speed could have
a zero associated cost.  You might argue that's a bug since there's
an actual instruction and thus at least a size cost (and decode cost)
but then I've seen too much zero cost stuff in backends (like that
combine PR causing s390 backend making address-cost zero even
though it's just "same cost").

IMO give we're dispatching to the rtx_cost hook eventually it needs
documenting there or alternatively catching zero and adjusting its
result there.  Of course cost == 0 ? 1 : cost is wrong as it makes
zero vs. one the same cost - using cost + 1 when from rtx_cost
might be more correct, at least preserving relative costs.

Richard.

>
>
> Segher


Re: [PATCH] Adjust range type of calls into fold_range for IPA passes [PR114985]

2024-05-10 Thread Richard Biener
On Fri, May 10, 2024 at 11:24 AM Aldy Hernandez  wrote:
>
> There are various calls into fold_range() that have the wrong type
> associated with the range temporary used to hold the result.  This
> used to work, because we could store either integers or pointers in a
> Value_Range, but is no longer the case with prange's.  Now you must
> explicitly state which type of range the temporary will hold before
> storing into it.  You can change this at a later time with set_type(),
> but you must always have a type before using the temporary, and it
> must match what fold_range() returns.
>
> This patch adjusts the IPA code to restore the previous functionality,
> so I can re-enable the prange code, but I do question whether the
> previous code was correct.  I have added appropriate comments to help
> the maintainers, but someone with more knowledge should revamp this
> going forward.
>
> The basic problem is that pointer comparisons return a boolean, but
> the IPA code is initializing the resulting range as a pointer.  This
> wasn't a problem, because fold_range() would previously happily force
> the range into an integer one, and everything would work.  But now we
> must initialize the range to an integer before calling into
> fold_range.  The thing is, that the failing case sets the result back
> into a pointer, which is just weird but existing behavior.  I have
> documented this in the code.
>
>   if (!handler
>   || !op_res.supports_type_p (vr_type)
>   || !handler.fold_range (op_res, vr_type, srcvr, op_vr))
> /* For comparison operators, the type here may be
>different than the range type used in fold_range above.
>For example, vr_type may be a pointer, whereas the type
>returned by fold_range will always be a boolean.
>
>This shouldn't cause any problems, as the set_varying
>below will happily change the type of the range in
>op_res, and then the cast operation in
>ipa_vr_operation_and_type_effects will ultimately leave
>things in the desired type, but it is confusing.
>
>Perhaps the original intent was to use the type of
>op_res here?  */
> op_res.set_varying (vr_type);
>
> BTW, this is not to say that the original gimple IR was wrong, but that
> IPA is setting the range type of the result of fold_range() to the type of
> the operands, which does not necessarily match in the case of a
> comparison.
>
> I am just restoring previous behavior here, but I do question whether it
> was right to begin with.
>
> Testing currently in progress on x86-64 and ppc64le with prange enabled.
>
> OK pending tests?

I think this "intermediate" patch is unnecessary and instead the code should
be fixed correctly, avoiding missed-optimization regressions.

Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/114985
> * ipa-cp.cc (ipa_value_range_from_jfunc): Adjust type of op_res.
> (propagate_vr_across_jump_function): Same.
> * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Adjust
> type for res.
> * ipa-prop.h (ipa_type_for_fold_range): New.
> ---
>  gcc/ipa-cp.cc| 18 --
>  gcc/ipa-fnsummary.cc |  6 +-
>  gcc/ipa-prop.h   | 13 +
>  3 files changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 5781f50c854..3c395632364 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -1730,7 +1730,7 @@ ipa_value_range_from_jfunc (vrange ,
> }
>else
> {
> - Value_Range op_res (vr_type);
> + Value_Range op_res (ipa_type_for_fold_range (operation, vr_type));
>   Value_Range res (vr_type);
>   tree op = ipa_get_jf_pass_through_operand (jfunc);
>   Value_Range op_vr (TREE_TYPE (op));
> @@ -1741,6 +1741,19 @@ ipa_value_range_from_jfunc (vrange ,
>   if (!handler
>   || !op_res.supports_type_p (vr_type)
>   || !handler.fold_range (op_res, vr_type, srcvr, op_vr))
> +   /* For comparison operators, the type here may be
> +  different than the range type used in fold_range above.
> +  For example, vr_type may be a pointer, whereas the type
> +  returned by fold_range will always be a boolean.
> +
> +  This shouldn't cause any problems, as the set_varying
> +  below will happily change the type of the range in
> +  op_res, and then the cast operation in
> +  ipa_vr_operation_and_type_effects will ultimately leave
> +  things in the desired type, but it is confusing.
> +
> +  Perhaps the original intent was to use the type of
> +  op_res here?  */
> op_res.set_varying (vr_type);
>
>   if (ipa_vr_operation_and_type_effects (res,
> @@ -2540,7 +2553,7 @@ 

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread Richard Biener
On Fri, May 10, 2024 at 11:06 AM Segher Boessenkool
 wrote:
>
> On Fri, May 10, 2024 at 04:50:10PM +0800, HAO CHEN GUI wrote:
> > Hi Richard,
> >   Thanks for your comments.
> >
> > 在 2024/5/10 15:16, Richard Biener 写道:
> > > But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no
> > > longer meaningful.  So shouldn't it instead be
> > >
> > >   return cost > 0 ? cost : 1;
> > Yes, it's better.
> >
> > >
> > > ?  Alternatively returning fractions of COSTS_N_INSNS (1) from 
> > > set_src_cost
> > > is invalid and thus the target is at fault (I do think that making zero 
> > > the
> > > unknown value is quite bad since that makes it impossible to have zero
> > > as cost represented).
> > >
> > > It seems the check is to aovid pattern_cost return zero (unknown), so the
> > > comment holds to pattern_cost the same (it returns an 'int' so the better
> > > exceptional value would have been -1, avoiding the compare).
> > But sometime it adds an insn cost. If the unknown cost is -1, the total cost
> > might be distorted.
>
> *All* code using a cost will have to be inspected and possibly adjusted
> if you decide to use a different value for "unknown" than what we have
> had for ages.  All other cost functions interacting with this one, too.

Btw, looking around pattern_cost is the only API documenting this special
value and the function after it using this function, insn_cost does the same
but

int
insn_cost (rtx_insn *insn, bool speed)
{
  if (targetm.insn_cost)
return targetm.insn_cost (insn, speed);

and the target hook doesn't document this special value.  set_src_cost
doesn't either, btw (that just uses rtx_cost).  So I don't think how
pattern_cost handles the set_src_cost result is warranted.  There's
simply no way to detect whether set_src_cost returns an actual
value - on the contrary, it always does.

Richard.

>
> Segher


Re: [COMMITTED] Remove obsolete Solaris 11.3 support

2024-05-10 Thread Richard Biener
On Fri, May 10, 2024 at 10:54 AM John Paul Adrian Glaubitz
 wrote:
>
> Hello Rainer,
>
> On Fri, 2024-05-10 at 10:20 +0200, Rainer Orth wrote:
> > > > Support for Solaris 11.3 had already been obsoleted in GCC 13.  However,
> > > > since the only Solaris system in the cfarm was running 11.3, I've kept
> > > > it in tree until now when both Solaris 11.4/SPARC and x86 systems have
> > > > been added.
> > > >
> > > > This patch actually removes the Solaris 11.3 support.
> > >
> > > I'm not sure I like this change since Solaris 11.3 is the last version of
> > > Solaris supported by a large number of SPARC systems.
> > >
> > > Oracle unfortunately raised the hardware baseline with Solaris 11.4 such
> > > that every system older than the SPARC T4 is no longer supported by 11.4
> > > while 11.3 still runs perfectly fine on these machines.
> >
> > I wonder why you didn't raise your concerns 1 1/2 years ago when I
> > announced the obsoletion of Solaris 11.3 support?
>
> Because I wasn't subscribed to gcc-patches and I'm also only subscribed now
> without receiving messages due to the large message volume on this list.

https://gcc.gnu.org/gcc-13/changes.html

> The problem with announcements on developer mailing lists is usually that they
> usually don't reach any users. I was made aware of this change only when I
> checked about the recent changes to GCC Git.

Where do you expect such announcement then?

Richard.

> > > While Oracle does no longer provide feature updates to Solaris 11.3, there
> > > is still LTSS security support so that users still receive security 
> > > updates
> > > so that their systems are continued to be protected against 
> > > vulnerabilities.
> >
> > The Solaris 11.3 ESUs (Extended Support Updates) are available at a
> > premium only, and just contain the bare minimum of security updates,
> > often 6 to 9 month in between.
>
> That's not an argument for throwing away hardware that still works perfectly
> fine and that still has some users.
>
> > > I think Solaris 11.3 support should be kept since the resulting code 
> > > removal
> > > is not that large that it would justify dropping support for such a large
> > > userbase.
> >
> > Do you have any indication on the size of the userbase?  I seriously
> > doubt it's large beyond some hobbyists that keep the old hardware
> > running.
>
> I don't have the exact numbers, no. But I know there are many users out there
> with pre-11.4 hardware that they still use. As you may know, there are no
> 11.4 SPARC desktop systems and most 11.4-capable hardware is usually very
> expensive.
>
> > You also seem to forget that my GCC (and LLVM) Solaris support work is
> > purely voluntary, done in my spare time.
>
> Not sure what makes you think so. I'm perfectly aware of the fact that lots of
> people do this work in their spare time as this applies to me as well.
>
> I'm not getting paid for my Debian work, my kernel maintenance and all the 
> other
> stuff that I'm doing either. That doesn't mean users are not allowed to ask me
> questions or send me comments about my work.
>
> > Keeping Solaris 11.3 support working would be much more than restoring
> > the removal patch:
> >
> > * For each and every of my Solaris patches, I'd have to investigate if
> >   it works on 11.3 or needs adjustments and workarounds.
> >
> > * I'd also need to regularly test the result to keep things working.
> >
> > I honestly don't have the time or the energy to do this, nor the
> > hardware required for testing  Besides, I have too much on my plate
> > already, and rather spend it on more beneficial work.
>
> Does Solaris support in GCC really change that often that the necessary tests
> cannot be run by volunteers? I'd be happy to test changes for Solaris 11.3
> which can be installed inside an LDOM.
>
> > Above all, I always wonder why people insist on running ancient hardware
> > with an almost-unsupported OS, but require a bleeding edge version of
> > GCC.  What's wrong with continuing to use GCC 13 (or even 14, although I
> > haven't tested that on Solaris 11.3) instead?
>
> You could also ask why people use operating systems other than Linux and
> architectures other than x86_64. I don't think you will get a satisfactory
> answer to that question.
>
> > > Removing Solaris 11.3 support might make sense in the future when SPARC
> > > support in Illumos has matured enough that people can switch over their
> > > machines.
> >
> > As has been noted, SPARC is on its way out for Illumos.
>
> Which makes my point to keep Solaris 11.3 support even more valid.
>
> Adrian
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer
> `. `'   Physicist
>   `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread Richard Biener
On Fri, May 10, 2024 at 4:25 AM HAO CHEN GUI  wrote:
>
> Hi,
>The cost return from set_src_cost might be zero. Zero for
> pattern_cost means unknown cost. So the regularization converts the zero
> to COSTS_N_INSNS (1).
>
>// pattern_cost
>cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
>return cost > 0 ? cost : COSTS_N_INSNS (1);
>
>But if set_src_cost returns a value less than COSTS_N_INSNS (1), it's
> untouched and just returned by pattern_cost. Thus "zero" from set_src_cost
> is higher than "one" from set_src_cost.
>
>   For instance, i386 returns cost "one" for zero_extend op.
> //ix86_rtx_costs
> case ZERO_EXTEND:
>   /* The zero extensions is often completely free on x86_64, so make
>  it as cheap as possible.  */
>   if (TARGET_64BIT && mode == DImode
>   && GET_MODE (XEXP (x, 0)) == SImode)
> *total = 1;
>
>   This patch fixes the problem by converting all costs which are less than
> COSTS_N_INSNS (1) to COSTS_N_INSNS (1).
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?

But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no
longer meaningful.  So shouldn't it instead be

  return cost > 0 ? cost : 1;

?  Alternatively returning fractions of COSTS_N_INSNS (1) from set_src_cost
is invalid and thus the target is at fault (I do think that making zero the
unknown value is quite bad since that makes it impossible to have zero
as cost represented).

It seems the check is to aovid pattern_cost return zero (unknown), so the
comment holds to pattern_cost the same (it returns an 'int' so the better
exceptional value would have been -1, avoiding the compare).

Richard.

> Thanks
> Gui Haochen
>
> ChangeLog
> rtlanal: Correct cost regularization in pattern_cost
>
> For the pattern_cost (insn_cost), the smallest known cost is
> COSTS_N_INSNS (1) and zero means the cost is unknown.  The method calls
> set_src_cost which might returns 0 or a value less than COSTS_N_INSNS (1).
> For these cases, pattern_cost should always return COSTS_N_INSNS (1).
> Current regularization is wrong and a value less than COSTS_N_INSNS (1)
> but larger than 0 will be returned.  This patch corrects it.
>
> gcc/
> * rtlanal.cc (pattern_cost): Return COSTS_N_INSNS (1) when the cost
> is less than COSTS_N_INSNS (1).
>
> patch.diff
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index 4158a531bdd..f7b3d7d72ce 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -5762,7 +5762,7 @@ pattern_cost (rtx pat, bool speed)
>  return 0;
>
>cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
> -  return cost > 0 ? cost : COSTS_N_INSNS (1);
> +  return cost > COSTS_N_INSNS (1) ? cost : COSTS_N_INSNS (1);
>  }
>
>  /* Calculate the cost of a single instruction.  A return value of zero


Re: gcc/DATESTAMP wasn't updated since 20240507

2024-05-10 Thread Richard Biener
On Thu, 9 May 2024, Jakub Jelinek wrote:

> On Thu, May 09, 2024 at 12:14:43PM +0200, Jakub Jelinek wrote:
> > On Thu, May 09, 2024 at 12:04:38PM +0200, Rainer Orth wrote:
> > > I just noticed that gcc/DATESTAMP wasn't updated yesterday and today,
> > > staying at 20240507.
> > 
> > I think it is because of the r15-268 commit, we do support
> > This reverts commit ...
> > when the referenced commit contains a ChangeLog message, but here
> > it doesn't, as it is a revert commit.
> 
> Indeed and also the r15-311 commit.
> Please don't Revert Revert, we don't really support that, had to fix it all
> by hand.

I do wonder if we can run the ChangeLog processing checks as part of
the pre-commit hook and reject such pushes.  It seems we have two
implementations, one in the pre-commit hook and the processing itself
rather than having a single implementation that can run in two modes?

Sorry for the trouble.

Richard.


Re: Ping: [PATCH v3] doc: Correction of Tree SSA Passes info.

2024-05-10 Thread Richard Biener
 in-memory addressable objects to
> > -non-aliased variables that can be renamed into SSA form.  We also
> > -update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
> > -aggregates so that we get fewer false kills.  The pass is located
> > -in @file{tree-ssa-alias.cc} and is described by @code{pass_may_alias}.
> > +It is located in @file{tree-ssa-structalias.cc} and is described
> > +by @code{pass_build_alias}.
> >   
> >   Interprocedural points-to information is located in
> >   @file{tree-ssa-structalias.cc} and described by @code{pass_ipa_pta}.
> > @@ -604,7 +571,7 @@ is described by @code{pass_ipa_tree_profile}.
> >   This pass implements series of heuristics to guess propababilities
> >   of branches.  The resulting predictions are turned into edge profile
> >   by propagating branches across the control flow graphs.
> > -The pass is located in @file{tree-profile.cc} and is described by
> > +The pass is located in @file{predict.cc} and is described by
> >   @code{pass_profile}.
> >   
> >   @item Lower complex arithmetic
> > @@ -653,7 +620,7 @@ in @file{tree-ssa-math-opts.cc} and is described by
> >   @item Full redundancy elimination
> >   
> >   This is a simpler form of PRE that only eliminates redundancies that
> > -occur on all paths.  It is located in @file{tree-ssa-pre.cc} and
> > +occur on all paths.  It is located in @file{tree-ssa-sccvn.cc} and
> >   described by @code{pass_fre}.
> >   
> >   @item Loop optimization
> > @@ -708,7 +675,7 @@ to align the number of iterations, and to align the
> > memory accesses in the
> >   loop.
> >   The pass is implemented in @file{tree-vectorizer.cc} (the main driver),
> >   @file{tree-vect-loop.cc} and @file{tree-vect-loop-manip.cc} (loop specific
> >   parts
> > -and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
> > +and general loop utilities), @file{tree-vect-slp.cc} (loop-aware SLP
> >   functionality), @file{tree-vect-stmts.cc}, @file{tree-vect-data-refs.cc}
> >   and
> >   @file{tree-vect-slp-patterns.cc} containing the SLP pattern matcher.
> >   Analysis of data references is in @file{tree-data-ref.cc}.
> > @@ -755,10 +722,6 @@ the ``copy-of'' relation.  It eliminates redundant
> > copies from the
> >   code.  The pass is located in @file{tree-ssa-copy.cc} and described by
> >   @code{pass_copy_prop}.
> >   
> > -A related pass that works on memory copies, and not just register
> > -copies, is located in @file{tree-ssa-copy.cc} and described by
> > -@code{pass_store_copy_prop}.
> > -
> >   @item Value range propagation
> >   
> >   This transformation is similar to constant propagation but
> > @@ -811,14 +774,6 @@ run last so that we have as much time as possible to
> > prove that the
> >   statement is not reachable.  It is located in @file{tree-cfg.cc} and
> >   is described by @code{pass_warn_function_return}.
> >   
> > -@item Leave static single assignment form
> > -
> > -This pass rewrites the function such that it is in normal form.  At
> > -the same time, we eliminate as many single-use temporaries as possible,
> > -so the intermediate language is no longer GIMPLE, but GENERIC@.  The
> > -pass is located in @file{tree-outof-ssa.cc} and is described by
> > -@code{pass_del_ssa}.
> > -
> >   @item Merge PHI nodes that feed into one another
> >   
> >   This is part of the CFG cleanup passes.  It attempts to join PHI nodes
> > @@ -857,25 +812,9 @@ pass is located in @file{tree-object-size.cc} and is
> > described by
> >   @item Loop invariant motion
> >   
> >   This pass removes expensive loop-invariant computations out of loops.
> > -The pass is located in @file{tree-ssa-loop.cc} and described by
> > +The pass is located in @file{tree-ssa-loop-im.cc} and described by
> >   @code{pass_lim}.
> >   
> > -@item Loop nest optimizations
> > -
> > -This is a family of loop transformations that works on loop nests.  It
> > -includes loop interchange, scaling, skewing and reversal and they are
> > -all geared to the optimization of data locality in array traversals
> > -and the removal of dependencies that hamper optimizations such as loop
> > -parallelization and vectorization.  The pass is located in
> > -@file{tree-loop-linear.c} and described by
> > -@code{pass_linear_transform}.
> > -
> > -@item Removal of empty loops
> > -
> > -This pass removes loops with no code in them.  The pass is located in
> > -@file{tree-ssa-loop-ivcanon.cc} and described by
> > -@code{pass_empty_loop}.
> > -
> >   @item Unrolling of small loops
> >   
> >   This pass completely unrolls loops with few iterations.  The pass
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 1/4] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-05-10 Thread Richard Biener
On Thu, May 9, 2024 at 9:12 PM Joseph Myers  wrote:
>
> On Wed, 8 May 2024, Kewen.Lin wrote:
>
> > to widen IFmode to TFmode.  To make build_common_tree_nodes
> > be able to find the correct mode for long double type node,
> > it introduces one hook mode_for_longdouble to offer target
> > a way to specify the mode used for long double type node.
>
> I don't really like layering a hook on top of the old target macro as a
> way to address a deficiency in the design of that target macro (floating
> types should have their mode, not a poorly defined precision value,
> specified directly by the target).

Seconded.

> A better hook design might be something like mode_for_floating_type (enum
> tree_index), where the argument is TI_FLOAT_TYPE, TI_DOUBLE_TYPE or
> TI_LONG_DOUBLE_TYPE, replacing all definitions and uses of
> FLOAT_TYPE_SIZE, DOUBLE_TYPE_SIZE and LONG_DOUBLE_TYPE_SIZE with the
> single new hook and appropriate definitions for each target (with a
> default definition that uses SFmode for float and DFmode for double and
> long double, which would be suitable for many targets).

In fact replacing all of X_TYPE_SIZE with a single hook might be worthwhile
though this removes the "convenient" defaulting, requiring each target to
enumerate all standard C ABI type modes.  But that might be also a good thing.

The most pragmatic solution would be to do
s/LONG_DOUBLE_TYPE_SIZE/LONG_DOUBLE_TYPE_MODE/

Richard.

> --
> Joseph S. Myers
> josmy...@redhat.com
>


Re: [PATCH] [RFC] Add function filtering to gcov

2024-05-08 Thread Richard Biener
On Fri, Mar 29, 2024 at 8:02 PM Jørgen Kvalsvik  wrote:
>
> This is a prototype for --include/--exclude flags, and I would like a
> review of both the approach and architecture, and the implementation,
> plus feedback on the feature itself. I did not update the manuals or
> carefully extend --help, in case the interface itself needs some
> revision before it can be merged.
>
> ---
>
> Add the --include and --exclude flags to gcov to control what functions
> to report on. This is meant to make gcov more practical as an when
> writing test suites or performing other coverage experiments, which
> tends to focus on a few functions at the time. This really shines in
> combination with the -t/--stdout flag. With support for more expansive
> metrics in gcov like modified condition/decision coverage (MC/DC) and
> path coverage, output quickly gets overwhelming without filtering.
>
> The approach is quite simple: filters are egrep regexes and are
> evaluated left-to-right, and the last filter "wins", that is, if a
> function matches an --include and a subsequent --exclude, it should not
> be included in the output. The output machinery is already interacting
> with the function table, which makes the json output work as expected,
> and only minor changes are needed to suppress the filtered-out
> functions.
>
> Demo: math.c
>
> int mul (int a, int b) {
> return a * b;
> }
>
> int sub (int a, int b) {
> return a - b;
> }
>
> int sum (int a, int b) {
> return a + b;
> }
>
> Plain matches:
>
> $ gcov -t math --include=sum
> -:0:Source:filter.c
> -:0:Graph:filter.gcno
> -:0:Data:-
> -:0:Runs:0
> #:9:int sum (int a, int b) {
> #:   10:return a + b;
>
> $ gcov -t math --include=mul
> -:0:Source:filter.c
> -:0:Graph:filter.gcno
> -:0:Data:-
> -:0:Runs:0
> #:1:int mul (int a, int b) {
> #:2:return a * b;
>
> Regex match:
>
> $ gcov -t math --include=su
> -:0:Source:filter.c
> -:0:Graph:filter.gcno
> -:0:Data:-
> -:0:Runs:0
> #:5:int sub (int a, int b) {
> #:6:return a - b;
> -:7:}
> #:9:int sum (int a, int b) {
> #:   10:return a + b;
>
> And similar for exclude:
>
> $ gcov -t math --exclude=sum
> -:0:Source:filter.c
> -:0:Graph:filter.gcno
> -:0:Data:-
> -:0:Runs:0
> #:1:int mul (int a, int b) {
> #:2:return a * b;
> -:3:}
> #:5:int sub (int a, int b) {
> #:6:return a - b;
>
> And json, for good measure:
>
> $ gcov -t math --include=sum --json | jq ".files[].lines[]"
> {
>   "line_number": 9,
>   "function_name": "sum",
>   "count": 0,
>   "unexecuted_block": true,
>   "block_ids": [],
>   "branches": [],
>   "calls": []
> }
> {
>   "line_number": 10,
>   "function_name": "sum",
>   "count": 0,
>   "unexecuted_block": true,
>   "block_ids": [
> 2
>   ],
>   "branches": [],
>   "calls": []
> }
>
> Note that the last function gets "clipped" when lines are associated to
> functions, which means the closing brace is dropped from the report. I
> hope this can be fixed, but considering it is not really a part of the
> function body, the gcov report is "complete".
>
> Matching generally work well for mangled names, as the mangled names
> also have the base symbol name in it. A possible extension to the
> filtering commands would be to mix it with demangling to more nicely
> being able to filter specific overloads, without manually having to
> mangle the interesting symbols. The g++.dg/gcov/gcov-20.C test tests the
> matching of a mangled name.
>
> The dejagnu testing function verify-calls is somewhat minimal, but does
> the job well enough.
>
> Why not just use grep? grep is not really sufficient, as grep is very
> line oriented, and the reports that benefit the most from filtering
> often span multiple lines, unpredictably.

For JSON output I suppose there's a way to "grep" without the line oriented
issue?  I suppose we could make the JSON more hierarchical by adding
an outer function object?

That said, I think this is a useful feature and thus OK for trunk if there are
no other comments in about a week if you also update the gcov documentation.

Thanks,
Richard.

> ---
>  gcc/gcov.cc| 101 +++--
>  gcc/testsuite/g++.dg/gcov/gcov-19.C|  35 +
>  gcc/testsuite/g++.dg/gcov/gcov-20.C|  38 ++
>  gcc/testsuite/gcc.misc-tests/gcov-24.c |  20 +
>  gcc/testsuite/gcc.misc-tests/gcov-25.c |  23 ++
>  gcc/testsuite/gcc.misc-tests/gcov-26.c |  23 ++
>  gcc/testsuite/gcc.misc-tests/gcov-27.c |  22 ++
>  gcc/testsuite/lib/gcov.exp |  53 -
>  8 files changed, 306 insertions(+), 9 deletions(-)
>  create mode 100644 

Re: [PATCH] tree-ssa-sink: Improve code sinking pass

2024-05-08 Thread Richard Biener
On Wed, Mar 13, 2024 at 2:56 PM Ajit Agarwal  wrote:
>
> Hello Richard:
>
> Currently, code sinking will sink code at the use points with loop having same
> nesting depth. The following patch improves code sinking by placing the sunk
> code in begining of the block after the labels.
>
> For example :
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>   l = a + b + c + d +e + f;
>   if (a != 5)
> {
>   bar();
>   j = l;
> }
> }
>
> Code Sinking does the following:
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>
>   if (a != 5)
> {
>   l = a + b + c + d +e + f;
>   bar();
>   j = l;
> }
> }
>
> Bootstrapped regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
>
> tree-ssa-sink: Improve code sinking pass
>
> Currently, code sinking will sink code at the use points with loop having same
> nesting depth. The following patch improves code sinking by placing the sunk
> code in begining of the block after the labels.
>
> 2024-03-13  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> PR tree-optimization/81953
> * tree-ssa-sink.cc (statement_sink_location):Sink statements at
> the begining of the basic block after labels.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/81953
> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 +++
>  gcc/tree-ssa-sink.cc|  7 ++-
>  2 files changed, 17 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 000..d3b79ca5803
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 880d6f70a80..1ec5c048fe7 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -208,7 +208,6 @@ select_best_block (basic_block early_bb,
>  loop nest.  */
>temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>  }
> -
>/* Placing a statement before a setjmp-like function would be invalid
>   (it cannot be reevaluated when execution follows an abnormal edge).
>   If we selected a block with abnormal predecessors, just punt.  */
> @@ -430,6 +429,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
> continue;
>   break;
> }
> +
>use = USE_STMT (one_use);
>
>if (gimple_code (use) != GIMPLE_PHI)

OK if you avoid the stray whitespace changes above.

Richard.

> @@ -439,10 +439,7 @@ statement_sink_location (gimple *stmt, basic_block 
> frombb,
>   if (sinkbb == frombb)
> return false;
>
> - if (sinkbb == gimple_bb (use))
> -   *togsi = gsi_for_stmt (use);
> - else
> -   *togsi = gsi_after_labels (sinkbb);
> + *togsi = gsi_after_labels (sinkbb);
>
>   return true;
> }
> --
> 2.39.3
>


[PATCH] Fix SLP reduction initial value for pointer reductions

2024-05-08 Thread Richard Biener
For pointer reductions we need to convert the initial value to
the vector component integer type.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

I've ran into this latent bug on the force-slp branch.

Richard.

* tree-vect-loop.cc (get_initial_defs_for_reduction): Convert
initial value to the vector component type.
---
 gcc/tree-vect-loop.cc | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 29c03c246d4..704df7bdcc7 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5618,7 +5618,14 @@ get_initial_defs_for_reduction (loop_vec_info loop_vinfo,
   if (i >= initial_values.length () || (j > i && neutral_op))
op = neutral_op;
   else
-   op = initial_values[i];
+   {
+ if (!useless_type_conversion_p (TREE_TYPE (vector_type),
+ TREE_TYPE (initial_values[i])))
+   initial_values[i] = gimple_convert (_seq,
+   TREE_TYPE (vector_type),
+   initial_values[i]);
+ op = initial_values[i];
+   }
 
   /* Create 'vect_ = {op0,op1,...,opn}'.  */
   number_of_places_left_in_vector--;
-- 
2.35.3


[PATCH] Fix non-grouped SLP load/store accounting in alignment peeling

2024-05-08 Thread Richard Biener
When we have a non-grouped access we bogously multiply by zero.
This shows most with single-lane SLP but also happens with
the multi-lane splat case.

Re-bootstrap & regtest running on x86_64-unknown-linux-gnu.

I've ran into this latent bug on the force-slp branch.

Richard.

* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment):
Properly guard DR_GROUP_SIZE access with STMT_VINFO_GROUPED_ACCESS.
---
 gcc/tree-vect-data-refs.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index c531079d3bb..ae237407672 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -2290,8 +2290,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
{
  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
- nscalars = (STMT_SLP_TYPE (stmt_info)
- ? vf * DR_GROUP_SIZE (stmt_info) : vf);
+ unsigned group_size = 1;
+ if (STMT_SLP_TYPE (stmt_info)
+ && STMT_VINFO_GROUPED_ACCESS (stmt_info))
+   group_size = DR_GROUP_SIZE (stmt_info);
+ nscalars = vf * group_size;
}
 
  /* Save info about DR in the hash table.  Also include peeling
-- 
2.35.3


Re: [PATCH] tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops

2024-05-08 Thread Richard Biener
On Wed, May 8, 2024 at 9:56 AM Stefan Schulze Frielinghaus
 wrote:
>
> On s390 the following tests fail
>
> FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .CLZ (vect" 1
> FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .POPCOUNT (vect" 1
> FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .CLZ 
> (vect" 1
> FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .POPCOUNT 
> (vect" 1
> FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .CTZ (vect" 2
> FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .POPCOUNT (vect" 1
> FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .CTZ 
> (vect" 2
> FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .POPCOUNT 
> (vect" 1
> FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .CTZ (vect" 2
> FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .POPCOUNT (vect" 1
> FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .CTZ 
> (vect" 2
> FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .POPCOUNT 
> (vect" 1
>
> because aprefetch unrolls loops even if -fno-unroll-loops is used.
> Accordingly, the scan patterns match more than one time.
>
> Could also be fixed by using -fno-prefetch-loop-arrays for the tests.
> Though, I tend to prefer if aprefetch honours -fno-unroll-loops.  Any
> preferences?
>
> Bootstrapped and regtested on x86_64 and s390.  Ok for mainline?

OK.

Richard.

> gcc/ChangeLog:
>
> * tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour
> -fno-unroll-loops.
> ---
>  gcc/tree-ssa-loop-prefetch.cc | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc
> index 70073cc4fe4..bb5d5dec779 100644
> --- a/gcc/tree-ssa-loop-prefetch.cc
> +++ b/gcc/tree-ssa-loop-prefetch.cc
> @@ -1401,6 +1401,10 @@ determine_unroll_factor (class loop *loop, struct 
> mem_ref_group *refs,
>struct mem_ref_group *agp;
>struct mem_ref *ref;
>
> +  /* Bail out early in case we must not unroll loops.  */
> +  if (!flag_unroll_loops)
> +return 1;
> +
>/* First check whether the loop is not too large to unroll.  We ignore
>   PARAM_MAX_UNROLL_TIMES, because for small loops, it prevented us
>   from unrolling them enough to make exactly one cache line covered by 
> each
> --
> 2.44.0
>


[PATCH] Fix and speedup IDF pruning by dominator

2024-05-08 Thread Richard Biener
When insert_updated_phi_nodes_for tries to skip pruning the IDF to
blocks dominated by the nearest common dominator of the set of
definition blocks it compares against ENTRY_BLOCK but that's never
going to be the common dominator.  In fact if it ever were the code
fails to copy IDF to PRUNED_IDF, leading to wrong code.

The following fixes that by avoiding the copy and pruning from the
IDF in-place as well as using the more approprate check against
the single successor of the ENTRY_BLOCK.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I've tried to split the patch but that runs into the pre-existing
issue, appearantly I had never tested the two patches separately
so now here's the squashed variant.  Pushed.

* tree-into-ssa.cc (insert_updated_phi_nodes_for): Skip
pruning when the nearest common dominator is the successor
of ENTRY_BLOCK.  Do not copy IDF but prune it directly.
---
 gcc/tree-into-ssa.cc | 47 +++-
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 705e4119ba3..3732c269ca3 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -3233,7 +3233,7 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 {
   basic_block entry;
   def_blocks *db;
-  bitmap idf, pruned_idf;
+  bitmap pruned_idf;
   bitmap_iterator bi;
   unsigned i;
 
@@ -3250,8 +3250,7 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 return;
 
   /* Compute the initial iterated dominance frontier.  */
-  idf = compute_idf (db->def_blocks, dfs);
-  pruned_idf = BITMAP_ALLOC (NULL);
+  pruned_idf = compute_idf (db->def_blocks, dfs);
 
   if (TREE_CODE (var) == SSA_NAME)
 {
@@ -3262,27 +3261,32 @@ insert_updated_phi_nodes_for (tree var, bitmap_head 
*dfs,
 common dominator of all the definition blocks.  */
  entry = nearest_common_dominator_for_set (CDI_DOMINATORS,
db->def_blocks);
- if (entry != ENTRY_BLOCK_PTR_FOR_FN (cfun))
-   EXECUTE_IF_SET_IN_BITMAP (idf, 0, i, bi)
- if (BASIC_BLOCK_FOR_FN (cfun, i) != entry
- && dominated_by_p (CDI_DOMINATORS,
-BASIC_BLOCK_FOR_FN (cfun, i), entry))
-   bitmap_set_bit (pruned_idf, i);
+ if (entry != single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
+   {
+ unsigned to_remove = ~0U;
+ EXECUTE_IF_SET_IN_BITMAP (pruned_idf, 0, i, bi)
+   {
+ if (to_remove != ~0U)
+   {
+ bitmap_clear_bit (pruned_idf, to_remove);
+ to_remove = ~0U;
+   }
+ if (BASIC_BLOCK_FOR_FN (cfun, i) == entry
+ || !dominated_by_p (CDI_DOMINATORS,
+ BASIC_BLOCK_FOR_FN (cfun, i), entry))
+   to_remove = i;
+   }
+ if (to_remove != ~0U)
+   bitmap_clear_bit (pruned_idf, to_remove);
+   }
}
   else
-   {
- /* Otherwise, do not prune the IDF for VAR.  */
- gcc_checking_assert (update_flags == TODO_update_ssa_full_phi);
- bitmap_copy (pruned_idf, idf);
-   }
-}
-  else
-{
-  /* Otherwise, VAR is a symbol that needs to be put into SSA form
-for the first time, so we need to compute the full IDF for
-it.  */
-  bitmap_copy (pruned_idf, idf);
+   /* Otherwise, do not prune the IDF for VAR.  */
+   gcc_checking_assert (update_flags == TODO_update_ssa_full_phi);
 }
+  /* Otherwise, VAR is a symbol that needs to be put into SSA form
+ for the first time, so we need to compute the full IDF for
+ it.  */
 
   if (!bitmap_empty_p (pruned_idf))
 {
@@ -3309,7 +3313,6 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 }
 
   BITMAP_FREE (pruned_idf);
-  BITMAP_FREE (idf);
 }
 
 /* Sort symbols_to_rename after their DECL_UID.  */
-- 
2.35.3


Re: [PATCH] reassoc: Fix up optimize_range_tests_to_bit_test [PR114965]

2024-05-08 Thread Richard Biener
On Wed, 8 May 2024, Jakub Jelinek wrote:

> Hi!
> 
> The optimize_range_tests_to_bit_test optimization normally emits a range
> test first:
>   if (entry_test_needed)
> {
>   tem = build_range_check (loc, optype, unshare_expr (exp),
>false, lowi, high);
>   if (tem == NULL_TREE || is_gimple_val (tem))
> continue;
> }
> so during the bit test we already know that exp is in the [lowi, high]
> range, but skips it if we have range info which tells us this isn't
> necessary.
> Also, normally it emits shifts by exp - lowi counter, but has an
> optimization to use just exp counter if the mask isn't a more expensive
> constant in that case and lowi is > 0 and high is smaller than prec.
> 
> The following testcase is miscompiled because the two abnormal cases
> are triggered.  The range of exp is [43, 43][48, 48][95, 95], so we on
> 64-bit arch decide we don't need the entry test, because 95 - 43 < 64.
> And we also decide to use just exp as counter, because the range test
> tests just for exp == 43 || exp == 48, so high is smaller than 64 too.
> Because 95 is in the exp range, we can't do that, we'd either need to
> do a range test first, i.e.
> if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1))
> or need to subtract lowi from the shift counter, i.e.
> if ((1UL << (exp - 43)) & mask2)
> but can't do both unless r.upper_bound () is < prec.
> 
> The following patch ensures that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-05-08  Jakub Jelinek  
> 
>   PR tree-optimization/114965
>   * tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to
>   optimize away exp - lowi subtraction from shift count unless entry
>   test is emitted or unless r.upper_bound () is smaller than prec.
> 
>   * gcc.c-torture/execute/pr114965.c: New test.
> 
> --- gcc/tree-ssa-reassoc.cc.jj2024-01-12 10:07:58.384848977 +0100
> +++ gcc/tree-ssa-reassoc.cc   2024-05-07 18:18:45.558814991 +0200
> @@ -3418,7 +3418,8 @@ optimize_range_tests_to_bit_test (enum t
>We can avoid then subtraction of the minimum value, but the
>mask constant could be perhaps more expensive.  */
> if (compare_tree_int (lowi, 0) > 0
> -   && compare_tree_int (high, prec) < 0)
> +   && compare_tree_int (high, prec) < 0
> +   && (entry_test_needed || wi::ltu_p (r.upper_bound (), prec)))
>   {
> int cost_diff;
> HOST_WIDE_INT m = tree_to_uhwi (lowi);
> --- gcc/testsuite/gcc.c-torture/execute/pr114965.c.jj 2024-05-07 
> 18:17:16.767031821 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr114965.c2024-05-07 
> 18:15:52.332188943 +0200
> @@ -0,0 +1,30 @@
> +/* PR tree-optimization/114965 */
> +
> +static void
> +foo (const char *x)
> +{
> +
> +  char a = '0';
> +  while (1)
> +{
> +  switch (*x)
> + {
> + case '_':
> +     case '+':
> +   a = *x;
> +   x++;
> +   continue;
> + default:
> +   break;
> + }
> +  break;
> +}
> +  if (a == '0' || a == '+')
> +__builtin_abort ();
> +}
> +
> +int
> +main ()
> +{
> +  foo ("_");
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-05-08 Thread Richard Biener
On Wed, May 8, 2024 at 7:50 AM Kewen.Lin  wrote:
>
> Hi,
>
> As the discussion in PR112980, although the current
> implementation for -fpatchable-function-entry* conforms
> with the documentation (making N NOPs be consecutive),
> it's inefficient for both kernel and userspace livepatching
> (see comments in PR for the details).
>
> So this patch is to change the current implementation by
> emitting the "before" NOPs before global entry point and
> the "after" NOPs after local entry point.  The new behavior
> would not keep NOPs to be consecutive, so the documentation
> is updated to emphasize this.
>
> Bootstrapped and regress-tested on powerpc64-linux-gnu
> P8/P9 and powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?  And backporting to active branches
> after burn-in time?  I guess we should also mention this
> change in changes.html?
>
> BR,
> Kewen
> -
> PR target/112980
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
> Adjust the handling on patch area emitting with dual entry, remove
> the restriction on "before" NOPs count, not emit "before" NOPs any
> more but only emit "after" NOPs.
> * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
> Adjust by respecting cfun->machine->stop_patch_area_print.
> (rs6000_elf_declare_function_name): For ELFv2 with dual entry, set
> cfun->machine->stop_patch_area_print as true.
> * config/rs6000/rs6000.h (struct machine_function): Remove member
> global_entry_emitted, add new member stop_patch_area_print.
> * doc/invoke.texi (option -fpatchable-function-entry): Adjust the
> documentation for PowerPC ELFv2 dual entry.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/patchable_function_entry-default.c: Adjust.
> * gcc.target/powerpc/pr99888-4.c: Likewise.
> * gcc.target/powerpc/pr99888-5.c: Likewise.
> * gcc.target/powerpc/pr99888-6.c: Likewise.
> ---
>  gcc/config/rs6000/rs6000-logue.cc | 40 +--
>  gcc/config/rs6000/rs6000.cc   | 15 +--
>  gcc/config/rs6000/rs6000.h| 10 +++--
>  gcc/doc/invoke.texi   |  8 ++--
>  .../patchable_function_entry-default.c|  3 --
>  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
>  8 files changed, 33 insertions(+), 55 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..0eb019b44b3 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file)
>   fprintf (file, "\tadd 2,2,12\n");
> }
>
> -  unsigned short patch_area_size = crtl->patch_area_size;
> -  unsigned short patch_area_entry = crtl->patch_area_entry;
> -  /* Need to emit the patching area.  */
> -  if (patch_area_size > 0)
> -   {
> - cfun->machine->global_entry_emitted = true;
> - /* As ELFv2 ABI shows, the allowable bytes between the global
> -and local entry points are 0, 4, 8, 16, 32 and 64 when
> -there is a local entry point.  Considering there are two
> -non-prefixed instructions for global entry point prologue
> -(8 bytes), the count for patchable nops before local entry
> -point would be 2, 6 and 14.  It's possible to support those
> -other counts of nops by not making a local entry point, but
> -we don't have clear use cases for them, so leave them
> -unsupported for now.  */
> - if (patch_area_entry > 0)
> -   {
> - if (patch_area_entry != 2
> - && patch_area_entry != 6
> - && patch_area_entry != 14)
> -   error ("unsupported number of nops before function entry 
> (%u)",
> -  patch_area_entry);
> - rs6000_print_patchable_function_entry (file, patch_area_entry,
> -true);
> - patch_area_size -= patch_area_entry;
> -   }
> -   }
> -
>fputs ("\t.localentry\t", file);
>assemble_name (file, name);
>fputs (",.-", file);
>assemble_name (file, name);
>fputs ("\n", file);
>/* Emit the nops after local entry.  */
> -  if (patch_area_size > 0)
> -   rs6000_print_patchable_function_entry (file, patch_area_size,
> -  patch_area_entry == 0);
> +  unsigned short patch_area_size = crtl->patch_area_size;
> +  unsigned short patch_area_entry = crtl->patch_area_entry;
> +  if (patch_area_size > patch_area_entry)
> +   {
> + cfun->machine->stop_patch_area_print 

Re: [PATCH] match: `a CMP nonnegative ? a : ABS` simplified to just `ABS` [PR112392]

2024-05-08 Thread Richard Biener
On Wed, May 8, 2024 at 5:25 AM Andrew Pinski  wrote:
>
> We can optimize `a == nonnegative ? a : ABS`, `a > nonnegative ? a : 
> ABS`
> and `a >= nonnegative ? a : ABS` into `ABS`. This allows removal of
> some extra comparison and extra conditional moves in some cases.
> I don't remember where I had found though but it is simple to add so
> let's add it.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Note I have a secondary pattern for the equal case as either a or nonnegative
> could be used.

OK

> PR tree-optimization/112392
>
> gcc/ChangeLog:
>
> * match.pd (`x CMP nonnegative ? x : ABS`): New pattern;
> where CMP is ==, > and >=.
> (`x CMP nonnegative@y ? y : ABS`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-41.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd   | 15 ++
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c | 34 ++
>  2 files changed, 49 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 03a03c31233..07e743ae464 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5876,6 +5876,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (convert (absu:utype @0)))
>  @3
>
> +/* X >  Positive ? X : ABS(X) -> ABS(X) */
> +/* X >= Positive ? X : ABS(X) -> ABS(X) */
> +/* X == Positive ? X : ABS(X) -> ABS(X) */
> +(for cmp (eq gt ge)
> + (simplify
> +  (cond (cmp:c @0 tree_expr_nonnegative_p@1) @0 (abs@3 @0))
> +  (if (INTEGRAL_TYPE_P (type))
> +   @3)))
> +
> +/* X == Positive ? Positive : ABS(X) -> ABS(X) */
> +(simplify
> + (cond (eq:c @0 tree_expr_nonnegative_p@1) @1 (abs@3 @0))
> + (if (INTEGRAL_TYPE_P (type))
> +  @3))
> +
>  /* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
> X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
> new file mode 100644
> index 000..9774e283a7b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-phiopt1" } */
> +/* PR tree-optimization/112392 */
> +
> +int feq_1(int a, unsigned char b)
> +{
> +  int absb = b;
> +  if (a == absb)  return absb;
> +  return a > 0 ? a : -a;
> +}
> +int feq_2(int a, unsigned char b)
> +{
> +  int absb = b;
> +  if (a == absb)  return a;
> +  return a > 0 ? a : -a;
> +}
> +
> +int fgt(int a, unsigned char b)
> +{
> +  int absb = b;
> +  if (a > absb)  return a;
> +  return a > 0 ? a : -a;
> +}
> +
> +int fge(int a, unsigned char b)
> +{
> +  int absb = b;
> +  if (a >= absb)  return a;
> +  return a > 0 ? a : -a;
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 4 "phiopt1" } } */
> --
> 2.43.0
>


Re: [PATCH] MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) to match

2024-05-08 Thread Richard Biener
On Tue, May 7, 2024 at 10:56 PM Andrew Pinski  wrote:
>
> On Tue, May 7, 2024 at 1:45 PM Jeff Law  wrote:
> >
> >
> >
> > On 4/30/24 9:21 PM, Andrew Pinski wrote:
> > > This adds a few more of what is currently done in phiopt's 
> > > value_replacement
> > > to match. I noticed this when I was hooking up phiopt's value_replacement
> > > code to use match and disabling the old code. But this can be done
> > > independently from the hooking up phiopt's value_replacement as phiopt
> > > is already hooked up for simplified versions already.
> > >
> > > /* a != 0 ? a / b : 0  -> a / b iff b is nonzero. */
> > > /* a != 0 ? a * b : 0 -> a * b */
> > > /* a != 0 ? a & b : 0 -> a & b */
> > >
> > > We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` 
> > > which
> > > uses that form.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > >   PR treee-optimization/114894
> > >
> > > gcc/ChangeLog:
> > >
> > >   * match.pd (`a != 0 ? a / b : 0`): New pattern.
> > >   (`a != 0 ? a * b : 0`): New pattern.
> > >   (`a != 0 ? a & b : 0`): New pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/tree-ssa/phi-opt-value-5.c: New test.
> > Is there any need to also handle the reversed conditional with the arms
> > swapped?If not, this is fine as-is.  If yes, then fine with the
> > obvious generalization.
>
> The answer is yes and no. While the PHI-OPT pass will try both cases
> but the other (all?) passes does not. This is something I have been
> thinking about trying to solve in a generic way instead of adding many
> more patterns here. I will start working on that in the middle of
> June.
> Most of the time cond patterns in match are used is inside phiopt so
> having the revered conditional has not been on high on my priority but
> with VRP and scev and match (itself) producing more cond_expr, we
> should fix this once and for all for GCC 15.

IMO this is a classical case for canonicalization.  IIRC in fold we
rely on tree_swap_operands_p for the COND_EXPR arms and if
we can invert the condition we do so.  So there's a conflict of interest
with respect to condition canonicalization and true/false canonicalization.
We do not canonicalize COND_EXPRs in gimple_resimplify3, but
the only natural thing there would be to do it based on the op2/op3
operands, looking at the conditional would dive down one level too deep.

Richard.

> Thanks,
> Andrew Pinski
>
> >
> > jeff
> >


Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-08 Thread Richard Biener
asible.
> >
> > If the flexible array member is moved to the end of the compounds,
> > the static initialization still work. What’s the issue here?
> >
> > > > My question: is it possible to update your source code to move
> > > > the structure with flexible array member to the end of the containing
> > > > structure?
> > > >
> > > > i.e, in your example, in the struct Thread_Configured_control,
> > > > move the field “Thread_Control Control” to the end of the structure?
> > >
> > > If we move the Thread_Control to the end, how would I add a
> > > configuration defined number of elements at the end?
> >
> > Don’t understand this, why moving the Thread_Control Control” to
> > the end of the containing structure will make this a problem?
> > Could you please explain this with a simplified example?
> 
> I found your example at [2] and tried to trim/summarize it here:
> 
> 
> struct _Thread_Control {
> Objects_Control Object;
> ...
> void*extensions[];
> };
> typedef struct _Thread_Control Thread_Control;
> 
> struct Thread_Configured_control {
>   Thread_Control Control;
> 
>   #if CONFIGURE_MAXIMUM_USER_EXTENSIONS > 0
> void *extensions[ CONFIGURE_MAXIMUM_USER_EXTENSIONS + 1 ];
>   #endif
>   Configuration_Scheduler_node Scheduler_nodes[ _CONFIGURE_SCHEDULER_COUNT ];
>   RTEMS_API_Control API_RTEMS;
>   #ifdef RTEMS_POSIX_API
> POSIX_API_Control API_POSIX;
>   #endif
>   #if CONFIGURE_MAXIMUM_THREAD_NAME_SIZE > 1
> char name[ CONFIGURE_MAXIMUM_THREAD_NAME_SIZE ];
>   #endif
>   #if defined(_CONFIGURE_ENABLE_NEWLIB_REENTRANCY) && \
> !defined(_REENT_THREAD_LOCAL)
> struct _reent Newlib;
>   #endif
> };
> 
> #define THREAD_INFORMATION_DEFINE( name, api, cls, max ) \
> ...
> static ... \
> Thread_Configured_control \
> name##_Objects[ _Objects_Maximum_per_allocation( max ) ]; \
> ...
> 
> 
> I don't see any static initialization of struct _Thread_Control::extensions
> nor any member initialization of the name##_Objects, and even then that
> is all legal in any arrangement:
> 
> truct flex  { int length; char data[]; };
> struct mid_flex { int m; struct flex flex_data; int n; int o; };
> struct end_flex { int m; int n; struct flex flex_data; };
> 
> struct flex f = { .length = 2 };
> struct mid_flex m = { .m = 5 };
> struct end_flex e = { .m = 5 };
> 
> struct flex fa[4] = { { .length = 2 } };
> struct mid_flex ma[4] = { { .m = 5 } };
> struct end_flex ea[4] = { { .m = 5 } };
> 
> These all work.
> 
> 
> But yes, I see why you can't move Thread_Control trivially to the end. It
> looks like you're depending on the implicit overlapping memory locations
> between struct _Thread_Control and things that include it as the first
> struct member, like struct Thread_Configured_control above:
> 
> cpukit/score/src/threaditerate.c:  the_thread = (Thread_Control *) 
> information->local_table[ index ];
> 
> (In the Linux kernel we found this kind of open casting to be very
> fragile and instead use a horrific wrapper called "container_of"[3] that
> does the pointer math (possibly to an offset of 0 for a direct cast) to
> find the member.)
> 
> Anyway, for avoiding the warning, you can just keep using the extension
> and add -Wno-... if it ever ends up in -Wall, or you can redefine struct
> _Thread_Control to avoid having the "extensions" member at all. This is
> what we've done in several cases in Linux. For example if we had this
> again, but made to look more like Thread_Control:
> 
> struct flex { int foo; int bar; char data[]; };
> struct mid_flex { struct flex hdr; int n; int o; };
> 
> It could be changed to:
> 
> struct flex_hdr { int foo; int bar; };
> struct flex { struct flex_hdr hdr; char data[]; };
> struct mid_flex { struct flex_hdr hdr; int n; int o; };
> 
> This has some collateral changes needed to reference the struct flex_hdr
> members from struct flex now (f->hdr.foo instead of f->foo). Sometimes
> this can be avoided by using a union, as I did in a recent refactoring
> in Linux: [4]
> 
> For more complex cases in Linux we've handled this by using our
> "struct_group"[5] macro, which allows for a union and tagged struct to
> be constructed:
> 
> struct flex {
>   __struct_group(flex_hdr, hdr,,
>   int foo;
>   int bar;
>   );
>   char data[];
> };
> struct mid_flex { struct flex_hdr hdr; int n; int o; };
> 
> Then struct flex member names don't have to change, but if anything is
> trying to get at struct flex::data through struct mid_flex::hdr, that'll
> need casting. But it _shouldn't_ since it has "n" and "o".
> 
> -Kees
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620122.html
> [2] https://github.com/RTEMS/rtems
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/container_of.h#n10
> [4] https://git.kernel.org/linus/896880ff30866f386ebed14ab81ce1ad3710cfc4
> [5] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/stddef.h?h=v6.8#n11
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] PR middle-end/111701: signbit(x*x) vs -fsignaling-nans

2024-05-08 Thread Richard Biener
On Tue, May 7, 2024 at 10:44 PM Joseph Myers  wrote:
>
> On Fri, 3 May 2024, Richard Biener wrote:
>
> > So what I do not necessarily agree with is that we need to preserve
> > the multiplication with -fsignaling-nans.  Do we consider a program doing
> >
> > handler() { exit(0); }
> >
> >  x = sNaN;
> > ...
> >  sigaction(SIGFPE, ... handler)
> >  x*x;
> >  format_hard_drive();
> >
> > and expecting the program to exit(0) rather than formating the hard-disk
> > to be expecting something the C standard guarantees?  And is it enough
> > for the program to enable -fsignaling-nans for this?
> >
> > If so then the first and foremost bug is that 'x*x' doesn't have
> > TREE_SIDE_EFFECTS
> > set and thus we do not preserve it when optimizing __builtin_signbit () of 
> > it.
>
> Signaling NaNs don't seem relevant here.  "Signal" means "set the
> exception flag" - and 0 * Inf raises the same "invalid" exception flag as
> sNaN * sNaN.  Changing flow of control on an exception is outside the
> scope of standard C and requires nonstandard extensions such as
> feenableexcept.  (At present -ftrapping-math covers both kinds of
> exception handling - the default setting of a flag, and the nonstandard
> change of flow of control.)

So it's reasonable to require -fnon-call-exceptions (which now enables
-fexceptions) and -fno-delete-dead-exceptions to have GCC preserve
a change of control flow side-effect of x*x?  We do not preserve
FP exception bits set by otherwise unused operations, that is, we
do not consider that side-effect to be observable even with
-ftrapping-math.  In fact I most uses of flag_trapping_math
are related to a possible control flow side-effect of FP math.
Exact preservation of FP exception flags will likely have to disable
all FP optimization if one considers FE_INEXACT and FE_UNDERFLOW.

Every time I try to make up my mind how to improve the situation for
the user I'm only confusing myself :/

Richard.

> --
> Joseph S. Myers
> josmy...@redhat.com
>


Re: [PATCH] expansion: Use __trunchfbf2 calls rather than __extendhfbf2 [PR114907]

2024-05-07 Thread Richard Biener



> Am 07.05.2024 um 18:02 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The HF and BF modes have the same size/precision and neither is
> a subset nor superset of the other.
> So, using either __extendhfbf2 or __trunchfbf2 is weird.
> The expansion apparently emits __extendhfbf2, but on the libgcc side
> we apparently have __trunchfbf2 implemented.
> 
> I think it is easier to switch to using what is available rather than
> adding new entrypoints to libgcc, even alias, because this is backportable.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok - do we have any target patterns that need adjustments?

Thanks,
Richard 

> 2024-05-07  Jakub Jelinek  
> 
>PR middle-end/114907
>* expr.cc (convert_mode_scalar): Use trunc_optab rather than
>sext_optab for HF->BF conversions.
>* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise.
> 
>* gcc.dg/pr114907.c: New test.
> 
> --- gcc/expr.cc.jj2024-04-09 09:29:04.0 +0200
> +++ gcc/expr.cc2024-05-06 13:21:33.933798494 +0200
> @@ -355,8 +355,16 @@ convert_mode_scalar (rtx to, rtx from, i
>  && REAL_MODE_FORMAT (from_mode) == _half_format));
> 
>   if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
> -/* Conversion between decimal float and binary float, same size.  */
> -tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
> +{
> +  if (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
> +  && REAL_MODE_FORMAT (from_mode) == _half_format)
> +/* libgcc implements just __trunchfbf2, not __extendhfbf2.  */
> +tab = trunc_optab;
> +  else
> +/* Conversion between decimal float and binary float, same
> +   size.  */
> +tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
> +}
>   else if (GET_MODE_PRECISION (from_mode) < GET_MODE_PRECISION (to_mode))
>tab = sext_optab;
>   else
> --- gcc/optabs-libfuncs.cc.jj2024-01-03 11:51:31.739728303 +0100
> +++ gcc/optabs-libfuncs.cc2024-05-06 15:50:21.611027802 +0200
> @@ -589,7 +589,9 @@ gen_trunc_conv_libfunc (convert_optab ta
>   if (GET_MODE_CLASS (float_tmode) != GET_MODE_CLASS (float_fmode))
> gen_interclass_conv_libfunc (tab, opname, float_tmode, float_fmode);
> 
> -  if (GET_MODE_PRECISION (float_fmode) <= GET_MODE_PRECISION (float_tmode))
> +  if (GET_MODE_PRECISION (float_fmode) <= GET_MODE_PRECISION (float_tmode)
> +  && (REAL_MODE_FORMAT (float_tmode) != _bfloat_half_format
> +  || REAL_MODE_FORMAT (float_fmode) != _half_format))
> return;
> 
>   if (GET_MODE_CLASS (float_tmode) == GET_MODE_CLASS (float_fmode))
> --- gcc/testsuite/gcc.dg/pr114907.c.jj2024-05-06 15:59:08.734958523 +0200
> +++ gcc/testsuite/gcc.dg/pr114907.c2024-05-06 16:02:38.914139829 +0200
> @@ -0,0 +1,27 @@
> +/* PR middle-end/114907 */
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-add-options float16 } */
> +/* { dg-require-effective-target float16_runtime } */
> +/* { dg-add-options bfloat16 } */
> +/* { dg-require-effective-target bfloat16_runtime } */
> +
> +__attribute__((noipa)) _Float16
> +foo (__bf16 x)
> +{
> +  return (_Float16) x;
> +}
> +
> +__attribute__((noipa)) __bf16
> +bar (_Float16 x)
> +{
> +  return (__bf16) x;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (11.125bf16) != 11.125f16
> +  || bar (11.125f16) != 11.125bf16)
> +__builtin_abort ();
> +}
> 
>Jakub
> 


Re: [PATCH] tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]

2024-05-07 Thread Richard Biener



> Am 07.05.2024 um 17:54 schrieb Jakub Jelinek :
> 
> Hi!
> 
> In r9-5742 we've started allowing to inline always_inline functions into
> functions which have disabled e.g. address sanitization even when the
> always_inline function is implicitly from command line options sanitized.
> 
> This mostly works fine because most of the asan instrumentation is done only
> late after ipa, but as the following testcase the .ASAN_MARK ifn calls
> gimplifier adds can result in ICEs.
> 
> Fixed by dropping those during inlining, similarly to how we drop
> .TSAN_FUNC_EXIT calls.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-05-07  Jakub Jelinek  
> 
>PR sanitizer/114956
>* tree-inline.cc: Include asan.h.
>(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
>sanitization disabled.
> 
>* gcc.dg/asan/pr114956.c: New test.
> 
> --- gcc/tree-inline.cc.jj2024-05-03 09:44:21.199055899 +0200
> +++ gcc/tree-inline.cc2024-05-06 10:45:37.231349328 +0200
> @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.
> #include "symbol-summary.h"
> #include "symtab-thunks.h"
> #include "symtab-clones.h"
> +#include "asan.h"
> 
> /* I'm not real happy about this, but we need to handle gimple and
>non-gimple trees.  */
> @@ -2226,13 +2227,26 @@ copy_bb (copy_body_data *id, basic_block
>}
>  else if (call_stmt
>   && id->call_stmt
> -   && gimple_call_internal_p (stmt)
> -   && gimple_call_internal_fn (stmt) == IFN_TSAN_FUNC_EXIT)
> -{
> -  /* Drop TSAN_FUNC_EXIT () internal calls during inlining.  */
> -  gsi_remove (_gsi, false);
> -  continue;
> -}
> +   && gimple_call_internal_p (stmt))
> +switch (gimple_call_internal_fn (stmt))
> +  {
> +  case IFN_TSAN_FUNC_EXIT:
> +/* Drop .TSAN_FUNC_EXIT () internal calls during inlining.  */
> +gsi_remove (_gsi, false);
> +continue;
> +  case IFN_ASAN_MARK:
> +/* Drop .ASAN_MARK internal calls during inlining into
> +   no_sanitize functions.  */
> +if (!sanitize_flags_p (SANITIZE_ADDRESS, id->dst_fn)
> +&& !sanitize_flags_p (SANITIZE_HWADDRESS, id->dst_fn))
> +  {
> +gsi_remove (_gsi, false);
> +continue;
> +  }
> +break;
> +  default:
> +break;
> +  }
> 
>  /* Statements produced by inlining can be unfolded, especially
> when we constant propagated some operands.  We can't fold
> --- gcc/testsuite/gcc.dg/asan/pr114956.c.jj2024-05-06 10:54:52.601892840 
> +0200
> +++ gcc/testsuite/gcc.dg/asan/pr114956.c2024-05-06 10:54:33.920143734 
> +0200
> @@ -0,0 +1,26 @@
> +/* PR sanitizer/114956 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fsanitize=address,null" } */
> +
> +int **a;
> +void qux (int *);
> +
> +__attribute__((always_inline)) static inline int *
> +foo (void)
> +{
> +  int b[1];
> +  qux (b);
> +  return a[1];
> +}
> +
> +__attribute__((no_sanitize_address)) void
> +bar (void)
> +{
> +  *a = foo ();
> +}
> +
> +void
> +baz (void)
> +{
> +  bar ();
> +}
> 
>Jakub
> 


[PATCH] Fix guard for IDF pruning by dominator

2024-05-07 Thread Richard Biener
When insert_updated_phi_nodes_for tries to skip pruning the IDF to
blocks dominated by the nearest common dominator of the set of
definition blocks it compares against ENTRY_BLOCK but that's never
going to be the common dominator, instead it will be at most its single
successor.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-into-ssa.cc (insert_updated_phi_nodes_for): Skip
pruning when the nearest common dominator is the successor
of ENTRY_BLOCK.
---
 gcc/tree-into-ssa.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 705e4119ba3..858c3840475 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -3262,7 +3262,7 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 common dominator of all the definition blocks.  */
  entry = nearest_common_dominator_for_set (CDI_DOMINATORS,
db->def_blocks);
- if (entry != ENTRY_BLOCK_PTR_FOR_FN (cfun))
+ if (entry != single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
EXECUTE_IF_SET_IN_BITMAP (idf, 0, i, bi)
  if (BASIC_BLOCK_FOR_FN (cfun, i) != entry
  && dominated_by_p (CDI_DOMINATORS,
-- 
2.35.3


[PATCH] Avoid re-allocating vector

2024-05-07 Thread Richard Biener
The following avoids re-allocating the var map BB vector by
pre-allocating it to the exact size needed when operating on the
whole function.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-ssa-live.cc (init_var_map): Pre-allocate vec_bbs vector
to the correct size and use quick_push.
---
 gcc/tree-ssa-live.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
index fa6be2fced3..e6ae551a457 100644
--- a/gcc/tree-ssa-live.cc
+++ b/gcc/tree-ssa-live.cc
@@ -113,8 +113,10 @@ init_var_map (int size, class loop *loop, bitmap bitint)
   map->outofssa_p = bitint == NULL;
   map->bitint = bitint;
   basic_block bb;
+  map->vec_bbs.reserve_exact (n_basic_blocks_for_fn (cfun)
+ - NUM_FIXED_BLOCKS);
   FOR_EACH_BB_FN (bb, cfun)
-   map->vec_bbs.safe_push (bb);
+   map->vec_bbs.quick_push (bb);
 }
   return map;
 }
-- 
2.35.3


[PATCH] Fix block index check in insert_updated_phi_nodes_for

2024-05-07 Thread Richard Biener
This replaces a >= 0 block index check with the appropriate NUM_FIXED_BLOCKs,
the check is from times ENTRY_BLOCK was negative.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-into-ssa.cc (insert_updated_phi_nodes_for): Fix block
index check.
---
 gcc/tree-into-ssa.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 7642baaabae..3732c269ca3 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -3305,7 +3305,7 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 
  mark_block_for_update (bb);
  FOR_EACH_EDGE (e, ei, bb->preds)
-   if (e->src->index >= 0)
+   if (e->src->index >= NUM_FIXED_BLOCKS)
  mark_block_for_update (e->src);
}
 
-- 
2.35.3


[PATCH] middle-end/27800 - avoid unnecessary temporary during gimplification

2024-05-07 Thread Richard Biener
This avoids a tempoary when gimplifying reg = a ? b : c, re-using
the LHS of an assignment if that's a register.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR middle-end/27800
* gimplify.cc (gimplify_modify_expr_rhs): For a COND_EXPR
avoid a temporary from gimplify_cond_expr when the LHS is
a register by pushing the assignment into the COND_EXPR arms.

* gcc.dg/pr27800.c: New testcase.
---
 gcc/gimplify.cc|  7 +--
 gcc/testsuite/gcc.dg/pr27800.c | 11 +++
 2 files changed, 16 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr27800.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 763843f45a7..26e96ada4c7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -6082,8 +6082,11 @@ gimplify_modify_expr_rhs (tree *expr_p, tree *from_p, 
tree *to_p,
  /* If we're assigning to a non-register type, push the assignment
 down into the branches.  This is mandatory for ADDRESSABLE types,
 since we cannot generate temporaries for such, but it saves a
-copy in other cases as well.  */
- if (!is_gimple_reg_type (TREE_TYPE (*from_p)))
+copy in other cases as well.
+Also avoid an extra temporary and copy when assigning to
+a register.  */
+ if (!is_gimple_reg_type (TREE_TYPE (*from_p))
+ || (is_gimple_reg (*to_p) && !gimplify_ctxp->allow_rhs_cond_expr))
{
  /* This code should mirror the code in gimplify_cond_expr. */
  enum tree_code code = TREE_CODE (*expr_p);
diff --git a/gcc/testsuite/gcc.dg/pr27800.c b/gcc/testsuite/gcc.dg/pr27800.c
new file mode 100644
index 000..e92ebc22e6f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr27800.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-gimple" } */
+
+int iii (int a, int b, int c)
+{
+  return a ? b : c;
+}
+
+/* Verify we end up with two assignments and not an extra copy
+   resulting from another temporary generated from gimplify_cond_expr.  */
+/* { dg-final { scan-tree-dump-times " = " 2 "gimple" } } */
-- 
2.35.3


[PATCH] Remove redundant check

2024-05-07 Thread Richard Biener
operand_equal_p already has checking code to verify the hash
is equal, avoid doing that again in gimplify_hasher::equal.

Re-bootstrap & regtest running on x86_64-unknown-linux-gnu.

* gimplify.cc (gimplify_hasher::equal): Remove redundant
checking.
---
 gcc/gimplify.cc | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index a6573a498d9..26e96ada4c7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -19603,9 +19603,5 @@ gimplify_hasher::equal (const elt_t *p1, const elt_t 
*p2)
   if (!operand_equal_p (t1, t2, 0))
 return false;
 
-  /* Only allow them to compare equal if they also hash equal; otherwise
- results are nondeterminate, and we fail bootstrap comparison.  */
-  gcc_checking_assert (hash (p1) == hash (p2));
-
   return true;
 }
-- 
2.35.3


Re: [PATCH] middle-end/114931 - type_hash_canon and structual equality types

2024-05-07 Thread Richard Biener
On Mon, 6 May 2024, Martin Uecker wrote:

> Am Montag, dem 06.05.2024 um 11:07 +0200 schrieb Richard Biener:
> > On Mon, 6 May 2024, Martin Uecker wrote:
> > 
> > > Am Montag, dem 06.05.2024 um 09:00 +0200 schrieb Richard Biener:
> > > > On Sat, 4 May 2024, Martin Uecker wrote:
> > > > 
> > > > > Am Freitag, dem 03.05.2024 um 21:16 +0200 schrieb Jakub Jelinek:
> > > > > > > On Fri, May 03, 2024 at 09:11:20PM +0200, Martin Uecker wrote:
> > > > > > > > > > > TYPE_CANONICAL as used by the middle-end cannot express 
> > > > > > > > > > > this but
> > > > > > > > > 
> > > > > > > > > Hm. so how does it work now for arrays?
> > > > > > > 
> > > > > > > Do you have a testcase which doesn't work correctly with the 
> > > > > > > arrays?
> > > > > 
> > > > > I am mostly trying to understand better how this works. But
> > > > > if I am not mistaken, the following example would indeed
> > > > > indicate that we do incorrect aliasing decisions for types
> > > > > derived from arrays:
> > > > > 
> > > > > https://godbolt.org/z/rTsE3PhKc
> > > > 
> > > > This example is about pointer-to-array types, int (*)[2] and
> > > > int (*)[1] are supposed to be compatible as in receive the same alias
> > > > set. 
> > > 
> > > In C, char (*)[2] and char (*)[1] are not compatible. But with
> > > COMPAT set, the example operates^1 with char (*)[] and char (*)[1]
> > > which are compatible.  If we form equivalence classes, then
> > > all three types would need to be treated as equivalent. 
> > > 
> > > ^1 Actually, pointer to functions returning pointers
> > > to arrays. Probably this example can still be simplified...
> > > 
> > > >  This is ensured by get_alias_set POINTER_TYPE_P handling,
> > > > the alias set is supposed to be the same as that of int *.  It seems
> > > > we do restrict the handling a bit, the code does
> > > > 
> > > >   /* Unnest all pointers and references.
> > > >  We also want to make pointer to array/vector equivalent to 
> > > > pointer to
> > > >  its element (see the reasoning above). Skip all those types, 
> > > > too.  
> > > > */
> > > >   for (p = t; POINTER_TYPE_P (p)
> > > >|| (TREE_CODE (p) == ARRAY_TYPE
> > > >&& (!TYPE_NONALIASED_COMPONENT (p)
> > > >|| !COMPLETE_TYPE_P (p)
> > > >|| TYPE_STRUCTURAL_EQUALITY_P (p)))
> > > >|| TREE_CODE (p) == VECTOR_TYPE;
> > > >p = TREE_TYPE (p))
> > > > 
> > > > where the comment doesn't exactly match the code - but C should
> > > > never have TYPE_NONALIASED_COMPONENT (p).
> > > > 
> > > > But maybe I misread the example or it goes wrong elsewhere.
> > > 
> > > If I am not confusing myself too much, the example shows that
> > > aliasing analysis treats the the types as incompatible in
> > > both cases, because it does not reload *a with -O2. 
> > > 
> > > For char (*)[1] and char (*)[2] this would be correct (but an
> > > implementation exploiting this would need to do structural
> > > comparisons and not equivalence classes) but for 
> > > char (*)[2] and char (*)[] it is not.
> > 
> > Oh, these are function pointers, so it's about the alias set of
> > a pointer to FUNCTION_TYPE.  I don't see any particular code
> > trying to make char[] * (*)() and char[1] *(*)() inter-operate
> > for TBAA iff the FUNCTION_TYPEs themselves are not having the
> > same TYPE_CANONICAL.
> > 
> > Can you open a bugreport and please point to the relevant parts
> > of the C standard that tells how pointer-to FUNCTION_TYPE TBAA
> > is supposed to work?
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114959

I have now pushed the patch after consulting with Jakub.

Richard.

> Martin
> > 
> 
> > Thanks,
> > Richard.
> > 
> > > Martin
> > > 
> > > 
> > > > 
> > > > Richard.
> > > > 
> > > > > Martin
> > > > > 
> > > > > > > 
> > > > > > > E.g. same_type_for_tbaa has
> > > > > > >

Re: [middle-end PATCH] Constant fold {-1,-1} << 1 in simplify-rtx.cc

2024-05-07 Thread Richard Biener
On Fri, Jan 26, 2024 at 7:26 PM Roger Sayle  wrote:
>
>
> This patch addresses a missed optimization opportunity in the RTL
> optimization passes.  The function simplify_const_binary_operation
> will constant fold binary operators with two CONST_INT operands,
> and those with two CONST_VECTOR operands, but is missing compile-time
> evaluation of binary operators with a CONST_VECTOR and a CONST_INT,
> such as vector shifts and rotates.
>
> My first version of this patch didn't contain a switch statement to
> explicitly check for valid binary opcodes, which bootstrapped and
> regression tested fine, but by paranoia has got the better of me,
> so this version now checks that VEC_SELECT or some funky (future)
> rtx_code doesn't cause problems.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline (in stage 1)?

OK.

Thanks,
Richard.

>
> 2024-01-26  Roger Sayle  
>
> gcc/ChangeLog
> * simplify-rtx.cc (simplify_const_binary_operation): Constant
> fold binary operations where the LHS is CONST_VECTOR and the
> RHS is CONST_INT (or CONST_DOUBLE) such as vector shifts.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] tree-optimization/110490 - bitcount for narrow modes

2024-05-07 Thread Richard Biener
On Tue, May 7, 2024 at 10:58 AM Stefan Schulze Frielinghaus
 wrote:
>
> Ping.  Ok for mainline?

OK.

Thanks,
Richard.

> On Thu, Apr 25, 2024 at 09:26:45AM +0200, Stefan Schulze Frielinghaus wrote:
> > Bitcount operations popcount, clz, and ctz are emulated for narrow modes
> > in case an operation is only supported for wider modes.  Beside that ctz
> > may be emulated via clz in expand_ctz.  Reflect this in
> > expression_expensive_p.
> >
> > I considered the emulation of ctz via clz as not expensive since this
> > basically reduces to ctz (x) = c - (clz (x & ~x)) where c is the mode
> > precision minus 1 which should be faster than a loop.
> >
> > Bootstrapped and regtested on x86_64 and s390.  Though, this is probably
> > stage1 material?
> >
> > gcc/ChangeLog:
> >
> >   PR tree-optimization/110490
> >   * tree-scalar-evolution.cc (expression_expensive_p): Also
> >   consider mode widening for popcount, clz, and ctz.
> > ---
> >  gcc/tree-scalar-evolution.cc | 23 +++
> >  1 file changed, 23 insertions(+)
> >
> > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > index b0a5e09a77c..622c7246c1b 100644
> > --- a/gcc/tree-scalar-evolution.cc
> > +++ b/gcc/tree-scalar-evolution.cc
> > @@ -3458,6 +3458,28 @@ bitcount_call:
> > && (optab_handler (optab, word_mode)
> > != CODE_FOR_nothing))
> > break;
> > +   /* If popcount is available for a wider mode, we emulate the
> > +  operation for a narrow mode by first zero-extending the value
> > +  and then computing popcount in the wider mode.  Analogue for
> > +  ctz.  For clz we do the same except that we additionally have
> > +  to subtract the difference of the mode precisions from the
> > +  result.  */
> > +   if (is_a  (mode, _mode))
> > + {
> > +   machine_mode wider_mode_iter;
> > +   FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
> > + if (optab_handler (optab, wider_mode_iter)
> > + != CODE_FOR_nothing)
> > +   goto check_call_args;
> > +   /* Operation ctz may be emulated via clz in expand_ctz.  */
> > +   if (optab == ctz_optab)
> > + {
> > +   FOR_EACH_WIDER_MODE_FROM (wider_mode_iter, mode)
> > + if (optab_handler (clz_optab, wider_mode_iter)
> > + != CODE_FOR_nothing)
> > +   goto check_call_args;
> > + }
> > + }
> > return true;
> >   }
> > break;
> > @@ -3469,6 +3491,7 @@ bitcount_call:
> > break;
> >   }
> >
> > +check_call_args:
> >FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
> >   if (expression_expensive_p (arg, cond_overflow_p, cache, op_cost))
> > return true;
> > --
> > 2.44.0
> >


[PATCH] Use unsigned for stack var indexes during RTL expansion

2024-05-07 Thread Richard Biener
We're currently using size_t but at the same time storing them into
bitmaps which only support unsigned int index.  The following makes
it unsigned int throughout, saving memory as well.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

* cfgexpand.cc (stack_var::representative): Use 'unsigned'
for stack var indexes instead of 'size_t'.
(stack_var::next): Likewise.
(EOC): Likewise.
(stack_vars_alloc): Likewise.
(stack_vars_num): Likewise.
(decl_to_stack_part): Likewise.
(stack_vars_sorted): Likewise.
(add_stack_var): Likewise.
(add_stack_var_conflict): Likewise.
(stack_var_conflict_p): Likewise.
(visit_op): Likewise.
(visit_conflict): Likewise.
(add_scope_conflicts_1): Likewise.
(stack_var_cmp): Likewise.
(part_hashmap): Likewise.
(update_alias_info_with_stack_vars): Likewise.
(union_stack_vars): Likewise.
(partition_stack_vars): Likewise.
(dump_stack_var_partition): Likewise.
(expand_stack_vars): Likewise.
(account_stack_vars): Likewise.
(stack_protect_decl_phase_1): Likewise.
(stack_protect_decl_phase_2): Likewise.
(asan_decl_phase_3): Likewise.
(init_vars_expansion): Likewise.
(estimated_stack_frame_size): Likewise.
---
 gcc/cfgexpand.cc | 75 
 1 file changed, 37 insertions(+), 38 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index afee064aa15..557cb28733b 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -320,22 +320,22 @@ public:
   unsigned int alignb;
 
   /* The partition representative.  */
-  size_t representative;
+  unsigned representative;
 
   /* The next stack variable in the partition, or EOC.  */
-  size_t next;
+  unsigned next;
 
   /* The numbers of conflicting stack variables.  */
   bitmap conflicts;
 };
 
-#define EOC  ((size_t)-1)
+#define EOC  ((unsigned)-1)
 
 /* We have an array of such objects while deciding allocation.  */
 static class stack_var *stack_vars;
-static size_t stack_vars_alloc;
-static size_t stack_vars_num;
-static hash_map *decl_to_stack_part;
+static unsigned stack_vars_alloc;
+static unsigned stack_vars_num;
+static hash_map *decl_to_stack_part;
 
 /* Conflict bitmaps go on this obstack.  This allows us to destroy
all of them in one big sweep.  */
@@ -343,7 +343,7 @@ static bitmap_obstack stack_var_bitmap_obstack;
 
 /* An array of indices such that stack_vars[stack_vars_sorted[i]].size
is non-decreasing.  */
-static size_t *stack_vars_sorted;
+static unsigned *stack_vars_sorted;
 
 /* The phase of the stack frame.  This is the known misalignment of
virtual_stack_vars_rtx from PREFERRED_STACK_BOUNDARY.  That is,
@@ -457,7 +457,7 @@ add_stack_var (tree decl, bool really_expand)
= XRESIZEVEC (class stack_var, stack_vars, stack_vars_alloc);
 }
   if (!decl_to_stack_part)
-decl_to_stack_part = new hash_map;
+decl_to_stack_part = new hash_map;
 
   v = _vars[stack_vars_num];
   decl_to_stack_part->put (decl, stack_vars_num);
@@ -491,7 +491,7 @@ add_stack_var (tree decl, bool really_expand)
 /* Make the decls associated with luid's X and Y conflict.  */
 
 static void
-add_stack_var_conflict (size_t x, size_t y)
+add_stack_var_conflict (unsigned x, unsigned y)
 {
   class stack_var *a = _vars[x];
   class stack_var *b = _vars[y];
@@ -508,7 +508,7 @@ add_stack_var_conflict (size_t x, size_t y)
 /* Check whether the decls associated with luid's X and Y conflict.  */
 
 static bool
-stack_var_conflict_p (size_t x, size_t y)
+stack_var_conflict_p (unsigned x, unsigned y)
 {
   class stack_var *a = _vars[x];
   class stack_var *b = _vars[y];
@@ -537,7 +537,7 @@ visit_op (gimple *, tree op, tree, void *data)
   && DECL_P (op)
   && DECL_RTL_IF_SET (op) == pc_rtx)
 {
-  size_t *v = decl_to_stack_part->get (op);
+  unsigned *v = decl_to_stack_part->get (op);
   if (v)
bitmap_set_bit (active, *v);
 }
@@ -557,10 +557,10 @@ visit_conflict (gimple *, tree op, tree, void *data)
   && DECL_P (op)
   && DECL_RTL_IF_SET (op) == pc_rtx)
 {
-  size_t *v = decl_to_stack_part->get (op);
+  unsigned *v = decl_to_stack_part->get (op);
   if (v && bitmap_set_bit (active, *v))
{
- size_t num = *v;
+ unsigned num = *v;
  bitmap_iterator bi;
  unsigned i;
  gcc_assert (num < stack_vars_num);
@@ -627,7 +627,7 @@ add_scope_conflicts_1 (basic_block bb, bitmap work, bool 
for_conflict)
   if (gimple_clobber_p (stmt))
{
  tree lhs = gimple_assign_lhs (stmt);
- size_t *v;
+ unsigned *v;
  /* Nested function lowering might introduce LHSs
 that are COMPONENT_REFs.  */
  if (!VAR_P (lhs))
@@ -743,8 +743,8 @@ add_scope_conflicts (void)
 static int
 stack_var_cmp (const void *a, const void *b)
 {

GCC 14.1.1 Status Report (2024-05-07)

2024-05-07 Thread Richard Biener
Status
==

The GCC 14.1 release tarballs have been created, the releases/gcc-14
branch is open again for regression and documentation bugfixing.
GCC 14.2 can be expected in about two months unless something serious
changes the plans.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  605-   1
P3   72+  15 
P4  217-   2
P5   25   
---   ---
Total P1-P3 677+  14
Total   919+  12


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2024-April/243823.html


Re: [PATCH] Mention that some options are turned on by `-Ofast` in their descriptions [PR97263]

2024-05-07 Thread Richard Biener
On Mon, May 6, 2024 at 11:28 PM Andrew Pinski  wrote:
>
> Like was done for -ffast-math in r0-105946-ga570fc16fa8056, we should
> document that -Ofast enables -fmath-errno, -funsafe-math-optimizations,
> -finite-math-only, -fno-trapping-math in their documentation.
>
> Note this changes the stronger "must not" to be "is not" for 
> -fno-trapping-math
> since we do enable it for -Ofast already.
>
> OK?

OK

> gcc/ChangeLog:
>
> PR middle-end/97263
> * doc/invoke.texi(fmath-errno): Document it is turned on
> with -Ofast.
> (funsafe-math-optimizations): Likewise.
> (ffinite-math-only): Likewise.
> (fno-trapping-math): Likewise and use less strong language.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/doc/invoke.texi | 41 ++---
>  1 file changed, 22 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 9456ced468a..14ff4d25da7 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14996,11 +14996,12 @@ with a single instruction, e.g., @code{sqrt}.  A 
> program that relies on
>  IEEE exceptions for math error handling may want to use this flag
>  for speed while maintaining IEEE arithmetic compatibility.
>
> -This option is not turned on by any @option{-O} option since
> -it can result in incorrect output for programs that depend on
> -an exact implementation of IEEE or ISO rules/specifications for
> -math functions. It may, however, yield faster code for programs
> -that do not require the guarantees of these specifications.
> +This option is not turned on by any @option{-O} option  besides
> +@option{-Ofast} since it can result in incorrect output for
> +programs that depend on an exact implementation of IEEE or
> +ISO rules/specifications for math functions. It may, however,
> +yield faster code for programs that do not require the guarantees
> +of these specifications.
>
>  The default is @option{-fmath-errno}.
>
> @@ -15017,11 +15018,12 @@ ANSI standards.  When used at link time, it may 
> include libraries
>  or startup files that change the default FPU control word or other
>  similar optimizations.
>
> -This option is not turned on by any @option{-O} option since
> -it can result in incorrect output for programs that depend on
> -an exact implementation of IEEE or ISO rules/specifications for
> -math functions. It may, however, yield faster code for programs
> -that do not require the guarantees of these specifications.
> +This option is not turned on by any @option{-O} option besides
> +@option{-Ofast} since it can result in incorrect output
> +for programs that depend on an exact implementation of IEEE
> +or ISO rules/specifications for math functions. It may, however,
> +yield faster code for programs that do not require the guarantees
> +of these specifications.
>  Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
>  @option{-fassociative-math} and @option{-freciprocal-math}.
>
> @@ -15061,11 +15063,12 @@ The default is @option{-fno-reciprocal-math}.
>  Allow optimizations for floating-point arithmetic that assume
>  that arguments and results are not NaNs or +-Infs.
>
> -This option is not turned on by any @option{-O} option since
> -it can result in incorrect output for programs that depend on
> -an exact implementation of IEEE or ISO rules/specifications for
> -math functions. It may, however, yield faster code for programs
> -that do not require the guarantees of these specifications.
> +This option is not turned on by any @option{-O} option besides
> +@option{-Ofast} since it can result in incorrect output
> +for programs that depend on an exact implementation of IEEE or
> +ISO rules/specifications for math functions. It may, however,
> +yield faster code for programs that do not require the guarantees
> +of these specifications.
>
>  The default is @option{-fno-finite-math-only}.
>
> @@ -15089,10 +15092,10 @@ underflow, inexact result and invalid operation.  
> This option requires
>  that @option{-fno-signaling-nans} be in effect.  Setting this option may
>  allow faster code if one relies on ``non-stop'' IEEE arithmetic, for example.
>
> -This option should never be turned on by any @option{-O} option since
> -it can result in incorrect output for programs that depend on
> -an exact implementation of IEEE or ISO rules/specifications for
> -math functions.
> +This option is not turned on by any @option{-O} option besides
> +@option{-Ofast} since it can result in incorrect output for programs
> +that depend on an exact implementation of IEEE or ISO rules/specifications
> +for math functions.
>
>  The default is @option{-ftrapping-math}.
>
> --
> 2.43.0
>


[PATCH] tree-optimization/100923 - re-do VN with contextual PTA info fix

2024-05-06 Thread Richard Biener
The following implements the gist of the PR100923 fix in a leaner
(and more complete) way by realizing that all ao_ref_init_from_vn_reference
uses need to have an SSA name in the base valueized with availability
in mind.  Instead of re-valueizing the whole chain of operands we can
simply only and always valueize the SSA name we put in the base.

This handles also two omitted places in vn_reference_lookup_3.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push later.

Richard.

PR tree-optimization/100923
* tree-ssa-sccvn.cc (ao_ref_init_from_vn_reference): Valueize
base SSA_NAME.
(vn_reference_lookup_3): Adjust vn_context_bb around calls
to ao_ref_init_from_vn_reference.
(vn_reference_lookup_pieces): Revert original PR100923 fix.
(vn_reference_lookup): Likewise.
---
 gcc/tree-ssa-sccvn.cc | 58 +++
 1 file changed, 25 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index fbbfa557833..726e9d88b8f 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -1201,11 +1201,17 @@ ao_ref_init_from_vn_reference (ao_ref *ref,
case STRING_CST:
  /* This can show up in ARRAY_REF bases.  */
case INTEGER_CST:
-   case SSA_NAME:
  *op0_p = op->op0;
  op0_p = NULL;
  break;
 
+   case SSA_NAME:
+ /* SSA names we have to get at one available since it contains
+flow-sensitive info.  */
+ *op0_p = vn_valueize (op->op0);
+ op0_p = NULL;
+ break;
+
/* And now the usual component-reference style ops.  */
case BIT_FIELD_REF:
  offset += wi::to_poly_offset (op->op1);
@@ -2725,7 +2731,6 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
  copy_reference_ops_from_ref (lhs, _ops);
  valueize_refs_1 (_ops, _anything, true);
}
-  vn_context_bb = saved_rpo_bb;
   ao_ref_init (_ref, lhs);
   lhs_ref_ok = true;
   if (valueized_anything
@@ -2734,9 +2739,11 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
ao_ref_base_alias_set (_ref), TREE_TYPE (lhs), lhs_ops)
  && !refs_may_alias_p_1 (ref, _ref, data->tbaa_p))
{
+ vn_context_bb = saved_rpo_bb;
  *disambiguate_only = TR_VALUEIZE_AND_DISAMBIGUATE;
  return NULL;
}
+  vn_context_bb = saved_rpo_bb;
 
   /* When the def is a CLOBBER we can optimistically disambiguate
 against it since any overlap it would be undefined behavior.
@@ -3634,13 +3641,19 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
   /* Adjust *ref from the new operands.  */
   ao_ref rhs1_ref;
   ao_ref_init (_ref, rhs1);
+  basic_block saved_rpo_bb = vn_context_bb;
+  vn_context_bb = gimple_bb (def_stmt);
   if (!ao_ref_init_from_vn_reference (,
  force_no_tbaa ? 0
  : ao_ref_alias_set (_ref),
  force_no_tbaa ? 0
  : ao_ref_base_alias_set (_ref),
  vr->type, vr->operands))
-   return (void *)-1;
+   {
+ vn_context_bb = saved_rpo_bb;
+ return (void *)-1;
+   }
+  vn_context_bb = saved_rpo_bb;
   /* This can happen with bitfields.  */
   if (maybe_ne (ref->size, r.size))
{
@@ -3839,8 +3852,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
return data->finish (0, 0, val);
 
   /* Adjust *ref from the new operands.  */
+  basic_block saved_rpo_bb = vn_context_bb;
+  vn_context_bb = gimple_bb (def_stmt);
   if (!ao_ref_init_from_vn_reference (, 0, 0, vr->type, vr->operands))
-   return (void *)-1;
+   {
+ vn_context_bb = saved_rpo_bb;
+ return (void *)-1;
+   }
+  vn_context_bb = saved_rpo_bb;
   /* This can happen with bitfields.  */
   if (maybe_ne (ref->size, r.size))
return (void *)-1;
@@ -3928,31 +3947,13 @@ vn_reference_lookup_pieces (tree vuse, alias_set_type 
set,
   unsigned limit = param_sccvn_max_alias_queries_per_access;
   vn_walk_cb_data data (, NULL_TREE, NULL, kind, true, NULL_TREE,
false);
-  vec ops_for_ref;
-  if (!valueized_p)
-   ops_for_ref = vr1.operands;
-  else
-   {
- /* For ao_ref_from_mem we have to ensure only available SSA names
-end up in base and the only convenient way to make this work
-for PRE is to re-valueize with that in mind.  */
- ops_for_ref.create (operands.length ());
- ops_for_ref.quick_grow (operands.length ());
- memcpy (ops_for_ref.address (),
- operands.address (),
- sizeof (vn_reference_op_s)
- * operands.length ());
- 

[PATCH] Complete ao_ref_init_from_vn_reference for all refs

2024-05-06 Thread Richard Biener
This makes sure we can create ao_refs from all VN operands we create.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  Will push later.

Richard.

* tree-ssa-sccvn.cc (ao_ref_init_from_vn_reference): Add
TARGET_MEM_REF support.  Handle more bases.
---
 gcc/tree-ssa-sccvn.cc | 51 ---
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 02c3bd5f538..fbbfa557833 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -1148,8 +1148,29 @@ ao_ref_init_from_vn_reference (ao_ref *ref,
 {
   switch (op->opcode)
{
-   /* These may be in the reference ops, but we cannot do anything
-  sensible with them here.  */
+   case CALL_EXPR:
+ return false;
+
+   /* Record the base objects.  */
+   case MEM_REF:
+ *op0_p = build2 (MEM_REF, op->type,
+  NULL_TREE, op->op0);
+ MR_DEPENDENCE_CLIQUE (*op0_p) = op->clique;
+ MR_DEPENDENCE_BASE (*op0_p) = op->base;
+ op0_p = _OPERAND (*op0_p, 0);
+ break;
+
+   case TARGET_MEM_REF:
+ *op0_p = build5 (TARGET_MEM_REF, op->type,
+  NULL_TREE, op->op2, op->op0,
+  op->op1, ops[i+1].op0);
+ MR_DEPENDENCE_CLIQUE (*op0_p) = op->clique;
+ MR_DEPENDENCE_BASE (*op0_p) = op->base;
+ op0_p = _OPERAND (*op0_p, 0);
+ ++i;
+ break;
+
+   /* Unwrap some of the wrapped decls.  */
case ADDR_EXPR:
  /* Apart from ADDR_EXPR arguments to MEM_REF.  */
  if (base != NULL_TREE
@@ -1170,21 +1191,16 @@ ao_ref_init_from_vn_reference (ao_ref *ref,
  break;
}
  /* Fallthru.  */
-   case CALL_EXPR:
- return false;
-
-   /* Record the base objects.  */
-   case MEM_REF:
- *op0_p = build2 (MEM_REF, op->type,
-  NULL_TREE, op->op0);
- MR_DEPENDENCE_CLIQUE (*op0_p) = op->clique;
- MR_DEPENDENCE_BASE (*op0_p) = op->base;
- op0_p = _OPERAND (*op0_p, 0);
- break;
-
-   case VAR_DECL:
case PARM_DECL:
+   case CONST_DECL:
case RESULT_DECL:
+ /* ???  We shouldn't see these, but un-canonicalize what
+copy_reference_ops_from_ref does when visiting MEM_REF.  */
+   case VAR_DECL:
+ /* ???  And for this only have DECL_HARD_REGISTER.  */
+   case STRING_CST:
+ /* This can show up in ARRAY_REF bases.  */
+   case INTEGER_CST:
case SSA_NAME:
  *op0_p = op->op0;
  op0_p = NULL;
@@ -1234,13 +1250,12 @@ ao_ref_init_from_vn_reference (ao_ref *ref,
case VIEW_CONVERT_EXPR:
  break;
 
-   case STRING_CST:
-   case INTEGER_CST:
+   case POLY_INT_CST:
case COMPLEX_CST:
case VECTOR_CST:
case REAL_CST:
+   case FIXED_CST:
case CONSTRUCTOR:
-   case CONST_DECL:
  return false;
 
default:
-- 
2.35.3


[PATCH] tree-optimization/114921 - _Float16 -> __bf16 isn't noop fixup

2024-05-06 Thread Richard Biener
The following further strengthens the check which convert expressions
we allow to vectorize as simple copy by resorting to
tree_nop_conversion_p on the vector components.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/114921
* tree-vect-stmts.cc (vectorizable_assignment): Use
tree_nop_conversion_p to identify converts we can vectorize
with a simple assignment.
---
 gcc/tree-vect-stmts.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 7e571968a59..21e8fe98e44 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5957,15 +5957,15 @@ vectorizable_assignment (vec_info *vinfo,
 
   /* We can handle VIEW_CONVERT conversions that do not change the number
  of elements or the vector size or other conversions when the component
- mode keeps the same.  */
+ types are nop-convertible.  */
   if (!vectype_in
   || maybe_ne (TYPE_VECTOR_SUBPARTS (vectype_in), nunits)
   || (code == VIEW_CONVERT_EXPR
  && maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
   GET_MODE_SIZE (TYPE_MODE (vectype_in
   || (CONVERT_EXPR_CODE_P (code)
- && (TYPE_MODE (TREE_TYPE (vectype))
- != TYPE_MODE (TREE_TYPE (vectype_in)
+ && !tree_nop_conversion_p (TREE_TYPE (vectype),
+TREE_TYPE (vectype_in
 return false;
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype) != VECTOR_BOOLEAN_TYPE_P (vectype_in))
-- 
2.35.3


Re: [PATCH] Driver: Reject output filenames with the same suffixes as source files [PR80182]

2024-05-06 Thread Richard Biener
On Mon, May 6, 2024 at 10:29 AM Peter0x44  wrote:
>
> On Mon May 6, 2024 at 8:14 AM BST, Richard Biener wrote:
> > On Sat, May 4, 2024 at 9:36 PM Peter Damianov  wrote:
> > >
> > > Currently, commands like:
> > > gcc -o file.c -lm
> > > will delete the user's code.
> >
> > Since there's an error from the linker in the end (missing 'main'), I 
> > wonder if
> > the linker can avoid truncating/opening the output file instead?  A trivial
> > solution might be to open a temporary file first and only atomically replace
> > the output file with the temporary file when there were no errors?
> I think this is a great idea! The only concern I have is that I think
> for mingw targets it would be necessary to be careful to append .exe if
> the file has no suffix when moving the temporary file to the output
> file. Maybe some other targets have similar concerns.
> >
> > > This patch checks the suffix of the output, and errors if the output ends 
> > > in
> > > any of the suffixes listed in default_compilers.
> > >
> > > Unfortunately, I couldn't come up with a better heuristic to diagnose 
> > > this case
> > > more specifically, so it is now not possible to directly make executables 
> > > with
> > > said suffixes. I am unsure if any users are depending on this.
> >
> > A way to provide a workaround would be to require the file not existing.  So
> > change the heuristic to only trigger if the output file exists (and is
> > non-empty?).
> I guess this could work, and has a lower chance of breaking anyone
> depending on this behavior, but I think it would still be confusing to
> anyone who did rely on this behavior, since then it wouldn't be allowed
> to overwrite an executable with the ".c" name. If anyone did rely on
> this behavior, their build would succeed once, and then error for every
> subsequent invokation, which would be confusing. It seems to me it is
> not a meaningful improvement.

That's true and the behavior would be confusing.

> With your previous suggestion, this whole heuristic becomes unnecessary
> anyway, so I think I will just forego it.

It of course wouldn't handle the case if there isn't a link error like

gcc -o file -lm -r

but it should still be an improvement.  And yes, I typoed a wrong -o myself
a few times ...

Richard.

> >
> > Richard.
> >
> > > PR driver/80182
> > > * gcc.cc (process_command): fatal_error if the output has the 
> > > suffix of
> > >   a source file.
> > > (have_c): Change type to bool.
> > > (have_O): Change type to bool.
> > > (have_E): Change type to bool.
> > > (have_S): New global variable.
> > > (driver_handle_option): Assign have_S
> > >
> > > Signed-off-by: Peter Damianov 
> > > ---
> > >  gcc/gcc.cc | 29 ++---
> > >  1 file changed, 26 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> > > index 830a4700a87..53169c16460 100644
> > > --- a/gcc/gcc.cc
> > > +++ b/gcc/gcc.cc
> > > @@ -2127,13 +2127,16 @@ static vec at_file_argbuf;
> > >  static bool in_at_file = false;
> > >
> > >  /* Were the options -c, -S or -E passed.  */
> > > -static int have_c = 0;
> > > +static bool have_c = false;
> > >
> > >  /* Was the option -o passed.  */
> > > -static int have_o = 0;
> > > +static bool have_o = false;
> > >
> > >  /* Was the option -E passed.  */
> > > -static int have_E = 0;
> > > +static bool have_E = false;
> > > +
> > > +/* Was the option -S passed.  */
> > > +static bool have_S = false;
> > >
> > >  /* Pointer to output file name passed in with -o. */
> > >  static const char *output_file = 0;
> > > @@ -4593,6 +4596,10 @@ driver_handle_option (struct gcc_options *opts,
> > >have_E = true;
> > >break;
> > >
> > > +case OPT_S:
> > > +  have_S = true;
> > > +  break;
> > > +
> > >  case OPT_x:
> > >spec_lang = arg;
> > >if (!strcmp (spec_lang, "none"))
> > > @@ -5058,6 +5065,22 @@ process_command (unsigned int 
> > > decoded_options_count,
> > >output_file);
> > >  }
> > >
> > > +  /* Reject output file names that have the same suffix as a source
> > > + file. This is to catch mistakes like

Re: [PATCH] sra: Do not leave work for DSE (that it can sometimes not perform)

2024-05-06 Thread Richard Biener
On Fri, 3 May 2024, Martin Jambor wrote:

> Hi,
> 
> when looking again at the g++.dg/tree-ssa/pr109849.C testcase we
> discovered that it generates terrible store-to-load forwarding stalls
> because SRA was leaving behind aggregate loads but all the stores were
> by scalar parts and DSE failed to remove the useless load.  SRA has
> all the knowledge to remove the statement even now, so this small
> patch makes it do so.
> 
> With this patch, the g++.dg/tree-ssa/pr109849.C micro-benchmark runs 9
> times faster (on an AMD EPYC 75F3 machine).
> 
> Bootstrapped and tested on x86_64.  OK for master?

OK.

> Given that the patch is simple but can sometimes have large benefit,
> could it possibly be backported to gcc-14 branch even if it is not a
> regression (at least not in the last decade) in a few weeks?

Sounds reasonable.  We have some more leeway for X.2 releases.

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2024-04-18  Martin Jambor  
> 
>   * tree-sra.cc (sra_modify_assign): Remove the original statement
>   also when dealing with a store to a fully covered aggregate from a
>   non-candidate.
> 
> gcc/testsuite/ChangeLog:
> 
> 2024-04-23  Martin Jambor  
> 
>   * g++.dg/tree-ssa/pr109849.C: Also check that the aggeegate store
>   to cur disappears.
>   * gcc.dg/tree-ssa/ssa-dse-26.c: Instead of relying on DSE,
>   check that the unwanted stores were removed at early SRA time.
> ---
>  gcc/testsuite/g++.dg/tree-ssa/pr109849.C   |  3 ++-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c |  6 +++---
>  gcc/tree-sra.cc| 14 --
>  3 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> index cd348c0f590..d06dbb10482 100644
> --- a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-sra" } */
> +/* { dg-options "-O2 -fdump-tree-sra -fdump-tree-optimized" } */
>  
>  #include 
>  typedef unsigned int uint32_t;
> @@ -29,3 +29,4 @@ main()
>  }
>  
>  /* { dg-final { scan-tree-dump "Created a replacement for stack offset" 
> "sra"} } */
> +/* { dg-final { scan-tree-dump-not "cur = MEM" "optimized"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> index 43152de5616..1d01392c595 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-dse1-details -fno-short-enums 
> -fno-tree-fre" } */
> +/* { dg-options "-O2 -fdump-tree-esra -fno-short-enums -fno-tree-fre" } */
>  /* { dg-skip-if "we want a BIT_FIELD_REF from fold_truth_andor" { ! lp64 } } 
> */
>  /* { dg-skip-if "temporary variable names are not x and y" { 
> mmix-knuth-mmixware } } */
>  
> @@ -31,5 +31,5 @@ constraint_equal (struct constraint a, struct constraint b)
>  && constraint_expr_equal (a.rhs, b.rhs);
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 2 "dse1" } } 
> */
> -/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 2 "dse1" } } 
> */
> +/* { dg-final { scan-tree-dump-not "x = " "esra" } } */
> +/* { dg-final { scan-tree-dump-not "y = " "esra" } } */
> diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> index 32fa28911f2..8040b0c5645 100644
> --- a/gcc/tree-sra.cc
> +++ b/gcc/tree-sra.cc
> @@ -4854,8 +4854,18 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
> *gsi)
>But use the RHS aggregate to load from to expose more
>optimization opportunities.  */
> if (access_has_children_p (lacc))
> - generate_subtree_copies (lacc->first_child, rhs, lacc->offset,
> -  0, 0, gsi, true, true, loc);
> + {
> +       generate_subtree_copies (lacc->first_child, rhs, lacc->offset,
> +0, 0, gsi, true, true, loc);
> +   if (lacc->grp_covered)
> + {
> +   unlink_stmt_vdef (stmt);
> +   gsi_remove (& orig_gsi, true);
> +   release_defs (stmt);
> +   sra_stats.deleted++;
> +   return SRA_AM_REMOVED;
> + }
> + }
>   }
>  
>return SRA_AM_NONE;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] middle-end/114931 - type_hash_canon and structual equality types

2024-05-06 Thread Richard Biener
On Mon, 6 May 2024, Martin Uecker wrote:

> Am Montag, dem 06.05.2024 um 09:00 +0200 schrieb Richard Biener:
> > On Sat, 4 May 2024, Martin Uecker wrote:
> > 
> > > Am Freitag, dem 03.05.2024 um 21:16 +0200 schrieb Jakub Jelinek:
> > > > > On Fri, May 03, 2024 at 09:11:20PM +0200, Martin Uecker wrote:
> > > > > > > > > TYPE_CANONICAL as used by the middle-end cannot express this 
> > > > > > > > > but
> > > > > > > 
> > > > > > > Hm. so how does it work now for arrays?
> > > > > 
> > > > > Do you have a testcase which doesn't work correctly with the arrays?
> > > 
> > > I am mostly trying to understand better how this works. But
> > > if I am not mistaken, the following example would indeed
> > > indicate that we do incorrect aliasing decisions for types
> > > derived from arrays:
> > > 
> > > https://godbolt.org/z/rTsE3PhKc
> > 
> > This example is about pointer-to-array types, int (*)[2] and
> > int (*)[1] are supposed to be compatible as in receive the same alias
> > set. 
> 
> In C, char (*)[2] and char (*)[1] are not compatible. But with
> COMPAT set, the example operates^1 with char (*)[] and char (*)[1]
> which are compatible.  If we form equivalence classes, then
> all three types would need to be treated as equivalent. 
> 
> ^1 Actually, pointer to functions returning pointers
> to arrays. Probably this example can still be simplified...
> 
> >  This is ensured by get_alias_set POINTER_TYPE_P handling,
> > the alias set is supposed to be the same as that of int *.  It seems
> > we do restrict the handling a bit, the code does
> > 
> >   /* Unnest all pointers and references.
> >  We also want to make pointer to array/vector equivalent to 
> > pointer to
> >  its element (see the reasoning above). Skip all those types, too.  
> > */
> >   for (p = t; POINTER_TYPE_P (p)
> >|| (TREE_CODE (p) == ARRAY_TYPE
> >&& (!TYPE_NONALIASED_COMPONENT (p)
> >|| !COMPLETE_TYPE_P (p)
> >|| TYPE_STRUCTURAL_EQUALITY_P (p)))
> >|| TREE_CODE (p) == VECTOR_TYPE;
> >p = TREE_TYPE (p))
> > 
> > where the comment doesn't exactly match the code - but C should
> > never have TYPE_NONALIASED_COMPONENT (p).
> > 
> > But maybe I misread the example or it goes wrong elsewhere.
> 
> If I am not confusing myself too much, the example shows that
> aliasing analysis treats the the types as incompatible in
> both cases, because it does not reload *a with -O2. 
> 
> For char (*)[1] and char (*)[2] this would be correct (but an
> implementation exploiting this would need to do structural
> comparisons and not equivalence classes) but for 
> char (*)[2] and char (*)[] it is not.

Oh, these are function pointers, so it's about the alias set of
a pointer to FUNCTION_TYPE.  I don't see any particular code
trying to make char[] * (*)() and char[1] *(*)() inter-operate
for TBAA iff the FUNCTION_TYPEs themselves are not having the
same TYPE_CANONICAL.

Can you open a bugreport and please point to the relevant parts
of the C standard that tells how pointer-to FUNCTION_TYPE TBAA
is supposed to work?

Thanks,
Richard.

> Martin
> 
> 
> > 
> > Richard.
> > 
> > > Martin
> > > 
> > > > > 
> > > > > E.g. same_type_for_tbaa has
> > > > >   type1 = TYPE_MAIN_VARIANT (type1);
> > > > >   type2 = TYPE_MAIN_VARIANT (type2);
> > > > > 
> > > > >   /* Handle the most common case first.  */
> > > > >   if (type1 == type2)
> > > > > return 1;
> > > > > 
> > > > >   /* If we would have to do structural comparison bail out.  */
> > > > >   if (TYPE_STRUCTURAL_EQUALITY_P (type1)
> > > > >   || TYPE_STRUCTURAL_EQUALITY_P (type2))
> > > > > return -1;
> > > > > 
> > > > >   /* Compare the canonical types.  */
> > > > >   if (TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2))
> > > > > return 1;
> > > > > 
> > > > >   /* ??? Array types are not properly unified in all cases as we have
> > > > >  spurious changes in the index types for example.  Removing this
> > > > >  causes all sorts of problems with the Fortran frontend.  */
> > > > >   if (TREE_CODE (type1) == ARRAY_TYPE
> > > > >   && TREE_CODE (type2) == ARRAY_TYPE)
> > > > > return -1;
> > > > > ...
> > > > > and later compares alias sets and the like.
> > > > > So, even if int[] and int[0] have different TYPE_CANONICAL, they
> > > > > will be considered maybe the same.  Also, guess get_alias_set
> > > > > has some ARRAY_TYPE handling...
> > > > > 
> > > > > Anyway, I think we should just go with Richi's patch.
> > > > > 
> > > > >   Jakub
> > > > > 
> > > 
> > > 
> > > 
> > 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] testsuite: c++: Skip g++.dg/analyzer on Solaris [PR111475]

2024-05-06 Thread Richard Biener
On Sun, 5 May 2024, Rainer Orth wrote:

> Rainer Orth  writes:
> 
> >> On Fri, May 03, 2024 at 09:31:08AM -0400, David Malcolm wrote:
> >>> Jakub, Richi, Rainer: this is a non-trivial change that cleans up
> >>> analyzer C++ testsuite results on Solaris, but has a slight risk of
> >>> affecting analyzer behavior on other targets.  As such, I was thinking
> >>> to hold off on backporting it to GCC 14 until after 14.1 is released.
> >>> Is that a good plan?
> >>
> >> Agreed 14.2 is better target than 14.1 for this, especially if committed
> >> shortly after 14.1 goes out.
> >
> > fully agreed: this is way too risky this close to the 14.1 release.  As
> > a stop-gap measure, one might consider just skipping the C++ analyzer
> > tests on Solaris to avoid the immense number of testsuite failures.
> 
> How about this?
> 
> Almost 1400 C++ analyzer tests FAIL on Solaris.  The patch is too risky
> to apply so close to the GCC 14.1.0 release, so disable the tests on
> Solaris instead to reduce the noise.
> 
> Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
> x86_64-pc-linux-gnu.
> 
> Ok for gcc-14 branch?

OK.

Richard.


Re: [PATCH] Driver: Reject output filenames with the same suffixes as source files [PR80182]

2024-05-06 Thread Richard Biener
On Sat, May 4, 2024 at 9:36 PM Peter Damianov  wrote:
>
> Currently, commands like:
> gcc -o file.c -lm
> will delete the user's code.

Since there's an error from the linker in the end (missing 'main'), I wonder if
the linker can avoid truncating/opening the output file instead?  A trivial
solution might be to open a temporary file first and only atomically replace
the output file with the temporary file when there were no errors?

> This patch checks the suffix of the output, and errors if the output ends in
> any of the suffixes listed in default_compilers.
>
> Unfortunately, I couldn't come up with a better heuristic to diagnose this 
> case
> more specifically, so it is now not possible to directly make executables with
> said suffixes. I am unsure if any users are depending on this.

A way to provide a workaround would be to require the file not existing.  So
change the heuristic to only trigger if the output file exists (and is
non-empty?).

Richard.

> PR driver/80182
> * gcc.cc (process_command): fatal_error if the output has the suffix 
> of
>   a source file.
> (have_c): Change type to bool.
> (have_O): Change type to bool.
> (have_E): Change type to bool.
> (have_S): New global variable.
> (driver_handle_option): Assign have_S
>
> Signed-off-by: Peter Damianov 
> ---
>  gcc/gcc.cc | 29 ++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..53169c16460 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -2127,13 +2127,16 @@ static vec at_file_argbuf;
>  static bool in_at_file = false;
>
>  /* Were the options -c, -S or -E passed.  */
> -static int have_c = 0;
> +static bool have_c = false;
>
>  /* Was the option -o passed.  */
> -static int have_o = 0;
> +static bool have_o = false;
>
>  /* Was the option -E passed.  */
> -static int have_E = 0;
> +static bool have_E = false;
> +
> +/* Was the option -S passed.  */
> +static bool have_S = false;
>
>  /* Pointer to output file name passed in with -o. */
>  static const char *output_file = 0;
> @@ -4593,6 +4596,10 @@ driver_handle_option (struct gcc_options *opts,
>have_E = true;
>break;
>
> +case OPT_S:
> +  have_S = true;
> +  break;
> +
>  case OPT_x:
>spec_lang = arg;
>if (!strcmp (spec_lang, "none"))
> @@ -5058,6 +5065,22 @@ process_command (unsigned int decoded_options_count,
>output_file);
>  }
>
> +  /* Reject output file names that have the same suffix as a source
> + file. This is to catch mistakes like: gcc -o file.c -lm
> + that could delete the user's code. */
> +  if (have_o && output_file != NULL && !have_E && !have_S)
> +{
> +  const char* filename = lbasename(output_file);
> +  const char* suffix = strchr(filename, '.');
> +  if (suffix != NULL)
> +   for (int i = 0; i < n_default_compilers; ++i)
> + if (!strcmp(suffix, default_compilers[i].suffix))
> +   fatal_error (input_location,
> +"output file suffix %qs could be a source file",
> +suffix);
> +}
> +
> +
>if (output_file != NULL && output_file[0] == '\0')
>  fatal_error (input_location, "output filename may not be empty");
>
> --
> 2.39.2
>


Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-06 Thread Richard Biener
On Sat, 4 May 2024, Sebastian Huber wrote:

> On 07.08.23 16:22, Qing Zhao via Gcc-patches wrote:
> > Hi,
> > 
> > This is the 2nd version of the patch.
> > Comparing to the 1st version, the only change is to address Richard's
> > comment on refering a warning option for diagnosing deprecated behavior.
> > 
> > 
> > Okay for committing?
> > 
> > thanks.
> > 
> > Qing
> > 
> > ==
> > 
> > *htdocs/gcc-14/changes.html (Caveats): Add notice about deprecating a C
> > extension about flexible array members.
> > ---
> >   htdocs/gcc-14/changes.html | 13 -
> >   1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> > index dad1ba53..eae25f1a 100644
> > --- a/htdocs/gcc-14/changes.html
> > +++ b/htdocs/gcc-14/changes.html
> > @@ -30,7 +30,18 @@ a work-in-progress.
> >   
> >   Caveats
> >   
> > -  ...
> > +  C:
> > +  Support for the GCC extension, a structure containing a C99 flexible
> > array
> > +  member, or a union containing such a structure, is not the last field
> > of
> > +  another structure, is deprecated. Refer to
> > +  https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html;>
> > +  Zero Length Arrays.
> > +  Any code relying on this extension should be modifed to ensure that
> > +  C99 flexible array members only end up at the ends of structures.
> > +  Please use the warning option
> > +   > href="https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wflex-array-member-not-at-end;>-Wflex-array-member-not-at-end
> > to
> > +  identify all such cases in the source code and modify them.
> > +  
> >   
> 
> I have a question with respect to the static initialization of flexible array
> members. According to the documentation this is supported by GCC:
> 
> https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> 
> "GCC allows static initialization of flexible array members. This is
> equivalent to defining a new structure containing the original structure
> followed by an array of sufficient size to contain the data. E.g. in the
> following, f1 is constructed as if it were declared like f2.
> 
> struct f1 {
>   int x; int y[];
> } f1 = { 1, { 2, 3, 4 } };
> 
> struct f2 {
>   struct f1 f1; int data[3];
> } f2 = { { 1 }, { 2, 3, 4 } };
> "
> 
> However, when I compile this code, I get a warning like this:
> 
> flex-array.c:6:13: warning: structure containing a flexible array member is
> not at the end of another structure [-Wflex-array-member-not-at-end]
> 6 |   struct f1 f1; int data[3];
>   |
> 
> In general, I agree that flexible array members should be at the end, however
> the support for static initialization is quite important from my point of view
> especially for applications for embedded systems. Here, dynamic allocations
> may not be allowed or feasible.

I do not get a diagnostic for this on trunk?  And I agree there shouldn't
be any.

Richard.


Re: [PATCH] middle-end/114931 - type_hash_canon and structual equality types

2024-05-06 Thread Richard Biener
On Sat, 4 May 2024, Martin Uecker wrote:

> Am Freitag, dem 03.05.2024 um 21:16 +0200 schrieb Jakub Jelinek:
> > > On Fri, May 03, 2024 at 09:11:20PM +0200, Martin Uecker wrote:
> > > > > > > TYPE_CANONICAL as used by the middle-end cannot express this but
> > > > > 
> > > > > Hm. so how does it work now for arrays?
> > > 
> > > Do you have a testcase which doesn't work correctly with the arrays?
> 
> I am mostly trying to understand better how this works. But
> if I am not mistaken, the following example would indeed
> indicate that we do incorrect aliasing decisions for types
> derived from arrays:
> 
> https://godbolt.org/z/rTsE3PhKc

This example is about pointer-to-array types, int (*)[2] and
int (*)[1] are supposed to be compatible as in receive the same alias
set.  This is ensured by get_alias_set POINTER_TYPE_P handling,
the alias set is supposed to be the same as that of int *.  It seems
we do restrict the handling a bit, the code does

  /* Unnest all pointers and references.
 We also want to make pointer to array/vector equivalent to 
pointer to
 its element (see the reasoning above). Skip all those types, too.  
*/
  for (p = t; POINTER_TYPE_P (p)
   || (TREE_CODE (p) == ARRAY_TYPE
   && (!TYPE_NONALIASED_COMPONENT (p)
   || !COMPLETE_TYPE_P (p)
   || TYPE_STRUCTURAL_EQUALITY_P (p)))
   || TREE_CODE (p) == VECTOR_TYPE;
   p = TREE_TYPE (p))

where the comment doesn't exactly match the code - but C should
never have TYPE_NONALIASED_COMPONENT (p).

But maybe I misread the example or it goes wrong elsewhere.

Richard.

> Martin
> 
> > > 
> > > E.g. same_type_for_tbaa has
> > >   type1 = TYPE_MAIN_VARIANT (type1);
> > >   type2 = TYPE_MAIN_VARIANT (type2);
> > > 
> > >   /* Handle the most common case first.  */
> > >   if (type1 == type2)
> > > return 1;
> > > 
> > >   /* If we would have to do structural comparison bail out.  */
> > >   if (TYPE_STRUCTURAL_EQUALITY_P (type1)
> > >   || TYPE_STRUCTURAL_EQUALITY_P (type2))
> > > return -1;
> > > 
> > >   /* Compare the canonical types.  */
> > >   if (TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2))
> > > return 1;
> > > 
> > >   /* ??? Array types are not properly unified in all cases as we have
> > >  spurious changes in the index types for example.  Removing this
> > >  causes all sorts of problems with the Fortran frontend.  */
> > >   if (TREE_CODE (type1) == ARRAY_TYPE
> > >   && TREE_CODE (type2) == ARRAY_TYPE)
> > > return -1;
> > > ...
> > > and later compares alias sets and the like.
> > > So, even if int[] and int[0] have different TYPE_CANONICAL, they
> > > will be considered maybe the same.  Also, guess get_alias_set
> > > has some ARRAY_TYPE handling...
> > > 
> > > Anyway, I think we should just go with Richi's patch.
> > > 
> > >   Jakub
> > > 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] middle-end/114931 - type_hash_canon and structual equality types

2024-05-03 Thread Richard Biener



> Am 03.05.2024 um 20:37 schrieb Martin Uecker :
> 
> Am Freitag, dem 03.05.2024 um 20:18 +0200 schrieb Jakub Jelinek:
>>> On Fri, May 03, 2024 at 08:04:18PM +0200, Martin Uecker wrote:
>>> A change that is not optimal but would avoid a lot of trouble is to
>>> only use the tag of the struct for computing a TYPE_CANONICAL, which
>>> could then be set already for incomplete types and never needs to
>>> change again. We would not differentiate between different struct
>>> types with the same tag for aliasing analysis, but in most cases
>>> I would expect different structs to have a different tag.
>> 
>> Having incompatible types have the same TYPE_CANONICAL would lead to wrong
>> code IMHO, while for aliasing purposes that might be conservative (though
>> not sure, the alias set computation is based on what types the element have
>> etc., so if the alias set is computed for say struct S { int s; }; and
>> then the same alias set used for struct S { long long a; double b; union {
>> short c; float d; } c; };, I think nothing good will come out of that),
> 
> The C type systems requires us to form equivalence classes though.
> For example
> 
> int (*r)[1];
> int (*q)[];
> int (*p)[3];
> 
> need to be in the same equivalence class even though r and p are
> not compatible, while at the same time r and q and q and p
> are compatible.

TYPE_CANONICAL as used by the middle-end cannot express this but 
useless_type_conversion_p is directed and has similar behavior.  Note the 
dual-use for TBAA and compatibility was convenient but maybe we have to 
separate both since making the equivalence class for TBAA larger is more 
conservative while for compatibility it’s the other way around…

Richard 

> 
>> but middle-end also uses TYPE_CANONICAL to see if types are the same,
>> say e.g. useless_type_conversion_p says that conversions from one
>> RECORD_TYPE to a different RECORD_TYPE are useless if they have the
>> same TYPE_CANONICAL.
>>  /* For aggregates we rely on TYPE_CANONICAL exclusively and require
>> explicit conversions for types involving to be structurally
>> compared types.  */
>>  else if (AGGREGATE_TYPE_P (inner_type)
>>   && TREE_CODE (inner_type) == TREE_CODE (outer_type))
>>return TYPE_CANONICAL (inner_type)
>>   && TYPE_CANONICAL (inner_type) == TYPE_CANONICAL (outer_type);
>> So, if you have struct S { int s; } and struct S { short a, b; }; and
>> VIEW_CONVERT_EXPR between them, that VIEW_CONVERT_EXPR will be removed
>> as useless, etc.
> 
> Maybe we could limit for purposes of computing TYPE_CANONICAL of derived
> types, e.g. TYPE_CANONICAL of structs stays the same with the transition
> from TYPE_STRUCT_EQUALITY to TYPE_CANONICAL but all the derived types
> remain stable.
> 
> Martin
> 
>> 
>> BTW, the idea of lazily updating TYPE_CANONICAL is basically what I've
>> described as the option to update all the derived types where it would
>> pretty much do that for all TYPE_STRUCTURAL_EQUALITY_P types in the
>> hash table (see if they are derived from the type in question and recompute
>> the TYPE_CANONICAL after recomputing all the TYPE_CANONICAL of its base
>> types), except perhaps even more costly (if the trigger would be some
>> build_array_type/build_function_type/... function is called and found
>> a cached TYPE_STRUCTURAL_EQUALITY_P type).  Note also that
>> TYPE_STRUCTURAL_EQUALITY_P isn't the case just for the C23 types which
>> are marked that way when incomplete and later completed, but by various
>> other cases for types which will be permanently like that, so doing
>> expensive checks each time some build*_type* is called that refers
>> to those would be expensive.
>> 
>>Jakub
>> 
> 


Re: [PATCH] middle-end/114931 - type_hash_canon and structual equality types

2024-05-03 Thread Richard Biener



> Am 03.05.2024 um 17:33 schrieb Martin Uecker :
> 
> Am Freitag, dem 03.05.2024 um 14:13 +0200 schrieb Richard Biener:
>> TYPE_STRUCTURAL_EQUALITY_P is part of our type system so we have
>> to make sure to include that into the type unification done via
>> type_hash_canon.  This requires the flag to be set before querying
>> the hash which is the biggest part of the patch.
> 
> I assume this does not affect structs / unions because they
> do not make this mechanism of type unification (each tagged type
> is a unique type), but only derived types that end up having
> TYPE_STRUCTURAL_EQUALITY_P because they are constructed from
> incomplete structs / unions before TYPE_CANONICAL is set.
> 
> I do not yet understand why this change is needed. Type
> identity should not be affected by setting TYPE_CANONICAL, so
> why do we need to keep such types separate?  I understand that we
> created some inconsistencies, but I do not see why this change
> is needed to fix it.  But I also haven't understood how we ended
> up with a TYPE_CANONICAL having TYPE_STRUCTURAL_EQUALITY_P in
> PR 114931 ...

Because we created the canonical function type before where one of its 
arguments had TYPE_STEUCTURAL_EQUALITY which makes the function type so.

Richard 

> 
> Martin
> 
> 
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.
>> 
>> As said in the PR this merely makes sure to keep individual types
>> consistent with themselves.  We still will have a set of types
>> with TYPE_STRUCTURAL_EQUALITY_P and a set without that might be
>> otherwise identical.  That could be only avoided with changes in
>> the frontend.
>> 
>> OK for trunk?
>> 
>> Thanks,
>> Richard.
>> 
>>PR middle-end/114931
>> gcc/
>>* tree.cc (type_hash_canon_hash): Hash TYPE_STRUCTURAL_EQUALITY_P.
>>(type_cache_hasher::equal): Compare TYPE_STRUCTURAL_EQUALITY_P.
>>(build_array_type_1): Set TYPE_STRUCTURAL_EQUALITY_P before
>>probing with type_hash_canon.
>>(build_function_type): Likewise.
>>(build_method_type_directly): Likewise.
>>(build_offset_type): Likewise.
>>(build_complex_type): Likewise.
>>* attribs.cc (build_type_attribute_qual_variant): Likewise.
>> 
>> gcc/c-family/
>>* c-common.cc (complete_array_type): Set TYPE_STRUCTURAL_EQUALITY_P
>>before probing with type_hash_canon.
>> 
>> gcc/testsuite/
>>* gcc.dg/pr114931.c: New testcase.
>> ---
>> gcc/attribs.cc  | 20 +-
>> gcc/c-family/c-common.cc| 11 --
>> gcc/testsuite/gcc.dg/pr114931.c | 10 +
>> gcc/tree.cc | 65 +++--
>> 4 files changed, 74 insertions(+), 32 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr114931.c
>> 
>> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
>> index 12ffc5f170a..3ab0b0fd87a 100644
>> --- a/gcc/attribs.cc
>> +++ b/gcc/attribs.cc
>> @@ -1336,6 +1336,16 @@ build_type_attribute_qual_variant (tree otype, tree 
>> attribute, int quals)
>>   tree dtype = ntype = build_distinct_type_copy (ttype);
>> 
>>   TYPE_ATTRIBUTES (ntype) = attribute;
>> +  /* If the target-dependent attributes make NTYPE different from
>> + its canonical type, we will need to use structural equality
>> + checks for this type.
>> +
>> + We shouldn't get here for stripping attributes from a type;
>> + the no-attribute type might not need structural comparison.  But
>> + we can if was discarded from type_hash_table.  */
>> +  if (TYPE_STRUCTURAL_EQUALITY_P (ttype)
>> +  || !comp_type_attributes (ntype, ttype))
>> +SET_TYPE_STRUCTURAL_EQUALITY (ntype);
>> 
>>   hashval_t hash = type_hash_canon_hash (ntype);
>>   ntype = type_hash_canon (hash, ntype);
>> @@ -1343,16 +1353,6 @@ build_type_attribute_qual_variant (tree otype, tree 
>> attribute, int quals)
>>   if (ntype != dtype)
>>/* This variant was already in the hash table, don't mess with
>>   TYPE_CANONICAL.  */;
>> -  else if (TYPE_STRUCTURAL_EQUALITY_P (ttype)
>> -   || !comp_type_attributes (ntype, ttype))
>> -/* If the target-dependent attributes make NTYPE different from
>> -   its canonical type, we will need to use structural equality
>> -   checks for this type.
>> -
>> -   We shouldn't get here for stripping attributes from a type;
>> -   the no-attribute type might not need structural comparison.  But
>> -   we can if was discarded from type_hash_

[PATCH] middle-end/114931 - type_hash_canon and structual equality types

2024-05-03 Thread Richard Biener
TYPE_STRUCTURAL_EQUALITY_P is part of our type system so we have
to make sure to include that into the type unification done via
type_hash_canon.  This requires the flag to be set before querying
the hash which is the biggest part of the patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.

As said in the PR this merely makes sure to keep individual types
consistent with themselves.  We still will have a set of types
with TYPE_STRUCTURAL_EQUALITY_P and a set without that might be
otherwise identical.  That could be only avoided with changes in
the frontend.

OK for trunk?

Thanks,
Richard.

PR middle-end/114931
gcc/
* tree.cc (type_hash_canon_hash): Hash TYPE_STRUCTURAL_EQUALITY_P.
(type_cache_hasher::equal): Compare TYPE_STRUCTURAL_EQUALITY_P.
(build_array_type_1): Set TYPE_STRUCTURAL_EQUALITY_P before
probing with type_hash_canon.
(build_function_type): Likewise.
(build_method_type_directly): Likewise.
(build_offset_type): Likewise.
(build_complex_type): Likewise.
* attribs.cc (build_type_attribute_qual_variant): Likewise.

gcc/c-family/
* c-common.cc (complete_array_type): Set TYPE_STRUCTURAL_EQUALITY_P
before probing with type_hash_canon.

gcc/testsuite/
* gcc.dg/pr114931.c: New testcase.
---
 gcc/attribs.cc  | 20 +-
 gcc/c-family/c-common.cc| 11 --
 gcc/testsuite/gcc.dg/pr114931.c | 10 +
 gcc/tree.cc | 65 +++--
 4 files changed, 74 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr114931.c

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 12ffc5f170a..3ab0b0fd87a 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1336,6 +1336,16 @@ build_type_attribute_qual_variant (tree otype, tree 
attribute, int quals)
   tree dtype = ntype = build_distinct_type_copy (ttype);
 
   TYPE_ATTRIBUTES (ntype) = attribute;
+  /* If the target-dependent attributes make NTYPE different from
+its canonical type, we will need to use structural equality
+checks for this type.
+
+We shouldn't get here for stripping attributes from a type;
+the no-attribute type might not need structural comparison.  But
+we can if was discarded from type_hash_table.  */
+  if (TYPE_STRUCTURAL_EQUALITY_P (ttype)
+ || !comp_type_attributes (ntype, ttype))
+   SET_TYPE_STRUCTURAL_EQUALITY (ntype);
 
   hashval_t hash = type_hash_canon_hash (ntype);
   ntype = type_hash_canon (hash, ntype);
@@ -1343,16 +1353,6 @@ build_type_attribute_qual_variant (tree otype, tree 
attribute, int quals)
   if (ntype != dtype)
/* This variant was already in the hash table, don't mess with
   TYPE_CANONICAL.  */;
-  else if (TYPE_STRUCTURAL_EQUALITY_P (ttype)
-  || !comp_type_attributes (ntype, ttype))
-   /* If the target-dependent attributes make NTYPE different from
-  its canonical type, we will need to use structural equality
-  checks for this type.
-
-  We shouldn't get here for stripping attributes from a type;
-  the no-attribute type might not need structural comparison.  But
-  we can if was discarded from type_hash_table.  */
-   SET_TYPE_STRUCTURAL_EQUALITY (ntype);
   else if (TYPE_CANONICAL (ntype) == ntype)
TYPE_CANONICAL (ntype) = TYPE_CANONICAL (ttype);
 
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 01e3d247fc2..032dcb4b41d 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -7115,6 +7115,13 @@ complete_array_type (tree *ptype, tree initial_value, 
bool do_default)
   TYPE_TYPELESS_STORAGE (main_type) = TYPE_TYPELESS_STORAGE (type);
   layout_type (main_type);
 
+  /* Set TYPE_STRUCTURAL_EQUALITY_P early.  */
+  if (TYPE_STRUCTURAL_EQUALITY_P (TREE_TYPE (main_type))
+  || TYPE_STRUCTURAL_EQUALITY_P (TYPE_DOMAIN (main_type)))
+SET_TYPE_STRUCTURAL_EQUALITY (main_type);
+  else
+TYPE_CANONICAL (main_type) = main_type;
+
   /* Make sure we have the canonical MAIN_TYPE. */
   hashval_t hashcode = type_hash_canon_hash (main_type);
   main_type = type_hash_canon (hashcode, main_type);
@@ -7122,7 +7129,7 @@ complete_array_type (tree *ptype, tree initial_value, 
bool do_default)
   /* Fix the canonical type.  */
   if (TYPE_STRUCTURAL_EQUALITY_P (TREE_TYPE (main_type))
   || TYPE_STRUCTURAL_EQUALITY_P (TYPE_DOMAIN (main_type)))
-SET_TYPE_STRUCTURAL_EQUALITY (main_type);
+gcc_assert (TYPE_STRUCTURAL_EQUALITY_P (main_type));
   else if (TYPE_CANONICAL (TREE_TYPE (main_type)) != TREE_TYPE (main_type)
   || (TYPE_CANONICAL (TYPE_DOMAIN (main_type))
   != TYPE_DOMAIN (main_type)))
@@ -7130,8 +7137,6 @@ complete_array_type (tree *ptype, tree initial_value, 
bool do_default)
   = build_array_type (TYPE_CANONICAL (TREE_TYPE (main_type)),
 

Re: [PATCH 3/3] Add parentheses around DECL_INIT for .original [PR23872]

2024-05-03 Thread Richard Biener
On Thu, May 2, 2024 at 11:40 PM Andrew Pinski  wrote:
>
> When we have :
> `void f (int y, int z) { int x = ( z++,y); }`
>
> This would have printed the decl's initializer without
> parentheses which can confusion if you think that is defining
> another variable rather than the compound expression.
>
> This adds parenthese around DECL_INIT if it was a COMPOUND_EXPR.

Looking it seems we'd hit a similar issue for

 foo ((z++,y), 2);

thus in CALL_EXPR context.  Also

int k;
void foo (int i, int j)
{
  k = (i, 2) + j;
}

dumps as

{
  k = i, j + 2;;
}

(ok that's folded to (i, j + 2) but still).

So shouldn't we bite the bullet and wrap all COMPOUND_EXPRs in
parens instead?  Possibly "tail-calling" the case of
a, b, c in COMPOUND_EXPR dumping itself?

Thanks,
Richard.

> Bootstrapped and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> * tree-pretty-print.cc (print_declaration): Add parenthese
> around DECL_INIT if it was a COMPOUND_EXPR.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-pretty-print.cc | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 825ba74443b..8b766dcd2b8 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -4240,7 +4240,14 @@ print_declaration (pretty_printer *pp, tree t, int 
> spc, dump_flags_t flags, bool
>   pp_equal (pp);
>   pp_space (pp);
>   if (!(flags & TDF_SLIM))
> -   dump_generic_node (pp, DECL_INITIAL (t), spc, flags, false);
> +   {
> + bool need_paren = TREE_CODE (DECL_INITIAL (t)) == COMPOUND_EXPR;
> + if (need_paren)
> +   pp_left_paren (pp);
> + dump_generic_node (pp, DECL_INITIAL (t), spc, flags, false);
> + if (need_paren)
> +   pp_right_paren (pp);
> +   }
>   else
> pp_string (pp, "<<< omitted >>>");
> }
> --
> 2.43.0
>


  1   2   3   4   5   6   7   8   9   10   >