from:"Andrew Pinski"

Re: [PATCH] MATCH: Simplify `a rrotate (32-b) -> a lrotate b` [PR109906]

2024-09-17 Thread Andrew Pinski

On Tue, Sep 17, 2024 at 11:23 PM Eikansh Gupta
 wrote:
>
> The pattern `a rrotate (32-b)` should be optimized to `a lrotate b`.
> The same is also true for `a lrotate (32-b)`. It can be optimized to
> `a rrotate b`.
>
> This patch adds following patterns:
> a rrotate (32-b) -> a lrotate b
> a lrotate (32-b) -> a rrotate b
>
> PR tree-optimization/109906
>
> gcc/ChangeLog:
>
> * match.pd (a rrotate (32-b) -> a lrotate b): New pattern
> (a lrotate (32-b) -> a rrotate b): New pattern
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr109906.c: New test.
> ---
>  gcc/match.pd | 10 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr109906.c | 40 
>  2 files changed, 50 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109906.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5566c0e4c41..77cb3f8060a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4759,6 +4759,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> build_int_cst (TREE_TYPE (@1),
>element_precision (type)), @1); }))
>
> +/* a rrotate (32-b) -> a lrotate b */
> +/* a lrotate (32-b) -> a rrotate b */
> +(for rotate (lrotate rrotate)
> + (simplify
> +  (rotate @0 (minus INTEGER_CST@1 @2))
> +   (if (TYPE_PRECISION (TREE_TYPE (@0)) == wi::to_wide (@1))
> +(if (rotate == RROTATE_EXPR)
> + (lrotate @0 @2)
> +  (rrotate @0 @2)

I just noticed this can be simplified to:
```
(for rotate  (lrotate rrotate)
 orotate (rrotate lrotate)
 (simplify
  (rotate @0 (minus INTEGER_CST@1 @2))
   (if (TYPE_PRECISION (TREE_TYPE (@0)) == wi::to_wide (@1))
 (orotate @0 @2
```

Sorry for not noticing that in the internal review.

Thanks,
Andrew Pinski

> +
>  /* Turn (a OP c1) OP c2 into a OP (c1+c2).  */
>  (for op (lrotate rrotate rshift lshift)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109906.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr109906.c
> new file mode 100644
> index 000..fe576d7ce3a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109906.c
> @@ -0,0 +1,40 @@
> +/* PR tree-optimization/109906 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
> +
> +/* Implementation of rotate right operation */
> +static inline
> +unsigned rrotate(unsigned x, int t)
> +{
> +  if (t >= 32) __builtin_unreachable();
> +  unsigned tl = x >> (t);
> +  unsigned th = x << (32-t);
> +  return tl | th;
> +}
> +
> +/* Here rotate left is achieved by doing rotate right by (32 - x) */
> +unsigned rotateleft(unsigned t, int x)
> +{
> +  return rrotate (t, 32-x);
> +}
> +
> +/* Implementation of rotate left operation */
> +static inline
> +unsigned lrotate(unsigned x, int t)
> +{
> +  if (t >= 32) __builtin_unreachable();
> +  unsigned tl = x << (t);
> +  unsigned th = x >> (32-t);
> +  return tl | th;
> +}
> +
> +/* Here rotate right is achieved by doing rotate left by (32 - x) */
> +unsigned rotateright(unsigned t, int x)
> +{
> +  return lrotate (t, 32-x);
> +}
> +
> +/* Shouldn't have instruction for (32 - x). */
> +/* { dg-final { scan-tree-dump-not "minus_expr" "optimized" } } */
> +/* { dg-final { scan-tree-dump "rrotate_expr" "optimized" } } */
> +/* { dg-final { scan-tree-dump "lrotate_expr" "optimized" } } */
> --
> 2.17.1
>

[PATCH 1/2] phiopt: Add some details dump to cselim

2024-09-17 Thread Andrew Pinski

While trying to debug PR 116747, I noticed there was no dump
saying what was done. So this adds the debug dump and it helps
debug what is going on in PR 116747 too.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (cond_if_else_store_replacement_1): Add debug dump.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 7b12692237e..488b45015e9 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -3469,6 +3469,17 @@ cond_if_else_store_replacement_1 (basic_block then_bb, 
basic_block else_bb,
   then_locus = gimple_location (then_assign);
   else_locus = gimple_location (else_assign);
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf(dump_file, "factoring out stores:\n\tthen:\n");
+  print_gimple_stmt (dump_file, then_assign, 0,
+TDF_VOPS|TDF_MEMSYMS);
+  fprintf(dump_file, "\telse:\n");
+  print_gimple_stmt (dump_file, else_assign, 0,
+TDF_VOPS|TDF_MEMSYMS);
+  fprintf (dump_file, "\n");
+}
+
   /* Now we've checked the constraints, so do the transformation:
  1) Remove the stores.  */
   gsi = gsi_for_stmt (then_assign);
@@ -3490,6 +3501,16 @@ cond_if_else_store_replacement_1 (basic_block then_bb, 
basic_block else_bb,
   add_phi_arg (newphi, else_rhs, EDGE_SUCC (else_bb, 0), else_locus);
 
   new_stmt = gimple_build_assign (lhs, gimple_phi_result (newphi));
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf(dump_file, "to use phi:\n");
+  print_gimple_stmt (dump_file, newphi, 0,
+TDF_VOPS|TDF_MEMSYMS);
+  fprintf(dump_file, "\n");
+  print_gimple_stmt (dump_file, new_stmt, 0,
+TDF_VOPS|TDF_MEMSYMS);
+  fprintf(dump_file, "\n\n");
+}
 
   /* 3) Insert that PHI node.  */
   gsi = gsi_after_labels (join_bb);
-- 
2.43.0

[PATCH 2/2] phiopt: C++ify cond_if_else_store_replacement

2024-09-17 Thread Andrew Pinski

This C++ify cond_if_else_store_replacement by using range fors
and changing using a std::pair instead of 2 vecs.
I had a hard time understanding the code when there was 2 vecs
so having a vec of a pair makes it easier to understand the relationship
between the 2.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (cond_if_else_store_replacement): Use
range fors and use one vec for then/else stores instead of 2.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 488b45015e9..d43832b390b 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -3587,9 +3587,6 @@ cond_if_else_store_replacement (basic_block then_bb, 
basic_block else_bb,
   vec then_ddrs, else_ddrs;
   gimple *then_store, *else_store;
   bool found, ok = false, res;
-  struct data_dependence_relation *ddr;
-  data_reference_p then_dr, else_dr;
-  int i, j;
   tree then_lhs, else_lhs;
   basic_block blocks[3];
 
@@ -3640,8 +3637,8 @@ cond_if_else_store_replacement (basic_block then_bb, 
basic_block else_bb,
 }
 
   /* Find pairs of stores with equal LHS.  */
-  auto_vec then_stores, else_stores;
-  FOR_EACH_VEC_ELT (then_datarefs, i, then_dr)
+  auto_vec, 1> stores_pairs;
+  for (auto then_dr : then_datarefs)
 {
   if (DR_IS_READ (then_dr))
 continue;
@@ -3652,7 +3649,7 @@ cond_if_else_store_replacement (basic_block then_bb, 
basic_block else_bb,
continue;
   found = false;
 
-  FOR_EACH_VEC_ELT (else_datarefs, j, else_dr)
+  for (auto else_dr : else_datarefs)
 {
   if (DR_IS_READ (else_dr))
 continue;
@@ -3672,13 +3669,12 @@ cond_if_else_store_replacement (basic_block then_bb, 
basic_block else_bb,
   if (!found)
 continue;
 
-  then_stores.safe_push (then_store);
-  else_stores.safe_push (else_store);
+  stores_pairs.safe_push (std::make_pair (then_store, else_store));
 }
 
   /* No pairs of stores found.  */
-  if (!then_stores.length ()
-  || then_stores.length () > (unsigned) param_max_stores_to_sink)
+  if (!stores_pairs.length ()
+  || stores_pairs.length () > (unsigned) param_max_stores_to_sink)
 {
   free_data_refs (then_datarefs);
   free_data_refs (else_datarefs);
@@ -3706,7 +3702,7 @@ cond_if_else_store_replacement (basic_block then_bb, 
basic_block else_bb,
 
   /* Check that there are no read-after-write or write-after-write dependencies
  in THEN_BB.  */
-  FOR_EACH_VEC_ELT (then_ddrs, i, ddr)
+  for (auto ddr : then_ddrs)
 {
   struct data_reference *dra = DDR_A (ddr);
   struct data_reference *drb = DDR_B (ddr);
@@ -3728,7 +3724,7 @@ cond_if_else_store_replacement (basic_block then_bb, 
basic_block else_bb,
 
   /* Check that there are no read-after-write or write-after-write dependencies
  in ELSE_BB.  */
-  FOR_EACH_VEC_ELT (else_ddrs, i, ddr)
+  for (auto ddr : else_ddrs)
 {
   struct data_reference *dra = DDR_A (ddr);
   struct data_reference *drb = DDR_B (ddr);
@@ -3749,9 +3745,10 @@ cond_if_else_store_replacement (basic_block then_bb, 
basic_block else_bb,
 }
 
   /* Sink stores with same LHS.  */
-  FOR_EACH_VEC_ELT (then_stores, i, then_store)
+  for (auto &store_pair : stores_pairs)
 {
-  else_store = else_stores[i];
+  then_store = store_pair.first;
+  else_store = store_pair.second;
   res = cond_if_else_store_replacement_1 (then_bb, else_bb, join_bb,
   then_store, else_store);
   ok = ok || res;
-- 
2.43.0

Re: [PATCH v2] GCC Driver : Enable very long gcc command-line option

2024-09-17 Thread Andrew Pinski

On Tue, Sep 17, 2024 at 3:40 AM Dora, Sunil Kumar
 wrote:
>
> Hi Andrew,
>
> Initially, I thought to address long command line options (when exceeding 
> 128KB) without disrupting the existing GCC driver behavior.
>
> As you suggested, I implemented changes to use the response file format 
> (@file) within the set_collect_gcc_options function and ensured that this was 
> passed through COLLECT_GCC_OPTIONS.
> However, these changes have introduced a side effect: they impact the 
> behavior of the -save-temps switch by generating additional .args.N files. As 
> a result, some existing test cases, including the one reported by the Linaro 
> team, are now failing.
> (File: Attached)
> Could you please advise on how we should proceed? Specifically, should we 
> adjust the test cases to accommodate the impact on the -save-temps switch, or 
> is there an alternative approach you would recommend? Your guidance on how to 
> address these issues while implementing the response file approach would be 
> greatly appreciated.

Sounds like the testcase needs to be changed. If you were not saving
around the file that was used for COLLECT_GCC_OPTIONS in the previous
patches (with -save-temps), then that was broken.
Or is the issue the name of the file is not based on the aux dump file
but based on something else?  That is what is the file that is kept
around for -save-temps ?

Thanks,
Andrew

> Thank you for your support.
>
>
>
> Thanks,
> Sunil Dora
> 
> From: Andrew Pinski 
> Sent: Friday, September 6, 2024 11:33 PM
> To: Dora, Sunil Kumar 
> Cc: Hemraj, Deepthi ; GCC Patches 
> ; Richard Guenther ; Jeff Law 
> ; josmy...@redhat.com ; MacLeod, 
> Randy ; Gowda, Naveen 
> 
> Subject: Re: [PATCH v2] GCC Driver : Enable very long gcc command-line option
>
> CAUTION: This email comes from a non Wind River email account!
>
> Do not click links or open attachments unless you recognize the sender and 
> know the content is safe.
>
>
> On Fri, Sep 6, 2024, 9:38 AM Dora, Sunil Kumar 
>  wrote:
>
> Hi Andrew,
>
> Thank you for your feedback. Initially, we attempted to address the issue by 
> utilizing GCC’s response files. However, we discovered that the 
> COLLECT_GCC_OPTIONS variable already contains the expanded contents of the 
> response files.
>
> As a result, using response files only mitigates the multiplication factor 
> but does not bypass the 128KB limit.
>
>
> I think you missed understood me fully. What I was saying instead of creating 
> a string inside set_collect_gcc_options, create the response file and pass 
> that via COLLECT_GCC_OPTIONS with the @file format. And then inside 
> collect2.cc when using COLLECT_GCC_OPTIONS/extract_string instead read in the 
> response file options if there was an @file instead of those 2 loops. This 
> requires more than what you did. Oh and should be less memory hungry and 
> maybe slightly faster.
>
> Thanks,
> Andrew
>
>
>
> I have included the response file usage logs and the complete history in the 
> Bugzilla report for your reference: Bugzilla Link.
> Following your suggestion, I have updated the logic to avoid hardcoding /tmp.
> Please find the revised version of patch at the following link:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662519.html
>
> Thanks,
> Sunil Dora
> 
> From: Andrew Pinski 
> Sent: Friday, August 30, 2024 8:05 PM
> To: Hemraj, Deepthi 
> Cc: gcc-patches@gcc.gnu.org ; rguent...@suse.de 
> ; jeffreya...@gmail.com ; 
> josmy...@redhat.com ; MacLeod, Randy 
> ; Gowda, Naveen ; 
> Dora, Sunil Kumar 
> Subject: Re: [PATCH v2] GCC Driver : Enable very long gcc command-line option
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and 
> know the content is safe.
>
> On Fri, Aug 30, 2024 at 12:34 AM  wrote:
> >
> > From: Deepthi Hemraj 
> >
> > For excessively long environment variables i.e >128KB
> > Store the arguments in a temporary file and collect them back together in 
> > collect2.
> >
> > This commit patches for COLLECT_GCC_OPTIONS issue:
> > GCC should not limit the length of command line passed to collect2.
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527
> >
> > The Linux kernel has the following limits on shell commands:
> > I.  Total number of bytes used to specify arguments must be under 128KB.
> > II. Each environment variable passed to an executable must be under 128 KiB
> >
> > In order to circumvent these limitations, many build tools support
> > response-files, i.e. fil

[PATCH] vect: Use simple_dce_worklist in the vectorizer [PR116711]

2024-09-16 Thread Andrew Pinski

This adds simple_dce_worklist to both the SLP vectorizer and the loop based 
vectorizer.
This is a step into removing the dce after the loop based vectorizer. That DCE 
still
does a few things, removing some of the induction variables which has become 
unused. That is
something which can be improved afterwards.

Note this adds it to the SLP BB vectorizer too as it is used from the loop 
based one sometimes.
In the case of the BB SLP vectorizer, the dead statements don't get removed 
until much later in
DSE so removing them much earlier is important.

Note on the new testcase, it came up during bootstrap where the SLP pass would 
cause the need to
invalidate the scev caches but there was no testcase for this beforehand so 
adding one is a good idea.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/116711

gcc/ChangeLog:

* tree-ssa-dce.cc (simple_dce_from_worklist): Returns
true if something was removed.
* tree-ssa-dce.h (simple_dce_from_worklist): Change return
type to bool.
* tree-vect-loop.cc (vectorizable_induction): Add phi result
to the dce worklist.
* tree-vect-slp.cc: Add includes of tree-ssa-dce.h,
tree-ssa-loop-niter.h and tree-scalar-evolution.h.
(vect_slp_region): Add DCE_WORKLIST argument. Copy
the dce_worklist from the bb vectorization info.
(vect_slp_bbs): Add DCE_WORKLIST argument. Update call to
vect_slp_region.
(vect_slp_if_converted_bb): Add DCE_WORKLIST argument. Update
call to vect_slp_bbs.
(vect_slp_function): Update call to vect_slp_bbs and call
simple_dce_from_worklist. Also free the loop iteration and
scev cache if something was removed.
* tree-vect-stmts.cc (vectorizable_bswap): Add the lhs of the scalar 
stmt
to the dce work list.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison_1): Likewise.
* tree-vectorizer.cc: Include tree-ssa-dce.h.
(vec_info::remove_stmt): Add all of the uses of the store to the
dce work list.
(try_vectorize_loop_1): Update call to vect_slp_if_converted_bb.
Copy the dce worklist into the loop's vectinfo dce worklist.
(pass_vectorize::execute): Copy loops' vectinfo dce worklist locally.
Add call to simple_dce_from_worklist.
* tree-vectorizer.h (vec_info): Add dce_worklist field.
(vect_slp_if_converted_bb): Add bitmap argument.
* tree-vectorizer.h (vect_slp_if_converted_bb): Add bitmap argument.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-77.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/vect/bb-slp-77.c | 15 +
 gcc/tree-ssa-dce.cc   |  5 +++--
 gcc/tree-ssa-dce.h|  2 +-
 gcc/tree-vect-loop.cc |  3 +++
 gcc/tree-vect-slp.cc  | 32 ---
 gcc/tree-vect-stmts.cc| 16 +-
 gcc/tree-vectorizer.cc| 21 +-
 gcc/tree-vectorizer.h |  5 -
 8 files changed, 85 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-77.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-77.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-77.c
new file mode 100644
index 000..a74bb17e25c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-77.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+/* Make sure SLP vectorization updates the estimated loop bounds correctly. */
+
+void g(int);
+void f(int *a)
+{
+  int n = a[0]++;
+  int g1 = a[1]++;
+  for(int i = 0; i < n; i++)
+g(g1);
+}
+
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" { 
target vect_int } } } */
+
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 69249c73013..87c5df4216b 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -2170,9 +2170,9 @@ make_pass_cd_dce (gcc::context *ctxt)
 /* A cheap DCE interface.  WORKLIST is a list of possibly dead stmts and
is consumed by this function.  The function has linear complexity in
the number of dead stmts with a constant factor like the average SSA
-   use operands number.  */
+   use operands number. Returns true if something was removed.  */
 
-void
+bool
 simple_dce_from_worklist (bitmap worklist, bitmap need_eh_cleanup)
 {
   int phiremoved = 0;
@@ -2269,4 +2269,5 @@ simple_dce_from_worklist (bitmap worklist, bitmap 
need_eh_cleanup)
phiremoved);
   statistics_counter_event (cfun, "Statements removed",
stmtremoved);
+  return phiremoved !=

Re: [PATCH] c++: alias of decltype(lambda) is opaque [PR116714]

2024-09-16 Thread Andrew Pinski

On Mon, Sep 16, 2024 at 8:12 AM Patrick Palka  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linuxgnu, does this look
> OK for trunk?  Sadly the prerequisity patch r15-2331-g523836716137d0
> probably isn't suitable for backporting, so I reckon this should be
> trunk-only.
>
> -- >8 --
>
> Here we're prematurely stripping the decltype(lambda) alias used inside
> the template-id during ahead of time template argument coercion, which
> means we treat it as if it were
>
>   is_same_v
>
> which instead yields false since now we're substituting into the lambda
> twice, and every such substitution yields a unique lambda.  This
> demonstrates that such aliases should be considered opaque, a notion which
> coincidentally we recently introduced in r15-2331-g523836716137d0.

I wonder if this fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106221  too.
While https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107390 looks it
would be fixed here too.

Thanks,
Andrew

>
> PR c++/116714
>
> gcc/cp/ChangeLog:
>
> * pt.cc (dependent_opaque_alias_p): Also return true for a
> decltype(lambda) alias.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp2a/lambda-uneval18.C: New test.
> ---
>  gcc/cp/pt.cc |  6 --
>  gcc/testsuite/g++.dg/cpp2a/lambda-uneval18.C | 20 
>  2 files changed, 24 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval18.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 04987f66746..a72a1eadbc7 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -6764,8 +6764,10 @@ dependent_opaque_alias_p (const_tree t)
>  {
>return (TYPE_P (t)
>   && typedef_variant_p (t)
> - && any_dependent_type_attributes_p (DECL_ATTRIBUTES
> - (TYPE_NAME (t;
> + && (any_dependent_type_attributes_p (DECL_ATTRIBUTES
> +  (TYPE_NAME (t)))
> + || (TREE_CODE (t) == DECLTYPE_TYPE
> + && TREE_CODE (DECLTYPE_TYPE_EXPR (t)) == LAMBDA_EXPR)));
>  }
>
>  /* Return the number of innermost template parameters in TMPL.  */
> diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-uneval18.C 
> b/gcc/testsuite/g++.dg/cpp2a/lambda-uneval18.C
> new file mode 100644
> index 000..2942f8305c7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/lambda-uneval18.C
> @@ -0,0 +1,20 @@
> +// PR c++/116714
> +// { dg-do compile { target c++20 } }
> +
> +template
> +inline constexpr bool is_same_v = __is_same(T, U);
> +
> +template
> +struct is_same { static constexpr bool value = false; };
> +
> +template
> +struct is_same { static constexpr bool value = true; };
> +
> +template
> +void f() {
> +  using type = decltype([]{});
> +  static_assert(is_same_v);
> +  static_assert(is_same::value);
> +};
> +
> +template void f();
> --
> 2.46.1.506.ged155187b4
>

[PATCH] vect: Set pattern_stmt_p on the newly created stmt_vec_info

2024-09-15 Thread Andrew Pinski

While adding simple_dce_worklist to the vectorizer, there was a regression
due to the slp patterns would create a SSA name but never free it even if it
never existed in the IR (this case as addsub but complex ones had the same 
issue).
The reason why it was never freed was the stmt_vec_info was not marked as a 
pattern stmt,
unlike the other pattern stmts that use vect_init_pattern_stmt instead of 
vec_info::add_pattern_stmt
(which is used for SLP patterns).

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-vectorizer.cc (vec_info::add_pattern_stmt): Set pattern_stmt_p.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-vectorizer.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index 0efabcbb258..4279b6db4cf 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -535,6 +535,7 @@ stmt_vec_info
 vec_info::add_pattern_stmt (gimple *stmt, stmt_vec_info stmt_info)
 {
   stmt_vec_info res = new_stmt_vec_info (stmt);
+  res->pattern_stmt_p = true;
   set_vinfo_for_stmt (stmt, res, false);
   STMT_VINFO_RELATED_STMT (res) = stmt_info;
   return res;
-- 
2.43.0

[PUSHED] testsuite; Fix execute/pr52286.c for 16bit

2024-09-14 Thread Andrew Pinski

The code path which was added for 16bit had a broken inline-asm which would
only assign maybe half of the registers for the `long` type to 0.

Adding L to the input operand of the inline-asm fixes the issue by now assigning
the full 32bit value of the input register that would match up with the output 
register.

Fixes r0-115223-gb0408f13d4b317 which added the 16bit code path to fix the 
testcase for 16bit.

Pushed as obvious.

PR testsuite/116716

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr52286.c: Fix inline-asm for 16bit case.
---
 gcc/testsuite/gcc.c-torture/execute/pr52286.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr52286.c 
b/gcc/testsuite/gcc.c-torture/execute/pr52286.c
index bb56295ab52..4fd5d6ac813 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr52286.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr52286.c
@@ -11,7 +11,7 @@ main ()
   b = (~a | 1) & -2038094497;
 #else
   long a, b;
-  asm ("" : "=r" (a) : "0" (0));
+  asm ("" : "=r" (a) : "0" (0L));
   b = (~a | 1) & -2038094497L;
 #endif
   if (b >= 0)
-- 
2.43.0

Re: [PATCH] tree-object-size: Fold PHI node offsets with constants [PR116556]

2024-09-14 Thread Andrew Pinski

On Sat, Sep 14, 2024 at 5:31 AM Siddhesh Poyarekar  wrote:
>
> In PTR + OFFSET cases, try harder to see if the target offset could
> result in a constant.  Specifically, if the offset is a PHI node with
> all constant branches, return the minimum (or maximum for OST_MINIMUM)
> of the possible values.
>
> gcc/ChangeLog:
>
> PR tree-optimization/116556
> * tree-object-size.cc (try_collapsing_offset): New function.
> (plus_stmt_object_size): Use it.
> * gcc/testsuite/gcc.dg/builtin-object-size-1.c (test12): New
> function.
> (main): Call it.
>
> Signed-off-by: Siddhesh Poyarekar 
> ---
> Tests underway for x86_64 and i686.  OK if they pass?
>
>  gcc/testsuite/gcc.dg/builtin-object-size-1.c | 25 
>  gcc/tree-object-size.cc  | 41 
>  2 files changed, 66 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-1.c 
> b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
> index d6d13c5ef7a..eed1653505c 100644
> --- a/gcc/testsuite/gcc.dg/builtin-object-size-1.c
> +++ b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
> @@ -712,6 +712,30 @@ test11 (void)
>  }
>  #endif
>
> +void
> +__attribute__ ((noinline))
> +test12 (unsigned cond)
> +{
> +  char *buf2 = malloc (10);
> +  char *p;
> +  size_t t;
> +
> +  if (cond)
> +t = 8;
> +  else
> +t = 4;
> +
> +  p = &buf2[t];
> +
> +#ifdef __builtin_object_size
> +  if (__builtin_object_size (p, 0) != (cond ? 2 : 6))
> +FAIL ();
> +#else
> +  if (__builtin_object_size (p, 0) != 6)
> +FAIL ();
> +#endif
> +}
> +
>  int
>  main (void)
>  {
> @@ -729,6 +753,7 @@ main (void)
>test10 ();
>  #ifndef SKIP_STRNDUP
>test11 ();
> +  test12 (1);
>  #endif
>DONE ();
>  }
> diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
> index 4c1fa9b555f..d90cc43fa97 100644
> --- a/gcc/tree-object-size.cc
> +++ b/gcc/tree-object-size.cc
> @@ -1467,6 +1467,44 @@ merge_object_sizes (struct object_size_info *osi, tree 
> dest, tree orig)
>return bitmap_bit_p (osi->reexamine, SSA_NAME_VERSION (orig));
>  }
>
> +/* For constant sizes, try collapsing a non-constant offset to a constant if
> +   possible.  The case handled at the moment is when the offset is a PHI node
> +   with all of its targets are constants.  */
> +
> +static tree
> +try_collapsing_offset (tree op, int object_size_type)
> +{
> +  gcc_assert (!(object_size_type & OST_DYNAMIC));
> +
> +  if (TREE_CODE (op) != SSA_NAME)
> +return op;
> +
> +  gimple *stmt = SSA_NAME_DEF_STMT (op);
> +
> +  if (gimple_code (stmt) != GIMPLE_PHI)
> +return op;
> +
> +  tree off = size_unknown (object_size_type);
> +
> +  for (unsigned i = 0; i < gimple_phi_num_args (stmt); i++)
> +{
> +  tree rhs = gimple_phi_arg (stmt, i)->def;
> +
> +  if (TREE_CODE (rhs) != INTEGER_CST)
> +   return op;
> +
> +  /* Note that this is the *opposite* of what we usually do with sizes,
> +because the maximum offset estimate here will give us a minimum size
> +estimate and vice versa.  */
> +  enum tree_code code = (object_size_type & OST_MINIMUM
> +? MAX_EXPR : MIN_EXPR);
> +
> +  off = size_binop (code, off, rhs);


I suspect this won't work for integer constants which have the what
would be the sign bit set.

That is:
```

void
__attribute__ ((noinline))
test9 (unsigned cond)
{
  char *buf2 = __builtin_malloc (10);
  char *p;
  __SIZE_TYPE__ t;

  if (cond)
t = -4;
  else
t = 4;
  p = &buf2[4] + t;

  if (__builtin_object_size (&p[0], 0) != 10)
__builtin_abort ();
}
```

Since you do the MIN/MAX in unsigned.

Thanks,
Andrew Pinski

> +}
> +
> +  gcc_assert (TREE_CODE (off) == INTEGER_CST);
> +  return off;
> +}
>
>  /* Compute object_sizes for VAR, defined to the result of an assignment
> with operator POINTER_PLUS_EXPR.  Return true if the object size might
> @@ -1499,6 +1537,9 @@ plus_stmt_object_size (struct object_size_info *osi, 
> tree var, gimple *stmt)
>if (object_sizes_unknown_p (object_size_type, varno))
>  return false;
>
> +  if (!(object_size_type & OST_DYNAMIC) && TREE_CODE (op1) != INTEGER_CST)
> +op1 = try_collapsing_offset (op1, object_size_type);
> +
>/* Handle PTR + OFFSET here.  */
>if (size_valid_p (op1, object_size_type)
>&& (TREE_CODE (op0) == SSA_NAME || TREE_CODE (op0) == ADDR_EXPR))
> --
> 2.45.1
>

[PATCH] vect: release defs of removed statement

2024-09-14 Thread Andrew Pinski

While trying to add use of simple_dce_from_worklist
to the vectorizer so we don't need to run a full blown
DCE pass after the vectorizer, there was a crash noticed
due to a ssa name which has a stmt without a bb. This was
due to not calling release_defs after the call to gsi_remove.

Note the code to remove zero use statements should be able to
remove once the use of simple_dce_from_worklist has been added.
But in the meantime, fixing this bug will also improve memory
usage and a few other things which look through all ssa names.

gcc/ChangeLog:

* tree-vect-loop.cc (optimize_mask_stores): Call release_defs
after the call to gsi_remove with last argument of true.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-vect-loop.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index cc15492f6a0..62c7f90779f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12803,6 +12803,7 @@ optimize_mask_stores (class loop *loop)
  if (has_zero_uses (lhs))
{
  gsi_remove (&gsi_from, true);
+ release_defs (stmt1);
  continue;
}
}
-- 
2.43.0

[PATCH] Mark the copy/move constructor/operator= of auto_bitmap as delete

2024-09-14 Thread Andrew Pinski

Since we are written in C++11, these should be marked as delete rather
than just private.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* bitmap.h (class auto_bitmap): Mark copy/move constructor/operator=
as deleted.

Signed-off-by: Andrew Pinski 
---
 gcc/bitmap.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 4cad1b4d6c6..451edcfc590 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -959,10 +959,10 @@ class auto_bitmap
 
  private:
   // Prevent making a copy that references our bitmap.
-  auto_bitmap (const auto_bitmap &);
-  auto_bitmap &operator = (const auto_bitmap &);
-  auto_bitmap (auto_bitmap &&);
-  auto_bitmap &operator = (auto_bitmap &&);
+  auto_bitmap (const auto_bitmap &) = delete;
+  auto_bitmap &operator = (const auto_bitmap &) = delete;
+  auto_bitmap (auto_bitmap &&) = delete;
+  auto_bitmap &operator = (auto_bitmap &&) = delete;
 
   bitmap_head m_bits;
 };
-- 
2.43.0

[PATCH] phi-opt: Improve heuristics for factoring out with constant (again) [PR116699]

2024-09-13 Thread Andrew Pinski

The heuristics for factoring out with a constant checks that the assignment 
statement
is the last statement of the basic block but sometimes there is a predicate or 
a nop statement
after the assignment. Rejecting this case does not make sense since both 
predicates and nop
statements are removed and don't contribute any instructions. So we should skip 
over them
when checking if the assignment statement was the last statement in the basic 
block.

phi-opt-factor-1.c's f0 is such an example where it should catch it at phiopt1 
(before predicates are removed)
and should happen in a similar way as f1 (which uses a temporary variable 
rather than return).

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116699

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Skip over 
nop/predicates
for seeing the assignment is the last statement.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-factor-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 .../gcc.dg/tree-ssa/phi-opt-factor-1.c| 26 +++
 gcc/tree-ssa-phiopt.cc|  6 +
 2 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-factor-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-factor-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-factor-1.c
new file mode 100644
index 000..12b846b9337
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-factor-1.c
@@ -0,0 +1,26 @@
+/* { dg-options "-O2 -fdump-tree-phiopt" } */
+
+/* PR tree-optimization/116699
+   Make sure the return PREDICT has no factor in deciding
+   if we factor out the conversion. */
+
+short f0(int a, int b, int c)
+{
+  int t1 = 4;
+  if (c < t1)  return (c > -1 ? c : -1);
+  return t1;
+}
+
+
+short f1(int a, int b, int c)
+{
+  int t1 = 4;
+  short t = t1;
+  if (c < t1)  t = (c > -1 ? c : -1);
+  return t;
+}
+
+/* Both f1 and f0  should be optimized at phiopt1 to the same thing. */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR " 2 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR " 2  "phiopt1" } } */
+/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index e5413e40572..7b12692237e 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -360,6 +360,12 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
}
  gsi = gsi_for_stmt (arg0_def_stmt);
  gsi_next_nondebug (&gsi);
+ /* Skip past nops and predicates. */
+ while (!gsi_end_p (gsi)
+ && (gimple_code (gsi_stmt (gsi)) == GIMPLE_NOP
+ || gimple_code (gsi_stmt (gsi)) == GIMPLE_PREDICT))
+   gsi_next_nondebug (&gsi);
+ /* Reject if the statement was not at the end of the block. */
  if (!gsi_end_p (gsi))
return NULL;
}
-- 
2.43.0

[PATCH] Fix factor_out_conditional_operation heuristics for constants

2024-09-12 Thread Andrew Pinski

While working on a different patch, I noticed the heuristics were not
doing the right thing if there was statements before the NOP/PREDICTs.
(LABELS don't have other statements before them).

This fixes that oversight which was added in r15-3334-gceda727dafba6e.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Instead
of just ignorning a NOP/PREDICT, skip over them before checking
the heuristics.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 5710bc32e61..e5413e40572 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -332,15 +332,17 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
{
  gsi = gsi_for_stmt (arg0_def_stmt);
  gsi_prev_nondebug (&gsi);
+ /* Ignore nops, predicates and labels. */
+ while (!gsi_end_p (gsi)
+ && (gimple_code (gsi_stmt (gsi)) == GIMPLE_NOP
+ || gimple_code (gsi_stmt (gsi)) == GIMPLE_PREDICT
+ || gimple_code (gsi_stmt (gsi)) == GIMPLE_LABEL))
+   gsi_prev_nondebug (&gsi);
+
  if (!gsi_end_p (gsi))
{
  gimple *stmt = gsi_stmt (gsi);
- /* Ignore nops, predicates and labels. */
- if (gimple_code (stmt) == GIMPLE_NOP
- || gimple_code (stmt) == GIMPLE_PREDICT
- || gimple_code (stmt) == GIMPLE_LABEL)
-   ;
- else if (gassign *assign = dyn_cast  (stmt))
+ if (gassign *assign = dyn_cast  (stmt))
{
  tree lhs = gimple_assign_lhs (assign);
  enum tree_code ass_code
-- 
2.43.0

Re: [PATCH] JSON dumping for GENERIC trees

2024-09-11 Thread Andrew Pinski

On Wed, Sep 11, 2024 at 6:51 PM  wrote:
>
> From: Thor C Preimesberger 
>
> This patch allows the compiler to dump GENERIC trees as JSON objects.
>
> The dump flag -fdump-tree-original-json dumps each fndecl node in the
> C frontend's gimplifier as a JSON object and traverses related nodes
> in an analagous manner as to raw-dumping.
>
> Some JSON parsers expect for there to be a single JSON value per file -
> the following shell command makes the output conformant:
>
>   tr -d '\n ' < out.json | sed -e 's/\]\[/,/g' | sed -e 's/}{/},{/g'
>
> There is also a debug function that simply prints a node as formatted JSON to
> stdout.
>
> The information in the dumped JSON is meant to be an amalgation of
> tree-pretty-print.cc's dump_generic_node and print-tree.cc's debug_tree.

I don't think this is a good idea and there is no obvious use case.
GIMPLE yes but not GENERIC.
Can you explain what the use case is for dumping generic as json. Also
you only hooked up the C and C++ family set of front-ends. Why not
hook up Fortran, Ada, Rust and go too? Why have it done in the
gimplifier?

Thanks,
Andrew

>
> Bootstrapped and tested on x86_64-pc-linux-gnu without issue.
>
> ChangeLog:
> * gcc/Makefile.in: Link tree-emit-json.o to c-gimplify.o
> * gcc/c-family/c-gimplify.cc (c_genericize): Hook for
> -fdump-tree-original-json
> * gcc/dumpfile.cc: Include tree-emit-json.h to expose
> node_emit_json and debug_tree_json. Also new headers needed for
> json.h being implicitly exposed
> * gcc/dumpfile.h (dump_flag): New dump flag TDF_JSON
> * gcc/tree-emit-json.cc: Logic for converting a tree to JSON
 > and dumping.
> * gcc/tree-emit-json.h: Ditto

A few comments about the changelog entry here.
it should be something like:
gcc/ChangeLog:
 * Makefile.in: ...

gcc/c-family/ChangeLog:
  * c-gimplify.cc ...

Also there is no testcase or indication on how you tested it.


>
> Signed-off-by: Thor C Preimesberger 
>
> ---
>  gcc/Makefile.in|2 +
>  gcc/c-family/c-gimplify.cc |   30 +-
>  gcc/cp/dump.cc |1 +
>  gcc/dumpfile.cc|3 +
>  gcc/dumpfile.h |6 +
>  gcc/tree-emit-json.cc  | 3155 
>  gcc/tree-emit-json.h   |   82 +
>  7 files changed, 3268 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/tree-emit-json.cc
>  create mode 100644 gcc/tree-emit-json.h
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 68fda1a7591..b65cc7f0ad5 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1042,6 +1042,7 @@ OPTS_H = $(INPUT_H) $(VEC_H) opts.h $(OBSTACK_H)
>  SYMTAB_H = $(srcdir)/../libcpp/include/symtab.h $(OBSTACK_H)
>  CPP_INTERNAL_H = $(srcdir)/../libcpp/internal.h
>  TREE_DUMP_H = tree-dump.h $(SPLAY_TREE_H) $(DUMPFILE_H)
> +TREE_EMIT_JSON_H = tree-emit-json.h $(SPLAY_TREE_H) $(DUMPFILE_H) json.h
>  TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H)
>  TREE_SSA_H = tree-ssa.h tree-ssa-operands.h \
> $(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \
> @@ -1709,6 +1710,7 @@ OBJS = \
> tree-diagnostic.o \
> tree-diagnostic-client-data-hooks.o \
> tree-dump.o \
> +   tree-emit-json.o \
> tree-eh.o \
> tree-emutls.o \
> tree-if-conv.o \
> diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> index 3e29766e092..8b0c80f4f75 100644
> --- a/gcc/c-family/c-gimplify.cc
> +++ b/gcc/c-family/c-gimplify.cc
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>  .  */
>
>  #include "config.h"
> +#define INCLUDE_MEMORY
>  #include "system.h"
>  #include "coretypes.h"
>  #include "tm.h"
> @@ -43,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "context.h"
>  #include "tree-pass.h"
>  #include "internal-fn.h"
> +#include "tree-emit-json.h"
>
>  /*  The gimplification pass converts the language-dependent trees
>  (ld-trees) emitted by the parser into language-independent trees
> @@ -629,20 +631,26 @@ c_genericize (tree fndecl)
>local_dump_flags = dfi->pflags;
>if (dump_orig)
>  {
> -  fprintf (dump_orig, "\n;; Function %s",
> -  lang_hooks.decl_printable_name (fndecl, 2));
> -  fprintf (dump_orig, " (%s)\n",
> -  (!DECL_ASSEMBLER_NAME_SET_P (fndecl) ? "null"
> -   : IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fndecl;
> -  fprintf (dump_orig, ";; enabled by -%s\n", dump_flag_name 
> (TDI_original));
> -  fprintf (dump_orig, "\n");
> -
> -  if (local_dump_flags & TDF_RAW)
> -   dump_node (DECL_SAVED_TREE (fndecl),
> +  if (local_dump_flags & TDF_JSON)
> +   dump_node_json (DECL_SAVED_TREE (fndecl),
>TDF_SLIM | local_dump_flags, dump_orig);
>else
> -   print_c_tree (dump_orig, DECL_SAVED_TREE (fndecl));
> +  {
> +   fprintf (dump_orig, "\n;; Function %s",
>

[PATCH 1/2] phiopt: Use gimple_phi_result rather than PHI_RESULT [PR116643]

2024-09-09 Thread Andrew Pinski

This converts the uses of PHI_RESULT in phiopt to be gimple_phi_result
instead. Since there was already a mismatch of uses here, it
would be good to use prefered one (gimple_phi_result) instead.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116643
gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): 
s/PHI_RESULT/gimple_phi_result/.
(factor_out_conditional_operation): Likewise.
(minmax_replacement): Likewise.
(spaceship_replacement): Likewise.
(cond_store_replacement): Likewise.
(cond_if_else_store_replacement_1): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 06ec5875722..bd8ede06a98 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -97,7 +97,7 @@ replace_phi_edge_with_variable (basic_block cond_block,
 {
   basic_block bb = gimple_bb (phi);
   gimple_stmt_iterator gsi;
-  tree phi_result = PHI_RESULT (phi);
+  tree phi_result = gimple_phi_result (phi);
   bool deleteboth = false;
 
   /* Duplicate range info if they are the only things setting the target PHI.
@@ -373,7 +373,7 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
 return NULL;
 
   /* Create a new PHI stmt.  */
-  result = PHI_RESULT (phi);
+  result = gimple_phi_result (phi);
   temp = make_ssa_name (TREE_TYPE (new_arg0), NULL);
 
   gimple_match_op new_op = arg0_op;
@@ -1684,7 +1684,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
   tree smaller, larger, arg_true, arg_false;
   gimple_stmt_iterator gsi, gsi_from;
 
-  tree type = TREE_TYPE (PHI_RESULT (phi));
+  tree type = TREE_TYPE (gimple_phi_result (phi));
 
   gcond *cond = as_a  (*gsi_last_bb (cond_bb));
   enum tree_code cmp = gimple_cond_code (cond);
@@ -2022,7 +2022,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
   /* Emit the statement to compute min/max.  */
   location_t locus = gimple_location (last_nondebug_stmt (cond_bb));
   gimple_seq stmts = NULL;
-  tree phi_result = PHI_RESULT (phi);
+  tree phi_result = gimple_phi_result (phi);
   result = gimple_build (&stmts, locus, minmax, TREE_TYPE (phi_result),
 arg0, arg1);
   result = gimple_build (&stmts, locus, ass_code, TREE_TYPE (phi_result),
@@ -2224,7 +2224,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
 
   /* Emit the statement to compute min/max.  */
   gimple_seq stmts = NULL;
-  tree phi_result = PHI_RESULT (phi);
+  tree phi_result = gimple_phi_result (phi);
 
   /* When we can't use a MIN/MAX_EXPR still make sure the expression
  stays in a form to be recognized by ISA that map to IEEE x > y ? x : y
@@ -2298,7 +2298,7 @@ spaceship_replacement (basic_block cond_bb, basic_block 
middle_bb,
   edge e0, edge e1, gphi *phi,
   tree arg0, tree arg1)
 {
-  tree phires = PHI_RESULT (phi);
+  tree phires = gimple_phi_result (phi);
   if (!INTEGRAL_TYPE_P (TREE_TYPE (phires))
   || TYPE_UNSIGNED (TREE_TYPE (phires))
   || !tree_fits_shwi_p (arg0)
@@ -3399,7 +3399,7 @@ cond_store_replacement (basic_block middle_bb, 
basic_block join_bb,
   add_phi_arg (newphi, rhs, e0, locus);
   add_phi_arg (newphi, name, e1, locus);
 
-  new_stmt = gimple_build_assign (lhs, PHI_RESULT (newphi));
+  new_stmt = gimple_build_assign (lhs, gimple_phi_result (newphi));
 
   /* 4) Insert that PHI node.  */
   gsi = gsi_after_labels (join_bb);
@@ -3481,7 +3481,7 @@ cond_if_else_store_replacement_1 (basic_block then_bb, 
basic_block else_bb,
   add_phi_arg (newphi, then_rhs, EDGE_SUCC (then_bb, 0), then_locus);
   add_phi_arg (newphi, else_rhs, EDGE_SUCC (else_bb, 0), else_locus);
 
-  new_stmt = gimple_build_assign (lhs, PHI_RESULT (newphi));
+  new_stmt = gimple_build_assign (lhs, gimple_phi_result (newphi));
 
   /* 3) Insert that PHI node.  */
   gsi = gsi_after_labels (join_bb);
-- 
2.43.0

[PATCH 2/2] phiopt: Move the common code between pass_phiopt and pass_cselim into a seperate function

2024-09-09 Thread Andrew Pinski

When r14-303-gb9fedabe381cce was done, it was missed that some of the common 
parts could
be done in a template and a lambda could be used. This patch implements that. 
This new
function can be used later on to implement a simple ifcvt pass.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (execute_over_cond_phis): New template function,
moved the common parts from pass_phiopt::execute/pass_cselim::execute.
(pass_phiopt::execute): Move the functon specific parts of the loop
into an lamdba.
(pass_cselim::execute): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 253 -
 1 file changed, 100 insertions(+), 153 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index bd8ede06a98..5710bc32e61 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -3933,6 +3933,83 @@ gate_hoist_loads (void)
  && HAVE_conditional_move);
 }
 
+template 
+static void
+execute_over_cond_phis (func_type func)
+{
+  unsigned n, i;
+  basic_block *bb_order;
+  basic_block bb;
+  /* Search every basic block for COND_EXPR we may be able to optimize.
+
+ We walk the blocks in order that guarantees that a block with
+ a single predecessor is processed before the predecessor.
+ This ensures that we collapse inner ifs before visiting the
+ outer ones, and also that we do not try to visit a removed
+ block.  */
+  bb_order = single_pred_before_succ_order ();
+  n = n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS;
+
+  for (i = 0; i < n; i++)
+{
+  basic_block bb1, bb2;
+  edge e1, e2;
+  bool diamond_p = false;
+
+  bb = bb_order[i];
+
+  /* Check to see if the last statement is a GIMPLE_COND.  */
+  gcond *cond_stmt = safe_dyn_cast  (*gsi_last_bb (bb));
+  if (!cond_stmt)
+   continue;
+
+  e1 = EDGE_SUCC (bb, 0);
+  bb1 = e1->dest;
+  e2 = EDGE_SUCC (bb, 1);
+  bb2 = e2->dest;
+
+  /* We cannot do the optimization on abnormal edges.  */
+  if ((e1->flags & EDGE_ABNORMAL) != 0
+ || (e2->flags & EDGE_ABNORMAL) != 0)
+   continue;
+
+  /* If either bb1's succ or bb2 or bb2's succ is non NULL.  */
+  if (EDGE_COUNT (bb1->succs) == 0
+ || EDGE_COUNT (bb2->succs) == 0)
+   continue;
+
+  /* Find the bb which is the fall through to the other.  */
+  if (EDGE_SUCC (bb1, 0)->dest == bb2)
+   ;
+  else if (EDGE_SUCC (bb2, 0)->dest == bb1)
+   {
+ std::swap (bb1, bb2);
+ std::swap (e1, e2);
+   }
+  else if (EDGE_SUCC (bb1, 0)->dest == EDGE_SUCC (bb2, 0)->dest
+  && single_succ_p (bb2))
+   {
+ diamond_p = true;
+ e2 = EDGE_SUCC (bb2, 0);
+ /* Make sure bb2 is just a fall through. */
+ if ((e2->flags & EDGE_FALLTHRU) == 0)
+   continue;
+   }
+  else
+   continue;
+
+  e1 = EDGE_SUCC (bb1, 0);
+
+  /* Make sure that bb1 is just a fall through.  */
+  if (!single_succ_p (bb1)
+ || (e1->flags & EDGE_FALLTHRU) == 0)
+   continue;
+
+  func (bb, bb1, bb2, e1, e2, diamond_p, cond_stmt);
+}
+  free (bb_order);
+}
+
 /* This pass tries to replaces an if-then-else block with an
assignment.  We have different kinds of transformations.
Some of these transformations are also performed by the ifcvt
@@ -4156,88 +4233,22 @@ unsigned int
 pass_phiopt::execute (function *)
 {
   bool do_hoist_loads = !early_p ? gate_hoist_loads () : false;
-  basic_block bb;
-  basic_block *bb_order;
-  unsigned n, i;
   bool cfgchanged = false;
 
   calculate_dominance_info (CDI_DOMINATORS);
   mark_ssa_maybe_undefs ();
 
-  /* Search every basic block for COND_EXPR we may be able to optimize.
-
- We walk the blocks in order that guarantees that a block with
- a single predecessor is processed before the predecessor.
- This ensures that we collapse inner ifs before visiting the
- outer ones, and also that we do not try to visit a removed
- block.  */
-  bb_order = single_pred_before_succ_order ();
-  n = n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS;
-
-  for (i = 0; i < n; i++)
+  auto phiopt_exec = [&] (basic_block bb, basic_block bb1,
+ basic_block bb2, edge e1, edge e2,
+ bool diamond_p, gcond *cond_stmt)
 {
-  gphi *phi;
-  basic_block bb1, bb2;
-  edge e1, e2;
-  tree arg0, arg1;
-  bool diamond_p = false;
-
-  bb = bb_order[i];
-
-  /* Check to see if the last statement is a GIMPLE_COND.  */
-  gcond *cond_stmt = safe_dyn_cast  (*gsi_last_bb (bb));
-  if (!cond_stmt)
-   continue;
-
-  e1 = EDGE_SUCC (bb, 0);
-  bb1 = e1->dest;
-  e2 = EDGE_SUCC (bb, 1);
-  bb2 = e2->dest;
-
-  /* We cannot do the optimization on abnormal edges.  */
-  if ((e

Re: [PATCH 00/10] __builtin_dynamic_object_size

2024-09-08 Thread Andrew Pinski

 regular ones when it can be proven at compile time that the object
>   size will alwasy be less than the length of the write.  I am working
>   on it right now.
>
> - I need to enable _FORTIFY_SOURCE=3 for gcc in glibc; currently it is
>   llvm-only.  I've started working on these patches too on the side.
>
> - Instead of bailing out on non-constant sizes with
>   __builtin_object_size, it should be possible to use ranger to
>   get an upper and lower bound on the size expression and use that to
>   implement __builtin_object_size.

When I was implementing improvements into phiopt I ran into case where
objsz would fail now because we get:
tmp = PHI 
ptr = ptr + tmp

where before the pointer plus was inside each branch instead. So my
question is there any progress on implementing objsz with ranger or
has that work been put off?
I filed https://gcc.gnu.org/PR116556 for this. Do you have any start
of patches for this so it maybe it could be taken to finish? If not
then I am going to try to implement it since it blocks my work on
phiopt.

Thanks,
Andrew Pinski


>
> - More work could to be done to reduce the performance impact of the
>   computation.  One way could be to add a heuristic where the pass keeps
>   track of nesting in the expression and either bail out or compute an
>   estimate if nesting crosses a threshold.  I'll take this up once we
>   have more data on the nature of the bottlenecks.
>
>
> Siddhesh Poyarekar (10):
>   tree-object-size: Replace magic numbers with enums
>   tree-object-size: Abstract object_sizes array
>   tree-object-size: Use tree instead of HOST_WIDE_INT
>   tree-object-size: Single pass dependency loop resolution
>   __builtin_dynamic_object_size: Recognize builtin
>   tree-object-size: Support dynamic sizes in conditions
>   tree-object-size: Handle function parameters
>   tree-object-size: Handle GIMPLE_CALL
>   tree-object-size: Dynamic sizes for ADDR_EXPR
>   tree-object-size: Handle dynamic offsets
>
>  gcc/builtins.c|   22 +-
>  gcc/builtins.def  |1 +
>  gcc/doc/extend.texi   |   13 +
>  gcc/gimple-fold.c |9 +-
>  .../g++.dg/ext/builtin-dynamic-object-size1.C |5 +
>  .../g++.dg/ext/builtin-dynamic-object-size2.C |5 +
>  .../gcc.dg/builtin-dynamic-alloc-size.c   |7 +
>  .../gcc.dg/builtin-dynamic-object-size-0.c|  463 +
>  .../gcc.dg/builtin-dynamic-object-size-1.c|7 +
>  .../gcc.dg/builtin-dynamic-object-size-10.c   |9 +
>  .../gcc.dg/builtin-dynamic-object-size-11.c   |7 +
>  .../gcc.dg/builtin-dynamic-object-size-12.c   |5 +
>  .../gcc.dg/builtin-dynamic-object-size-13.c   |5 +
>  .../gcc.dg/builtin-dynamic-object-size-14.c   |5 +
>  .../gcc.dg/builtin-dynamic-object-size-15.c   |5 +
>  .../gcc.dg/builtin-dynamic-object-size-16.c   |7 +
>  .../gcc.dg/builtin-dynamic-object-size-17.c   |8 +
>  .../gcc.dg/builtin-dynamic-object-size-18.c   |8 +
>  .../gcc.dg/builtin-dynamic-object-size-19.c   |  104 +
>  .../gcc.dg/builtin-dynamic-object-size-2.c|7 +
>  .../gcc.dg/builtin-dynamic-object-size-3.c|7 +
>  .../gcc.dg/builtin-dynamic-object-size-4.c|7 +
>  .../gcc.dg/builtin-dynamic-object-size-5.c|8 +
>  .../gcc.dg/builtin-dynamic-object-size-6.c|5 +
>  .../gcc.dg/builtin-dynamic-object-size-7.c|5 +
>  .../gcc.dg/builtin-dynamic-object-size-8.c|5 +
>  .../gcc.dg/builtin-dynamic-object-size-9.c|5 +
>  gcc/testsuite/gcc.dg/builtin-object-size-1.c  |  160 +-
>  gcc/testsuite/gcc.dg/builtin-object-size-16.c |2 +
>  gcc/testsuite/gcc.dg/builtin-object-size-17.c |2 +
>  gcc/testsuite/gcc.dg/builtin-object-size-2.c  |  134 ++
>  gcc/testsuite/gcc.dg/builtin-object-size-3.c  |  151 ++
>  gcc/testsuite/gcc.dg/builtin-object-size-4.c  |   99 +
>  gcc/testsuite/gcc.dg/builtin-object-size-5.c  |   12 +
>  gcc/tree-object-size.c| 1766 +++--
>  gcc/tree-object-size.h|3 +-
>  gcc/ubsan.c   |   46 +-
>  37 files changed, 2499 insertions(+), 620 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
>  create mode 100644 gcc/te

[PATCH] phiopt: Small refactoring/cleanup of non-ssa name case of factor_out_conditional_operation

2024-09-08 Thread Andrew Pinski

This small cleanup removes a redundant check for gimple_assign_cast_p and 
reformats
based on that. Also changes the if statement that checks if the integral type 
and the
check to see if the constant fits into the new type such that it returns null
and reformats based on that.

Also moves the check for has_single_use earlier so it is less complex still a 
cheaper
check than some of the others (like the check on the integer side).

This was noticed when adding a few new things to 
factor_out_conditional_operation
but those are not ready to submit yet.

Note there are no functional difference with this change.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Move the 
has_single_use
checks much earlier. Remove redundant check for gimple_assign_cast_p.
Change around the check if the integral consts fits into the new type.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 122 -
 1 file changed, 60 insertions(+), 62 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 271a5d51f09..06ec5875722 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -265,6 +265,11 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   tree new_arg0 = arg0_op.ops[0];
   tree new_arg1;
 
+  /* If arg0 have > 1 use, then this transformation actually increases
+ the number of expressions evaluated at runtime.  */
+  if (!has_single_use (arg0))
+return NULL;
+
   if (TREE_CODE (arg1) == SSA_NAME)
 {
   /* Check if arg1 is an SSA_NAME.  */
@@ -278,6 +283,11 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   if (arg1_op.operands_occurs_in_abnormal_phi ())
return NULL;
 
+  /* If arg1 have > 1 use, then this transformation actually increases
+the number of expressions evaluated at runtime.  */
+  if (!has_single_use (arg1))
+   return NULL;
+
   /* Either arg1_def_stmt or arg0_def_stmt should be conditional.  */
   if (dominated_by_p (CDI_DOMINATORS, gimple_bb (phi), gimple_bb 
(arg0_def_stmt))
  && dominated_by_p (CDI_DOMINATORS,
@@ -295,80 +305,68 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   if (dominated_by_p (CDI_DOMINATORS, gimple_bb (phi), gimple_bb 
(arg0_def_stmt)))
return NULL;
 
-  /* If arg1 is an INTEGER_CST, fold it to new type.  */
-  if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
- && (int_fits_type_p (arg1, TREE_TYPE (new_arg0))
- || (TYPE_PRECISION (TREE_TYPE (new_arg0))
+  /* Only handle if arg1 is a INTEGER_CST and one that fits
+into the new type or if it is the same precision.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
+ || !(int_fits_type_p (arg1, TREE_TYPE (new_arg0))
+  || (TYPE_PRECISION (TREE_TYPE (new_arg0))
   == TYPE_PRECISION (TREE_TYPE (arg1)
+   return NULL;
+
+  /* For the INTEGER_CST case, we are just moving the
+conversion from one place to another, which can often
+hurt as the conversion moves further away from the
+statement that computes the value.  So, perform this
+only if new_arg0 is an operand of COND_STMT, or
+if arg0_def_stmt is the only non-debug stmt in
+its basic block, because then it is possible this
+could enable further optimizations (minmax replacement
+etc.).  See PR71016.
+Note no-op conversions don't have this issue as
+it will not generate any zero/sign extend in that case.  */
+  if ((TYPE_PRECISION (TREE_TYPE (new_arg0))
+  != TYPE_PRECISION (TREE_TYPE (arg1)))
+ && new_arg0 != gimple_cond_lhs (cond_stmt)
+ && new_arg0 != gimple_cond_rhs (cond_stmt)
+ && gimple_bb (arg0_def_stmt) == e0->src)
{
- if (gimple_assign_cast_p (arg0_def_stmt))
+ gsi = gsi_for_stmt (arg0_def_stmt);
+ gsi_prev_nondebug (&gsi);
+ if (!gsi_end_p (gsi))
{
- /* For the INTEGER_CST case, we are just moving the
-conversion from one place to another, which can often
-hurt as the conversion moves further away from the
-statement that computes the value.  So, perform this
-only if new_arg0 is an operand of COND_STMT, or
-if arg0_def_stmt is the only non-debug stmt in
-its basic block, because then it is possible this
-could enable further optimizations (minmax replacement
-etc.).  See PR71016.
-Note no-op conversions don't have this issue as
-it will not generate any zero/sign extend in that case.  */
- if ((TYPE_PRECISION (TREE_TYPE (new_arg0))
-   != TYPE_PRECISION (TREE

Re: [r15-3529 Regression] FAIL: gcc.dg/pr116588.c (test for excess errors) on Linux/x86_64

2024-09-07 Thread Andrew Pinski

On Sat, Sep 7, 2024 at 4:11 PM haochen.jiang  wrote:
>
> On Linux/x86_64,
>
> 506417dbc8b1cbc1133a5322572cf94b671aadf6 is the first bad commit
> commit 506417dbc8b1cbc1133a5322572cf94b671aadf6
> Author: Andrew MacLeod 
> Date:   Fri Sep 6 11:42:14 2024 -0400
>
> Before running fast VRP, make sure all edges have EXECUTABLE set.
>
> caused
>
> FAIL: gcc.dg/pr116588.c (test for excess errors)

Fixed with r15-3533-g35c2bcb2389d34 .

Thanks,
Andrew Pinski

>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3529/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr116588.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr116588.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
>
> (Please do not reply to this email, for question about this report, contact 
> me at haochen dot jiang at intel.com.)
> (If you met problems with cascadelake related, disabling AVX512F in command 
> line might save that.)
> (However, please make sure that there is no potential problems with AVX512.)

[PUSHED] Fix pr116588.c for -m32

2024-09-07 Thread Andrew Pinski

This is a simple fix which adds the target supports requirement of int128
to the testcase too.

Pushed as obvious after testing to make sure the testcase is UNSUPPORTED now
with -m32 but working with -m64 on x86_64-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.dg/pr116588.c: Require int128.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/pr116588.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr116588.c b/gcc/testsuite/gcc.dg/pr116588.c
index 677964dd1d6..6b0678d465e 100644
--- a/gcc/testsuite/gcc.dg/pr116588.c
+++ b/gcc/testsuite/gcc.dg/pr116588.c
@@ -1,5 +1,6 @@
 /* PR tree-optimization/116588 */
 /* { dg-do run { target bitint575 } } */
+/* { dg-require-effective-target int128 } */
 /* { dg-options "-O2 -fno-vect-cost-model -fno-tree-dominator-opts 
-fno-tree-fre --param=vrp-block-limit=0  -DDEBUG -fdump-tree-vrp2-details" } */
 
 int a;
-- 
2.43.0

[PUSHED] split-path: Fix dump wording about duplicating too many statements

2024-09-07 Thread Andrew Pinski

It was pointed out in 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662183.html,
that the wording with this print has too many words.
Fixed thusly.

Pushed as obvious after a build and test for x86_64-linux-gnu.

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (is_feasible_trace): Fix wording
on the print.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-ssa-split-paths.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-split-paths.cc b/gcc/gimple-ssa-split-paths.cc
index 32b5c445760..886d85a94e4 100644
--- a/gcc/gimple-ssa-split-paths.cc
+++ b/gcc/gimple-ssa-split-paths.cc
@@ -208,7 +208,7 @@ is_feasible_trace (basic_block bb)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
-"Duplicating block %d would be too duplicate "
+"Duplicating block %d would duplicate "
 "too many statments: %d >= %d\n",
 bb->index, num_stmts_in_join,
 param_max_jump_thread_duplication_stmts);
-- 
2.43.0

Re: [PATCH] Add new warning Wmissing-designated-initializers [PR39589]

2024-09-07 Thread Andrew Pinski

On Mon, Aug 26, 2024 at 1:59 PM Peter Frost  wrote:
>
> Currently the behaviour of Wmissing-field-initializers is inconsistent
> between C and C++. The C warning assumes that missing designated
> initializers are deliberate, and does not warn. The C++ warning does warn
> for missing designated initializers.
>
> This patch changes the behaviour of Wmissing-field-initializers to
> universally not warn about missing designated initializers, and adds a new
> warning for specifically for missing designated initializers.
>
> NOTE TO MAINTAINERS: This is my first gcc contribution, so I don't have
> git write access.
>
> Successfully tested on x86_64-pc-linux-gnu.
>
> PR c/39589
>
> gcc/c-family/ChangeLog:
>
> * c.opt:
>
> gcc/c/ChangeLog:
>
> * c-typeck.cc (pop_init_level):
>
> gcc/cp/ChangeLog:
>
> * typeck2.cc (process_init_constructor_record):
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/diagnostic/base.C:
> * gcc.dg/20011021-1.c:
> * gcc.dg/missing-field-init-1.c:
> * gcc.dg/pr60784.c:
> * g++.dg/warn/missing-designated-initializers-1.C: New test.
> * g++.dg/warn/missing-designated-initializers-2.C: New test.
> * gcc.dg/missing-designated-initializers-1.c: New test.
> * gcc.dg/missing-designated-initializers-2.c: New test.


Your changelog is incomplete. It just has what files and what
functions were changed but not how.
Other than that the patch looks good to me but I can't approve it.

Thanks,
Andrew Pinski

>
>
> ---
>  gcc/c-family/c.opt|  4 +++
>  gcc/c/c-typeck.cc | 36 +++
>  gcc/cp/typeck2.cc | 20 ---
>  gcc/testsuite/g++.dg/diagnostic/base.C|  4 +--
>  .../warn/missing-designated-initializers-1.C  | 11 ++
>  .../warn/missing-designated-initializers-2.C  | 11 ++
>  gcc/testsuite/gcc.dg/20011021-1.c |  4 +--
>  .../missing-designated-initializers-1.c   | 13 +++
>  .../missing-designated-initializers-2.c   | 13 +++
>  gcc/testsuite/gcc.dg/missing-field-init-1.c   |  2 +-
>  gcc/testsuite/gcc.dg/pr60784.c|  2 +-
>  11 files changed, 96 insertions(+), 24 deletions(-)
>  create mode 100644 
> gcc/testsuite/g++.dg/warn/missing-designated-initializers-1.C
>  create mode 100644 
> gcc/testsuite/g++.dg/warn/missing-designated-initializers-2.C
>  create mode 100644 gcc/testsuite/gcc.dg/missing-designated-initializers-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/missing-designated-initializers-2.c
>
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 491aa02e1a3..81e52f1417e 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -977,6 +977,10 @@ Wmissing-field-initializers
>  C ObjC C++ ObjC++ Var(warn_missing_field_initializers) Warning 
> EnabledBy(Wextra)
>  Warn about missing fields in struct initializers.
>
> +Wmissing-designated-initializers
> +C ObjC C++ ObjC++ Var(warn_missing_designated_initializers) Warning 
> EnabledBy(Wextra)
> +Warn about missing designated initialisers in struct initializers.
> +
>  Wmissing-format-attribute
>  C ObjC C++ ObjC++ Warning Alias(Wsuggest-attribute=format)
>  ;
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 094e41fa202..72b544e8f67 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -9795,7 +9795,7 @@ pop_init_level (location_t loc, int implicit,
>  }
>
>/* Warn when some struct elements are implicitly initialized to zero.  */
> -  if (warn_missing_field_initializers
> +  if ((warn_missing_field_initializers || 
> warn_missing_designated_initializers)
>&& constructor_type
>&& TREE_CODE (constructor_type) == RECORD_TYPE
>&& constructor_unfilled_fields)
> @@ -9806,21 +9806,29 @@ pop_init_level (location_t loc, int implicit,
>|| integer_zerop (DECL_SIZE 
> (constructor_unfilled_fields
>   constructor_unfilled_fields = DECL_CHAIN 
> (constructor_unfilled_fields);
>
> -   if (constructor_unfilled_fields
> -   /* Do not warn if this level of the initializer uses member
> -  designators; it is likely to be deliberate.  */
> -   && !constructor_designated
> -   /* Do not warn about initializing with { 0 } or with { }.  */
> -   && !constructor_zeroinit)
> - {
> -   if (warning_at (input_location, OPT_Wmissing_field_initializers,
> + if (constructor_unfilled_fields
> + /* Do not warn about initializing with { 0 } or with { }.  */
> + &am

[PATCH] gimple-fold: Move optimizing memcpy to memset to fold_stmt from fab

2024-09-06 Thread Andrew Pinski

I noticed this folding inside fab could be done else where and could
even improve inlining decisions and a few other things so let's
move it to fold_stmt.
It also fixes PR 116601 because places which call fold_stmt already
have to deal with the stmt becoming a non-throw statement.

For the fix for PR 116601 on the branches should be the original patch
rather than a backport of this one.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116601

gcc/ChangeLog:

* gimple-fold.cc (optimize_memcpy_to_memset): Move
from tree-ssa-ccp.cc and rename. Also return true
if the optimization happened.
(gimple_fold_builtin_memory_op): Call
optimize_memcpy_to_memset.
(fold_stmt_1): Call optimize_memcpy_to_memset for
load/store copies.
* tree-ssa-ccp.cc (optimize_memcpy): Delete.
(pass_fold_builtins::execute): Remove code that
calls optimize_memcpy.

gcc/testsuite/ChangeLog:

* gcc.dg/pr78408-1.c: Adjust dump scan to match where
the optimization now happens.
* g++.dg/torture/except-2.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc  | 134 
 gcc/testsuite/g++.dg/torture/except-2.C |  18 
 gcc/testsuite/gcc.dg/pr78408-1.c|   5 +-
 gcc/tree-ssa-ccp.cc | 132 +--
 4 files changed, 156 insertions(+), 133 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/except-2.C

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 2746fcfe314..942de7720fd 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -894,6 +894,121 @@ size_must_be_zero_p (tree size)
   return vr.zero_p ();
 }
 
+/* Optimize
+   a = {};
+   b = a;
+   into
+   a = {};
+   b = {};
+   Similarly for memset (&a, ..., sizeof (a)); instead of a = {};
+   and/or memcpy (&b, &a, sizeof (a)); instead of b = a;  */
+
+static bool
+optimize_memcpy_to_memset (gimple_stmt_iterator *gsip, tree dest, tree src, 
tree len)
+{
+  gimple *stmt = gsi_stmt (*gsip);
+  if (gimple_has_volatile_ops (stmt))
+return false;
+
+  tree vuse = gimple_vuse (stmt);
+  if (vuse == NULL || TREE_CODE (vuse) != SSA_NAME)
+return false;
+
+  gimple *defstmt = SSA_NAME_DEF_STMT (vuse);
+  tree src2 = NULL_TREE, len2 = NULL_TREE;
+  poly_int64 offset, offset2;
+  tree val = integer_zero_node;
+  if (gimple_store_p (defstmt)
+  && gimple_assign_single_p (defstmt)
+  && TREE_CODE (gimple_assign_rhs1 (defstmt)) == CONSTRUCTOR
+  && !gimple_clobber_p (defstmt))
+src2 = gimple_assign_lhs (defstmt);
+  else if (gimple_call_builtin_p (defstmt, BUILT_IN_MEMSET)
+  && TREE_CODE (gimple_call_arg (defstmt, 0)) == ADDR_EXPR
+  && TREE_CODE (gimple_call_arg (defstmt, 1)) == INTEGER_CST)
+{
+  src2 = TREE_OPERAND (gimple_call_arg (defstmt, 0), 0);
+  len2 = gimple_call_arg (defstmt, 2);
+  val = gimple_call_arg (defstmt, 1);
+  /* For non-0 val, we'd have to transform stmt from assignment
+into memset (only if dest is addressable).  */
+  if (!integer_zerop (val) && is_gimple_assign (stmt))
+   src2 = NULL_TREE;
+}
+
+  if (src2 == NULL_TREE)
+return false;
+
+  if (len == NULL_TREE)
+len = (TREE_CODE (src) == COMPONENT_REF
+  ? DECL_SIZE_UNIT (TREE_OPERAND (src, 1))
+  : TYPE_SIZE_UNIT (TREE_TYPE (src)));
+  if (len2 == NULL_TREE)
+len2 = (TREE_CODE (src2) == COMPONENT_REF
+   ? DECL_SIZE_UNIT (TREE_OPERAND (src2, 1))
+   : TYPE_SIZE_UNIT (TREE_TYPE (src2)));
+  if (len == NULL_TREE
+  || !poly_int_tree_p (len)
+  || len2 == NULL_TREE
+  || !poly_int_tree_p (len2))
+return false;
+
+  src = get_addr_base_and_unit_offset (src, &offset);
+  src2 = get_addr_base_and_unit_offset (src2, &offset2);
+  if (src == NULL_TREE
+  || src2 == NULL_TREE
+  || maybe_lt (offset, offset2))
+return false;
+
+  if (!operand_equal_p (src, src2, 0))
+return false;
+
+  /* [ src + offset2, src + offset2 + len2 - 1 ] is set to val.
+ Make sure that
+ [ src + offset, src + offset + len - 1 ] is a subset of that.  */
+  if (maybe_gt (wi::to_poly_offset (len) + (offset - offset2),
+   wi::to_poly_offset (len2)))
+return false;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Simplified\n  ");
+  print_gimple_stmt (dump_file, stmt, 0, dump_flags);
+  fprintf (dump_file, "after previous\n  ");
+  print_gimple_stmt (dump_file, defstmt, 0, dump_flags);
+}
+
+  /* For simplicity, don't change the kind of the stmt,
+ turn dest = src; into dest = {}; and memcpy (&dest, &src, len);
+ into memset (&dest, val, len);
+ In theory we could change dest = src into memset if dest
+ is addressable (maybe beneficial if va

Re: [PATCH v2] GCC Driver : Enable very long gcc command-line option

2024-09-06 Thread Andrew Pinski

On Fri, Sep 6, 2024, 9:38 AM Dora, Sunil Kumar <
sunilkumar.d...@windriver.com> wrote:

> Hi Andrew,
>
> Thank you for your feedback. Initially, we attempted to address the issue
> by utilizing GCC’s response files. However, we discovered that the
> COLLECT_GCC_OPTIONS variable already contains the expanded contents of
> the response files.
>
> As a result, using response files only mitigates the multiplication factor
> but does not bypass the 128KB limit.
>

I think you missed understood me fully. What I was saying instead of
creating a string inside set_collect_gcc_options, create the response file
and pass that via COLLECT_GCC_OPTIONS with the @file format. And then
inside collect2.cc when using COLLECT_GCC_OPTIONS/extract_string instead
read in the response file options if there was an @file instead of those 2
loops. This requires more than what you did. Oh and should be less memory
hungry and maybe slightly faster.

Thanks,
Andrew



I have included the response file usage logs and the complete history in
> the Bugzilla report for your reference: Bugzilla Link
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527#c19>.
> Following your suggestion, I have updated the logic to avoid hardcoding
> /tmp.
> Please find the revised version of patch at the following link:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662519.html
>
> Thanks,
> Sunil Dora
> --
> *From:* Andrew Pinski 
> *Sent:* Friday, August 30, 2024 8:05 PM
> *To:* Hemraj, Deepthi 
> *Cc:* gcc-patches@gcc.gnu.org ; rguent...@suse.de
> ; jeffreya...@gmail.com ;
> josmy...@redhat.com ; MacLeod, Randy <
> randy.macl...@windriver.com>; Gowda, Naveen ;
> Dora, Sunil Kumar 
> *Subject:* Re: [PATCH v2] GCC Driver : Enable very long gcc command-line
> option
>
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
> On Fri, Aug 30, 2024 at 12:34 AM  wrote:
> >
> > From: Deepthi Hemraj 
> >
> > For excessively long environment variables i.e >128KB
> > Store the arguments in a temporary file and collect them back together
> in collect2.
> >
> > This commit patches for COLLECT_GCC_OPTIONS issue:
> > GCC should not limit the length of command line passed to collect2.
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527
> >
> > The Linux kernel has the following limits on shell commands:
> > I.  Total number of bytes used to specify arguments must be under 128KB.
> > II. Each environment variable passed to an executable must be under 128
> KiB
> >
> > In order to circumvent these limitations, many build tools support
> > response-files, i.e. files that contain the arguments for the executed
> > command. These are typically passed using @ syntax.
> >
> > Gcc uses the COLLECT_GCC_OPTIONS environment variable to transfer the
> > expanded command line to collect2. With many options, this exceeds the
> limit II.
> >
> > GCC : Added Testcase for PR111527
> >
> > TC1 : If the command line argument less than 128kb, gcc should use
> >   COLLECT_GCC_OPTION to communicate and compile fine.
> > TC2 : If the command line argument in the range of 128kb to 2mb,
> >   gcc should copy arguments in a file and use FILE_GCC_OPTIONS
> >   to communicate and compile fine.
> > TC3 : If the command line argument greater thean 2mb, gcc shuld
> >   fail the compile and report error. (Expected FAIL)
> >
> > Signed-off-by: sunil dora 
> > Signed-off-by: Topi Kuutela 
> > Signed-off-by: Deepthi Hemraj 
> > ---
> >  gcc/collect2.cc   | 39 ++--
> >  gcc/gcc.cc| 37 +--
> >  gcc/testsuite/gcc.dg/longcmd/longcmd.exp  | 16 +
> >  gcc/testsuite/gcc.dg/longcmd/pr111527-1.c | 44 +++
> >  gcc/testsuite/gcc.dg/longcmd/pr111527-2.c |  9 +
> >  gcc/testsuite/gcc.dg/longcmd/pr111527-3.c | 10 ++
> >  gcc/testsuite/gcc.dg/longcmd/pr111527-4.c | 10 ++
> >  7 files changed, 159 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/longcmd.exp
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c
> >  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c
> >
> > diff --git a/gcc/collect2.cc b/gcc/collect2.cc
> > index 902014a9cc1..1f56963b1ce 100644
> > --- a/gcc/collect2.cc
&g

[PATCH] fab: Factor out the main folding part of pass_fold_builtins::execute [PR116601]

2024-09-06 Thread Andrew Pinski

This is an alternative patch to fix PR tree-optimization/116601 by factoring
out the main part of pass_fold_builtins::execute into its own function so that
we don't need to repeat the code for doing the eh cleanup. It also fixes the
problem I saw with the atomics which might skip over a statement; though I don't
have a testcase for that.
Just a note on the return value of fold_all_builtin_stmt, it does not return 
true
if something was folded but rather if the iterator should increment to the next
statement or not. This was the bug with atomics, is that in some cases the 
atomic
builtins could remove the statement which is being processed but then there 
would
be another gsi_next happening.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116601

gcc/ChangeLog:

* tree-ssa-ccp.cc (optimize_memcpy): Return true if the statement
was updated.
(pass_fold_builtins::execute): Factor out folding code into ...
(fold_all_builtin_stmt): This.

gcc/testsuite/ChangeLog:

* g++.dg/torture/except-2.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/torture/except-2.C |  18 +
 gcc/tree-ssa-ccp.cc | 534 
 2 files changed, 276 insertions(+), 276 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/except-2.C

diff --git a/gcc/testsuite/g++.dg/torture/except-2.C 
b/gcc/testsuite/g++.dg/torture/except-2.C
new file mode 100644
index 000..d896937a118
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/except-2.C
@@ -0,0 +1,18 @@
+// { dg-do compile }
+// { dg-additional-options "-fexceptions -fnon-call-exceptions" }
+// PR tree-optimization/116601
+
+struct RefitOption {
+  char subtype;
+  int string;
+} n;
+void h(RefitOption);
+void k(RefitOption *__val)
+{
+  try {
+*__val = RefitOption{};
+RefitOption __trans_tmp_2 = *__val;
+h(__trans_tmp_2);
+  }
+  catch(...){}
+}
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 44711018e0e..930432e3244 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -4166,18 +4166,19 @@ optimize_atomic_op_fetch_cmp_0 (gimple_stmt_iterator 
*gsip,
a = {};
b = {};
Similarly for memset (&a, ..., sizeof (a)); instead of a = {};
-   and/or memcpy (&b, &a, sizeof (a)); instead of b = a;  */
+   and/or memcpy (&b, &a, sizeof (a)); instead of b = a;
+   Returns true if the statement was changed.  */
 
-static void
+static bool
 optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, tree src, tree len)
 {
   gimple *stmt = gsi_stmt (*gsip);
   if (gimple_has_volatile_ops (stmt))
-return;
+return false;
 
   tree vuse = gimple_vuse (stmt);
   if (vuse == NULL)
-return;
+return false;
 
   gimple *defstmt = SSA_NAME_DEF_STMT (vuse);
   tree src2 = NULL_TREE, len2 = NULL_TREE;
@@ -4202,7 +4203,7 @@ optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, 
tree src, tree len)
 }
 
   if (src2 == NULL_TREE)
-return;
+return false;
 
   if (len == NULL_TREE)
 len = (TREE_CODE (src) == COMPONENT_REF
@@ -4216,24 +4217,24 @@ optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, 
tree src, tree len)
   || !poly_int_tree_p (len)
   || len2 == NULL_TREE
   || !poly_int_tree_p (len2))
-return;
+return false;
 
   src = get_addr_base_and_unit_offset (src, &offset);
   src2 = get_addr_base_and_unit_offset (src2, &offset2);
   if (src == NULL_TREE
   || src2 == NULL_TREE
   || maybe_lt (offset, offset2))
-return;
+return false;
 
   if (!operand_equal_p (src, src2, 0))
-return;
+return false;
 
   /* [ src + offset2, src + offset2 + len2 - 1 ] is set to val.
  Make sure that
  [ src + offset, src + offset + len - 1 ] is a subset of that.  */
   if (maybe_gt (wi::to_poly_offset (len) + (offset - offset2),
wi::to_poly_offset (len2)))
-return;
+return false;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
@@ -4271,6 +4272,237 @@ optimize_memcpy (gimple_stmt_iterator *gsip, tree dest, 
tree src, tree len)
   fprintf (dump_file, "into\n  ");
   print_gimple_stmt (dump_file, stmt, 0, dump_flags);
 }
+  return true;
+}
+
+/* Fold statement STMT located at I. Maybe setting CFG_CHANGED if
+   the condition was changed and cfg_cleanup is needed to be run.
+   Returns true if the iterator I is at the statement to handle;
+   otherwise false means move the iterator to the next statement.  */
+static int
+fold_all_builtin_stmt (gimple_stmt_iterator &i, gimple *stmt,
+  bool &cfg_changed)
+{
+  /* Remove assume internal function calls. */
+  if (gimple_call_internal_p (stmt, IFN_ASSUME))
+{
+  gsi_remove (&i, true);
+  return true;
+   }
+
+  if (gimple_code (stmt) != GIMPLE_CALL)
+{
+  if (gimple_assign_load_p (stmt) && gimple_store_p (stmt))
+   return optimize_memcpy (&i,

Re: [PATCH] gimple ssa: Don't use __builtin_popcount in switch exp transform [PR116616]

2024-09-06 Thread Andrew Pinski

 RESULT.  Assumes that OP's value will
> -   be non-negative.  The generated check may give arbitrary answer for 
> negative
> -   values.
> -
> -   Before computing the check, OP may have to be converted to another type.
> -   This should be specified in TYPE.  Use can_pow2p to decide what this type
> -   should be.
> -
> -   Should only be used if can_pow2p returns true for type of OP.  */
> +/* Build a sequence of gimple statements checking that OP is a power of 2.
> +   Return the result as a boolean_type_node ssa name through RESULT.  Assumes
> +   that OP's value will be non-negative.  The generated check may give
> +   arbitrary answer for negative values.  */
>
>  static gimple_seq
> -gen_pow2p (tree op, location_t loc, tree *result, tree type)
> +gen_pow2p (tree op, location_t loc, tree *result)
>  {
>gimple_seq stmts = NULL;
>gimple_stmt_iterator gsi = gsi_last (stmts);
>
> -  built_in_function fn;
> -  if (type == unsigned_type_node)
> -fn = BUILT_IN_POPCOUNT;
> -  else if (type == long_unsigned_type_node)
> -fn = BUILT_IN_POPCOUNTL;
> -  else
> -{
> -  fn = BUILT_IN_POPCOUNTLL;
> -  gcc_checking_assert (type == long_long_unsigned_type_node);
> -}
> +  tree type = TREE_TYPE (op);
> +
> +  /* Build (op ^ (op - 1)) > (op - 1).  */
> +  tree tmp1 = gimple_build (&gsi, false, GSI_NEW_STMT, loc, MINUS_EXPR, type,
> +   op, build_one_cst (type));
> +  tree tmp2 = gimple_build (&gsi, false, GSI_NEW_STMT, loc, BIT_XOR_EXPR, 
> type,
> +   op, tmp1);
> +  *result = gimple_build (&gsi, false, GSI_NEW_STMT, loc, GT_EXPR,
> + boolean_type_node, tmp2, tmp1);

You need to do this in an unsigned types. Otherwise you get the wrong
answer and also introduce undefined code.
So you need to use:
tree utype = unsigned_type_for (type);
tree tmp3;
if (types_compatible_p (type, utype)
  tmp3 = op;
else
 tmp3 = gimple_build (&gsi, false, GSI_NEW_STMT, loc, CONVERT_EXPR, utype, op);

And then use utype and tmp3 instead of op.

Thanks,
Andrew Pinski

>
> -  tree orig_type = TREE_TYPE (op);
> -  tree tmp1;
> -  if (type != orig_type)
> -tmp1 = gimple_convert (&gsi, false, GSI_NEW_STMT, loc, type, op);
> -  else
> -tmp1 = op;
> -  /* Build __builtin_popcount{l,ll} (op) == 1.  */
> -  tree tmp2 = gimple_build (&gsi, false, GSI_NEW_STMT, loc,
> -   as_combined_fn (fn), integer_type_node, tmp1);
> -  *result = gimple_build (&gsi, false, GSI_NEW_STMT, loc, EQ_EXPR,
> - boolean_type_node, tmp2,
> - build_one_cst (integer_type_node));
>return stmts;
>  }
>
> @@ -371,9 +323,6 @@ switch_conversion::is_exp_index_transform_viable (gswitch 
> *swtch)
>m_exp_index_transform_log2_type = can_log2 (index_type, opt_type);
>if (!m_exp_index_transform_log2_type)
>  return false;
> -  m_exp_index_transform_pow2p_type = can_pow2p (index_type);
> -  if (!m_exp_index_transform_pow2p_type)
> -return false;
>
>/* Check that each case label corresponds only to one value
>   (no case 1..3).  */
> @@ -467,8 +416,7 @@ switch_conversion::exp_index_transform (gswitch *swtch)
>new_edge2->probability = profile_probability::even ();
>
>tree tmp;
> -  gimple_seq stmts = gen_pow2p (index, UNKNOWN_LOCATION, &tmp,
> -   m_exp_index_transform_pow2p_type);
> +  gimple_seq stmts = gen_pow2p (index, UNKNOWN_LOCATION, &tmp);
>gsi = gsi_last_bb (cond_bb);
>gsi_insert_seq_after (&gsi, stmts, GSI_LAST_NEW_STMT);
>gcond *stmt_cond = gimple_build_cond (NE_EXPR, tmp, boolean_false_node,
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index 14610499e5f..6468995eb31 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -920,11 +920,9 @@ public:
>bool m_exp_index_transform_applied;
>
>/* If switch conversion decided exponential index transform is viable, here
> - will be stored the types to which index variable has to be converted
> - before the logarithm and the "is power of 2" operations which are part 
> of
> - the transform.  */
> + will be stored the type to which index variable has to be converted
> + before the logarithm operation which is a part of the transform.  */
>tree m_exp_index_transform_log2_type;
> -  tree m_exp_index_transform_pow2p_type;
>  };
>
>  void
> --
> 2.46.0
>

Re: [PATCH 1/2] split-paths: Move check for # of statements in join earlier

2024-09-05 Thread Andrew Pinski

On Tue, Sep 3, 2024 at 11:30 PM Kyrylo Tkachov  wrote:
>
> Hi Andrew,
>
> > On 3 Sep 2024, at 20:11, Andrew Pinski  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > This moves the check for # of statements to copy in join to
> > be the first check. This check is the cheapest check so it
> > should be first. Plus add a print to the dump file since there
> > was none beforehand.
> >
> > gcc/ChangeLog:
> >
> >* gimple-ssa-split-paths.cc (is_feasible_trace): Move
> >check for # of statments in join earlier and add a
> >debug print.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> > gcc/gimple-ssa-split-paths.cc | 19 +--
> > 1 file changed, 13 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/gimple-ssa-split-paths.cc b/gcc/gimple-ssa-split-paths.cc
> > index 8b4304fe59e..81a5d1dee5b 100644
> > --- a/gcc/gimple-ssa-split-paths.cc
> > +++ b/gcc/gimple-ssa-split-paths.cc
> > @@ -167,6 +167,19 @@ is_feasible_trace (basic_block bb)
> >   int num_stmts_in_pred2
> > = EDGE_COUNT (pred2->succs) == 1 ? count_stmts_in_block (pred2) : 0;
> >
> > +  /* Upper Hard limit on the number statements to copy.  */
> > +  if (num_stmts_in_join
> > +  >= param_max_jump_thread_duplication_stmts)
> > +{
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +   fprintf (dump_file,
> > +"Duplicating block %d would be too duplicate "
> > +"too many statments: %d >= %d\n",
>
> The “be too” is unnecessary here IMO.

I will fix this tomorrow; just trying to get a different fix done.

Thanks,
Andrew

> Thanks,
> Kyrill
>
> > +bb->index, num_stmts_in_join,
> > +param_max_jump_thread_duplication_stmts);
> > +  return false;
> > +}
> > +
> >   /* This is meant to catch cases that are likely opportunities for
> >  if-conversion.  Essentially we look for the case where
> >  BB's predecessors are both single statement blocks where
> > @@ -406,12 +419,6 @@ is_feasible_trace (basic_block bb)
> >   /* We may want something here which looks at dataflow and tries
> >  to guess if duplication of BB is likely to result in simplification
> >  of instructions in BB in either the original or the duplicate.  */
> > -
> > -  /* Upper Hard limit on the number statements to copy.  */
> > -  if (num_stmts_in_join
> > -  >= param_max_jump_thread_duplication_stmts)
> > -return false;
> > -
> >   return true;
> > }
> >
> > --
> > 2.43.0
> >
>

Re: [PATCH] fab: Cleanup eh after optimize_memcpy [PR116601]

2024-09-05 Thread Andrew Pinski

On Thu, Sep 5, 2024 at 12:26 AM Richard Biener
 wrote:
>
> On Thu, Sep 5, 2024 at 8:25 AM Andrew Pinski  wrote:
> >
> > When optimize_memcpy was added in r7-5443-g7b45d0dfeb5f85,
> > a path was added such that a statement was turned into a non-throwing
> > statement and maybe_clean_or_replace_eh_stmt/gimple_purge_dead_eh_edges
> > would not be called for that statement.
> > This adds these calls to that path.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > Ok? For the trunk, 14, 13 and 12 branches?
>
> I wonder if this can be somehow integrated better with the existing
>
>   old_stmt = stmt;
>   stmt = gsi_stmt (i);
>   update_stmt (stmt);
>
>   if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)
>   && gimple_purge_dead_eh_edges (bb))
> cfg_changed = true;
>
> which frankly looks odd - update_stmt shouldn't ever change stmt.  Maybe
> moving the old_stmt assign before the switch works?

I agree it looks odd/wrong. But only moving the assignment before the
switch does not fix this issue since if we don't have a builtin (which
we have in this case, it is a memcpy like statement):
  __trans_tmp_2 = MEM[(const struct RefitOption &)__val_5(D)];

I have a set of patches to refactor this code to simplify and fix the
issue with the update_stmt and more (since there are issues with the
atomic replacements too).

>
> > PR tree-optimization/116601
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-ccp.cc (pass_fold_builtins::execute): Cleanup eh
> >     after optimize_memcpy on a mem statement.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/torture/except-2.C: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/testsuite/g++.dg/torture/except-2.C | 18 ++
> >  gcc/tree-ssa-ccp.cc | 11 +--
> >  2 files changed, 27 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/torture/except-2.C
> >
> > diff --git a/gcc/testsuite/g++.dg/torture/except-2.C 
> > b/gcc/testsuite/g++.dg/torture/except-2.C
> > new file mode 100644
> > index 000..d896937a118
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/torture/except-2.C
> > @@ -0,0 +1,18 @@
> > +// { dg-do compile }
> > +// { dg-additional-options "-fexceptions -fnon-call-exceptions" }
> > +// PR tree-optimization/116601
> > +
> > +struct RefitOption {
> > +  char subtype;
> > +  int string;
> > +} n;
> > +void h(RefitOption);
> > +void k(RefitOption *__val)
> > +{
> > +  try {
> > +*__val = RefitOption{};
> > +RefitOption __trans_tmp_2 = *__val;
> > +h(__trans_tmp_2);
> > +  }
> > +  catch(...){}
> > +}
> > diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> > index 44711018e0e..3cd385f476b 100644
> > --- a/gcc/tree-ssa-ccp.cc
> > +++ b/gcc/tree-ssa-ccp.cc
> > @@ -4325,8 +4325,15 @@ pass_fold_builtins::execute (function *fun)
> >if (gimple_code (stmt) != GIMPLE_CALL)
> > {
> >   if (gimple_assign_load_p (stmt) && gimple_store_p (stmt))
> > -   optimize_memcpy (&i, gimple_assign_lhs (stmt),
> > -gimple_assign_rhs1 (stmt), NULL_TREE);
> > +   {
> > + optimize_memcpy (&i, gimple_assign_lhs (stmt),
> > +  gimple_assign_rhs1 (stmt), NULL_TREE);
> > + old_stmt = stmt;
> > + stmt = gsi_stmt (i);
> > + if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)
> > + && gimple_purge_dead_eh_edges (bb))
> > +   cfg_changed = true;
> > +   }
> >   gsi_next (&i);
> >   continue;
> > }
> > --
> > 2.43.0
> >

Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-09-05 Thread Andrew Pinski

On Thu, Sep 5, 2024 at 2:57 PM Jeff Law  wrote:
>
>
>
> On 9/5/24 12:59 PM, Palmer Dabbelt wrote:
> > On Thu, 05 Sep 2024 11:52:57 PDT (-0700), Palmer Dabbelt wrote:
> >> We have cheap logical ops, so let's just move this back to the default
> >> to take advantage of the standard branch/op hueristics.
> >>
> >> gcc/ChangeLog:
> >>
> >> PR target/116615
> >> * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> >> ---
> >> There's a bunch more discussion in the bug, but it's starting to smell
> >> like this was just a holdover from MIPS (where maybe it also shouldn't
> >> be set).  I haven't tested this, but I figured I'd send the patch to get
> >> a little more visibility.
> >>
> >> I guess we should also kick off something like a SPEC run to make sure
> >> there's no regressions?
> >
> > Sorry I missed it in the bug, but Ruoyao points to dddafe94823
> > ("LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT") where short-
> > circuiting the FP comparisons helps on LoongArch.
> >
> > Not sure if I'm also missing something here, but it kind of feels like
> > that should be handled by a more generic optimization decision that just
> > globally "should we short circuit logical ops" -- assuming it really is
> > the FP comparisons that are causing the cost, as opposed to the actual
> > logical ops themselves.
> >
> > Probably best to actually run the benchmarks, though...
> THe #define essentially is overriding the generic heuristics which look
> at branch cost to determine how aggressively to try and combine several
> conditional branch conditions using logical ops so they can use a single
> conditional branch in the end.
>
> I don't remember all the history here, but in retrospect, the mere
> existence of that #define points to a failing in the costing models.

I provided the original history of LOGICAL_OP_NON_SHORT_CIRCUIT in the
RISCV bug report.
And yes there is a costing model fail here.
LOGICAL_OP_NON_SHORT_CIRCUIT was useful if you have a decent cset (or
these days have a ccmp optab).
One cost model issue is LOGICAL_OP_NON_SHORT_CIRCUIT does not handle
if the comparison was fp or integer (which would handle the Loonsoog
and MIPS; and to less sense RISCV).
PowerPC backend does not implement the ccmp optab nor does it have a
decent costing cset so having it as 0 is correct; even though BRANCH
cost might be low for the target (though it could implement ccmp optab
now but nobody has that implemented yet).
Note RISCV's cset is cheap (both size and speed) due to being close to
MIPS and just having instructions which set the GPRs and then
comparing against 0.

I don't have time until next year to start looking at improving the
situation with respect of LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST; it
is on my radar since I want to improve how aarch64's ccmp is done and
remove the use of LOGICAL_OP_NON_SHORT_CIRCUIT from fold-cost to only
being in the ifcombine (or maybe even just in isel) pass.

Thanks,
Andrew Pinski

>
> FWIW, my general sense is that the gimple phases shouldn't work *too*
> hard to try and combine logical ops, but the if-converters in the RTL
> phases should be fairly aggressive.THe fact that we use BRANCH_COST
> to drive both is likely sub-optimal.
> jeff

[PATCH] fab: Cleanup eh after optimize_memcpy [PR116601]

2024-09-04 Thread Andrew Pinski

When optimize_memcpy was added in r7-5443-g7b45d0dfeb5f85,
a path was added such that a statement was turned into a non-throwing
statement and maybe_clean_or_replace_eh_stmt/gimple_purge_dead_eh_edges
would not be called for that statement.
This adds these calls to that path.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Ok? For the trunk, 14, 13 and 12 branches?

PR tree-optimization/116601

gcc/ChangeLog:

* tree-ssa-ccp.cc (pass_fold_builtins::execute): Cleanup eh
after optimize_memcpy on a mem statement.

gcc/testsuite/ChangeLog:

* g++.dg/torture/except-2.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/torture/except-2.C | 18 ++
 gcc/tree-ssa-ccp.cc | 11 +--
 2 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/except-2.C

diff --git a/gcc/testsuite/g++.dg/torture/except-2.C 
b/gcc/testsuite/g++.dg/torture/except-2.C
new file mode 100644
index 000..d896937a118
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/except-2.C
@@ -0,0 +1,18 @@
+// { dg-do compile }
+// { dg-additional-options "-fexceptions -fnon-call-exceptions" }
+// PR tree-optimization/116601
+
+struct RefitOption {
+  char subtype;
+  int string;
+} n;
+void h(RefitOption);
+void k(RefitOption *__val)
+{
+  try {
+*__val = RefitOption{};
+RefitOption __trans_tmp_2 = *__val;
+h(__trans_tmp_2);
+  }
+  catch(...){}
+}
diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
index 44711018e0e..3cd385f476b 100644
--- a/gcc/tree-ssa-ccp.cc
+++ b/gcc/tree-ssa-ccp.cc
@@ -4325,8 +4325,15 @@ pass_fold_builtins::execute (function *fun)
   if (gimple_code (stmt) != GIMPLE_CALL)
{
  if (gimple_assign_load_p (stmt) && gimple_store_p (stmt))
-   optimize_memcpy (&i, gimple_assign_lhs (stmt),
-gimple_assign_rhs1 (stmt), NULL_TREE);
+   {
+ optimize_memcpy (&i, gimple_assign_lhs (stmt),
+  gimple_assign_rhs1 (stmt), NULL_TREE);
+ old_stmt = stmt;
+ stmt = gsi_stmt (i);
+ if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)
+ && gimple_purge_dead_eh_edges (bb))
+   cfg_changed = true;
+   }
  gsi_next (&i);
  continue;
}
-- 
2.43.0

Re: [PATCH] aarch64: Handle attributes in the global namespace for aarch64_lookup_shared_state_flags [PR116598]

2024-09-04 Thread Andrew Pinski

On Wed, Sep 4, 2024 at 2:44 PM Andrew Pinski  wrote:
>
> On Wed, Sep 4, 2024 at 2:36 PM Marek Polacek  wrote:
> >
> > On Wed, Sep 04, 2024 at 02:05:21PM -0700, Andrew Pinski wrote:
> > > The code in aarch64_lookup_shared_state_flags all C++11 attributes on the 
> > > function type
> > > had a namespace associated with them. But with the addition of 
> > > reproducible/unsequenced,
> > > this was no longer true.
> > > This is the simple fix to ignore attributes in the global namespace since 
> > > we are looking
> > > for ones in the `arm` namespace instead.
> > >
> > > Built and tested for aarch64-linux-gnu.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64.cc (aarch64_lookup_shared_state_flags): 
> > > Ignore
> > >   attributes in the global namespace.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > >  gcc/config/aarch64/aarch64.cc | 4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > > index 27e24ba70ab..3f7bc572edc 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -597,6 +597,10 @@ aarch64_lookup_shared_state_flags (tree attrs, const 
> > > char *state_name)
> > >if (!cxx11_attribute_p (attr))
> > >   continue;
> > >
> > > +  /* Skip the attributes in the global namespace. */
> > > +  if (!TREE_PURPOSE (TREE_PURPOSE (attr)))
> > > + continue;
> > > +
> > >auto ns = IDENTIFIER_POINTER (TREE_PURPOSE (TREE_PURPOSE (attr)));
> > >if (strcmp (ns, "arm") != 0)
> > >   continue;
> >
> > Take it or leave it, but I think the whole thing could be just
> >
> >   tree ns = get_attribute_namespace (attr);
> >   if (!ns || !id_equal (ns, "arm"))
> > continue;
>
> Actually I think it could be reduced further down to just:
> if (!is_attribute_namespace_p ("arm", attr))
>   continue;
>
> In this case.
>
> If I get some time this weekend I will submit a patch to clean up this code.

In the end I did it today.
New patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662325.html

Thanks again for the suggestion and the review.

Thanks,
Andrew

>
> Thanks,
> Andrew Pinski
>
> >
> > You could also use get_attribute_name below for attr_name.
> >
> > Marek
> >

[PATCH] aarch64: Use is_attribute_namespace_p and get_attribute_name inside aarch64_lookup_shared_state_flags [PR116598]

2024-09-04 Thread Andrew Pinski

The code in aarch64_lookup_shared_state_flags all C++11 attributes on the 
function type
had a namespace associated with them. But with the addition of 
reproducible/unsequenced,
this is not true.

This fixes the issue by using is_attribute_namespace_p instead of manually 
figuring out
the namespace is named "arm" and uses get_attribute_name instead of manually 
grabbing
the attribute name.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

PR target/116598
* config/aarch64/aarch64.cc (aarch64_lookup_shared_state_flags): Use
is_attribute_namespace_p and get_attribute_name instead of manually 
grabbing
the namespace and name of the attribute.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.cc | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 27e24ba70ab..6a3f1a23a9f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -594,14 +594,10 @@ aarch64_lookup_shared_state_flags (tree attrs, const char 
*state_name)
 {
   for (tree attr = attrs; attr; attr = TREE_CHAIN (attr))
 {
-  if (!cxx11_attribute_p (attr))
+  if (!is_attribute_namespace_p ("arm", attr))
continue;
 
-  auto ns = IDENTIFIER_POINTER (TREE_PURPOSE (TREE_PURPOSE (attr)));
-  if (strcmp (ns, "arm") != 0)
-   continue;
-
-  auto attr_name = IDENTIFIER_POINTER (TREE_VALUE (TREE_PURPOSE (attr)));
+  auto attr_name = IDENTIFIER_POINTER (get_attribute_name (attr));
   auto flags = aarch64_attribute_shared_state_flags (attr_name);
   if (!flags)
continue;
-- 
2.43.0

Re: [PATCH] aarch64: Handle attributes in the global namespace for aarch64_lookup_shared_state_flags [PR116598]

2024-09-04 Thread Andrew Pinski

On Wed, Sep 4, 2024 at 2:36 PM Marek Polacek  wrote:
>
> On Wed, Sep 04, 2024 at 02:05:21PM -0700, Andrew Pinski wrote:
> > The code in aarch64_lookup_shared_state_flags all C++11 attributes on the 
> > function type
> > had a namespace associated with them. But with the addition of 
> > reproducible/unsequenced,
> > this was no longer true.
> > This is the simple fix to ignore attributes in the global namespace since 
> > we are looking
> > for ones in the `arm` namespace instead.
> >
> > Built and tested for aarch64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64.cc (aarch64_lookup_shared_state_flags): 
> > Ignore
> >   attributes in the global namespace.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/config/aarch64/aarch64.cc | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 27e24ba70ab..3f7bc572edc 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -597,6 +597,10 @@ aarch64_lookup_shared_state_flags (tree attrs, const 
> > char *state_name)
> >if (!cxx11_attribute_p (attr))
> >   continue;
> >
> > +  /* Skip the attributes in the global namespace. */
> > +  if (!TREE_PURPOSE (TREE_PURPOSE (attr)))
> > + continue;
> > +
> >auto ns = IDENTIFIER_POINTER (TREE_PURPOSE (TREE_PURPOSE (attr)));
> >if (strcmp (ns, "arm") != 0)
> >   continue;
>
> Take it or leave it, but I think the whole thing could be just
>
>   tree ns = get_attribute_namespace (attr);
>   if (!ns || !id_equal (ns, "arm"))
> continue;

Actually I think it could be reduced further down to just:
if (!is_attribute_namespace_p ("arm", attr))
  continue;

In this case.

If I get some time this weekend I will submit a patch to clean up this code.

Thanks,
Andrew Pinski

>
> You could also use get_attribute_name below for attr_name.
>
> Marek
>

[PATCH] aarch64: Handle attributes in the global namespace for aarch64_lookup_shared_state_flags [PR116598]

2024-09-04 Thread Andrew Pinski

The code in aarch64_lookup_shared_state_flags all C++11 attributes on the 
function type
had a namespace associated with them. But with the addition of 
reproducible/unsequenced,
this was no longer true.
This is the simple fix to ignore attributes in the global namespace since we 
are looking
for ones in the `arm` namespace instead.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_lookup_shared_state_flags): Ignore
attributes in the global namespace.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 27e24ba70ab..3f7bc572edc 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -597,6 +597,10 @@ aarch64_lookup_shared_state_flags (tree attrs, const char 
*state_name)
   if (!cxx11_attribute_p (attr))
continue;
 
+  /* Skip the attributes in the global namespace. */
+  if (!TREE_PURPOSE (TREE_PURPOSE (attr)))
+   continue;
+
   auto ns = IDENTIFIER_POINTER (TREE_PURPOSE (TREE_PURPOSE (attr)));
   if (strcmp (ns, "arm") != 0)
continue;
-- 
2.43.0

[PATCH] expand: Add dump for costing of positive divides

2024-09-03 Thread Andrew Pinski

While trying to understand PR 115910 I found it was useful to print out
the two costs of doing a signed and unsigned division just like was added in
r15-3272-g3c89c41991d8e8 for popcount==1.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* expr.cc (expand_expr_divmod): Add dump of the two costs for
positive division.

Signed-off-by: Andrew Pinski 
---
 gcc/expr.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 320be8b17a1..7a471f20e79 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -9648,6 +9648,7 @@ expand_expr_divmod (tree_code code, machine_mode mode, 
tree treeop0,
   end_sequence ();
   unsigned uns_cost = seq_cost (uns_insns, speed_p);
   unsigned sgn_cost = seq_cost (sgn_insns, speed_p);
+  bool was_tie = false;
 
   /* If costs are the same then use as tie breaker the other other
 factor.  */
@@ -9655,8 +9656,14 @@ expand_expr_divmod (tree_code code, machine_mode mode, 
tree treeop0,
{
  uns_cost = seq_cost (uns_insns, !speed_p);
  sgn_cost = seq_cost (sgn_insns, !speed_p);
+ was_tie = true;
}
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf(dump_file, "positive division:%s unsigned cost: %u; "
+ "signed cost: %u\n", was_tie ? "(needed tie breaker)":"",
+ uns_cost, sgn_cost);
+
   if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp))
{
  emit_insn (uns_insns);
-- 
2.43.0

[PUSHED] aarch64: Fix testcase vec-init-22-speed.c [PR116589]

2024-09-03 Thread Andrew Pinski

For this testcase, the trunk produces:
```
f_s16:
fmovs31, w0
fmovs0, w1
```

While the testcase was expecting what was produced in GCC 14:
```
f_s16:
sxthw0, w0
sxthw1, w1
fmovd31, x0
fmovd0, x1
```

After r15-1575-gea8061f46a30 the code was:
```
dup v31.4h, w0
dup v0.4h, w1
```
But when ext-dce was added with r15-1901-g98914f9eba5f19, we get the better 
code generation now and only fmov's.

Pushed as obvious after running the testcase.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vec-init-22-speed.c: Update scan for better code 
gen.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c 
b/gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c
index 993ef8c4161..6edc82831a0 100644
--- a/gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-22-speed.c
@@ -7,6 +7,6 @@
 
 #include "vec-init-22.h"
 
-/* { dg-final { scan-assembler-times {\tfmov\td[0-9]+, x[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmov\ts[0-9]+, w[0-9]+} 2 } } */
 /* { dg-final { scan-assembler-times {\tins\tv[0-9]+\.h\[[1-3]\], w[0-9]+} 6 } 
} */
 /* { dg-final { scan-assembler {\tzip1\tv[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h} 
} } */
-- 
2.43.0

[PATCH] object-size: Use simple_dce_from_worklist in object-size pass

2024-09-03 Thread Andrew Pinski

While trying to see if there was a way to improve object-size pass
to use the ranger (for pointer plus), I noticed that it leaves around
the statement containing __builtin_object_size if it was reduced to a constant.
This fixes that by using simple_dce_from_worklist.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-object-size.cc (object_sizes_execute): Mark lhs for maybe dceing
if doing a propagate. Call simple_dce_from_worklist.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-object-size.cc | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 4c1fa9b555f..6544730e153 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "gimplify-me.h"
 #include "gimplify.h"
+#include "tree-ssa-dce.h"
 
 struct object_size_info
 {
@@ -2187,6 +2188,7 @@ static unsigned int
 object_sizes_execute (function *fun, bool early)
 {
   todo = 0;
+  auto_bitmap sdce_worklist;
 
   basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
@@ -2277,13 +2279,18 @@ object_sizes_execute (function *fun, bool early)
 
  /* Propagate into all uses and fold those stmts.  */
  if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
-   replace_uses_by (lhs, result);
+   {
+ replace_uses_by (lhs, result);
+ /* Mark lhs as being possiblely DCEd. */
+ bitmap_set_bit (sdce_worklist, SSA_NAME_VERSION (lhs));
+   }
  else
replace_call_with_value (&i, result);
}
 }
 
   fini_object_sizes ();
+  simple_dce_from_worklist (sdce_worklist);
   return todo;
 }
 
-- 
2.43.0

Re: [pushed] c++: support C++11 attributes in C++98

2024-09-03 Thread Andrew Pinski

On Tue, Sep 3, 2024 at 3:01 PM Jason Merrill  wrote:
>
> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> -- 8< --
>
> I don't see any reason why we can't allow the [[]] attribute syntax in C++98
> mode with a pedwarn just like many other C++11 features.  In fact, we
> already do support it in some places in the grammar, but not in places that
> check cp_nth_tokens_can_be_std_attribute_p.
>
> Let's also follow the C front-end's lead in only warning about them when
>  -pedantic.
>
> It still isn't necessary for this function to guard against Objective-C
> message passing syntax; we handle that with tentative parsing in
> cp_parser_statement, and we don't call this function in that context anyway.
>
> gcc/cp/ChangeLog:
>
> * parser.cc (cp_nth_tokens_can_be_std_attribute_p): Don't check
> cxx_dialect.
> * error.cc (maybe_warn_cpp0x): Only complain about C++11 attributes
> if pedantic.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp0x/gen-attrs-1.C: Also run in C++98 mode.
> * g++.dg/cpp0x/gen-attrs-11.C: Likewise.
> * g++.dg/cpp0x/gen-attrs-13.C: Likewise.
> * g++.dg/cpp0x/gen-attrs-15.C: Likewise.
> * g++.dg/cpp0x/gen-attrs-75.C: Don't expect C++98 warning after
> __extension__.
> ---
>  gcc/cp/error.cc   |  7 ---
>  gcc/cp/parser.cc  |  9 -
>  gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C  |  2 +-
>  gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C |  2 +-
>  gcc/testsuite/g++.dg/cpp0x/gen-attrs-13.C |  2 +-
>  gcc/testsuite/g++.dg/cpp0x/gen-attrs-15.C |  2 +-
>  gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C | 10 +-
>  7 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
> index 57cd76caf49..4a9e9aa3cdc 100644
> --- a/gcc/cp/error.cc
> +++ b/gcc/cp/error.cc
> @@ -4735,9 +4735,10 @@ maybe_warn_cpp0x (cpp0x_warn_str str, location_t 
> loc/*=input_location*/)
>  "only available with %<-std=c++11%> or %<-std=gnu++11%>");
>  break;
>case CPP0X_ATTRIBUTES:
> -   pedwarn (loc, OPT_Wc__11_extensions,
> -"C++11 attributes "
> -"only available with %<-std=c++11%> or %<-std=gnu++11%>");
> +   if (pedantic)
> + pedwarn (loc, OPT_Wc__11_extensions,
> +  "C++11 attributes "
> +  "only available with %<-std=c++11%> or %<-std=gnu++11%>");

Shouldn't the warning also change to mention -std=gnu++98 now? Or
maybe reworded a little more?

Thanks,
Andrew Pinski


> break;
>case CPP0X_REF_QUALIFIER:
> pedwarn (loc, OPT_Wc__11_extensions,
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index edfa5a49440..64122d937fa 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -29924,11 +29924,10 @@ cp_nth_tokens_can_be_std_attribute_p (cp_parser 
> *parser, size_t n)
>  {
>cp_token *token = cp_lexer_peek_nth_token (parser->lexer, n);
>
> -  return (cxx_dialect >= cxx11
> - && ((token->type == CPP_KEYWORD && token->keyword == RID_ALIGNAS)
> - || (token->type == CPP_OPEN_SQUARE
> - && (token = cp_lexer_peek_nth_token (parser->lexer, n + 1))
> - && token->type == CPP_OPEN_SQUARE)));
> +  return ((token->type == CPP_KEYWORD && token->keyword == RID_ALIGNAS)
> + || (token->type == CPP_OPEN_SQUARE
> + && (token = cp_lexer_peek_nth_token (parser->lexer, n + 1))
> + && token->type == CPP_OPEN_SQUARE));
>  }
>
>  /* Return TRUE iff the next Nth tokens in the stream are possibly the
> diff --git a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C 
> b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C
> index c2cf912047e..b1625d96916 100644
> --- a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C
> +++ b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-1.C
> @@ -1,3 +1,3 @@
> -// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-Wno-c++11-extensions" }
>
>  int  [[gnu::format(printf, 1, 2)]] foo(const char *, ...); // { 
> dg-warning "only applies to function types" }
> diff --git a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C 
> b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C
> index 504b4565679..040f15c9dbb 100644
> --- a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C
> +++ b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-11.C
> @@ -1,4 +1,4 @@
> -// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-Wno-c++11-

[PATCH 2/2] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Andrew Pinski

This simplifies the heurstic for split path to see if the join
bb is a ifcvt candidate.
For the predecessors bbs need either to be empty or only have one
statement in them which could be a decent ifcvt candidate.
The previous heurstics would miss that:
```
if (a) goto B else goto C;
B:  goto C;
C:
c = PHI
```

Would be a decent ifcvt candidate. And would also miss:
```
if (a) goto B else goto C;
B: d = f + 1;  goto C;
C:
c = PHI
```

Also since currently the max number of cmovs being able to produced is 3, we
should only assume `<= 3` phis can be ifcvt candidates.

The testcase changes for split-path-6.c is that lookharder function
is a true ifcvt case where we would get cmov as expected; it looks like it
was not a candidate when the heurstic was added but became one later on.
pr88797.C is now rejected via it being an ifcvt candidate rather than being 
about
DCE/const prop.

The rest of the testsuite changes are just slight change in the dump,
removing the "*diamnond" part as it was removed from the print.

Bootstrapped and tested on x86_64.

PR tree-optimization/112402

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (poor_ifcvt_pred): New function.
(is_feasible_trace): Remove old heurstics for ifcvt cases.
For num_stmts <=1 for both pred check poor_ifcvt_pred on both
pred.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/split-path-11.c: Update scan.
* gcc.dg/tree-ssa/split-path-2.c: Update scan.
* gcc.dg/tree-ssa/split-path-5.c: Update scan.
* gcc.dg/tree-ssa/split-path-6.c: Update scan.
* g++.dg/tree-ssa/pr88797.C: Update scan.
* gcc.dg/tree-ssa/split-path-13.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-ssa-split-paths.cc | 172 ++
 gcc/testsuite/g++.dg/tree-ssa/pr88797.C   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-11.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-13.c |  26 +++
 gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-5.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c  |   4 +-
 7 files changed, 88 insertions(+), 122 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/split-path-13.c

diff --git a/gcc/gimple-ssa-split-paths.cc b/gcc/gimple-ssa-split-paths.cc
index 81a5d1dee5b..32b5c445760 100644
--- a/gcc/gimple-ssa-split-paths.cc
+++ b/gcc/gimple-ssa-split-paths.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
 #include "fold-const.h"
+#include "cfghooks.h"
 
 /* Given LATCH, the latch block in a loop, see if the shape of the
path reaching LATCH is suitable for being split by duplication.
@@ -141,6 +142,40 @@ poor_ifcvt_candidate_code (enum tree_code code)
  || code == CALL_EXPR);
 }
 
+/* Return TRUE if PRED of BB is an poor ifcvt candidate. */
+static bool
+poor_ifcvt_pred (basic_block pred, basic_block bb)
+{
+  /* If the edge count of the pred is not 1, then
+ this is the predecessor from the if rather
+ than middle one. */
+  if (EDGE_COUNT (pred->succs) != 1)
+return false;
+
+  /* Empty middle bb are never a poor ifcvt candidate. */
+  if (empty_block_p (pred))
+return false;
+  /* If BB's predecessors are single statement blocks where
+ the output of that statement feed the same PHI in BB,
+ it an ifcvt candidate. */
+  gimple *stmt = last_and_only_stmt (pred);
+  if (!stmt || gimple_code (stmt) != GIMPLE_ASSIGN)
+return true;
+  tree_code code = gimple_assign_rhs_code (stmt);
+  if (poor_ifcvt_candidate_code (code))
+return true;
+  tree lhs = gimple_assign_lhs (stmt);
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple *phi = gsi_stmt (gsi);
+  if (gimple_phi_arg_def (phi, 0) == lhs
+ || gimple_phi_arg_def (phi, 1) == lhs)
+   return false;
+}
+  return true;
+}
+
 /* Return TRUE if BB is a reasonable block to duplicate by examining
its size, false otherwise.  BB will always be a loop latch block.
 
@@ -181,127 +216,30 @@ is_feasible_trace (basic_block bb)
 }
 
   /* This is meant to catch cases that are likely opportunities for
- if-conversion.  Essentially we look for the case where
- BB's predecessors are both single statement blocks where
- the output of that statement feed the same PHI in BB.  */
-  if (num_stmts_in_pred1 == 1 && num_stmts_in_pred2 == 1)
-{
-  gimple *stmt1 = last_and_only_stmt (pred1);
-  gimple *stmt2 = last_and_only_stmt (pred2);
-
-  if (stmt1 && stmt2
- && gimple_code (stmt1) == GIMPLE_ASSIGN
- && gimple_code (stmt2) == GIMPLE_ASSIGN)
-   {
- enum tree_code code1 = gimple_assign_rhs_code (stmt1);
- enum tree_code code2 = gimple_assign_rhs_code (stmt

[PATCH 1/2] split-paths: Move check for # of statements in join earlier

2024-09-03 Thread Andrew Pinski

This moves the check for # of statements to copy in join to
be the first check. This check is the cheapest check so it
should be first. Plus add a print to the dump file since there
was none beforehand.

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (is_feasible_trace): Move
check for # of statments in join earlier and add a
debug print.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-ssa-split-paths.cc | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-ssa-split-paths.cc b/gcc/gimple-ssa-split-paths.cc
index 8b4304fe59e..81a5d1dee5b 100644
--- a/gcc/gimple-ssa-split-paths.cc
+++ b/gcc/gimple-ssa-split-paths.cc
@@ -167,6 +167,19 @@ is_feasible_trace (basic_block bb)
   int num_stmts_in_pred2
 = EDGE_COUNT (pred2->succs) == 1 ? count_stmts_in_block (pred2) : 0;
 
+  /* Upper Hard limit on the number statements to copy.  */
+  if (num_stmts_in_join
+  >= param_max_jump_thread_duplication_stmts)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Duplicating block %d would be too duplicate "
+"too many statments: %d >= %d\n",
+bb->index, num_stmts_in_join,
+param_max_jump_thread_duplication_stmts);
+  return false;
+}
+
   /* This is meant to catch cases that are likely opportunities for
  if-conversion.  Essentially we look for the case where
  BB's predecessors are both single statement blocks where
@@ -406,12 +419,6 @@ is_feasible_trace (basic_block bb)
   /* We may want something here which looks at dataflow and tries
  to guess if duplication of BB is likely to result in simplification
  of instructions in BB in either the original or the duplicate.  */
-
-  /* Upper Hard limit on the number statements to copy.  */
-  if (num_stmts_in_join
-  >= param_max_jump_thread_duplication_stmts)
-return false;
-
   return true;
 }
 
-- 
2.43.0

[PATCH] split-path: Improve ifcvt heurstic for split path [PR112402]

2024-09-03 Thread Andrew Pinski

This simplifies the heurstic for split path to see if the join
bb is a ifcvt candidate.
For the predecessors bbs need either to be empty or only have one
statement in them which could be a decent ifcvt candidate.
The previous heurstics would miss that:
```
if (a) goto B else goto C;
B:  goto C;
C:
c = PHI
```

Would be a decent ifcvt candidate. And would also miss:
```
if (a) goto B else goto C;
B: d = f + 1;  goto C;
C:
c = PHI
```

Also since currently the max number of cmovs being able to produced is 3, we
should only assume `<= 3` phis can be ifcvt candidates.

The testcase changes for split-path-6.c is that lookharder function
is a true ifcvt case where we would get cmov as expected; it looks like it
was not a candidate when the heurstic was added but became one later on.
pr88797.C is now rejected via it being an ifcvt candidate rather than being 
about
DCE/const prop.

The rest of the testsuite changes are just slight change in the dump,
removing the "*diamnond" part as it was removed from the print.

Bootstrapped and tested on x86_64.

PR tree-optimization/112402

gcc/ChangeLog:

* gimple-ssa-split-paths.cc (poor_ifcvt_pred): New function.
(is_feasible_trace): Remove old heurstics for ifcvt cases.
For num_stmts <=1 for both pred check poor_ifcvt_pred on both
pred.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/split-path-11.c: Update scan.
* gcc.dg/tree-ssa/split-path-2.c: Update scan.
* gcc.dg/tree-ssa/split-path-5.c: Update scan.
* gcc.dg/tree-ssa/split-path-6.c: Update scan.
* g++.dg/tree-ssa/pr88797.C: Update scan.
* gcc.dg/tree-ssa/split-path-13.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-ssa-split-paths.cc | 172 ++
 gcc/testsuite/g++.dg/tree-ssa/pr88797.C   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-11.c |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-13.c |  26 +++
 gcc/testsuite/gcc.dg/tree-ssa/split-path-2.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-5.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-6.c  |   4 +-
 7 files changed, 88 insertions(+), 122 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/split-path-13.c

diff --git a/gcc/gimple-ssa-split-paths.cc b/gcc/gimple-ssa-split-paths.cc
index 81a5d1dee5b..32b5c445760 100644
--- a/gcc/gimple-ssa-split-paths.cc
+++ b/gcc/gimple-ssa-split-paths.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
 #include "fold-const.h"
+#include "cfghooks.h"
 
 /* Given LATCH, the latch block in a loop, see if the shape of the
path reaching LATCH is suitable for being split by duplication.
@@ -141,6 +142,40 @@ poor_ifcvt_candidate_code (enum tree_code code)
  || code == CALL_EXPR);
 }
 
+/* Return TRUE if PRED of BB is an poor ifcvt candidate. */
+static bool
+poor_ifcvt_pred (basic_block pred, basic_block bb)
+{
+  /* If the edge count of the pred is not 1, then
+ this is the predecessor from the if rather
+ than middle one. */
+  if (EDGE_COUNT (pred->succs) != 1)
+return false;
+
+  /* Empty middle bb are never a poor ifcvt candidate. */
+  if (empty_block_p (pred))
+return false;
+  /* If BB's predecessors are single statement blocks where
+ the output of that statement feed the same PHI in BB,
+ it an ifcvt candidate. */
+  gimple *stmt = last_and_only_stmt (pred);
+  if (!stmt || gimple_code (stmt) != GIMPLE_ASSIGN)
+return true;
+  tree_code code = gimple_assign_rhs_code (stmt);
+  if (poor_ifcvt_candidate_code (code))
+return true;
+  tree lhs = gimple_assign_lhs (stmt);
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple *phi = gsi_stmt (gsi);
+  if (gimple_phi_arg_def (phi, 0) == lhs
+ || gimple_phi_arg_def (phi, 1) == lhs)
+   return false;
+}
+  return true;
+}
+
 /* Return TRUE if BB is a reasonable block to duplicate by examining
its size, false otherwise.  BB will always be a loop latch block.
 
@@ -181,127 +216,30 @@ is_feasible_trace (basic_block bb)
 }
 
   /* This is meant to catch cases that are likely opportunities for
- if-conversion.  Essentially we look for the case where
- BB's predecessors are both single statement blocks where
- the output of that statement feed the same PHI in BB.  */
-  if (num_stmts_in_pred1 == 1 && num_stmts_in_pred2 == 1)
-{
-  gimple *stmt1 = last_and_only_stmt (pred1);
-  gimple *stmt2 = last_and_only_stmt (pred2);
-
-  if (stmt1 && stmt2
- && gimple_code (stmt1) == GIMPLE_ASSIGN
- && gimple_code (stmt2) == GIMPLE_ASSIGN)
-   {
- enum tree_code code1 = gimple_assign_rhs_code (stmt1);
- enum tree_code code2 = gimple_assign_rhs_code (stmt

Re: [PATCH 1/3] SVE intrinsics: Fold constant operands.

2024-09-03 Thread Andrew Pinski

On Fri, Aug 30, 2024 at 4:41 AM Jennifer Schmitz  wrote:
>
> This patch implements constant folding of binary operations for SVE intrinsics
> by calling the constant-folding mechanism of the middle-end for a given
> tree_code.
> In fold-const.cc, the code for folding vector constants was moved from
> const_binop to a new function vector_const_binop. This function takes a
> function pointer as argument specifying how to fold the vector elements.
> The code for folding operations where the first operand is a vector
> constant and the second argument is an integer constant was also moved
> into vector_const_binop to fold binary SVE intrinsics where the second
> operand is an integer (_n).
> In the aarch64 backend, the new function aarch64_const_binop was
> created, which - in contrast to int_const_binop - does not treat operations as
> overflowing. This function is passed as callback to vector_const_binop
> during gimple folding in intrinsic implementations.
> Because aarch64_const_binop calls poly_int_binop, the latter was made public.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?

This broke almost all targets (except for aarch64 and riscv since
those are NUM_POLY_INT_COEFFS  != 1 targets).
Because the assert in poly_int_binop for NUM_POLY_INT_COEFFS is now
before the check for both arg1/arg2 being INTEGER_CST since you moved
that from int_const_binop into poly_int_binop.

The obvious patch would move the assert below the check for
INTEGER_CSTs. I can't test it right now though.

Thanks,
Andrew Pinski

>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
> * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
> New function to fold binary SVE intrinsics without overflow.
> * config/aarch64/aarch64-sve-builtins.h: Declare aarch64_const_binop.
> * fold-const.h: Declare vector_const_binop.
> * fold-const.cc (const_binop): Remove cases for vector constants.
> (vector_const_binop): New function that folds vector constants
> element-wise.
> (int_const_binop): Remove call to wide_int_binop.
> (poly_int_binop): Add call to wide_int_binop.

Re: [PATCH v1 7/9] aarch64: Disable the anchors

2024-09-02 Thread Andrew Pinski

On Mon, Sep 2, 2024 at 6:12 AM Evgeny Karpov
 wrote:
>
> The anchors have been disabled as they use symbol + offset, which is
> not applicable for COFF AArch64.

This does not make sense to me at all. Anchors are a small
optimization to group together some static decls so that you could
reuse an anchor point.
Could you expand on this and why you think disabling is correct?
It is so you could do:
adrpx0, .LANCHOR0
add x2, x0, :lo12:.LANCHOR0
ldr w1, [x0, #:lo12:.LANCHOR0]
ldr w0, [x2, 4]

Rather than:
adrpx1, t
adrpx0, t1
ldr w1, [x1, #:lo12:t]
ldr w0, [x0, #:lo12:t1]
add w0, w1, w0

Notice how there is only one adrp in the anchor case.
Could you expand on why the section anchors don't work for pe-coff?

Thanks,
Andrew

>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc (TARGET_MIN_ANCHOR_OFFSET):
> Keep default TARGET_MAX_ANCHOR_OFFSET for PECOFF target.
> (TARGET_MAX_ANCHOR_OFFSET): Likewise.
> ---
>  gcc/config/aarch64/aarch64.cc | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 1d88814f28d..eea9ac02df0 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -30961,11 +30961,13 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_MIN_ANCHOR_OFFSET
>  #define TARGET_MIN_ANCHOR_OFFSET -256
>
> +#if !TARGET_PECOFF
>  /* Limit the maximum anchor offset to 4k-1, since that's the limit for a
> byte offset; we can do much more for larger data types, but have no way
> to determine the size of the access.  We assume accesses are aligned.  */
>  #undef TARGET_MAX_ANCHOR_OFFSET
>  #define TARGET_MAX_ANCHOR_OFFSET 4095
> +#endif
>
>  #undef TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT
>  #define TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT \
> --
> 2.34.1
>

Re: [PATCH 2/8] i386: Optimize ordered and nonequal

2024-09-02 Thread Andrew Pinski

On Mon, Sep 2, 2024 at 11:20 AM Jakub Jelinek  wrote:
>
> On Mon, Aug 26, 2024 at 02:42:31PM +0800, Haochen Jiang wrote:
> >   * match.pd: Optimize (and ordered non-equal) to
> >   (not (or unordered  equal))
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/optimize_one.c: New test.
>
> The testcase FAILs on i686-linux, because it uses -mfpmath=sse
> without enabling -msse2.
>
> I've committed the following fix as obvious to fix that.
>
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6636,6 +6636,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (ltgt @0 @0)
> >   (if (!flag_trapping_math || !tree_expr_maybe_nan_p (@0))
> >{ constant_boolean_node (false, type); }))
> > +(simplify
> > + (bit_and (ordered @0 @1) (ne @0 @1))
> > + (bit_not (uneq @0 @1)))
>
> I wonder whether there shouldn't be some :c (e.g. on bit_and and maybe
> ne too), because ordered is commutative and so is ne and so is bit_and,
> and perhaps you want to match also (bit_and (ne @0 @1) (ordered @1 @0))
> etc.  What about negation of this (bit_ior (unordered @0 @1) (eq @0 @1))?

The :c is needed for bit_and for sure. BUT should not needed for
ordered/ne though because the canonicalization of the operations
should have the operands in the same order as `a ordered b` is the
same as `b ordered a`.

Thanks,
Andrew Pinski

>
> And I think the test is really badly named...
>
> 2024-09-02  Jakub Jelinek  
>
> * gcc.target/i386/optimize_one.c: Add -msse2 to dg-options.
>
> --- gcc/testsuite/gcc.target/i386/optimize_one.c.jj 2024-09-02 
> 15:41:30.070228957 +0200
> +++ gcc/testsuite/gcc.target/i386/optimize_one.c2024-09-02 
> 20:09:14.151727645 +0200
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mfpmath=sse" } */
> +/* { dg-options "-O2 -mfpmath=sse -msse2" } */
>  /* { dg-final { scan-assembler-times "comi" 1 } } */
>  /* { dg-final { scan-assembler-times "set" 1 } } */
>
>
>
> Jakub
>

Re: [PATCH] MATCH: add abs support for half float

2024-09-01 Thread Andrew Pinski

On Sun, Sep 1, 2024 at 4:27 PM Kugan Vivekanandarajah
 wrote:
>
> Hi Andrew.
>
> > On 28 Aug 2024, at 2:23 pm, Andrew Pinski  wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Aug 27, 2024 at 8:54 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >> Thanks for the reply.
> >>
> >>> On 27 Aug 2024, at 7:05 pm, Richard Biener  
> >>> wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Tue, Aug 27, 2024 at 8:23 AM Kugan Vivekanandarajah
> >>>  wrote:
> >>>>
> >>>> Hi Richard,
> >>>>
> >>>>> On 22 Aug 2024, at 10:34 pm, Richard Biener 
> >>>>>  wrote:
> >>>>>
> >>>>> External email: Use caution opening links or attachments
> >>>>>
> >>>>>
> >>>>> On Wed, Aug 21, 2024 at 12:08 PM Kugan Vivekanandarajah
> >>>>>  wrote:
> >>>>>>
> >>>>>> Hi Richard,
> >>>>>>
> >>>>>>> On 20 Aug 2024, at 6:09 pm, Richard Biener 
> >>>>>>>  wrote:
> >>>>>>>
> >>>>>>> External email: Use caution opening links or attachments
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> Thanks for the comments.
> >>>>>>>>
> >>>>>>>>> On 2 Aug 2024, at 8:36 pm, Richard Biener 
> >>>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah
> >>>>>>>>>  wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 1 Aug 2024, at 10:46 pm, Richard Biener 
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski 
> >>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
> >>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
> >>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
> >>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
> >>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> >>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski 
> >>>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>&g

[PATCH] slsr: Use simple_dce_from_worklist in SLSR [PR116554]

2024-08-31 Thread Andrew Pinski

While working on a phiopt patch, it was noticed that
SLSR would leave around some unused ssa names. Let's
add simple_dce_from_worklist usage to SLSR to remove
the dead statements. This should give a small improvemnent
for passes afterwards.

Boostrapped and tested on x86_64.

gcc/ChangeLog:

PR tree-optimization/116554
* gimple-ssa-strength-reduction.cc: Include tree-ssa-dce.h.
(replace_mult_candidate): Add sdce_worklist argument, mark
the rhs1/rhs2 for maybe dceing.
(replace_unconditional_candidate): Add sdce_worklist argument,
Update call to replace_mult_candidate.
(replace_conditional_candidate): Add sdce_worklist argument,
update call to replace_mult_candidate.
(replace_uncond_cands_and_profitable_phis): Add sdce_worklist argument,
update call to replace_conditional_candidate,
replace_unconditional_candidate, and 
replace_uncond_cands_and_profitable_phis.
(replace_one_candidate): Add sdce_worklist argument, mark
the orig_rhs1/orig_rhs2 for maybe dceing.
(replace_profitable_candidates): Add sdce_worklist argument,
update call to replace_one_candidate and replace_profitable_candidates.
(analyze_candidates_and_replace): Call simple_dce_from_worklist and
update calls to replace_profitable_candidates, and
replace_uncond_cands_and_profitable_phis.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-ssa-strength-reduction.cc | 59 +++-
 1 file changed, 41 insertions(+), 18 deletions(-)

diff --git a/gcc/gimple-ssa-strength-reduction.cc 
b/gcc/gimple-ssa-strength-reduction.cc
index 1cb3625c7eb..39cd9339c77 100644
--- a/gcc/gimple-ssa-strength-reduction.cc
+++ b/gcc/gimple-ssa-strength-reduction.cc
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-affine.h"
 #include "tree-eh.h"
 #include "builtins.h"
+#include "tree-ssa-dce.h"
 
 /* Information about a strength reduction candidate.  Each statement
in the candidate table represents an expression of one of the
@@ -2126,7 +2127,8 @@ cand_already_replaced (slsr_cand_t c)
replace_conditional_candidate.  */
 
 static void
-replace_mult_candidate (slsr_cand_t c, tree basis_name, offset_int bump)
+replace_mult_candidate (slsr_cand_t c, tree basis_name, offset_int bump,
+   auto_bitmap &sdce_worklist)
 {
   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
@@ -2193,6 +2195,11 @@ replace_mult_candidate (slsr_cand_t c, tree basis_name, 
offset_int bump)
   if (cand_code != NEGATE_EXPR) {
rhs1 = gimple_assign_rhs1 (c->cand_stmt);
rhs2 = gimple_assign_rhs2 (c->cand_stmt);
+   /* Mark the 2 original rhs for maybe DCEing.  */
+   if (TREE_CODE (rhs1) == SSA_NAME)
+ bitmap_set_bit (sdce_worklist, SSA_NAME_VERSION (rhs1));
+   if (TREE_CODE (rhs2) == SSA_NAME)
+ bitmap_set_bit (sdce_worklist, SSA_NAME_VERSION (rhs2));
   }
   if (cand_code != NEGATE_EXPR
  && ((operand_equal_p (rhs1, basis_name, 0)
@@ -2237,7 +2244,7 @@ replace_mult_candidate (slsr_cand_t c, tree basis_name, 
offset_int bump)
folded value ((i - i') * S) is referred to here as the "bump."  */
 
 static void
-replace_unconditional_candidate (slsr_cand_t c)
+replace_unconditional_candidate (slsr_cand_t c, auto_bitmap &sdce_worklist)
 {
   slsr_cand_t basis;
 
@@ -2247,7 +2254,8 @@ replace_unconditional_candidate (slsr_cand_t c)
   basis = lookup_cand (c->basis);
   offset_int bump = cand_increment (c) * wi::to_offset (c->stride);
 
-  replace_mult_candidate (c, gimple_assign_lhs (basis->cand_stmt), bump);
+  replace_mult_candidate (c, gimple_assign_lhs (basis->cand_stmt), bump,
+ sdce_worklist);
 }
 
 /* Return the index in the increment vector of the given INCREMENT,
@@ -2507,7 +2515,8 @@ create_phi_basis (slsr_cand_t c, gimple *from_phi, tree 
basis_name,
basis.  */
 
 static void
-replace_conditional_candidate (slsr_cand_t c)
+replace_conditional_candidate (slsr_cand_t c, auto_bitmap &sdce_worklist)
+
 {
   tree basis_name, name;
   slsr_cand_t basis;
@@ -2527,7 +2536,7 @@ replace_conditional_candidate (slsr_cand_t c)
   /* Replace C with an add of the new basis phi and a constant.  */
   offset_int bump = c->index * wi::to_offset (c->stride);
 
-  replace_mult_candidate (c, name, bump);
+  replace_mult_candidate (c, name, bump, sdce_worklist);
 }
 
 /* Recursive helper function for phi_add_costs.  SPREAD is a measure of
@@ -2608,7 +2617,8 @@ phi_add_costs (gimple *phi, slsr_cand_t c, int 
one_add_cost)
so, replace the candidate and introduce the compensation code.  */
 
 static void
-replace_uncond_cands_and_profitable_phis (slsr_cand_t c)
+replace_uncond_cands_and_profitable_phis (slsr_can

[PUSEHED] libobjc: Add cast to void* to disable warning for casting between incompatible function types [PR89586]

2024-08-31 Thread Andrew Pinski

Even though __objc_get_forward_imp returns an IMP type, it will be casted to a 
compatable function
type before calling it. So we adding a cast to `void*` will disable warning 
about the incompatible type.

Pushed after bootstrap/test on x86_64.

libobjc/ChangeLog:

PR libobjc/89586
* sendmsg.c (__objc_get_forward_imp): Add cast to `void*` before 
casting to IMP.

Signed-off-by: Andrew Pinski 
---
 libobjc/sendmsg.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libobjc/sendmsg.c b/libobjc/sendmsg.c
index e781b2a9e50..65bc250ad90 100644
--- a/libobjc/sendmsg.c
+++ b/libobjc/sendmsg.c
@@ -126,11 +126,11 @@ __objc_get_forward_imp (id rcv, SEL sel)
   && objc_sizeof_type (t) > OBJC_MAX_STRUCT_BY_VALUE
 #endif
   )
-return (IMP)__objc_block_forward;
+return (IMP)(void*)__objc_block_forward;
   else if (t && (*t == 'f' || *t == 'd'))
-return (IMP)__objc_double_forward;
+return (IMP)(void*)__objc_double_forward;
   else
-return (IMP)__objc_word_forward;
+return (IMP)(void*)__objc_word_forward;
 }
 }
 
-- 
2.43.0

[PATCH 2/2] phiopt: Ignore some nop statements in heursics [PR116098]

2024-08-30 Thread Andrew Pinski

The heurstics that was added for PR71016, try to search to see
if the conversion was being moved away from its definition. The problem
is the heurstics would stop if there was a non GIMPLE_ASSIGN (and already 
ignores
debug statements) and in this case we would have a GIMPLE_LABEL that was not
being ignored. So we should need to ignore GIMPLE_NOP, GIMPLE_LABEL and 
GIMPLE_PREDICT.
Note this is now similar to how gimple_empty_block_p behaves.

Note this fixes the wrong code that was reported by moving the VCE (conversion) 
out before
the phiopt/match could convert it into an bit_ior and move the VCE out with the 
VCE being
conditionally valid.

Bootstrapped and tested on x86_64-linux-gnu.
Also built and tested for aarch64-linux-gnu.

PR tree-optimization/116098

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Ignore
nops, labels and predicts for heuristic for conversion with a constant.

gcc/testsuite/ChangeLog:

* c-c++-common/torture/pr116098-1.c: New test.
* gcc.target/aarch64/csel-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 .../c-c++-common/torture/pr116098-1.c | 84 +++
 gcc/testsuite/gcc.target/aarch64/csel-1.c | 28 +++
 gcc/tree-ssa-phiopt.cc|  9 +-
 3 files changed, 119 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/torture/pr116098-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/csel-1.c

diff --git a/gcc/testsuite/c-c++-common/torture/pr116098-1.c 
b/gcc/testsuite/c-c++-common/torture/pr116098-1.c
new file mode 100644
index 000..b9d9a342305
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/torture/pr116098-1.c
@@ -0,0 +1,84 @@
+/* { dg-do run } */
+/* PR tree-optimization/116098 */
+/* truthy was being miscompiled where the VCE was not being pulled out
+   of the if statement by factor_out_conditional_operation before the rest of
+   phiopt would happen which assumed VCE would be correct. */
+/* The unused label was causing truthy to have different code generation than 
truthy_1. */
+
+
+#ifndef __cplusplus
+#define bool _Bool
+#endif
+
+enum ValueType {
+VALUE_BOOLEAN,
+VALUE_NUM,
+};
+
+struct Value {
+enum ValueType type;
+union {
+bool boolean;
+int num;
+};
+};
+
+static struct Value s_value;
+static bool s_b;
+
+
+bool truthy_1(void) __attribute__((noinline));
+bool
+truthy_1(void)
+{
+struct Value value = s_value;
+if (s_b) s_b = 0;
+enum ValueType t = value.type;
+if (t != VALUE_BOOLEAN)
+  return 1;
+  return value.boolean;
+}
+bool truthy(void) __attribute__((noinline));
+bool
+truthy(void)
+{
+struct Value value = s_value;
+if (s_b) s_b = 0;
+enum ValueType t = value.type;
+if (t != VALUE_BOOLEAN)
+  return 1;
+  /* This unused label should not cause any difference in code generation. */
+a: __attribute__((unused));
+  return value.boolean;
+}
+
+int
+main(void)
+{
+s_b = 0;
+s_value = (struct Value) {
+.type = VALUE_NUM,
+.num = 2,
+};
+s_value = (struct Value) {
+.type = VALUE_BOOLEAN,
+.boolean = !truthy_1(),
+};
+bool b = truthy_1();
+if (b)
+  __builtin_abort();
+
+s_b = 0;
+s_value = (struct Value) {
+.type = VALUE_NUM,
+.num = 2,
+};
+s_value = (struct Value) {
+.type = VALUE_BOOLEAN,
+.boolean = !truthy(),
+};
+b = truthy();
+if (b)
+  __builtin_abort();
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/csel-1.c 
b/gcc/testsuite/gcc.target/aarch64/csel-1.c
new file mode 100644
index 000..a20d39ea375
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* These 2 functions should be the same; even though there is a label in f1. 
+   The label should not make a difference in code generation.
+   There sign extend should be removed as it is not needed. */
+void f(int t, int a, short *b)
+{
+  short t1 = 1;
+  if (a)
+{
+  t1 = t;
+}
+  *b = t1;
+}
+
+void f1(int t, int a, short *b)
+{
+  short t1 = 1;
+  if (a)
+{
+  label1: __attribute__((unused))
+  t1 = t;
+}
+  *b = t1;
+}
+
+/* { dg-final { scan-assembler-not "sxth\t" } } */
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 9a009e187ee..271a5d51f09 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -324,8 +324,13 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
  gsi_prev_nondebug (&gsi);
  if (!gsi_end_p (gsi))
{
- if (gassign *assign
-   = dyn_cast  (gsi_stmt (gsi)))
+ gimple *stmt = gsi_stmt (gsi);
+ /* Ignore nops, predicates and labels. */
+ if (gimple_code (stmt) == GIMPLE_NOP
+

[PATCH 1/2] testsuite: Change what is being tested for pr66726-2.c

2024-08-30 Thread Andrew Pinski

r14-575-g6d6c17e45f62cf changed the debug dump message but the testcase
pr66726-2.c was not updated for the change. The testcase was searching to
make sure we didn't factor out a conversion but the testcase was no longer
testing that so we needed to update what was being searched for.

Tested on x86_64-linux.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr66726-2.c: Update scan dump message.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/pr66726-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-2.c
index ab43d4835d2..a59a643f5c1 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-2.c
@@ -16,4 +16,4 @@ foo (char b)
   return a + b;
 }
 
-/* { dg-final { scan-tree-dump-times "factor conversion out" 0 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "factor operation out" 0 "phiopt1" } } */
-- 
2.43.0

Re: [PATCH v2] GCC Driver : Enable very long gcc command-line option

2024-08-30 Thread Andrew Pinski

On Fri, Aug 30, 2024 at 12:34 AM  wrote:
>
> From: Deepthi Hemraj 
>
> For excessively long environment variables i.e >128KB
> Store the arguments in a temporary file and collect them back together in 
> collect2.
>
> This commit patches for COLLECT_GCC_OPTIONS issue:
> GCC should not limit the length of command line passed to collect2.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111527
>
> The Linux kernel has the following limits on shell commands:
> I.  Total number of bytes used to specify arguments must be under 128KB.
> II. Each environment variable passed to an executable must be under 128 KiB
>
> In order to circumvent these limitations, many build tools support
> response-files, i.e. files that contain the arguments for the executed
> command. These are typically passed using @ syntax.
>
> Gcc uses the COLLECT_GCC_OPTIONS environment variable to transfer the
> expanded command line to collect2. With many options, this exceeds the limit 
> II.
>
> GCC : Added Testcase for PR111527
>
> TC1 : If the command line argument less than 128kb, gcc should use
>   COLLECT_GCC_OPTION to communicate and compile fine.
> TC2 : If the command line argument in the range of 128kb to 2mb,
>   gcc should copy arguments in a file and use FILE_GCC_OPTIONS
>   to communicate and compile fine.
> TC3 : If the command line argument greater thean 2mb, gcc shuld
>   fail the compile and report error. (Expected FAIL)
>
> Signed-off-by: sunil dora 
> Signed-off-by: Topi Kuutela 
> Signed-off-by: Deepthi Hemraj 
> ---
>  gcc/collect2.cc   | 39 ++--
>  gcc/gcc.cc| 37 +--
>  gcc/testsuite/gcc.dg/longcmd/longcmd.exp  | 16 +
>  gcc/testsuite/gcc.dg/longcmd/pr111527-1.c | 44 +++
>  gcc/testsuite/gcc.dg/longcmd/pr111527-2.c |  9 +
>  gcc/testsuite/gcc.dg/longcmd/pr111527-3.c | 10 ++
>  gcc/testsuite/gcc.dg/longcmd/pr111527-4.c | 10 ++
>  7 files changed, 159 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/longcmd.exp
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/longcmd/pr111527-4.c
>
> diff --git a/gcc/collect2.cc b/gcc/collect2.cc
> index 902014a9cc1..1f56963b1ce 100644
> --- a/gcc/collect2.cc
> +++ b/gcc/collect2.cc
> @@ -376,6 +376,39 @@ typedef int scanfilter;
>
>  static void scan_prog_file (const char *, scanpass, scanfilter);
>
> +char* getenv_extended (const char* var_name)
> +{
> +  int file_size;
> +  char* buf = NULL;
> +  const char* prefix = "/tmp";
> +
> +  char* string = getenv (var_name);
> +  if (strncmp (var_name, prefix, strlen(prefix)) == 0)

This is not what was meant by saying using the same env and supporting
response files.
Instead what Richard meant was use `@file` as the option that gets
passed via COLLECT_GCC_OPTIONS and then if you see `@` expand the
options like what is done for the normal command line.
Hard coding "/tmp" here is wrong because TMPDIR might not be set to
"/tmp" and even more with -save-temps, the response file should stay
around afterwards and be in the working directory rather than TMPDIR.

Thanks,
Andrew Pinski

> +{
> +  FILE *fptr;
> +  fptr = fopen (string, "r");
> +  if (fptr == NULL)
> +   return (0);
> +  /* Copy contents from temporary file to buffer */
> +  if (fseek (fptr, 0, SEEK_END) == -1)
> +   return (0);
> +  file_size = ftell (fptr);
> +  rewind (fptr);
> +  buf = (char *) xmalloc (file_size + 1);
> +  if (buf == NULL)
> +   return (0);
> +  if (fread ((void *) buf, file_size, 1, fptr) <= 0)
> +   {
> + free (buf);
> + fatal_error (input_location, "fread failed");
> + return (0);
> +   }
> +  buf[file_size] = '\0';
> +  return buf;
> +}
> +  return string;
> +}
> +
>
>  /* Delete tempfiles and exit function.  */
>
> @@ -1004,7 +1037,7 @@ main (int argc, char **argv)
>  /* Now pick up any flags we want early from COLLECT_GCC_OPTIONS
> The LTO options are passed here as are other options that might
> be unsuitable for ld (e.g. -save-temps).  */
> -p = getenv ("COLLECT_GCC_OPTIONS");
> +p = getenv_extended ("COLLECT_GCC_OPTIONS");
>  while (p && *p)
>{
> const char *q = extract_string (&p);
> @@ -1200,7 +1233,7 @@ main (int argc, char **argv)
>   A

[PATCH] middle-end: also optimized `popcount(a) <= 1` [PR90693]

2024-08-29 Thread Andrew Pinski

This expands on optimizing `popcount(a) == 1` to also handle
`popcount(a) <= 1`. `<= 1` can be expanded as `(a & -a) == 0`
like what is done for `== 1` if we know that a was nonzero.
We have to do the optimization in 2 places due to if we have
an optab entry for popcount or not.

Built and tested for aarch64-linux-gnu.

PR middle-end/90693

gcc/ChangeLog:

* internal-fn.cc (expand_POPCOUNT): Handle the second argument
being `-1` for `<= 1`.
* tree-ssa-math-opts.cc (match_single_bit_test): Handle LE/GT
cases.
(math_opts_dom_walker::after_dom_children): Call match_single_bit_test
for LE_EXPR/GT_EXPR also.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-le-1.c: New test.
* gcc.target/aarch64/popcnt-le-2.c: New test.
* gcc.target/aarch64/popcnt-le-3.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/internal-fn.cc| 20 +++-
 .../gcc.target/aarch64/popcnt-le-1.c  | 29 +
 .../gcc.target/aarch64/popcnt-le-2.c  | 31 +++
 .../gcc.target/aarch64/popcnt-le-3.c  | 31 +++
 gcc/tree-ssa-math-opts.cc | 25 +++
 5 files changed, 129 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-le-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-le-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-le-3.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 4e33db365ac..b55f089cf56 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -5304,11 +5304,16 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
  Use rtx costs in that case to determine if .POPCOUNT (arg) == 1
  or (arg ^ (arg - 1)) > arg - 1 is cheaper.
  If .POPCOUNT second argument is 0, we additionally know that arg
- is non-zero, so use arg & (arg - 1) == 0 instead.  */
+ is non-zero, so use arg & (arg - 1) == 0 instead.
+ If .POPCOUNT second argument is -1, the comparison was either `<= 1`
+ or `> 1`.  */
   bool speed_p = optimize_insn_for_speed_p ();
   tree lhs = gimple_call_lhs (stmt);
   tree arg = gimple_call_arg (stmt, 0);
   bool nonzero_arg = integer_zerop (gimple_call_arg (stmt, 1));
+  bool was_le = integer_minus_onep (gimple_call_arg (stmt, 1));
+  if (was_le)
+nonzero_arg = true;
   tree type = TREE_TYPE (arg);
   machine_mode mode = TYPE_MODE (type);
   machine_mode lhsmode = TYPE_MODE (TREE_TYPE (lhs));
@@ -5360,10 +5365,23 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
 emit_insn (popcount_insns);
   else
 {
+  start_sequence ();
   emit_insn (cmp_insns);
   plhs = expand_normal (lhs);
   if (GET_MODE (cmp) != GET_MODE (plhs))
cmp = convert_to_mode (GET_MODE (plhs), cmp, 1);
+  /* For `<= 1`, we need to produce `2 - cmp` or `cmp ? 1 : 2` as that
+then gets compared against 1 and we need the false case to be 2.  */
+  if (was_le)
+   {
+ cmp = expand_simple_binop (GET_MODE (cmp), MINUS, const2_rtx,
+cmp, NULL_RTX, 1, OPTAB_WIDEN);
+ if (!cmp)
+   goto fail;
+   }
   emit_move_insn (plhs, cmp);
+  rtx_insn *all_insns = get_insns ();
+  end_sequence ();
+  emit_insn (all_insns);
 }
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-le-1.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt-le-1.c
new file mode 100644
index 000..b4141da982c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt-le-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand-details" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR middle-end/90693 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** le32:
+** sub w([0-9]+), w0, #1
+** tst w0, w\1
+** csetw0, eq
+** ret
+*/
+
+unsigned le32 (const unsigned int a) {
+  return __builtin_popcountg (a) <= 1;
+}
+
+/*
+** gt32:
+** sub w([0-9]+), w0, #1
+** tst w0, w\1
+** csetw0, ne
+** ret
+*/
+unsigned gt32 (const unsigned int a) {
+  return __builtin_popcountg (a) > 1;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-le-2.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt-le-2.c
new file mode 100644
index 000..975552ca63e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt-le-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mgeneral-regs-only -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "POPCOUNT \\\(" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "__builtin_popcount \\\(" "optimized" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR middle-end/90693 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** le32:
+** sub w(

[PATCH 1/2] expand: Small speed up expansion of __builtin_prefetch

2024-08-29 Thread Andrew Pinski

This is a small speed up of the expansion of __builtin_prefetch.
Basically for the optional arguments, no reason to call expand_normal
on a constant integer that we know the value, just replace it with
GEN_INT/const0_rtx instead.

Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* builtins.cc (expand_builtin_prefetch): Rewrite expansion of the 
optional
arguments to not expand known constants.
---
 gcc/builtins.cc | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index b4d51eaeba5..37c7c98e5c7 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -1280,25 +1280,22 @@ expand_builtin_prefetch (tree exp)
  zero (read) and argument 2 (locality) defaults to 3 (high degree of
  locality).  */
   nargs = call_expr_nargs (exp);
-  if (nargs > 1)
-arg1 = CALL_EXPR_ARG (exp, 1);
-  else
-arg1 = integer_zero_node;
-  if (nargs > 2)
-arg2 = CALL_EXPR_ARG (exp, 2);
-  else
-arg2 = integer_three_node;
+  arg1 = nargs > 1 ? CALL_EXPR_ARG (exp, 1) : NULL_TREE;
+  arg2 = nargs > 2 ? CALL_EXPR_ARG (exp, 2) : NULL_TREE;
 
   /* Argument 0 is an address.  */
   op0 = expand_expr (arg0, NULL_RTX, Pmode, EXPAND_NORMAL);
 
   /* Argument 1 (read/write flag) must be a compile-time constant int.  */
-  if (TREE_CODE (arg1) != INTEGER_CST)
+  if (arg1 == NULL_TREE)
+op1 = const0_rtx;
+  else if (TREE_CODE (arg1) != INTEGER_CST)
 {
   error ("second argument to %<__builtin_prefetch%> must be a constant");
-  arg1 = integer_zero_node;
+  op1 = const0_rtx;
 }
-  op1 = expand_normal (arg1);
+  else
+op1 = expand_normal (arg1);
   /* Argument 1 must be either zero or one.  */
   if (INTVAL (op1) != 0 && INTVAL (op1) != 1)
 {
@@ -1308,12 +1305,15 @@ expand_builtin_prefetch (tree exp)
 }
 
   /* Argument 2 (locality) must be a compile-time constant int.  */
-  if (TREE_CODE (arg2) != INTEGER_CST)
+  if (arg2 == NULL_TREE)
+op2 = GEN_INT (3);
+  else if (TREE_CODE (arg2) != INTEGER_CST)
 {
   error ("third argument to %<__builtin_prefetch%> must be a constant");
-  arg2 = integer_zero_node;
+  op2 = const0_rtx;
 }
-  op2 = expand_normal (arg2);
+  else
+op2 = expand_normal (arg2);
   /* Argument 2 must be 0, 1, 2, or 3.  */
   if (INTVAL (op2) < 0 || INTVAL (op2) > 3)
 {
-- 
2.43.0

[PATCH 2/2] middle-end: Remove integer_three_node [PR116537]

2024-08-29 Thread Andrew Pinski

After the small expansion patch for __builtin_prefetch, the
only use of integer_three_node is inside tree-ssa-loop-prefetch.cc so let's
remove it as the loop prefetch pass is not enabled these days by default and
having a tree node around just for that pass is a little wasteful. Integer
constants are also shared these days so calling build_int_cst will use the 
cached
node anyways.

Bootstrapped and tested on x86_64-linux.

PR middle-end/116537

gcc/ChangeLog:

* tree-core.h (enum tree_index): Remove TI_INTEGER_THREE
* tree-ssa-loop-prefetch.cc (issue_prefetch_ref): Call build_int_cst
instead of using integer_three_node.
* tree.cc (build_common_tree_nodes): Remove initialization
of integer_three_node.
* tree.h (integer_three_node): Delete.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-core.h   | 1 -
 gcc/tree-ssa-loop-prefetch.cc | 2 +-
 gcc/tree.cc   | 1 -
 gcc/tree.h| 1 -
 4 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c7702..a36817059da 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -662,7 +662,6 @@ enum tree_index : unsigned {
 
   TI_INTEGER_ZERO,
   TI_INTEGER_ONE,
-  TI_INTEGER_THREE,
   TI_INTEGER_MINUS_ONE,
   TI_NULL_POINTER,
 
diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc
index bb5d5dec779..3569403c618 100644
--- a/gcc/tree-ssa-loop-prefetch.cc
+++ b/gcc/tree-ssa-loop-prefetch.cc
@@ -1182,7 +1182,7 @@ issue_prefetch_ref (struct mem_ref *ref, unsigned 
unroll_factor, unsigned ahead)
   addr_base = force_gimple_operand_gsi (&bsi, unshare_expr (addr_base),
true, NULL, true, GSI_SAME_STMT);
   write_p = ref->write_p ? integer_one_node : integer_zero_node;
-  local = nontemporal ? integer_zero_node : integer_three_node;
+  local = nontemporal ? integer_zero_node : build_int_cst (integer_type_node, 
3);
 
   for (ap = 0; ap < n_prefetches; ap++)
 {
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 17a5cea7c25..b14cfbe7929 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9567,7 +9567,6 @@ build_common_tree_nodes (bool signed_char)
   /* Define these next since types below may used them.  */
   integer_zero_node = build_int_cst (integer_type_node, 0);
   integer_one_node = build_int_cst (integer_type_node, 1);
-  integer_three_node = build_int_cst (integer_type_node, 3);
   integer_minus_one_node = build_int_cst (integer_type_node, -1);
 
   size_zero_node = size_int (0);
diff --git a/gcc/tree.h b/gcc/tree.h
index c501019717f..93aa7d22d6f 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -,7 +,6 @@ tree_strip_any_location_wrapper (tree exp)
 
 #define integer_zero_node  global_trees[TI_INTEGER_ZERO]
 #define integer_one_node   global_trees[TI_INTEGER_ONE]
-#define integer_three_node  global_trees[TI_INTEGER_THREE]
 #define integer_minus_one_node global_trees[TI_INTEGER_MINUS_ONE]
 #define size_zero_node global_trees[TI_SIZE_ZERO]
 #define size_one_node  global_trees[TI_SIZE_ONE]
-- 
2.43.0

[PATCH] expand: Allow widdening optab when expanding popcount==1 [PR116508]

2024-08-28 Thread Andrew Pinski

After adding popcount{qi,hi}2 to the aarch64 backend, I noticed that
the expansion for popcount==1 was no longer trying to do the trick
of handling popcount==1 as `(arg ^ (arg - 1)) > arg - 1`. The problem
is the expansion was using OPTAB_DIRECT, when using OPTAB_WIDEN
will allow modes which are smaller than SImode (in the aarch64 case).

Note QImode's cost still needs some improvements so part of popcnt-eq-1.c
is xfailed. Though there is a check to make sure the costs are compared now.

Built and tested on aarch64-linux-gnu.

PR middle-end/116508

gcc/ChangeLog:

* internal-fn.cc (expand_POPCOUNT): Use OPTAB_WIDEN for PLUS and
XOR/AND expansion.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-eq-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/internal-fn.cc|  4 +-
 .../gcc.target/aarch64/popcnt-eq-1.c  | 45 +++
 2 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-eq-1.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 78997ef056a..4e33db365ac 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -5332,11 +5332,11 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
   start_sequence ();
   rtx op0 = expand_normal (arg);
   rtx argm1 = expand_simple_binop (mode, PLUS, op0, constm1_rtx, NULL_RTX,
-  1, OPTAB_DIRECT);
+  1, OPTAB_WIDEN);
   if (argm1 == NULL_RTX)
 goto fail;
   rtx argxorargm1 = expand_simple_binop (mode, nonzero_arg ? AND : XOR, op0,
-argm1, NULL_RTX, 1, OPTAB_DIRECT);
+argm1, NULL_RTX, 1, OPTAB_WIDEN);
   if (argxorargm1 == NULL_RTX)
 goto fail;
   rtx cmp;
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-eq-1.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt-eq-1.c
new file mode 100644
index 000..bb9e2bf0a54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt-eq-1.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand-details" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR middle-end/116508 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h16:
+** sub w([0-9]+), w0, #1
+** eor w([0-9]+), w0, w\1
+** and w([0-9]+), w\1, 65535
+** cmp w\3, w\2, uxth
+** csetw0, cc
+** ret
+*/
+
+/* when expanding popcount == 1, should use
+   `(arg ^ (arg - 1)) > arg - 1` as that has a lower latency
+   than doing the popcount then comparing against 1.
+   The popcount/addv can be costly. */
+unsigned h16 (const unsigned short a) {
+ return __builtin_popcountg (a) == 1;
+}
+
+/* unsigned char should also do the same trick */
+/* Currently xfailed since the cost does not take into account the
+   moving between gprs and vector regs correctly. */
+/*
+** h8: { xfail *-*-* }
+** sub w([0-9]+), w0, #1
+** eor w([0-9]+), w0, w\1
+** and w([0-9]+), w\1, 255
+** cmp w\3, w\2, uxtb
+** csetw0, cc
+** ret
+*/
+
+
+unsigned h8 (const unsigned char a) {
+ return __builtin_popcountg (a) == 1;
+}
+
+/* There should be printing out the costs for h8 and h16's popcount == 1 */
+/* { dg-final { scan-rtl-dump-times "popcount == 1:" 2 "expand"} } */
-- 
2.43.0

Re: [PATCH v2 2/5] testsuite: Add scan-ltrans-rtl* for use in dg-final [PR116140]

2024-08-28 Thread Andrew Pinski

On Wed, Aug 28, 2024 at 4:05 AM Alex Coplan  wrote:
>
> On 28/08/2024 11:53, Richard Sandiford wrote:
> > Alex Coplan  writes:
> > > Hi,
> > >
> > > This is a v2 of:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659966.html
> > > which is rebased on top of Richard S's patch to reduce the cut-and-paste 
> > > in
> > > scanltranstree.exp (thanks again for doing that).
> > >
> > > Tested on aarch64-linux-gnu, OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > -- >8 --
> > >
> > > This extends the scan-ltrans-tree* helpers to create RTL variants.  This
> > > is needed to check the behaviour of an RTL pass under LTO.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR libstdc++/116140
> > > * doc/sourcebuild.texi: Document ltrans-rtl value of kind for
> > > scan--dump*.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR libstdc++/116140
> > > * lib/scanltranstree.exp (scan-ltrans-rtl-dump): New.
> > > (scan-ltrans-rtl-dump-not): New.
> > > (scan-ltrans-rtl-dump-dem): New.
> > > (scan-ltrans-rtl-dump-dem-not): New.
> > > (scan-ltrans-rtl-dump-times): New.
> >
> > The patch only contains the gcc/testsuite changes, but those are ok
> > for trunk, thanks.
>
> Gah, sorry -- those got lost in the rebase.  Is it OK to commit this
> together with the doc changes included as per the previous patch?

I am getting a new ERROR after this. Maybe you didn't notice this
since you were looking for new FAIL.
+ERROR: gcc.dg/ipa/ipa-icf-38.c: error executing dg-final: variable is
not assigned by any conversion specifiers

The error corresponds to:
/* { dg-final { scan-ltrans-tree-dump "Function foo" "optimized" } } */

That is the only testcase which uses `scan-ltrans-tree-dump*` even.

Thanks,
Andrew Pinski



>
> Alex
>
> >
> > Richard
> >
> > > ---
> > >  gcc/testsuite/lib/scanltranstree.exp | 80 +---
> > >  1 file changed, 37 insertions(+), 43 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/lib/scanltranstree.exp 
> > > b/gcc/testsuite/lib/scanltranstree.exp
> > > index bc6e02dc369..a7d4de3765f 100644
> > > --- a/gcc/testsuite/lib/scanltranstree.exp
> > > +++ b/gcc/testsuite/lib/scanltranstree.exp
> > > @@ -19,50 +19,44 @@
> > >
> > >  load_lib scandump.exp
> > >
> > > -# The first item in the list is an LTO equivalent of the second item
> > > -# in the list; see the documentation of the second item for details.
> > > -foreach { name scan type suffix } {
> > > -scan-ltrans-tree-dump scan-dump ltrans-tree t
> > > -scan-ltrans-tree-dump-not scan-dump-not ltrans-tree t
> > > -scan-ltrans-tree-dump-dem scan-dump-dem ltrans-tree t
> > > -scan-ltrans-tree-dump-dem-not scan-dump-dem-not ltrans-tree t
> > > -} {
> > > -eval [string map [list @NAME@ $name \
> > > -  @SCAN@ $scan \
> > > -  @TYPE@ $type \
> > > -  @SUFFIX@ $suffix] {
> > > -proc @NAME@ { args } {
> > > -   if { [llength $args] < 2 } {
> > > -   error "@NAME@: too few arguments"
> > > -   return
> > > -   }
> > > -   if { [llength $args] > 3 } {
> > > -   error "@NAME@: too many arguments"
> > > -   return
> > > +# Define scan-ltrans-{tree,rtl}-dump{,-not,-dem,-dem-not}.  These are LTO
> > > +# variants of the corresponding functions without -ltrans in the name.
> > > +foreach ir { tree rtl } {
> > > +foreach modifier { {} -not -dem -dem-not } {
> > > +   eval [string map [list @NAME@ scan-ltrans-$ir-dump$modifier \
> > > +  @SCAN@ scan$modifier \
> > > +  @TYPE@ ltrans-$ir \
> > > +  @SUFFIX@ [string index $ir 0]] {
> > > +   proc @NAME@ { args } {
> > > +   if { [llength $args] < 2 } {
> > > +   error "@NAME@: too few arguments"
> > > +   return
> > > +   }
> > > +   if { [llength $args] > 3 } {
> > > +   error "@NAME@: too many arguments"
> > > +   return
> > > +   }
> > > +   if { [llength $args] >=

Re: [PATCH 1/3] expand: Add debug dump on the cost for `popcount==1` expand

2024-08-28 Thread Andrew Pinski

On Wed, Aug 28, 2024 at 12:26 AM Richard Biener
 wrote:
>
> On Wed, Aug 28, 2024 at 6:34 AM Andrew Pinski  
> wrote:
> >
> > While working on PR 114224, I found it would be useful to dump the
> > different costs of the expansion to make easier to understand why one
> > was chosen over the other.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> > Build and tested for aarch64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.cc (expand_POPCOUNT): Dump the costs for
> > the two choices.
> > ---
> >  gcc/internal-fn.cc | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 89da13b38ce..91210976a0a 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -5351,6 +5351,14 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
> >unsigned popcount_cost = (seq_cost (popcount_insns, speed_p)
> > + seq_cost (popcount_cmp_insns, speed_p));
> >unsigned cmp_cost = seq_cost (cmp_insns, speed_p);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +{
> > +  fprintf(dump_file, "popcount == 1, cost\n");
> > +  fprintf(dump_file, "popcount: %u\n", popcount_cost);
> > +  fprintf(dump_file, "cmp: %u\n\n", cmp_cost);
>
> Can you make this more brief in a single line, like
>
> choice for popcount == 1: popcount cost: %u; cmp cost: %u\n
>
> ?
>
> OK with that change.

Yes that makes better sense really; I was originally thinking about
putting it on one line but then decided against it just for easier to
program at the time.
Attached is what I pushed in the end.

Thanks,
Andrew

>
> > +}
> > +
> >if (popcount_cost <= cmp_cost)
> >  emit_insn (popcount_insns);
> >else
> > --
> > 2.43.0
> >


0001-expand-Add-debug-dump-on-the-cost-for-popcount-1-exp.patch
Description: Binary data

[PATCH 2/3] aarch64: Handle cost for vector add reduction

2024-08-27 Thread Andrew Pinski

While working on PR 114224 (popcount costs is not modeled), I noticed
that addv (vector reduction add) was not handled either. This adds the handling
there. Some of the extends are part of the instructions so we need to handle 
those
too.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_rtx_addv_costs): New function.
(aarch64_rtx_costs): For unspec_addv, call aarch64_rtx_addv_costs.
For unspec_addv under a zero_extend, call aarch64_rtx_addv_costs.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.cc | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 40dacfcf2e7..7607b85e3cf 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14097,6 +14097,31 @@ aarch64_abd_rtx_p (rtx x)
   return rtx_equal_p (maxop0, minop0) && rtx_equal_p (maxop1, minop1);
 }
 
+/* Handle the cost for unspec ADDV (reduction add).
+   Result is true if the total cost of the operation
+   has now been calculated. */
+static bool
+aarch64_rtx_addv_costs (rtx op0, int *cost, bool speed)
+{
+  const struct cpu_cost_table *extra_cost
+= aarch64_tune_params.insn_extra_cost;
+
+  if (speed)
+*cost += extra_cost->vect.alu;
+
+  /* The zero/sign extend part of the reduction is part of the instruction. */
+  if (GET_CODE (op0) == ZERO_EXTEND
+  || GET_CODE (op0) == SIGN_EXTEND)
+   {
+ *cost += rtx_cost (XEXP (op0, 0), GET_MODE (XEXP (op0, 0)),
+   UNSPEC, 0, speed);
+ return true;
+   }
+
+   *cost += rtx_cost (op0, GET_MODE (op0), UNSPEC, 0, speed);
+   return true;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
@@ -14912,6 +14937,11 @@ cost_plus:
 case ZERO_EXTEND:
 
   op0 = XEXP (x, 0);
+  /* Addv with an implicit zero extend. */
+  if (GET_CODE (op0) == UNSPEC
+ && XINT (op0, 1) == UNSPEC_ADDV)
+   return aarch64_rtx_addv_costs (XVECEXP (op0, 0, 0),
+  cost, speed);
   /* If a value is written in SI mode, then zero extended to DI
 mode, the operation will in general be free as a write to
 a 'w' register implicitly zeroes the upper bits of an 'x'
@@ -15378,6 +15408,11 @@ cost_plus:
 
   return false;
 }
+  /* The vector integer/floating point add reduction instructions. */
+  if (XINT (x, 1) == UNSPEC_ADDV
+ || XINT (x, 1) == UNSPEC_FADDV)
+   return aarch64_rtx_addv_costs (XVECEXP (x, 0, 0), cost, speed);
+
   break;
 
 case TRUNCATE:
-- 
2.43.0

[PATCH 3/3] aarch64: Add rtx cost for popcount [PR114224]

2024-08-27 Thread Andrew Pinski

While looking into some popcount related I noticed that the popcount
cost is not modeled at all. This adds both the vector and scalar (for CSSC)
costs. For CSSC, we default to `COSTS_N_INSNS (3)` based on the Ampere1B's
cycle count that is found from LLVM's model.

Built and tested for aarch64-linux-gnu.
Built also arm-linux-eabi because of the shared structure.

PR target/114224

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle POPCOUNT.
* config/arm/aarch-common-protos.h (struct alu_cost_table): Add pop 
field.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, 
thunderx_extra_costs,
thunderx2t99_extra_costs, thunderx3t110_extra_costs,
tsv110_extra_costs, a64fx_extra_costs,
ampere1_extra_costs, ampere1a_extra_costs,
ampere1b_extra_costs): Update for pop field.
* config/arm/aarch-cost-tables.h (generic_extra_costs, 
cortexa53_extra_costs,
cortexa57_extra_costs, cortexa76_extra_costs, exynosm1_extra_costs,
xgene1_extra_costs): Likewise.
* config/arm/arm.cc (cortexa9_extra_costs, cortexa8_extra_costs,
cortexa5_extra_costs, cortexa7_extra_costs, cortexa12_extra_costs,
cortexa15_extra_costs, v7m_extra_costs): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt11.c: New test.
* gcc.target/aarch64/popcnt12.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-cost-tables.h|  9 +
 gcc/config/aarch64/aarch64.cc   | 20 +++
 gcc/config/arm/aarch-common-protos.h|  1 +
 gcc/config/arm/aarch-cost-tables.h  |  6 
 gcc/config/arm/arm.cc   |  7 
 gcc/testsuite/gcc.target/aarch64/popcnt11.c | 37 +
 gcc/testsuite/gcc.target/aarch64/popcnt12.c | 37 +
 7 files changed, 117 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt11.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt12.c

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 7c794916117..a9005d02d4e 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -42,6 +42,7 @@ const struct cpu_cost_table qdf24xx_extra_costs =
 0, /* bfx.  */
 0, /* clz.  */
 0,/* rev.  */
+COSTS_N_INSNS (2), /* pop.  */
 0, /* non_exec.  */
 true   /* non_exec_costs_exec.  */
   },
@@ -150,6 +151,7 @@ const struct cpu_cost_table thunderx_extra_costs =
 0, /* Bfx.  */
 COSTS_N_INSNS (5), /* Clz.  */
 0, /* rev.  */
+COSTS_N_INSNS (2),  /* pop.  */
 0, /* UNUSED: non_exec.  */
 false  /* UNUSED: non_exec_costs_exec.  */
   },
@@ -257,6 +259,7 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
 0, /* Bfx.  */
 COSTS_N_INSNS (3), /* Clz.  */
 0, /* Rev.  */
+COSTS_N_INSNS (2),  /* pop.  */
 0, /* Non_exec.  */
 true   /* Non_exec_costs_exec.  */
   },
@@ -364,6 +367,7 @@ const struct cpu_cost_table thunderx3t110_extra_costs =
 0, /* Bfx.  */
 COSTS_N_INSNS (3), /* Clz.  */
 0, /* Rev.  */
+COSTS_N_INSNS (2),  /* pop.  */
 0, /* Non_exec.  */
 true   /* Non_exec_costs_exec.  */
   },
@@ -471,6 +475,7 @@ const struct cpu_cost_table tsv110_extra_costs =
 0, /* bfx.  */
 0, /* clz.  */
 0, /* rev.  */
+COSTS_N_INSNS (2),  /* pop.  */
 0, /* non_exec.  */
 true   /* non_exec_costs_exec.  */
   },
@@ -579,6 +584,7 @@ const struct cpu_cost_table a64fx_extra_costs =
 0, /* bfx.  */
 0, /* clz.  */
 0, /* rev.  */
+COSTS_N_INSNS (2), /* pop.  */
 0, /* non_exec.  */
 true   /* non_exec_costs_exec.  */
   },
@@ -686,6 +692,7 @@ const struct cpu_cost_table ampere1_extra_costs =
 0, /* bfx.  */
 0, /* clz.  */
 0, /* rev.  */
+COSTS_N_INSNS (2), /* pop.  */
 0, /* non_exec.  */
 true   /* non_exec_costs_exec.  */
   },
@@ -793,6 +800,7 @@ const struct cpu_cost_table ampere1a_extra_costs =
 0, /* bfx.  */
 0, /* clz.  */
 0, /* rev.  */
+COSTS_N_INSNS (2), /* pop.  */
 0, /* non_exec.  */
 true   /* non_exec_costs_exec.  */
   },
@@ -900,6 +908,7 @@ const struct cpu_cost_table ampere1b_extra_costs =
 0, /* bfx.  */
 0, /* clz.  */
 0, /* rev.  */
+COSTS_

[PATCH 1/3] expand: Add debug dump on the cost for `popcount==1` expand

2024-08-27 Thread Andrew Pinski

While working on PR 114224, I found it would be useful to dump the
different costs of the expansion to make easier to understand why one
was chosen over the other.

Bootstrapped and tested on x86_64-linux-gnu.
Build and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* internal-fn.cc (expand_POPCOUNT): Dump the costs for
the two choices.
---
 gcc/internal-fn.cc | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 89da13b38ce..91210976a0a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -5351,6 +5351,14 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
   unsigned popcount_cost = (seq_cost (popcount_insns, speed_p)
+ seq_cost (popcount_cmp_insns, speed_p));
   unsigned cmp_cost = seq_cost (cmp_insns, speed_p);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf(dump_file, "popcount == 1, cost\n");
+  fprintf(dump_file, "popcount: %u\n", popcount_cost);
+  fprintf(dump_file, "cmp: %u\n\n", cmp_cost);
+}
+
   if (popcount_cost <= cmp_cost)
 emit_insn (popcount_insns);
   else
-- 
2.43.0

Re: [PATCH] MATCH: add abs support for half float

2024-08-27 Thread Andrew Pinski

On Tue, Aug 27, 2024 at 8:54 PM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> Thanks for the reply.
>
> > On 27 Aug 2024, at 7:05 pm, Richard Biener  
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Aug 27, 2024 at 8:23 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >>> On 22 Aug 2024, at 10:34 pm, Richard Biener  
> >>> wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On Wed, Aug 21, 2024 at 12:08 PM Kugan Vivekanandarajah
> >>>  wrote:
> >>>>
> >>>> Hi Richard,
> >>>>
> >>>>> On 20 Aug 2024, at 6:09 pm, Richard Biener  
> >>>>> wrote:
> >>>>>
> >>>>> External email: Use caution opening links or attachments
> >>>>>
> >>>>>
> >>>>> On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah
> >>>>>  wrote:
> >>>>>>
> >>>>>> Thanks for the comments.
> >>>>>>
> >>>>>>> On 2 Aug 2024, at 8:36 pm, Richard Biener 
> >>>>>>>  wrote:
> >>>>>>>
> >>>>>>> External email: Use caution opening links or attachments
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 1 Aug 2024, at 10:46 pm, Richard Biener 
> >>>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>> External email: Use caution opening links or attachments
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah
> >>>>>>>>>  wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski  
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
> >>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
> >>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
> >>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> >>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski 
> >>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> >>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Revised based on the comment and moved it into existing 
> >>>>>>>>>>>>>>>>>> patterns as.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> * match.pd: Extend A

RE: [PATCH 3/3] Match: Add pattern for `(a ? b : 0) | (a ? 0 : c)` into `a ? b : c` [PR103660]

2024-08-26 Thread Andrew Pinski (QUIC)

> -Original Message-
> From: Marc Glisse 
> Sent: Monday, August 26, 2024 4:46 AM
> To: Richard Biener 
> Cc: Andrew Pinski (QUIC) ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH 3/3] Match: Add pattern for `(a ? b : 0) | (a
> ? 0 : c)` into `a ? b : c` [PR103660]
> 
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -2339,6 +2339,16 @@
> DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>   (if (INTEGRAL_TYPE_P (type))
> >>(bit_and @0 @1)))
> >>
> >> +/* Fold `(a ? b : 0) | (a ? 0 : c)` into (a ? b : c).
> >> +Handle also ^ and + in replacement of `|`. */ (for cnd
> (cond
> >> +vec_cond)  (for op (bit_ior bit_xor plus)
> >> +  (simplify
> >> +   (op:c
> >> +(cnd:s @0 @00 integer_zerop)
> >> +(cnd:s @0 integer_zerop @01))
> >> +   (cnd @0 @00 @01
> 
> Wouldn't it fall into something more generic like
> 
> (for cnd (cond vec_cond)
>   (for op (any_binary)
>(simplify
> (op
>  (cnd:s @0 @1 @2)
>  (cnd:s @0 @3 @4))
> (cnd @0 (op! @1 @3) (op! @2 @4)
> 
> ?
> 
> The example given in the doc for the use of '!' is pretty close

Yes we can extend the pattern that is already there for vec_cond too. Though I 
also think we keep the special case for the newly added because then we need to 
extra steps to see that op is no longer there.

Another thing longer term is to remove VEC_COND_EXPR and merge it with 
COND_EXPP. I know this was already mentioned in a different thread but  I don't 
want to duplicate work someone else might be doing; so, I have held back on 
trying to implement that.

Thanks,
Andrew Pinski 

> 
> @smallexample
> (simplify
>(plus (vec_cond:s @@0 @@1 @@2) @@3)
>(vec_cond @@0 (plus! @@1 @@3) (plus! @@2 @@3)))
> @end smallexample
> 
> --
> Marc Glisse

Re: [committed] libstdc++: Make std::vector::reference constructor private [PR115098]

2024-08-25 Thread Andrew Pinski

On Fri, Aug 23, 2024 at 5:20 AM Jonathan Wakely  wrote:
>
> Tested x86_64-linux. Pushed to trunk.
>
> -- >8 --
>
> The standard says this constructor should be private.  LWG 4141 proposes
> to remove it entirely. We still need it, but it doesn't need to be
> public.
>
> For std::bitset the default constructor is already private (and never
> even defined) but there's a non-standard constructor that's public, but
> doesn't need to be.

This looks like broke the pretty-printers testcase:
```
/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc:
In function 'int main()':
/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc:156:
error: 'std::_Bit_reference::_Bit_reference()' is private within this
context
In file included from
/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/vector:67,
 from
/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc:31:
/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_bvector.h:90:
note: declared private here
compiler exited with status 1

...
spawn -ignore SIGHUP
/home/apinski/src/upstream-gcc-isel/gcc/objdir/./gcc/xg++
-shared-libgcc -B/home/apinski/src/upstream-gcc-isel/gcc/objdir/./gcc
-nostdinc++ 
-L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/src
-L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/src/.libs
-L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
-B/home/apinski/upstream-gcc-isel/x86_64-pc-linux-gnu/bin/
-B/home/apinski/upstream-gcc-isel/x86_64-pc-linux-gnu/lib/ -isystem
/home/apinski/upstream-gcc-isel/x86_64-pc-linux-gnu/include -isystem
/home/apinski/upstream-gcc-isel/x86_64-pc-linux-gnu/sys-include
-fchecking=1 
-B/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/./libstdc++-v3/src/.libs
-fmessage-length=0 -fno-show-column -ffunction-sections
-fdata-sections -fcf-protection -mshstk -g -O2 -D_GNU_SOURCE
-DLOCALEDIR="." -nostdinc++
-I/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu
-I/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include
-I/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/libsupc++
-I/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/include/backward
-I/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/util
/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc
-g -O0 -fdiagnostics-plain-output ./libtestc++.a -Wl,--gc-sections
-L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/src/filesystem/.libs
-L/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/src/experimental/.libs
-lm -o ./simple11.exe
/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc:
In function 'int main()':
/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc:149:
error: 'std::_Bit_reference::_Bit_reference()' is private within this
context
In file included from
/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/vector:67,
 from
/home/apinski/src/upstream-gcc-isel/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc:31:
/home/apinski/src/upstream-gcc-isel/gcc/objdir/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_bvector.h:90:
note: declared private here
compiler exited with status 1
```

Noticed because of the new UNRESOLVED .

Thanks,
Andrew Pinski




>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/115098
> * include/bits/stl_bvector.h (_Bit_reference): Make default
> constructor private. Declare vector and bit iterators as
> friends.
> * include/std/bitset (bitset::reference): Make constructor and
> data members private.
> * testsuite/20_util/bitset/115098.cc: New test.
> * testsuite/23_containers/vector/bool/115098.cc: New test.
> ---
>  libstdc++-v3/include/bits/stl_bvector.h  | 12 +---
>  libstdc++-v3/include/std/bitset  |  5 +
>  libstdc++-v3/testsuite/20_util/bitset/115098.cc  | 11 +++
>  .../testsuite/23_containers/vector/bool/115098.cc|  8 
>  4 files changed, 29 insertions(+), 7 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/20_util/bitset/115098.cc
>  create mode 100644 libstdc++-v3/testsuite/23_containers/vector/bool/115098.cc
>
> diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
> b/libstdc++-v3/inclu

[PATCH] expand: Use the correct mode for store flags for popcount [PR116480]

2024-08-25 Thread Andrew Pinski

When expanding popcount used for equal to 1 (or rather 
__builtin_stdc_has_single_bit),
the wrong mode was bsing used for the mode of the store flags. We were using 
the mode
of the argument to popcount but since popcount's return value is always int, 
the mode
of the expansion here should have been the mode of the return type rater than 
the argument.

Built and tested on aarch64-linux-gnu with no regressions.
Also bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/116480

gcc/ChangeLog:

* internal-fn.cc (expand_POPCOUNT): Use the correct mode
for store flags.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116480-1.c: New test.
* gcc.dg/torture/pr116480-2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/internal-fn.cc| 3 ++-
 gcc/testsuite/gcc.dg/torture/pr116480-1.c | 8 
 gcc/testsuite/gcc.dg/torture/pr116480-2.c | 8 
 3 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116480-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116480-2.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index a96e61e527c..89da13b38ce 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -5311,6 +5311,7 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
   bool nonzero_arg = integer_zerop (gimple_call_arg (stmt, 1));
   tree type = TREE_TYPE (arg);
   machine_mode mode = TYPE_MODE (type);
+  machine_mode lhsmode = TYPE_MODE (TREE_TYPE (lhs));
   do_pending_stack_adjust ();
   start_sequence ();
   expand_unary_optab_fn (fn, stmt, popcount_optab);
@@ -5318,7 +5319,7 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt)
   end_sequence ();
   start_sequence ();
   rtx plhs = expand_normal (lhs);
-  rtx pcmp = emit_store_flag (NULL_RTX, EQ, plhs, const1_rtx, mode, 0, 0);
+  rtx pcmp = emit_store_flag (NULL_RTX, EQ, plhs, const1_rtx, lhsmode, 0, 0);
   if (pcmp == NULL_RTX)
 {
 fail:
diff --git a/gcc/testsuite/gcc.dg/torture/pr116480-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116480-1.c
new file mode 100644
index 000..15a5727941c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116480-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target int128 } } */
+
+int
+foo(unsigned __int128 b)
+{
+  return __builtin_popcountg(b) == 1;
+}
+
diff --git a/gcc/testsuite/gcc.dg/torture/pr116480-2.c 
b/gcc/testsuite/gcc.dg/torture/pr116480-2.c
new file mode 100644
index 000..7bf690283b4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116480-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target bitint } } */
+
+int
+foo(unsigned _BitInt(127) b)
+{
+  return __builtin_popcountg(b) == 1;
+}
+
-- 
2.43.0

Re: [PATCH] tree-optimization/116463 - complex lowering leaves around dead stmts

2024-08-23 Thread Andrew Pinski

On Fri, Aug 23, 2024 at 5:38 AM Richard Biener  wrote:
>
> Complex lowering generally replaces existing complex defs with
> COMPLEX_EXPRs but those might be dead when it can always refer to
> components from the lattice.  This in turn can pessimize followup
> transforms like forwprop and reassoc, the following makes sure to
> get rid of dead COMPLEX_EXPRs generated by using
> simple_dce_from_worklist.

Just an FYI, I had noticed this also when looking into PR 115544, 2
months ago and I was thinking about implementing then.
It also fixes that issue without the change to the _BitInt lower.

Thanks,
Andrew Pinski

>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, this will cause
> the following fallout which is similar to the aarch64 fallout in
> PR116463, complex SLP recognition being somewhat fragile.  I'll track
> this there.  Pushed.
>
> FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c scan-assembler-not
> vfma
> dd[123]*ph[ t]
> FAIL: gcc.target/i386/avx512fp16-vector-complex-float.c
> scan-assembler-times vf
> maddcph[ t] 1
> FAIL: gcc.target/i386/part-vect-complexhf.c scan-assembler-times
> vfmaddcph[ t] 1
>
>
> PR tree-optimization/116463
> * tree-complex.cc: Include tree-ssa-dce.h.
> (dce_worklist): New global.
> (update_complex_assignment): Add SSA def to the DCE worklist.
> (tree_lower_complex): Perform DCE.
> ---
>  gcc/tree-complex.cc | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
> index dfb45b9d91c..7480c07640e 100644
> --- a/gcc/tree-complex.cc
> +++ b/gcc/tree-complex.cc
> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "case-cfn-macros.h"
>  #include "builtins.h"
>  #include "optabs-tree.h"
> +#include "tree-ssa-dce.h"
>
>  /* For each complex ssa name, a lattice value.  We're interested in finding
> out whether a complex number is degenerate in some way, having only real
> @@ -88,6 +89,9 @@ static vec phis_to_revisit;
>  /* BBs that need EH cleanup.  */
>  static bitmap need_eh_cleanup;
>
> +/* SSA defs we should try to DCE.  */
> +static bitmap dce_worklist;
> +
>  /* Lookup UID in the complex_variable_components hashtable and return the
> associated tree.  */
>  static tree
> @@ -731,6 +735,7 @@ update_complex_assignment (gimple_stmt_iterator *gsi, 
> tree r, tree i)
>update_stmt (stmt);
>if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
>  bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
> +  bitmap_set_bit (dce_worklist, SSA_NAME_VERSION (gimple_assign_lhs (stmt)));
>
>update_complex_components (gsi, gsi_stmt (*gsi), r, i);
>  }
> @@ -1962,6 +1967,7 @@ tree_lower_complex (void)
>complex_propagate.ssa_propagate ();
>
>need_eh_cleanup = BITMAP_ALLOC (NULL);
> +  dce_worklist = BITMAP_ALLOC (NULL);
>
>complex_variable_components = new int_tree_htab_type (10);
>
> @@ -2008,6 +2014,9 @@ tree_lower_complex (void)
>
>gsi_commit_edge_inserts ();
>
> +  simple_dce_from_worklist (dce_worklist, need_eh_cleanup);
> +  BITMAP_FREE (dce_worklist);
> +
>unsigned todo
>  = gimple_purge_all_dead_eh_edges (need_eh_cleanup) ? TODO_cleanup_cfg : 
> 0;
>BITMAP_FREE (need_eh_cleanup);
> --
> 2.43.0

Re: [PATCH] PR tree-optimization/101390: Vectorize modulo operator

2024-08-22 Thread Andrew Pinski

On Thu, Aug 22, 2024 at 11:28 AM Andrew Pinski  wrote:
>
> On Thu, Aug 22, 2024 at 4:12 AM Richard Biener  wrote:
> >
> > On Thu, 22 Aug 2024, Jennifer Schmitz wrote:
> >
> > > On 19 Aug 2024, at 21:02, Richard Sandiford  
> > > wrote:
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > Jennifer Schmitz  writes:
> > > >> Thanks for the comments. I updated the patch accordingly and 
> > > >> bootstrapped and tested again.
> > > >> Best, Jennifer
> > > >>
> > > >> From 9ef423f23afaeaa650d511c51bbc1a167e40b349 Mon Sep 17 00:00:00 2001
> > > >> From: Jennifer Schmitz 
> > > >> Date: Wed, 7 Aug 2024 08:56:45 -0700
> > > >> Subject: [PATCH] PR tree-optimization/101390: Vectorize modulo operator
> > > >>
> > > >> This patch adds a new vectorization pattern that detects the modulo
> > > >> operation where the second operand is a variable.
> > > >> It replaces the statement by division, multiplication, and subtraction.
> > > >>
> > > >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> > > >> regression.
> > > >> Ok for mainline?
> > > >>
> > > >> Signed-off-by: Jennifer Schmitz 
> > > >>
> > > >> gcc/
> > > >>
> > > >>  PR tree-optimization/101390
> > > >>  * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new 
> > > >> pattern.
> > > >>
> > > >> gcc/testsuite/
> > > >>  PR tree-optimization/101390
> > > >>  * gcc.dg/vect/vect-mod-var.c: New test.
> > > >>  * gcc.target/aarch64/sve/mod_1.c: Likewise.
> > > >>  * lib/target-supports.exp: New selector expression.
> > > >
> > > > LGTM, thanks.  Please give others a couple of days to comment though.
> > > >
> > > Pushed to trunk with 9bbad3685131ec95d970f81bf75f9556d4d92742.
> >
> > The gcc.dg/vect/vect-mod-var.c seems to FAIL execution for me on
> > x86_64-linux:
> >
> > FAIL: gcc.dg/vect/vect-mod-var.c -flto -ffat-lto-objects execution test
> > FAIL: gcc.dg/vect/vect-mod-var.c execution test
>
> And on powerpc64: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116461 .

I pushed the fix for the testcase as
r15-3098-gf6b10fe45b9b704fd6a7124ab02c6e6cbd8efce4 . The issue was
just division by 0 is undefined (well mod by 0).

Thanks,
Andrew Pinski

> Thanks,
> Andrew
>
> >
> > Richard.
> >
> > > Best, Jennifer
> > > > Richard
> > > >
> > > >> ---
> > > >> gcc/testsuite/gcc.dg/vect/vect-mod-var.c | 37 +++
> > > >> gcc/testsuite/gcc.target/aarch64/sve/mod_1.c | 28 +
> > > >> gcc/testsuite/lib/target-supports.exp|  5 ++
> > > >> gcc/tree-vect-patterns.cc| 66 
> > > >> 4 files changed, 136 insertions(+)
> > > >> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-mod-var.c
> > > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mod_1.c
> > > >>
> > > >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mod-var.c 
> > > >> b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
> > > >> new file mode 100644
> > > >> index 000..eeed318c62b
> > > >> --- /dev/null
> > > >> +++ b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
> > > >> @@ -0,0 +1,37 @@
> > > >> +#include "tree-vect.h"
> > > >> +
> > > >> +#define N 64
> > > >> +
> > > >> +__attribute__ ((noinline)) int
> > > >> +f (int *restrict a, int *restrict b, int *restrict c)
> > > >> +{
> > > >> +  for (int i = 0; i < N; ++i)
> > > >> +c[i] = a[i] % b[i];
> > > >> +}
> > > >> +
> > > >> +#define BASE1 -126
> > > >> +#define BASE2 116
> > > >> +
> > > >> +int
> > > >> +main (void)
> > > >> +{
> > > >> +  check_vect ();
> > > >> +
> > > >> +  int a[N], b[N], c[N];
> > > >> +
> > > >> +  for (int i = 0; i < N; ++i)
> > > >> +{
> > > >> +  a[i] = BASE1 + i * 5;
> >

[PATCH] testsuite: Fix vect-mod-var.c for division by 0 [PR116461]

2024-08-22 Thread Andrew Pinski

The testcase cc.dg/vect/vect-mod-var.c has an division by 0
which is undefined. On some targets (aarch64), the scalar and
the vectorized version, the result of division by 0 is the same.
While on other targets (x86), we get a SIGFAULT. On other targets (powerpc),
the results are different.

The fix is to make sure the testcase does not test division by 0 (or really mod 
by 0).

Pushed as obvious after testing on x86_64-linux-gnu to make sure the testcase 
passes
now.

PR testsuite/116461

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-mod-var.c: Change the initialization loop so that
`b[i]` is never 0. Use 1 in those places.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/vect/vect-mod-var.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mod-var.c 
b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
index eeed318c62b..c552941faef 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
@@ -23,6 +23,9 @@ main (void)
 {
   a[i] = BASE1 + i * 5;
   b[i] = BASE2 - i * 4;
+  /* b[i] cannot be 0 as that would cause undefined
+behavior with respect to `% b[i]`. */
+  b[i] = b[i] ? b[i] : 1;
   __asm__ volatile ("");
 }
 
-- 
2.43.0

[PUSHED] testsuite: Fix gcc.dg/torture/pr116420.c for targets default unsigned char [PR116464]

2024-08-22 Thread Andrew Pinski

This is an obvious fix to the gcc.dg/torture/pr116420.c testcase which simplier
changes from plain `char` to `signed char` so it works on targets where plain 
char defaults
to unsigned.

Pushed as obvious after a quick test for aarch64-linux-gnu to make sure the 
testcase
passes now.

PR testsuite/116464

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116420.c:

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr116420.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr116420.c 
b/gcc/testsuite/gcc.dg/torture/pr116420.c
index 9a784f59429..81a6e133647 100644
--- a/gcc/testsuite/gcc.dg/torture/pr116420.c
+++ b/gcc/testsuite/gcc.dg/torture/pr116420.c
@@ -1,7 +1,7 @@
 /* { dg-do run } */
 /* { dg-additional-options "-fno-forward-propagate -fno-tree-ch" } */
 int a, d, e;
-char b = -1, c, f;
+signed char b = -1, c, f;
 int main() {
   int g;
   for (; d < 1; d++) {
-- 
2.43.0

[PATCH] toplevel: Error out if using --disable-libstdcxx with bootstrap [PR105474]

2024-08-22 Thread Andrew Pinski

Bootstrapping and using --disable-libstdcxx will cause a build failure deep in 
compiling
stage2 so instead error out early in the toplevel configure so it is more user 
friendly.

Bootstrapped and tested on x86_64-linux-gnu.
Also made sure --disable-libstdcxx without --disable-bootstrap failed.

PR bootstrap/105474

ChangeLog:

* configure: Regenerate.
* configure.ac: Error out if libstdc++ is not enabled
with bootstrapping.

Signed-off-by: Andrew Pinski 
---
 configure| 9 +
 configure.ac | 9 +
 2 files changed, 18 insertions(+)

diff --git a/configure b/configure
index 51bf1d1add1..0722242389d 100755
--- a/configure
+++ b/configure
@@ -10235,6 +10235,15 @@ case "$enable_bootstrap:$ENABLE_GOLD: $configdirs 
:,$stage1_languages," in
 ;;
 esac
 
+# Bootstrapping GCC requires libstdc++-v3 so error out if libstdc++ is 
disabled with bootstrapping
+# Note C++ is always enabled for stage1 now.
+case "$enable_bootstrap:${noconfigdirs}" in
+  yes:*target-libstdc++-v3*)
+as_fn_error $? "bootstrapping with --disable-libstdcxx is not supported" 
"$LINENO" 5
+;;
+esac
+
+
 extrasub_build=
 for module in ${build_configdirs} ; do
   if test -z "${no_recursion}" \
diff --git a/configure.ac b/configure.ac
index 20457005e29..8be11e84db8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3191,6 +3191,15 @@ case "$enable_bootstrap:$ENABLE_GOLD: $configdirs 
:,$stage1_languages," in
 ;;
 esac
 
+# Bootstrapping GCC requires libstdc++-v3 so error out if libstdc++ is 
disabled with bootstrapping
+# Note C++ is always enabled for stage1 now.
+case "$enable_bootstrap:${noconfigdirs}" in
+  yes:*target-libstdc++-v3*)
+AC_MSG_ERROR([bootstrapping with --disable-libstdcxx is not supported])
+;;
+esac
+
+
 extrasub_build=
 for module in ${build_configdirs} ; do
   if test -z "${no_recursion}" \
-- 
2.43.0

[PATCH] Don't remove /usr/lib and /lib from when passing to the linker [PR97304/104707]

2024-08-22 Thread Andrew Pinski

With newer ld, the default search library path does not include /usr/lib nor 
/lib
but the driver decides to not pass -L down to the link for these and then in 
some/most
cases libc is not found.
This code dates from at least 1992 and it is done in a way which is not safe and
does not make sense. So let's remove it.

Bootstrapped and tested on x86_64-linux-gnu (which defaults to being a 
multilib).

gcc/ChangeLog:

PR driver/104707
PR driver/97304

* gcc.cc (is_directory): Don't not include /usr/lib and /lib
for library directory pathes. Remove library argument.
(add_to_obstack): Update call to is_directory.
(driver_handle_option): Likewise.
(spec_path): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/gcc.cc | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index abdb40bfe6e..a02af80ec6e 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -408,7 +408,7 @@ static int do_spec_2 (const char *, const char *);
 static void do_option_spec (const char *, const char *);
 static void do_self_spec (const char *);
 static const char *find_file (const char *);
-static int is_directory (const char *, bool);
+static int is_directory (const char *);
 static const char *validate_switches (const char *, bool, bool);
 static void validate_all_switches (void);
 static inline void validate_switches_from_spec (const char *, bool);
@@ -2940,7 +2940,7 @@ add_to_obstack (char *path, void *data)
 {
   struct add_to_obstack_info *info = (struct add_to_obstack_info *) data;
 
-  if (info->check_dir && !is_directory (path, false))
+  if (info->check_dir && !is_directory (path))
 return NULL;
 
   if (!info->first_time)
@@ -4576,7 +4576,7 @@ driver_handle_option (struct gcc_options *opts,
   if appending a directory separator actually makes a
   valid directory name.  */
if (!IS_DIR_SEPARATOR (arg[len - 1])
-   && is_directory (arg, false))
+   && is_directory (arg))
  {
char *tmp = XNEWVEC (char, len + 2);
strcpy (tmp, arg);
@@ -6019,7 +6019,7 @@ spec_path (char *path, void *data)
   memcpy (path + len, info->append, info->append_len + 1);
 }
 
-  if (!is_directory (path, true))
+  if (!is_directory (path))
 return NULL;
 
   do_spec_1 (info->option, 1, NULL);
@@ -8041,11 +8041,10 @@ find_file (const char *name)
   return newname ? newname : name;
 }
 
-/* Determine whether a directory exists.  If LINKER, return 0 for
-   certain fixed names not needed by the linker.  */
+/* Determine whether a directory exists.  */
 
 static int
-is_directory (const char *path1, bool linker)
+is_directory (const char *path1)
 {
   int len1;
   char *path;
@@ -8063,17 +8062,6 @@ is_directory (const char *path1, bool linker)
   *cp++ = '.';
   *cp = '\0';
 
-  /* Exclude directories that the linker is known to search.  */
-  if (linker
-  && IS_DIR_SEPARATOR (path[0])
-  && ((cp - path == 6
-  && filename_ncmp (path + 1, "lib", 3) == 0)
- || (cp - path == 10
- && filename_ncmp (path + 1, "usr", 3) == 0
- && IS_DIR_SEPARATOR (path[4])
- && filename_ncmp (path + 5, "lib", 3) == 0)))
-return 0;
-
   return (stat (path, &st) >= 0 && S_ISDIR (st.st_mode));
 }
 
-- 
2.43.0

Re: [PATCH] PR tree-optimization/101390: Vectorize modulo operator

2024-08-22 Thread Andrew Pinski

On Thu, Aug 22, 2024 at 4:12 AM Richard Biener  wrote:
>
> On Thu, 22 Aug 2024, Jennifer Schmitz wrote:
>
> > On 19 Aug 2024, at 21:02, Richard Sandiford  
> > wrote:
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > Jennifer Schmitz  writes:
> > >> Thanks for the comments. I updated the patch accordingly and 
> > >> bootstrapped and tested again.
> > >> Best, Jennifer
> > >>
> > >> From 9ef423f23afaeaa650d511c51bbc1a167e40b349 Mon Sep 17 00:00:00 2001
> > >> From: Jennifer Schmitz 
> > >> Date: Wed, 7 Aug 2024 08:56:45 -0700
> > >> Subject: [PATCH] PR tree-optimization/101390: Vectorize modulo operator
> > >>
> > >> This patch adds a new vectorization pattern that detects the modulo
> > >> operation where the second operand is a variable.
> > >> It replaces the statement by division, multiplication, and subtraction.
> > >>
> > >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> > >> regression.
> > >> Ok for mainline?
> > >>
> > >> Signed-off-by: Jennifer Schmitz 
> > >>
> > >> gcc/
> > >>
> > >>  PR tree-optimization/101390
> > >>  * tree-vect-pattern.cc (vect_recog_mod_var_pattern): Add new 
> > >> pattern.
> > >>
> > >> gcc/testsuite/
> > >>  PR tree-optimization/101390
> > >>  * gcc.dg/vect/vect-mod-var.c: New test.
> > >>  * gcc.target/aarch64/sve/mod_1.c: Likewise.
> > >>  * lib/target-supports.exp: New selector expression.
> > >
> > > LGTM, thanks.  Please give others a couple of days to comment though.
> > >
> > Pushed to trunk with 9bbad3685131ec95d970f81bf75f9556d4d92742.
>
> The gcc.dg/vect/vect-mod-var.c seems to FAIL execution for me on
> x86_64-linux:
>
> FAIL: gcc.dg/vect/vect-mod-var.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/vect-mod-var.c execution test

And on powerpc64: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116461 .
Thanks,
Andrew

>
> Richard.
>
> > Best, Jennifer
> > > Richard
> > >
> > >> ---
> > >> gcc/testsuite/gcc.dg/vect/vect-mod-var.c | 37 +++
> > >> gcc/testsuite/gcc.target/aarch64/sve/mod_1.c | 28 +
> > >> gcc/testsuite/lib/target-supports.exp|  5 ++
> > >> gcc/tree-vect-patterns.cc| 66 
> > >> 4 files changed, 136 insertions(+)
> > >> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-mod-var.c
> > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mod_1.c
> > >>
> > >> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mod-var.c 
> > >> b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
> > >> new file mode 100644
> > >> index 000..eeed318c62b
> > >> --- /dev/null
> > >> +++ b/gcc/testsuite/gcc.dg/vect/vect-mod-var.c
> > >> @@ -0,0 +1,37 @@
> > >> +#include "tree-vect.h"
> > >> +
> > >> +#define N 64
> > >> +
> > >> +__attribute__ ((noinline)) int
> > >> +f (int *restrict a, int *restrict b, int *restrict c)
> > >> +{
> > >> +  for (int i = 0; i < N; ++i)
> > >> +c[i] = a[i] % b[i];
> > >> +}
> > >> +
> > >> +#define BASE1 -126
> > >> +#define BASE2 116
> > >> +
> > >> +int
> > >> +main (void)
> > >> +{
> > >> +  check_vect ();
> > >> +
> > >> +  int a[N], b[N], c[N];
> > >> +
> > >> +  for (int i = 0; i < N; ++i)
> > >> +{
> > >> +  a[i] = BASE1 + i * 5;
> > >> +  b[i] = BASE2 - i * 4;
> > >> +  __asm__ volatile ("");
> > >> +}
> > >> +
> > >> +  f (a, b, c);
> > >> +
> > >> +#pragma GCC novector
> > >> +  for (int i = 0; i < N; ++i)
> > >> +if (c[i] != a[i] % b[i])
> > >> +  __builtin_abort ();
> > >> +}
> > >> +
> > >> +/* { dg-final { scan-tree-dump "vect_recog_mod_var_pattern: detected" 
> > >> "vect" { target vect_int_div } } } */
> > >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mod_1.c 
> > >> b/gcc/testsuite/gcc.target/aarch64/sve/mod_1.c
> > >> new file mode 100644
> > >> index 000..eb37f1e3636
> > >> --- /dev/null
> > >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/mod_1.c
> > >> @@ -0,0 +1,28 @@
> > >> +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > >> +/* { dg-options "-Ofast -ftree-vectorize -fno-vect-cost-model 
> > >> --save-temps" } */
> > >> +
> > >> +#include 
> > >> +
> > >> +#define DEF_LOOP(TYPE)   \
> > >> +void __attribute__ ((noipa)) \
> > >> +mod_##TYPE (TYPE *restrict dst, TYPE *restrict src1, \
> > >> + TYPE *restrict src2, int count) \
> > >> +{\
> > >> +  for (int i = 0; i < count; ++i)\
> > >> +dst[i] = src1[i] % src2[i];  \
> > >> +}
> > >> +
> > >> +#define TEST_ALL(T) \
> > >> +  T (int32_t) \
> > >> +  T (uint32_t) \
> > >> +  T (int64_t) \
> > >> +  T (uint64_t)
> > >> +
> > >> +TEST_ALL (DEF_LOOP)
> > >> +
> > >> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m, 
> > >> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> > >> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m, 
> > >> z[0-9]+\.s, z[0-9]+\.s\n}

[PATCH] fold: Fix `a * 1j` if a has side effects [PR116454]

2024-08-22 Thread Andrew Pinski

The problem here was a missing save_expr around arg0 since
it is used twice, once in REALPART_EXPR and once in IMAGPART_EXPR.
Thia adds the save_expr and reformats the code slightly so it is a
little easier to understand.  It excludes the case when arg0 is
a COMPLEX_EXPR since in that case we'll end up with the distinct
real and imaginary parts.  This is important to retain early
optimization in some testcases.

Bootstapped and tested on x86_64-linux-gnu with no regressions, pushed.

PR 116454

gcc/ChangeLog:

* fold-const.cc (fold_binary_loc): Fix `a * +-1i`
by wrapping arg0 with save_expr when it is not COMPLEX_EXPR.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116454-1.c: New test.
* gcc.dg/torture/pr116454-2.c: New test.

Signed-off-by: Andrew Pinski 
Co-Authored-By: Richard Biener  
---
 gcc/fold-const.cc | 32 ---
 gcc/testsuite/gcc.dg/torture/pr116454-1.c | 16 
 gcc/testsuite/gcc.dg/torture/pr116454-2.c | 12 +
 3 files changed, 50 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116454-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116454-2.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index eeadc8db9f6..35402a59768 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -12093,17 +12093,29 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
{
  tree rtype = TREE_TYPE (TREE_TYPE (arg0));
  if (real_onep (TREE_IMAGPART (arg1)))
-   return
- fold_build2_loc (loc, COMPLEX_EXPR, type,
-  negate_expr (fold_build1_loc (loc, IMAGPART_EXPR,
-rtype, arg0)),
-  fold_build1_loc (loc, REALPART_EXPR, rtype, 
arg0));
+   {
+ if (TREE_CODE (arg0) != COMPLEX_EXPR)
+   arg0 = save_expr (arg0);
+ tree iarg0 = fold_build1_loc (loc, IMAGPART_EXPR,
+   rtype, arg0);
+ tree rarg0 = fold_build1_loc (loc, REALPART_EXPR,
+   rtype, arg0);
+ return fold_build2_loc (loc, COMPLEX_EXPR, type,
+ negate_expr (iarg0),
+ rarg0);
+   }
  else if (real_minus_onep (TREE_IMAGPART (arg1)))
-   return
- fold_build2_loc (loc, COMPLEX_EXPR, type,
-  fold_build1_loc (loc, IMAGPART_EXPR, rtype, 
arg0),
-  negate_expr (fold_build1_loc (loc, REALPART_EXPR,
-rtype, arg0)));
+   {
+ if (TREE_CODE (arg0) != COMPLEX_EXPR)
+   arg0 = save_expr (arg0);
+ tree iarg0 = fold_build1_loc (loc, IMAGPART_EXPR,
+   rtype, arg0);
+ tree rarg0 = fold_build1_loc (loc, REALPART_EXPR,
+   rtype, arg0);
+ return fold_build2_loc (loc, COMPLEX_EXPR, type,
+ iarg0,
+ negate_expr (rarg0));
+   }
}
 
  /* Optimize z * conj(z) for floating point complex numbers.
diff --git a/gcc/testsuite/gcc.dg/torture/pr116454-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116454-1.c
new file mode 100644
index 000..6210dcce4a4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116454-1.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ffast-math" } */
+
+static int t = 0;
+_Complex float f()
+{
+t++;
+return 0;
+}
+int main() {
+   t = 0;
+   /* Would cause f() to be incorrectly invoked twice. */
+   f() * 1j;
+   if (t != 1)
+  __builtin_abort();
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr116454-2.c 
b/gcc/testsuite/gcc.dg/torture/pr116454-2.c
new file mode 100644
index 000..a1e1604e616
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116454-2.c
@@ -0,0 +1,12 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ffast-math" } */
+_Complex float arr[2];
+
+int main() {
+  _Complex float *ptr;
+  ptr = arr;
+  *++ptr * 1j; 
+  /* ptr should only increment once, not twice. */
+  if (ptr != arr + 1)
+__builtin_abort ();
+}
-- 
2.43.0

Re: [committed] libstdc++: Fix std::variant to reject array types [PR116381]

2024-08-21 Thread Andrew Pinski

On Wed, Aug 21, 2024 at 1:56 AM Jonathan Wakely  wrote:
>
> Tested x86_64-linux. Pushed to trunk.
>
> Probably worth backporting too. It could potentially cause new errors
> for people using arrays in std::variant, but that's forbidden by the
> standard.

It might be worth mentioning in porting_to guide just in case. You
never know since we have gotten bug reports about broken code that was
also rejected by clang/MSVC due to a change in GCC.

Thanks,
Andrew

>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/116381
> * include/std/variant (variant): Fix conditions for
> static_assert to match the spec.
> * testsuite/20_util/variant/types_neg.cc: New test.
> ---
>  libstdc++-v3/include/std/variant|  6 ++
>  .../testsuite/20_util/variant/types_neg.cc  | 17 +
>  2 files changed, 19 insertions(+), 4 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/20_util/variant/types_neg.cc
>
> diff --git a/libstdc++-v3/include/std/variant 
> b/libstdc++-v3/include/std/variant
> index 12108d07f0b..5fb7770d889 100644
> --- a/libstdc++-v3/include/std/variant
> +++ b/libstdc++-v3/include/std/variant
> @@ -1457,10 +1457,8 @@ namespace __detail::__variant
>
>static_assert(sizeof...(_Types) > 0,
> "variant must have at least one alternative");
> -  static_assert(!(std::is_reference_v<_Types> || ...),
> -   "variant must have no reference alternative");
> -  static_assert(!(std::is_void_v<_Types> || ...),
> -   "variant must have no void alternative");
> +  static_assert(((std::is_object_v<_Types> && !is_array_v<_Types>) && 
> ...),
> +   "variant alternatives must be non-array object types");
>
>using _Base = __detail::__variant::_Variant_base<_Types...>;
>
> diff --git a/libstdc++-v3/testsuite/20_util/variant/types_neg.cc 
> b/libstdc++-v3/testsuite/20_util/variant/types_neg.cc
> new file mode 100644
> index 000..5cd3d02154b
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/variant/types_neg.cc
> @@ -0,0 +1,17 @@
> +// { dg-do compile { target c++17 } }
> +
> +# include 
> +
> +std::variant<> v0; // { dg-error "here" }
> +// { dg-error "must have at least one alternative" "" { target *-*-* } 0 }
> +std::variant v1; // { dg-error "here" }
> +std::variant v2; // { dg-error "here" }
> +std::variant v3; // { dg-error "here" }
> +std::variant v4; // { dg-error "here" }
> +std::variant v5; // { dg-error "here" }
> +std::variant v6; // { dg-error "here" }
> +// { dg-error "must be non-array object types" "" { target *-*-* } 0 }
> +
> +// All of variant's base classes are instantiated before checking any
> +// static_assert, so we get lots of errors before the expected errors above.
> +// { dg-excess-errors "" }
> --
> 2.46.0
>

Re: [PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]

2024-08-20 Thread Andrew Pinski

On Tue, Aug 20, 2024 at 11:18 AM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > Andrew Pinski  writes:
> >> When CSSC is not enabled, 128bit popcount can be implemented
> >> just via the vector (v16qi) cnt instruction followed by a reduction,
> >> like how the 64bit one is currently implemented instead of
> >> splitting into 2 64bit popcount.
> >>
> >> Build and tested for aarch64-linux-gnu.
> >>
> >>  PR target/113042
> >>
> >> gcc/ChangeLog:
> >>
> >>  * config/aarch64/aarch64.md (popcountti2): New define_expand.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.target/aarch64/popcnt10.c: New test.
> >>  * gcc.target/aarch64/popcnt9.c: New test.
> >
> > OK if there are no other comments in the next 24 hours.
>
> Sorry, only thought about it later, but:

Yes that is a good idea since that would be the same code in the end
anyways and it is slightly cleaner.
I was not 100% sure if you removed your approval or approved it with
the changes so I submitted a new patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660960.html

Thanks,
Andrew Pinski

>
> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> >> index 12dcc16529a..73506e71f43 100644
> >> --- a/gcc/config/aarch64/aarch64.md
> >> +++ b/gcc/config/aarch64/aarch64.md
> >> @@ -5378,6 +5378,22 @@ (define_expand "popcount2"
> >>  }
> >>  })
> >>
> >> +(define_expand "popcountti2"
> >> +  [(set (match_operand:TI 0 "register_operand")
> >> +(popcount:TI (match_operand:TI 1 "register_operand")))]
>
> Could you try making the output :DI instead of :TI?  I'd expect
> internal-fn.cc to handle that correctly and extend the result to
> 128 bits where needed.
>
> That would make the dummy popcount rtx malformed, so I suppose
> the pattern should just be:
>
>   [(match_operand:DI 0 "register_operand")
>(match_operand:TI 1 "register_operand")]
>
> >> +  "TARGET_SIMD && !TARGET_CSSC"
> >> +{
> >> +  rtx v = gen_reg_rtx (V16QImode);
> >> +  rtx v1 = gen_reg_rtx (V16QImode);
> >> +  emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
> >> +  emit_insn (gen_popcountv16qi2 (v1, v));
> >> +  rtx out = gen_reg_rtx (DImode);
> >> +  emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));
>
> We could then use operands[0] directly as the output here.
>
> Thanks,
> Richard
>
> >> +  out = convert_to_mode (TImode, out, true);
> >> +  emit_move_insn (operands[0], out);
> >> +  DONE;
> >> +})

Re: [PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it

2024-08-20 Thread Andrew Pinski

On Tue, Aug 20, 2024 at 9:46 AM Richard Sandiford
 wrote:
>
> Andrew Pinski  writes:
> > On aarch64 (without !CSSC instructions), since popcount is implemented 
> > using the SIMD instruction cnt,
> > instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit 
> > cnt (V16QI mode). And only one
> > reduction addition instead of 2. Currently fold_builtin_bit_query will 
> > expand always without checking
> > if there was an optab for the type, so this changes that to check the optab 
> > to see if we should expand
> > or have the backend handle it.
> >
> > Bootstrapped and tested on x86_64-linux-gnu and built and tested for 
> > aarch64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> >   * builtins.cc (fold_builtin_bit_query): Don't expand double
> >   `unsigned long long` typess if there is an optab entry for that
> >   type.
>
> OK.  The logic in the function seems a bit twisty (the same condition
> is checked later), but all my attempts to improve it only made it worse.

I tried to look if there was a good refactoring here too but I didn't
see any either.
Anyways I have now pushed it as
r15-3056-g50b5000a5e430aaf99a5e00465cc9e25563d908b .

Thanks,
Andrew

>
> Thanks,
> Richard
>
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/builtins.cc | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index 0b902896ddd..b4d51eaeba5 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum 
> > built_in_function fcode,
> >tree call = NULL_TREE, tem;
> >if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
> >&& (TYPE_PRECISION (arg0_type)
> > -   == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
> > +   == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
> > +  /* If the target supports the optab, then don't do the expansion. */
> > +  && !direct_internal_fn_supported_p (ifn, arg0_type, 
> > OPTIMIZE_FOR_BOTH))
> >  {
> >/* __int128 expansions using up to 2 long long builtins.  */
> >arg0 = save_expr (arg0);

Re: [PATCH v2] ASAN: call initialize_sanitizer_builtins for hwasan [PR115205]

2024-08-20 Thread Andrew Pinski

On Sun, Aug 11, 2024 at 9:36 PM Andrew Pinski  wrote:
>
> Sometimes initialize_sanitizer_builtins is not called before emitting
> the asan builtins with hwasan. In the case of the bug report, there
> was a path with the fortran front-end where it was not called.
> So let's call it in asan_instrument before calling transform_statements
> and from hwasan_finish_file.
>
> Built and tested for aarch64-linux-gnu with no regressions.

I pushed this in the end as obvious since other places call
initialize_sanitizer_builtins without any extra checks.

Thanks,
Andrew Pinski

>
> Changes since v1:
> * v2: Add call of asan_instrument to hwasan_finish_file also.
>
> gcc/ChangeLog:
>
> PR sanitizer/115205
> * asan.cc (asan_instrument): Call initialize_sanitizer_builtins
> for hwasan.
>     (hwasan_finish_file): Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/asan.cc | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/asan.cc b/gcc/asan.cc
> index 9e0f51b1477..5f262d54a3a 100644
> --- a/gcc/asan.cc
> +++ b/gcc/asan.cc
> @@ -4276,6 +4276,7 @@ asan_instrument (void)
>  {
>if (hwasan_sanitize_p ())
>  {
> +  initialize_sanitizer_builtins ();
>transform_statements ();
>return 0;
>  }
> @@ -4694,6 +4695,8 @@ hwasan_finish_file (void)
>if (flag_sanitize & SANITIZE_KERNEL_HWADDRESS)
>  return;
>
> +  initialize_sanitizer_builtins ();
> +
>/* Avoid instrumenting code in the hwasan constructors/destructors.  */
>flag_sanitize &= ~SANITIZE_HWADDRESS;
>int priority = MAX_RESERVED_INIT_PRIORITY - 1;
> --
> 2.43.0
>

[PATCH v2] aarch64: Implement popcountti2 pattern [PR113042]

2024-08-20 Thread Andrew Pinski

When CSSC is not enabled, 128bit popcount can be implemented
just via the vector (v16qi) cnt instruction followed by a reduction,
like how the 64bit one is currently implemented instead of
splitting into 2 64bit popcount.

Changes since v1:
* v2: Make operand 0 be DImode instead of TImode and simplify.

Build and tested for aarch64-linux-gnu.

PR target/113042

gcc/ChangeLog:

* config/aarch64/aarch64.md (popcountti2): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt10.c: New test.
* gcc.target/aarch64/popcnt9.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md   | 13 +++
 gcc/testsuite/gcc.target/aarch64/popcnt10.c | 25 +
 gcc/testsuite/gcc.target/aarch64/popcnt9.c  | 25 +
 3 files changed, 63 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt9.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 12dcc16529a..c54b29cd64b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5378,6 +5378,19 @@ (define_expand "popcount2"
 }
 })
 
+(define_expand "popcountti2"
+  [(match_operand:DI 0 "register_operand")
+   (match_operand:TI 1 "register_operand")]
+  "TARGET_SIMD && !TARGET_CSSC"
+{
+  rtx v = gen_reg_rtx (V16QImode);
+  rtx v1 = gen_reg_rtx (V16QImode);
+  emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
+  emit_insn (gen_popcountv16qi2 (v1, v));
+  emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (operands[0], v1));
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt10.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
new file mode 100644
index 000..4d01fc67022
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+cssc"
+
+/*
+** h128:
+** ldp x([0-9]+), x([0-9]+), \[x0\]
+** cnt x([0-9]+), x([0-9]+)
+** cnt x([0-9]+), x([0-9]+)
+** add w0, w([0-9]+), w([0-9]+)
+** ret
+*/
+
+
+unsigned h128 (const unsigned __int128 *a) {
+  return __builtin_popcountg (a[0]);
+}
+
+/* popcount with CSSC should be split into 2 sections. */
+/* { dg-final { scan-tree-dump-not "POPCOUNT " "optimized" } } */
+/* { dg-final { scan-tree-dump-times " __builtin_popcount" 2 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt9.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
new file mode 100644
index 000..c778fc7f420
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h128:
+** ldr q([0-9]+), \[x0\]
+** cnt v([0-9]+).16b, v\1.16b
+** addvb([0-9]+), v\2.16b
+** fmovw0, s\3
+** ret
+*/
+
+
+unsigned h128 (const unsigned __int128 *a) {
+ return __builtin_popcountg (a[0]);
+}
+
+/* There should be only one POPCOUNT. */
+/* { dg-final { scan-tree-dump-times "POPCOUNT " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " __builtin_popcount"  "optimized" } } */
+
-- 
2.43.0

[PATCH v2 2/2] match: Reject non-ssa name/min invariants in gimple_extract [PR116412]

2024-08-20 Thread Andrew Pinski

After the conversion for phiopt's conditional operand
to use maybe_push_res_to_seq, it was found that gimple_extract
will extract out from REALPART_EXPR/IMAGPART_EXPR/VCE and BIT_FIELD_REF,
a memory load. But that extraction was not needed as memory loads are not
simplified in match and simplify. So gimple_extract should return false
in those cases.

Changes since v1:
* Move the rejection to gimple_extract from factor_out_conditional_operation.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116412

gcc/ChangeLog:

* gimple-match-exports.cc (gimple_extract): Return false if op0
was not a SSA name nor a min invariant for 
REALPART_EXPR/IMAGPART_EXPR/VCE
and BIT_FIELD_REF.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116412-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-match-exports.cc   | 6 ++
 gcc/testsuite/gcc.dg/torture/pr116412-1.c | 6 ++
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116412-1.c

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index 15d54b7d843..86e40100899 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -740,6 +740,9 @@ gimple_extract (gimple *stmt, gimple_match_op *res_op,
|| code == VIEW_CONVERT_EXPR)
  {
tree op0 = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
+   /* op0 needs to be a SSA name or an min invariant. */
+   if (TREE_CODE (op0) != SSA_NAME && !is_gimple_min_invariant 
(op0))
+ return false;
res_op->set_op (code, type, valueize_op (op0));
return true;
  }
@@ -747,6 +750,9 @@ gimple_extract (gimple *stmt, gimple_match_op *res_op,
  {
tree rhs1 = gimple_assign_rhs1 (stmt);
tree op0 = valueize_op (TREE_OPERAND (rhs1, 0));
+   /* op0 needs to be a SSA name or an min invariant. */
+   if (TREE_CODE (op0) != SSA_NAME && !is_gimple_min_invariant 
(op0))
+ return false;
res_op->set_op (code, type, op0,
TREE_OPERAND (rhs1, 1),
TREE_OPERAND (rhs1, 2),
diff --git a/gcc/testsuite/gcc.dg/torture/pr116412-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116412-1.c
new file mode 100644
index 000..3bc26ecd8b8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116412-1.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+double f(_Complex double a, _Complex double *b, int c)
+{
+  if (c) return __real__ a;
+  return __real__ *b;
+}
-- 
2.43.0

[PATCH v2 1/2] phi-opt: Fix for failing maybe_push_res_to_seq in factor_out_conditional_operation [PR 116409]

2024-08-20 Thread Andrew Pinski

The code was assuming that maybe_push_res_to_seq would not fail if the 
gimple_extract_op returned true.
But for some cases when the function is pure rather than const, then it can 
fail.
This change moves around the code to check the result of maybe_push_res_to_seq 
instead of assuming it will
always work.

Changes since v1:
* v2: Instead of directly testing non-pure builtin functions change to test if 
maybe_push_res_to_seq fails.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR  tree-optimization/116409

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Move
maybe_push_res_to_seq before creating the phi node and the debug dump.
Return false if maybe_push_res_to_seq fails.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116409-1.c: New test.
* gcc.dg/torture/pr116409-2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr116409-1.c |  7 ++
 gcc/testsuite/gcc.dg/torture/pr116409-2.c |  7 ++
 gcc/tree-ssa-phiopt.cc| 30 +++
 3 files changed, 34 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116409-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116409-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr116409-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116409-1.c
new file mode 100644
index 000..7bf8d49c9a0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116409-1.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-frounding-math -fno-math-errno" } */
+double f(int c, double a, double b) {
+  if (c)
+return __builtin_sqrt(a);
+  return __builtin_sqrt(b);
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr116409-2.c 
b/gcc/testsuite/gcc.dg/torture/pr116409-2.c
new file mode 100644
index 000..c27f11312d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116409-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+
+int f (int t, char *a, char *b) {
+  if (t)
+return __builtin_strlen (a);
+  return __builtin_strlen (b);
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 2d4aba5b087..95bac330c8f 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "tree-ssa-propagate.h"
 #include "tree-ssa-dce.h"
+#include "calls.h"
 
 /* Return the singleton PHI in the SEQ of PHIs for edges E0 and E1. */
 
@@ -370,6 +371,25 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   /* Create a new PHI stmt.  */
   result = PHI_RESULT (phi);
   temp = make_ssa_name (TREE_TYPE (new_arg0), NULL);
+
+  gimple_match_op new_op = arg0_op;
+
+  /* Create the operation stmt if possible and insert it.  */
+  new_op.ops[0] = temp;
+  gimple_seq seq = NULL;
+  result = maybe_push_res_to_seq (&new_op, &seq, result);
+
+  /* If we can't create the new statement, release the temp name
+ and return back.  */
+  if (!result)
+{
+  release_ssa_name (temp);
+  return NULL;
+}
+
+  gsi = gsi_after_labels (gimple_bb (phi));
+  gsi_insert_seq_before (&gsi, seq, GSI_CONTINUE_LINKING);
+
   newphi = create_phi_node (temp, gimple_bb (phi));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -398,16 +418,6 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   add_phi_arg (newphi, new_arg0, e0, locus);
   add_phi_arg (newphi, new_arg1, e1, locus);
 
-  gimple_match_op new_op = arg0_op;
-
-  /* Create the operation stmt and insert it.  */
-  new_op.ops[0] = temp;
-  gimple_seq seq = NULL;
-  result = maybe_push_res_to_seq (&new_op, &seq, result);
-  gcc_assert (result);
-  gsi = gsi_after_labels (gimple_bb (phi));
-  gsi_insert_seq_before (&gsi, seq, GSI_CONTINUE_LINKING);
-
   /* Remove the original PHI stmt.  */
   gsi = gsi_for_stmt (phi);
   gsi_remove (&gsi, true);
-- 
2.43.0

[PATCH 2/2] phiopt: Reject non gimple val inside factor_out_conditional_operation [PR116412]

2024-08-19 Thread Andrew Pinski

After the conversion to use maybe_push_res_to_seq, sometimes (REALPART_EXPR
and IMAGPART_EXPR and VCE) the argument will not be a gimple value and
then phiopt here would create an invalid PHI.
Just add a check for gimple val is the way to fix this.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/116412

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Make sure 
new_arg0
and new_arg1 are both gimple values.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116412-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr116412-1.c | 6 ++
 gcc/tree-ssa-phiopt.cc| 4 
 2 files changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116412-1.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr116412-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116412-1.c
new file mode 100644
index 000..3bc26ecd8b8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116412-1.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+double f(_Complex double a, _Complex double *b, int c)
+{
+  if (c) return __real__ a;
+  return __real__ *b;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 770f3629fe1..be95798a065 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -368,6 +368,10 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   if (!types_compatible_p (TREE_TYPE (new_arg0), TREE_TYPE (new_arg1)))
 return NULL;
 
+  /* The new args need to be both gimple values. */
+  if (!is_gimple_val (new_arg0) || !is_gimple_val (new_arg1))
+return NULL;
+
   /* Function calls can only be const or an internal function
  as maybe_push_res_to_seq only handles those currently.  */
   if (!arg0_op.code.is_tree_code ())
-- 
2.43.0

[PATCH 1/2] phi-opt: Fix for non-const functions for factor_out_conditional_operation [PR 116409]

2024-08-19 Thread Andrew Pinski

Currently maybe_push_res_to_seq does not handle non-const builtins (it does 
handle internal
functions though). So we need to disable factoring out non-const builtins. This 
will be fixed in
a better way later but this fixes the regression at hand and does not change 
what was goal on
moving factor_out_conditional_operation over to use gimple_match_op.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR  tree-optimization/116409

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Reject
non const builtins (except for internal functions).

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr116409-1.c: New test.
* gcc.dg/torture/pr116409-2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr116409-1.c |  7 +++
 gcc/testsuite/gcc.dg/torture/pr116409-2.c |  7 +++
 gcc/tree-ssa-phiopt.cc| 18 ++
 3 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116409-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr116409-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr116409-1.c 
b/gcc/testsuite/gcc.dg/torture/pr116409-1.c
new file mode 100644
index 000..7bf8d49c9a0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116409-1.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-frounding-math -fno-math-errno" } */
+double f(int c, double a, double b) {
+  if (c)
+return __builtin_sqrt(a);
+  return __builtin_sqrt(b);
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr116409-2.c 
b/gcc/testsuite/gcc.dg/torture/pr116409-2.c
new file mode 100644
index 000..c27f11312d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116409-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+
+int f (int t, char *a, char *b) {
+  if (t)
+return __builtin_strlen (a);
+  return __builtin_strlen (b);
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 2d4aba5b087..770f3629fe1 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "tree-ssa-propagate.h"
 #include "tree-ssa-dce.h"
+#include "calls.h"
 
 /* Return the singleton PHI in the SEQ of PHIs for edges E0 and E1. */
 
@@ -367,6 +368,23 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   if (!types_compatible_p (TREE_TYPE (new_arg0), TREE_TYPE (new_arg1)))
 return NULL;
 
+  /* Function calls can only be const or an internal function
+ as maybe_push_res_to_seq only handles those currently.  */
+  if (!arg0_op.code.is_tree_code ())
+{
+  auto fn = combined_fn (arg0_op.code);
+  if (!internal_fn_p (fn))
+   {
+ tree decl = builtin_decl_implicit (as_builtin_fn (fn));
+ if (!decl)
+   return NULL;
+
+ /* Non-const functions are not supported currently.  */
+ if (!(flags_from_decl_or_type (decl) & ECF_CONST))
+   return NULL;
+   }
+}
+
   /* Create a new PHI stmt.  */
   result = PHI_RESULT (phi);
   temp = make_ssa_name (TREE_TYPE (new_arg0), NULL);
-- 
2.43.0

Re: [PATCH] PHIOPT: move factor_out_conditional_operation over to use gimple_match_op

2024-08-18 Thread Andrew Pinski

On Sun, Aug 18, 2024 at 11:06 AM Jeff Law  wrote:
>
>
>
> On 8/16/24 8:13 PM, Andrew Pinski wrote:
> > To start working on more with expressions with more than one operand, 
> > converting
> > over to use gimple_match_op is needed.
> > The added side-effect here is factor_out_conditional_operation can now 
> > support
> > builtins/internal calls that has one operand without any extra code added.
> >
> > Note on the changed testcases:
> > * pr87007-5.c: the test was testing testing for avoiding partial register 
> > stalls
> > for the sqrt and making sure there is only one zero of the register before 
> > the
> > branch, the phiopt would now merge the sqrt's so disable phiopt.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> >   * gimple-match-exports.cc 
> > (gimple_match_op::operands_occurs_in_abnormal_phi):
> >   New function.
> >   * gimple-match.h (gimple_match_op): Add 
> > operands_occurs_in_abnormal_phi.
> >   * tree-ssa-phiopt.cc (factor_out_conditional_operation): Use 
> > gimple_match_op
> >   instead of manually extracting from/creating the gimple.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/pr87007-5.c: Disable phi-opt.
> >
>
> > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
> > b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > index 8f2dc947f6c..1a240adef63 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > @@ -1,8 +1,11 @@
> >   /* { dg-do compile } */
> > -/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse 
> > -fno-tree-vectorize -fdump-tree-cddce3-details 
> > -fdump-tree-lsplit-optimized" } */
> > +/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse 
> > -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized 
> > -fno-ssa-phiopt" } */
> >   /* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
> >  are sunk out of the loop and the loop is elided.  One vsqrtsd with
> >  memory operand needs a xor to avoid partial dependence.  */
> > +/* Phi-OPT needs to ne disabled otherwise, sqrt calls are merged which is 
> > better
> > +   but we are testing to make sure the partial register stall for SSE is 
> > still avoided
> > +   for sqrts.  */
> Nit.  s/to ne/to be/g
>
> OK with the nit fixed.
>
> Note this is getting closer to doing generalized sinking a common op
> through PHI nodes which is something we've wanted for a long time.

Yes that is the plan; I just want to do it in steps as I have a few
other projects in progress; and I don't know how much of each I will
be able to get done in time for GCC 15. I originally had this done
differently but I thought it would be better to reuse infrastructure
that was already there instead of creating new ones.
I had implemented this patch back in April and I didn't know if I
could get to the rest due to other projects going on so I submitted it
finally.

Thanks,
Andrew

>
> Jeff
>

Re: [PATCH] libstdc++: Remove note from the GCC 4.0.1 days

2024-08-18 Thread Andrew Pinski

On Sun, Aug 18, 2024 at 3:42 PM Andrew Pinski  wrote:
>
> On Sun, Aug 18, 2024 at 3:39 PM Eric Gallager  wrote:
> >
> > On Sun, Aug 18, 2024 at 4:52 AM Gerald Pfeifer  wrote:
> > >
> > > When I updated one of the links yesterday I noticed we have this obsolete
> > > reference to GCC 4.0.1 and binutils 2.15.90.0.1.1 from 19 (nineteen) years
> > > ago.
> > >
> > > I suggest we remove these.
> > >
> >
> > Instead of just removing it, I wonder if it might be worthwhile to
> > just bump the version numbers to something more recent? What's the
> > current minimum version of binutils that libstdc++ requires?
>
> Well considering the binutils version is also mentioned as part of the
> prerequisites for GCC with a newish version; I think mentioning it
> also (which might get out of sync) in libstdc++ manual a little over
> board.
> See https://gcc.gnu.org/install/prerequisites.html .

Looks like most of the versions mentioned in
https://gcc.gnu.org/install/specific.html need to be updated to at
least the version that was mentioned in libstdc++'s manual.

hppa*-hp-hpux11, i?86-*-linux*, and sparc-sun-solaris2* all mention
versions older than 2.15.9. At least
https://gcc.gnu.org/install/prerequisites.html recommends 2.35+ (due
to LTO requirements).

Thanks,
Andrew



>
> Thanks,
> Andrew Pinski
>
> >
> > > Okay?
> > >
> > > Gerald
> > >
> > >
> > > libstdc++-v3:
> > > * doc/xml/manual/prerequisites.xml: Remove note from the GCC 4.0.1
> > > days.
> > > * doc/html/manual/setup.html: Regenerate.
> > >
> > > diff --git a/libstdc++-v3/doc/html/manual/setup.html 
> > > b/libstdc++-v3/doc/html/manual/setup.html
> > > index 78d2a00c50a..d8c5ff65cff 100644
> > > --- a/libstdc++-v3/doc/html/manual/setup.html
> > > +++ b/libstdc++-v3/doc/html/manual/setup.html
> > > @@ -29,10 +29,7 @@
> > > the tools you will need if you wish to modify the source.
> > >  
> > > Additional data is given here only where it applies to libstdc++.
> > > -  As of GCC 4.0.1 the minimum version of binutils required to 
> > > build
> > > -  libstdc++ is 2.15.90.0.1.1.
> > > -  Older releases of libstdc++ do not require such a recent version,
> > > -  but to take full advantage of useful space-saving features and
> > > +  To take full advantage of useful space-saving features and
> > >bug-fixes you should use a recent binutils whenever possible.
> > >The configure process will automatically detect and use these
> > >features if the underlying support is present.
> > > diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml 
> > > b/libstdc++-v3/doc/xml/manual/prerequisites.xml
> > > index a3c6e732a77..0efe63bcd46 100644
> > > --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml
> > > +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml
> > > @@ -25,10 +25,7 @@
> > > Additional data is given here only where it applies to libstdc++.
> > >
> > >
> > > -   As of GCC 4.0.1 the minimum version of binutils required to 
> > > build
> > > -  libstdc++ is 2.15.90.0.1.1.
> > > -  Older releases of libstdc++ do not require such a recent version,
> > > -  but to take full advantage of useful space-saving features and
> > > +   To take full advantage of useful space-saving features and
> > >bug-fixes you should use a recent binutils whenever possible.
> > >The configure process will automatically detect and use these
> > >features if the underlying support is present.

Re: [PATCH] libstdc++: Remove note from the GCC 4.0.1 days

2024-08-18 Thread Andrew Pinski

On Sun, Aug 18, 2024 at 3:39 PM Eric Gallager  wrote:
>
> On Sun, Aug 18, 2024 at 4:52 AM Gerald Pfeifer  wrote:
> >
> > When I updated one of the links yesterday I noticed we have this obsolete
> > reference to GCC 4.0.1 and binutils 2.15.90.0.1.1 from 19 (nineteen) years
> > ago.
> >
> > I suggest we remove these.
> >
>
> Instead of just removing it, I wonder if it might be worthwhile to
> just bump the version numbers to something more recent? What's the
> current minimum version of binutils that libstdc++ requires?

Well considering the binutils version is also mentioned as part of the
prerequisites for GCC with a newish version; I think mentioning it
also (which might get out of sync) in libstdc++ manual a little over
board.
See https://gcc.gnu.org/install/prerequisites.html .

Thanks,
Andrew Pinski

>
> > Okay?
> >
> > Gerald
> >
> >
> > libstdc++-v3:
> > * doc/xml/manual/prerequisites.xml: Remove note from the GCC 4.0.1
> > days.
> > * doc/html/manual/setup.html: Regenerate.
> >
> > diff --git a/libstdc++-v3/doc/html/manual/setup.html 
> > b/libstdc++-v3/doc/html/manual/setup.html
> > index 78d2a00c50a..d8c5ff65cff 100644
> > --- a/libstdc++-v3/doc/html/manual/setup.html
> > +++ b/libstdc++-v3/doc/html/manual/setup.html
> > @@ -29,10 +29,7 @@
> > the tools you will need if you wish to modify the source.
> >  
> > Additional data is given here only where it applies to libstdc++.
> > -  As of GCC 4.0.1 the minimum version of binutils required to build
> > -  libstdc++ is 2.15.90.0.1.1.
> > -  Older releases of libstdc++ do not require such a recent version,
> > -  but to take full advantage of useful space-saving features and
> > +  To take full advantage of useful space-saving features and
> >bug-fixes you should use a recent binutils whenever possible.
> >The configure process will automatically detect and use these
> >features if the underlying support is present.
> > diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml 
> > b/libstdc++-v3/doc/xml/manual/prerequisites.xml
> > index a3c6e732a77..0efe63bcd46 100644
> > --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml
> > +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml
> > @@ -25,10 +25,7 @@
> > Additional data is given here only where it applies to libstdc++.
> >
> >
> > -   As of GCC 4.0.1 the minimum version of binutils required to build
> > -  libstdc++ is 2.15.90.0.1.1.
> > -  Older releases of libstdc++ do not require such a recent version,
> > -  but to take full advantage of useful space-saving features and
> > +   To take full advantage of useful space-saving features and
> >bug-fixes you should use a recent binutils whenever possible.
> >The configure process will automatically detect and use these
> >features if the underlying support is present.

[PATCH] forwprop: Also dce from added statements from gimple_simplify

2024-08-17 Thread Andrew Pinski

This extends r14-3982-g9ea74d235c7e78 to also include the newly added statements
since some of them might be dead too (due to the way match and simplify works).
This was noticed while working on adding a new match and simplify pattern where 
a
new statement that got added was not being used.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* gimple-fold.cc (mark_lhs_in_seq_for_dce): New function.
(replace_stmt_with_simplification): Call mark_lhs_in_seq_for_dce
right before inserting the sequence.
(fold_stmt_1): Add dce_worklist argument, update call to
replace_stmt_with_simplification.
(fold_stmt): Add dce_worklist argument, update call to fold_stmt_1.
(fold_stmt_inplace): Update call to fold_stmt_1.
* gimple-fold.h (fold_stmt): Add bitmap argument.
* tree-ssa-forwprop.cc (pass_forwprop::execute): Update call to 
fold_stmt.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc   | 43 +---
 gcc/gimple-fold.h|  4 ++--
 gcc/tree-ssa-forwprop.cc |  2 +-
 3 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 18d7a6b176d..0bec35d06f6 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -5914,6 +5914,24 @@ has_use_on_stmt (tree name, gimple *stmt)
   return false;
 }
 
+/* Add the lhs of each statement of SEQ to DCE_WORKLIST. */
+
+static void
+mark_lhs_in_seq_for_dce (bitmap dce_worklist, gimple_seq seq)
+{
+  if (!dce_worklist)
+return;
+
+  for (gimple_stmt_iterator i = gsi_start (seq);
+   !gsi_end_p (i); gsi_next (&i))
+{
+  gimple *stmt = gsi_stmt (i);
+  tree name = gimple_get_lhs (stmt);
+  if (name && TREE_CODE (name) == SSA_NAME)
+   bitmap_set_bit (dce_worklist, SSA_NAME_VERSION (name));
+}
+}
+
 /* Worker for fold_stmt_1 dispatch to pattern based folding with
gimple_simplify.
 
@@ -5924,7 +5942,8 @@ has_use_on_stmt (tree name, gimple *stmt)
 static bool
 replace_stmt_with_simplification (gimple_stmt_iterator *gsi,
  gimple_match_op *res_op,
- gimple_seq *seq, bool inplace)
+ gimple_seq *seq, bool inplace,
+ bitmap dce_worklist)
 {
   gimple *stmt = gsi_stmt (*gsi);
   tree *ops = res_op->ops;
@@ -5992,6 +6011,8 @@ replace_stmt_with_simplification (gimple_stmt_iterator 
*gsi,
  print_gimple_stmt (dump_file, gsi_stmt (*gsi),
 0, TDF_SLIM);
}
+  // Mark the lhs of the new statements maybe for dce
+  mark_lhs_in_seq_for_dce (dce_worklist, *seq);
   gsi_insert_seq_before (gsi, *seq, GSI_SAME_STMT);
   return true;
 }
@@ -6015,6 +6036,8 @@ replace_stmt_with_simplification (gimple_stmt_iterator 
*gsi,
  print_gimple_stmt (dump_file, gsi_stmt (*gsi),
 0, TDF_SLIM);
}
+ // Mark the lhs of the new statements maybe for dce
+ mark_lhs_in_seq_for_dce (dce_worklist, *seq);
  gsi_insert_seq_before (gsi, *seq, GSI_SAME_STMT);
  return true;
}
@@ -6032,6 +6055,8 @@ replace_stmt_with_simplification (gimple_stmt_iterator 
*gsi,
print_gimple_seq (dump_file, *seq, 0, TDF_SLIM);
  print_gimple_stmt (dump_file, gsi_stmt (*gsi), 0, TDF_SLIM);
}
+  // Mark the lhs of the new statements maybe for dce
+  mark_lhs_in_seq_for_dce (dce_worklist, *seq);
   gsi_insert_seq_before (gsi, *seq, GSI_SAME_STMT);
   return true;
 }
@@ -6047,6 +6072,8 @@ replace_stmt_with_simplification (gimple_stmt_iterator 
*gsi,
  fprintf (dump_file, "gimple_simplified to ");
  print_gimple_seq (dump_file, *seq, 0, TDF_SLIM);
}
+ // Mark the lhs of the new statements maybe for dce
+ mark_lhs_in_seq_for_dce (dce_worklist, *seq);
  gsi_replace_with_seq_vops (gsi, *seq);
  return true;
}
@@ -6214,7 +6241,8 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = 
false)
distinguishes both cases.  */
 
 static bool
-fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, tree (*valueize) (tree))
+fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, tree (*valueize) (tree),
+bitmap dce_worklist = nullptr)
 {
   bool changed = false;
   gimple *stmt = gsi_stmt (*gsi);
@@ -6382,7 +6410,8 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, 
tree (*valueize) (tree))
   if (gimple_simplify (stmt, &res_op, inplace ? NULL : &seq,
   valueize, valueize))
{
- if (replace_stmt_with_simplification (gsi, &res_op, &seq, inplace))
+ if (replace_stmt_with_simplification (gsi, &res_op, &seq, inplace,
+   dce_worklist))
changed = true;

[PATCH] PHIOPT: move factor_out_conditional_operation over to use gimple_match_op

2024-08-16 Thread Andrew Pinski

To start working on more with expressions with more than one operand, converting
over to use gimple_match_op is needed.
The added side-effect here is factor_out_conditional_operation can now support
builtins/internal calls that has one operand without any extra code added.

Note on the changed testcases:
* pr87007-5.c: the test was testing testing for avoiding partial register stalls
for the sqrt and making sure there is only one zero of the register before the
branch, the phiopt would now merge the sqrt's so disable phiopt.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* gimple-match-exports.cc 
(gimple_match_op::operands_occurs_in_abnormal_phi):
New function.
* gimple-match.h (gimple_match_op): Add operands_occurs_in_abnormal_phi.
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Use 
gimple_match_op
instead of manually extracting from/creating the gimple.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr87007-5.c: Disable phi-opt.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-match-exports.cc   | 14 +
 gcc/gimple-match.h|  2 +
 gcc/testsuite/gcc.target/i386/pr87007-5.c |  5 +-
 gcc/tree-ssa-phiopt.cc| 66 ++-
 4 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index aacf3ff0414..15d54b7d843 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -126,6 +126,20 @@ gimple_match_op::resimplify (gimple_seq *seq, tree 
(*valueize)(tree))
 }
 }
 
+/* Returns true if any of the operands of THIS occurs
+   in abnormal phis. */
+bool
+gimple_match_op::operands_occurs_in_abnormal_phi() const
+{
+  for (unsigned int i = 0; i < num_ops; i++)
+{
+   if (TREE_CODE (ops[i]) == SSA_NAME
+  && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[i]))
+   return true;
+}
+  return false;
+}
+
 /* Return whether T is a constant that we'll dispatch to fold to
evaluate fully constant expressions.  */
 
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index d710fcbace2..8edff578ba9 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -136,6 +136,8 @@ public:
 
   /* The operands to CODE.  Only the first NUM_OPS entries are meaningful.  */
   tree ops[MAX_NUM_OPS];
+
+  bool operands_occurs_in_abnormal_phi() const;
 };
 
 inline
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index 8f2dc947f6c..1a240adef63 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -1,8 +1,11 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize 
-fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize 
-fdump-tree-cddce3-details -fdump-tree-lsplit-optimized -fno-ssa-phiopt" } */
 /* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
are sunk out of the loop and the loop is elided.  One vsqrtsd with
memory operand needs a xor to avoid partial dependence.  */
+/* Phi-OPT needs to ne disabled otherwise, sqrt calls are merged which is 
better
+   but we are testing to make sure the partial register stall for SSE is still 
avoided
+   for sqrts.  */
 
 #include
 
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index aa414f6..2d4aba5b087 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -220,13 +220,12 @@ static gphi *
 factor_out_conditional_operation (edge e0, edge e1, gphi *phi,
   tree arg0, tree arg1, gimple *cond_stmt)
 {
-  gimple *arg0_def_stmt = NULL, *arg1_def_stmt = NULL, *new_stmt;
-  tree new_arg0 = NULL_TREE, new_arg1 = NULL_TREE;
+  gimple *arg0_def_stmt = NULL, *arg1_def_stmt = NULL;
   tree temp, result;
   gphi *newphi;
   gimple_stmt_iterator gsi, gsi_for_def;
   location_t locus = gimple_location (phi);
-  enum tree_code op_code;
+  gimple_match_op arg0_op, arg1_op;
 
   /* Handle only PHI statements with two arguments.  TODO: If all
  other arguments to PHI are INTEGER_CST or if their defining
@@ -250,31 +249,31 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   /* Check if arg0 is an SSA_NAME and the stmt which defines arg0 is
  an unary operation.  */
   arg0_def_stmt = SSA_NAME_DEF_STMT (arg0);
-  if (!is_gimple_assign (arg0_def_stmt)
-  || (gimple_assign_rhs_class (arg0_def_stmt) != GIMPLE_UNARY_RHS
- && gimple_assign_rhs_code (arg0_def_stmt) != VIEW_CONVERT_EXPR))
+  if (!gimple_extract_op (arg0_def_stmt, &arg0_op))
 return NULL;
 
-  /* Use the RHS as new_arg0.  */
-  op_code = gimple_assign_rhs_code (arg0_def_stmt);
-  new_arg0 = gimple_assign_rhs1 (arg0_def_stmt);
-  if (op_code == VIEW_CONVERT_EXPR)
-{
-

[PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]

2024-08-16 Thread Andrew Pinski

When CSSC is not enabled, 128bit popcount can be implemented
just via the vector (v16qi) cnt instruction followed by a reduction,
like how the 64bit one is currently implemented instead of
splitting into 2 64bit popcount.

Build and tested for aarch64-linux-gnu.

PR target/113042

gcc/ChangeLog:

* config/aarch64/aarch64.md (popcountti2): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt10.c: New test.
* gcc.target/aarch64/popcnt9.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md   | 16 +
 gcc/testsuite/gcc.target/aarch64/popcnt10.c | 25 +
 gcc/testsuite/gcc.target/aarch64/popcnt9.c  | 25 +
 3 files changed, 66 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt9.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 12dcc16529a..73506e71f43 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5378,6 +5378,22 @@ (define_expand "popcount2"
 }
 })
 
+(define_expand "popcountti2"
+  [(set (match_operand:TI 0 "register_operand")
+   (popcount:TI (match_operand:TI 1 "register_operand")))]
+  "TARGET_SIMD && !TARGET_CSSC"
+{
+  rtx v = gen_reg_rtx (V16QImode);
+  rtx v1 = gen_reg_rtx (V16QImode);
+  emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
+  emit_insn (gen_popcountv16qi2 (v1, v));
+  rtx out = gen_reg_rtx (DImode);
+  emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));
+  out = convert_to_mode (TImode, out, true);
+  emit_move_insn (operands[0], out);
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt10.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
new file mode 100644
index 000..4d01fc67022
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+cssc"
+
+/*
+** h128:
+** ldp x([0-9]+), x([0-9]+), \[x0\]
+** cnt x([0-9]+), x([0-9]+)
+** cnt x([0-9]+), x([0-9]+)
+** add w0, w([0-9]+), w([0-9]+)
+** ret
+*/
+
+
+unsigned h128 (const unsigned __int128 *a) {
+  return __builtin_popcountg (a[0]);
+}
+
+/* popcount with CSSC should be split into 2 sections. */
+/* { dg-final { scan-tree-dump-not "POPCOUNT " "optimized" } } */
+/* { dg-final { scan-tree-dump-times " __builtin_popcount" 2 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt9.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
new file mode 100644
index 000..c778fc7f420
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h128:
+** ldr q([0-9]+), \[x0\]
+** cnt v([0-9]+).16b, v\1.16b
+** addvb([0-9]+), v\2.16b
+** fmovw0, s\3
+** ret
+*/
+
+
+unsigned h128 (const unsigned __int128 *a) {
+ return __builtin_popcountg (a[0]);
+}
+
+/* There should be only one POPCOUNT. */
+/* { dg-final { scan-tree-dump-times "POPCOUNT " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " __builtin_popcount"  "optimized" } } */
+
-- 
2.43.0

[PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it

2024-08-16 Thread Andrew Pinski

On aarch64 (without !CSSC instructions), since popcount is implemented using 
the SIMD instruction cnt,
instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt 
(V16QI mode). And only one
reduction addition instead of 2. Currently fold_builtin_bit_query will expand 
always without checking
if there was an optab for the type, so this changes that to check the optab to 
see if we should expand
or have the backend handle it.

Bootstrapped and tested on x86_64-linux-gnu and built and tested for 
aarch64-linux-gnu.

gcc/ChangeLog:

* builtins.cc (fold_builtin_bit_query): Don't expand double
`unsigned long long` typess if there is an optab entry for that
type.

Signed-off-by: Andrew Pinski 
---
 gcc/builtins.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0b902896ddd..b4d51eaeba5 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum 
built_in_function fcode,
   tree call = NULL_TREE, tem;
   if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
   && (TYPE_PRECISION (arg0_type)
- == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
+ == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
+  /* If the target supports the optab, then don't do the expansion. */
+  && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
 {
   /* __int128 expansions using up to 2 long long builtins.  */
   arg0 = save_expr (arg0);
-- 
2.43.0

[PUSHED] PHIOPT: Fix comment before factor_out_conditional_operation

2024-08-15 Thread Andrew Pinski

From: Andrew Pinski 

I didn't update the comment before factor_out_conditional_operation
correctly. this updates it to be correct and mentions unary operations
rather than just conversions.

Pushed as obvious.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Update
comment.
---
 gcc/tree-ssa-phiopt.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f05ca727503..aa414f6 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -212,7 +212,7 @@ replace_phi_edge_with_variable (basic_block cond_block,
 }
 
 /* PR66726: Factor operations out of COND_EXPR.  If the arguments of the PHI
-   stmt are CONVERT_STMT, factor out the conversion and perform the conversion
+   stmt are Unary operator, factor out the operation and perform the operation
to the result of PHI stmt.  COND_STMT is the controlling predicate.
Return the newly-created PHI, if any.  */
 
-- 
2.43.0

Re: [PATCH v2] aarch64: Improve popcount for bytes [PR113042]

2024-08-14 Thread Andrew Pinski

On Wed, Aug 14, 2024 at 2:21 PM Richard Sandiford
 wrote:
>
> Andrew Pinski  writes:
> > For popcount for bytes, we don't need the reduction addition
> > after the vector cnt instruction as we are only counting one
> > byte's popcount.
> > This changes the popcount extend to cover all ALLI rather than GPI.
> >
> > Changes since v1:
> > * v2 - Use ALLI iterator and combine all into one pattern.
> >Add new testcases popcnt[6-8].c.
> >
> > Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> >
> >   PR target/113042
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64.md (popcount2): Update pattern
> >   to support ALLI modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/popcnt5.c: New test.
> >   * gcc.target/aarch64/popcnt6.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/config/aarch64/aarch64.md  | 52 +++---
> >  gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19 
> >  gcc/testsuite/gcc.target/aarch64/popcnt6.c | 19 
> >  gcc/testsuite/gcc.target/aarch64/popcnt7.c | 18 
> >  gcc/testsuite/gcc.target/aarch64/popcnt8.c | 18 
> >  5 files changed, 119 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt5.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt6.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt7.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt8.c
>
> Sorry for the slow review.
>
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index 389a1906e23..dd88fd891b5 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -5332,28 +5332,66 @@ (define_insn "*aarch64_popcount2_cssc_insn"
> >  ;; MOV   w0, v2.b[0]
> >
> >  (define_expand "popcount2"
> > -  [(set (match_operand:GPI 0 "register_operand")
> > - (popcount:GPI (match_operand:GPI 1 "register_operand")))]
> > +  [(set (match_operand:ALLI 0 "register_operand")
> > + (popcount:ALLI (match_operand:ALLI 1 "register_operand")))]
> >"TARGET_CSSC || TARGET_SIMD"
>
> Could we restrict this to:
>
>   TARET_CSSC
>   ? GET_MODE_BITSIZE (mode) >= 32
>   : TARGET_SIMD
>
> >  {
> > +  rtx in = operands[1];
> > +  rtx out = operands[0];
> > +  if (TARGET_CSSC
> > +  && (mode == HImode
> > +  || mode == QImode))
> > +{
> > +  rtx tmp = gen_reg_rtx (SImode);
> > +  rtx out1 = gen_reg_rtx (SImode);
> > +  if (mode == HImode)
> > +emit_insn (gen_zero_extendhisi2 (tmp, in));
> > +  else
> > +emit_insn (gen_zero_extendqisi2 (tmp, in));
> > +  emit_insn (gen_popcountsi2 (out1, tmp));
> > +  emit_move_insn (out, gen_lowpart (mode, out1));
> > +  DONE;
> > +}
>
> ...and then skip this part (including the rtx in/out)?  It should be
> what target-independent code would do.
>
> >if (!TARGET_CSSC)
> >  {
> >rtx v = gen_reg_rtx (V8QImode);
> >rtx v1 = gen_reg_rtx (V8QImode);
> >rtx in = operands[1];
> >rtx out = operands[0];
> > -  if(mode == SImode)
> > +  /* SImode and HImode should be zero extended to DImode. */
> > +  if (mode == SImode || mode == HImode)
> >   {
> > rtx tmp;
> > tmp = gen_reg_rtx (DImode);
> > -   /* If we have SImode, zero extend to DImode, pop count does
> > -  not change if we have extra zeros. */
> > -   emit_insn (gen_zero_extendsidi2 (tmp, in));
> > +   /* If we have SImode, zero extend to DImode,
> > +  pop count does not change if we have extra zeros. */
>
> The doubled comment seems redundant.  How about making the first one:
>
>   /* SImode and HImode should be zero extended to DImode.
>  popcount does not change if we have extra zeros.  */
>
> and deleting the second comment?
>
> > +   if (mode == SImode)
> > + emit_insn (gen_zero_extendsidi2 (tmp, in));
> > +   else
> > + emit_insn (gen_zero_extendhidi2 (tmp, in));
> > in = tmp;
>
> I think the if body can be replaced with:
>
>   in = convert_to_mode (DImode, in, true);
>
> >   }
> >emit_move_insn (v, gen_lowpart (V8QImode, in));
> >emit_i

[PATCH v3] aarch64: Improve popcount for bytes [PR113042]

2024-08-14 Thread Andrew Pinski

For popcount for bytes, we don't need the reduction addition
after the vector cnt instruction as we are only counting one
byte's popcount.
This changes the popcount extend to cover all ALLI rather than GPI.

Changes since v1:
* v2 - Use ALLI iterator and combine all into one pattern.
   Add new testcases popcnt[6-8].c.
* v3 - Simplify TARGET_CSSC path.
   Use convert_to_mode instead of gen_zero_extend* directly.
   Some other small cleanups.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/113042

gcc/ChangeLog:

* config/aarch64/aarch64.md (popcount2): Update pattern
to support ALLI modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt5.c: New test.
* gcc.target/aarch64/popcnt6.c: New test.
* gcc.target/aarch64/popcnt7.c: New test.
* gcc.target/aarch64/popcnt8.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md  | 37 ++
 gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19 +++
 gcc/testsuite/gcc.target/aarch64/popcnt6.c | 19 +++
 gcc/testsuite/gcc.target/aarch64/popcnt7.c | 18 +++
 gcc/testsuite/gcc.target/aarch64/popcnt8.c | 18 +++
 5 files changed, 98 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt8.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 665a333903c..12dcc16529a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5341,9 +5341,9 @@ (define_insn "*aarch64_popcount2_cssc_insn"
 ;; MOV w0, v2.b[0]
 
 (define_expand "popcount2"
-  [(set (match_operand:GPI 0 "register_operand")
-   (popcount:GPI (match_operand:GPI 1 "register_operand")))]
-  "TARGET_CSSC || TARGET_SIMD"
+  [(set (match_operand:ALLI 0 "register_operand")
+   (popcount:ALLI (match_operand:ALLI 1 "register_operand")))]
+  "TARGET_CSSC ? GET_MODE_BITSIZE (mode) >= 32 : TARGET_SIMD"
 {
   if (!TARGET_CSSC)
 {
@@ -5351,18 +5351,29 @@ (define_expand "popcount2"
   rtx v1 = gen_reg_rtx (V8QImode);
   rtx in = operands[1];
   rtx out = operands[0];
-  if(mode == SImode)
-   {
- rtx tmp;
- tmp = gen_reg_rtx (DImode);
- /* If we have SImode, zero extend to DImode, pop count does
-not change if we have extra zeros. */
- emit_insn (gen_zero_extendsidi2 (tmp, in));
- in = tmp;
-   }
+  /* SImode and HImode should be zero extended to DImode.
+popcount does not change if we have extra zeros.  */
+  if (mode == SImode || mode == HImode)
+   in = convert_to_mode (DImode, in, true);
+
   emit_move_insn (v, gen_lowpart (V8QImode, in));
   emit_insn (gen_popcountv8qi2 (v1, v));
-  emit_insn (gen_aarch64_zero_extend_reduc_plus_v8qi (out, v1));
+  /* QImode, just extract from the v8qi vector.  */
+  if (mode == QImode)
+   emit_move_insn (out, gen_lowpart (QImode, v1));
+  /* HI and SI, reduction is zero extended to SImode. */
+  else if (mode == SImode || mode == HImode)
+   {
+ rtx out1 = gen_reg_rtx (SImode);
+ emit_insn (gen_aarch64_zero_extendsi_reduc_plus_v8qi (out1, v1));
+ emit_move_insn (out, gen_lowpart (mode, out1));
+   }
+  /* DImode, reduction is zero extended to DImode. */
+  else
+   {
+ gcc_assert (mode == DImode);
+ emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v8qi (out, v1));
+   }
   DONE;
 }
 })
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
new file mode 100644
index 000..406369d9b29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h8:
+** ldr b[0-9]+, \[x0\]
+** cnt v[0-9]+.8b, v[0-9]+.8b
+** smovw0, v[0-9]+.b\[0\]
+** ret
+*/
+/* We should not need the addv here since we only need a byte popcount. */
+
+unsigned h8 (const unsigned char *a) {
+ return __builtin_popcountg (a[0]);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt6.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt6.c
new file mode 100644
index 000..e882cb24126
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt6.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc

RE: [PATCH v2] aarch64: Improve popcount for bytes [PR113042]

2024-08-13 Thread Andrew Pinski (QUIC)

> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Monday, June 10, 2024 12:23 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) 
> Subject: [PATCH v2] aarch64: Improve popcount for bytes
> [PR113042]
> 
> For popcount for bytes, we don't need the reduction addition
> after the vector cnt instruction as we are only counting one
> byte's popcount.
> This changes the popcount extend to cover all ALLI rather than
> GPI.

Ping? 
https://patchwork.sourceware.org/project/gcc/patch/20240610192255.402779-1-quic_apin...@quicinc.com/

Thanks,
Andrew Pinski

> 
> Changes since v1:
> * v2 - Use ALLI iterator and combine all into one pattern.
>Add new testcases popcnt[6-8].c.
> 
> Bootstrapped and tested on aarch64-linux-gnu with no
> regressions.
> 
>   PR target/113042
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.md (popcount2):
> Update pattern
>   to support ALLI modes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/popcnt5.c: New test.
>   * gcc.target/aarch64/popcnt6.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.md  | 52
> +++---
>  gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19 
> gcc/testsuite/gcc.target/aarch64/popcnt6.c | 19 
> gcc/testsuite/gcc.target/aarch64/popcnt7.c | 18 
> gcc/testsuite/gcc.target/aarch64/popcnt8.c | 18 
>  5 files changed, 119 insertions(+), 7 deletions(-)  create mode
> 100644 gcc/testsuite/gcc.target/aarch64/popcnt5.c
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/popcnt6.c
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/popcnt7.c
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/popcnt8.c
> 
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md index
> 389a1906e23..dd88fd891b5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5332,28 +5332,66 @@ (define_insn
> "*aarch64_popcount2_cssc_insn"
>  ;; MOV   w0, v2.b[0]
> 
>  (define_expand "popcount2"
> -  [(set (match_operand:GPI 0 "register_operand")
> - (popcount:GPI (match_operand:GPI 1
> "register_operand")))]
> +  [(set (match_operand:ALLI 0 "register_operand")
> + (popcount:ALLI (match_operand:ALLI 1
> "register_operand")))]
>"TARGET_CSSC || TARGET_SIMD"
>  {
> +  rtx in = operands[1];
> +  rtx out = operands[0];
> +  if (TARGET_CSSC
> +  && (mode == HImode
> +  || mode == QImode))
> +{
> +  rtx tmp = gen_reg_rtx (SImode);
> +  rtx out1 = gen_reg_rtx (SImode);
> +  if (mode == HImode)
> +emit_insn (gen_zero_extendhisi2 (tmp, in));
> +  else
> +emit_insn (gen_zero_extendqisi2 (tmp, in));
> +  emit_insn (gen_popcountsi2 (out1, tmp));
> +  emit_move_insn (out, gen_lowpart (mode,
> out1));
> +  DONE;
> +}
>if (!TARGET_CSSC)
>  {
>rtx v = gen_reg_rtx (V8QImode);
>rtx v1 = gen_reg_rtx (V8QImode);
>rtx in = operands[1];
>rtx out = operands[0];
> -  if(mode == SImode)
> +  /* SImode and HImode should be zero extended to
> DImode. */
> +  if (mode == SImode || mode ==
> HImode)
>   {
> rtx tmp;
> tmp = gen_reg_rtx (DImode);
> -   /* If we have SImode, zero extend to DImode, pop count
> does
> -  not change if we have extra zeros. */
> -   emit_insn (gen_zero_extendsidi2 (tmp, in));
> +   /* If we have SImode, zero extend to DImode,
> +  pop count does not change if we have extra zeros. */
> +   if (mode == SImode)
> + emit_insn (gen_zero_extendsidi2 (tmp, in));
> +   else
> + emit_insn (gen_zero_extendhidi2 (tmp, in));
> in = tmp;
>   }
>emit_move_insn (v, gen_lowpart (V8QImode, in));
>emit_insn (gen_popcountv8qi2 (v1, v));
> -  emit_insn
> (gen_aarch64_zero_extend_reduc_plus_v8qi (out,
> v1));
> +  /* QImode, just extract from the v8qi vector.  */
> +  if (mode == QImode)
> + {
> +   emit_move_insn (out, gen_lowpart (QImode, v1));
> + }
> +  /* HI and SI, reduction is zero extended to SImode. */
> +  else if (mode == SImode || mode ==
> HImode)
> + {
> +   rtx out1;
> +   out1 = gen_reg_rtx (SImode);
> +   emit_insn (gen_aarch64_zero_extendsi_reduc_plus_v8qi
> (out1, v1));
> +   emit_move_insn (out, gen_lowpart (mode,
> out1));
> + }
> +  /* DImode, reduction is zero extended to DImode. */
> +  else
&

Re: [optc-save-gen.awk] Fix streaming of command line options for offloading

2024-08-12 Thread Andrew Pinski

On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni
 wrote:
>
> Hi,
> As mentioned in:
> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html
>
> AArch64 cl_optimization_stream_out streams out target-specific optimization 
> options like flag_aarch64_early_ldp_fusion, aarch64_early_ra etc, which 
> breaks AArch64/nvptx offloading,
> since nvptx cl_optimization_stream_in doesn't have corresponding stream-in 
> for these options and ends up setting invalid values for ptr->explicit_mask 
> (and subsequent data structures).
>
> This makes even a trivial test like the following to cause ICE in 
> lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:
>
> int main()
> {
>   int x;
>   #pragma omp target map(x)
> x;
> }
>
> The attached patch modifies optc-save-gen.awk to generate if 
> (!lto_stream_offload_p) check before streaming out target-specific opt in 
> cl_optimization_stream_out, which
> fixes the issue. cl_optimization_stream_out after patch (last few entries):
>
>   bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
>   bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_aarch64_early_ldp_fusion);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_aarch64_late_ldp_fusion);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_mrecip_low_precision_sqrt);
>   if (!lto_stream_offload_p)
>   bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);
>   for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> bp_pack_value (bp, ptr->explicit_mask[i], 64);
>
> For target-specific options, streaming out is gated on !lto_stream_offload_p 
> check.
>
> The patch also fixes failures due to same issue with x86_64->nvptx offloading 
> for target-print-1.f90 (and couple more).
> Does the patch look OK ?

I think it seems to be on the right track. One thing that is also
going to be an issue is streaming in, there could be a target option
on the offload side that is marked as Optimization that would might
also cause issues. We should check to make sure that also gets fixed
here too. Or error out for offloading targets can't have target
options with Optimization on them during the build.

Thanks,
Andrew Pinski

>
> Signed-off-by: Prathamesh Kulkarni 
>
> Thanks,
> Prathamesh

[PATCH 2/3] match: extend the `((a CMP b) ? c : 0) | ((a CMP' b) ? d : 0)` patterns to support ^ and + [PR103660]

2024-08-12 Thread Andrew Pinski

r13-4620-g4d9db4bdd458 Added a few patterns and some of them can be extended to 
support XOR and PLUS.
This extends the patterns to support XOR and PLUS instead of just IOR.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/103660

gcc/ChangeLog:

* match.pd (`((a CMP b) ? c : 0) | ((a CMP' b) ? d : 0)`): Extend to 
support
XOR and PLUS.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr103660-2.C: New test.
* g++.dg/tree-ssa/pr103660-3.C: New test.
* gcc.dg/tree-ssa/pr103660-2.c: New test.
* gcc.dg/tree-ssa/pr103660-3.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd   | 42 +++-
 gcc/testsuite/g++.dg/tree-ssa/pr103660-2.C | 30 +++
 gcc/testsuite/g++.dg/tree-ssa/pr103660-3.C | 30 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr103660-2.c | 45 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr103660-3.c | 35 +
 5 files changed, 163 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr103660-2.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr103660-3.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103660-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103660-3.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c9c8478d286..b43ceb6def0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2356,18 +2356,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
  /* Fold ((-(a < b) & c) | (-(a >= b) & d)) into a < b ? c : d.  This is
 canonicalized further and we recognize the conditional form:
-(a < b ? c : 0) | (a >= b ? d : 0) into a < b ? c : d.  */
- (simplify
-  (bit_ior
-   (cond (cmp@0  @01 @02) @3 zerop)
-   (cond (icmp@4 @01 @02) @5 zerop))
-(if (INTEGRAL_TYPE_P (type)
-&& invert_tree_comparison (cmp, HONOR_NANS (@01)) == icmp
-/* The scalar version has to be canonicalized after vectorization
-   because it makes unconditional loads conditional ones, which
-   means we lose vectorization because the loads may trap.  */
-&& canonicalize_math_after_vectorization_p ())
-(cond @0 @3 @5)))
+(a < b ? c : 0) | (a >= b ? d : 0) into a < b ? c : d.
+Handle also ^ and + in replacement of `|`. */
+ (for op (bit_ior bit_xor plus)
+  (simplify
+   (op
+(cond (cmp@0  @01 @02) @3 zerop)
+(cond (icmp@4 @01 @02) @5 zerop))
+ (if (INTEGRAL_TYPE_P (type)
+ && invert_tree_comparison (cmp, HONOR_NANS (@01)) == icmp
+ /* The scalar version has to be canonicalized after vectorization
+because it makes unconditional loads conditional ones, which
+means we lose vectorization because the loads may trap.  */
+ && canonicalize_math_after_vectorization_p ())
+ (cond @0 @3 @5
 
  /* Vector Fold (((a < b) & c) | ((a >= b) & d)) into a < b ? c : d. 
 and ((~(a < b) & c) | (~(a >= b) & d)) into a < b ? c : d.  */
@@ -2391,13 +2393,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(vec_cond @0 @3 @2))
 
  /* Scalar Vectorized Fold ((-(a < b) & c) | (-(a >= b) & d))
-into a < b ? d : c.  */
- (simplify
-  (bit_ior
-   (vec_cond:s (cmp@0 @4 @5) @2 integer_zerop)
-   (vec_cond:s (icmp@1 @4 @5) @3 integer_zerop))
-  (if (invert_tree_comparison (cmp, HONOR_NANS (@4)) == icmp)
-   (vec_cond @0 @2 @3
+into a < b ? d : c.
+Handle also ^ and + in replacement of `|`. */
+ (for op (bit_ior bit_xor plus)
+  (simplify
+   (op
+(vec_cond:s (cmp@0 @4 @5) @2 integer_zerop)
+(vec_cond:s (icmp@1 @4 @5) @3 integer_zerop))
+   (if (invert_tree_comparison (cmp, HONOR_NANS (@4)) == icmp)
+(vec_cond @0 @2 @3)
 
 /* Transform X & -Y into X * Y when Y is { 0 or 1 }.  */
 (simplify
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr103660-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr103660-2.C
new file mode 100644
index 000..95205c02bc3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr103660-2.C
@@ -0,0 +1,30 @@
+/* PR tree-optimization/103660 */
+/* Vector type version. */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-raw -Wno-psabi" } */
+
+typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
+#define funcs(OP,n)\
+v4si min_##n(v4si a, v4si b) { \
+  v4si X = a < b ? a : 0;  \
+  v4si Y = a >= b ? b : 0; \
+  return (X OP Y); \
+}  \
+v4si f_##n(v4si a, v4si b, \
+  v4si c, v4si d) {\
+  v4si X = a < b ? c : 0;  \
+  v4si Y = a >= b ? d : 0; \
+  return (X OP Y); \
+}
+
+
+funcs(^, xor)
+funcs(+, plus)
+
+/* min_xor/min_plus should produce min or `a < b ? a : b` depending on if 
the target
+   supports min on the vector type or not

[PATCH 1/3] testsuite: Add testcases for part of PR 103660

2024-08-12 Thread Andrew Pinski

IOR part of the bug report was fixed by r13-4620-g4d9db4bdd458 but
that added only aarch64 specific testcases. This adds 4
generic testcases for this to check to make sure they are optimized.
The C++ testcases are the vector type versions.

PR tree-optimization/103660

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr103660-0.C: New test.
* g++.dg/tree-ssa/pr103660-1.C: New test.
* gcc.dg/tree-ssa/pr103660-0.c: New test.
* gcc.dg/tree-ssa/pr103660-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/tree-ssa/pr103660-0.C | 28 ++
 gcc/testsuite/g++.dg/tree-ssa/pr103660-1.C | 28 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr103660-0.c | 33 +
 gcc/testsuite/gcc.dg/tree-ssa/pr103660-1.c | 43 ++
 4 files changed, 132 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr103660-0.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr103660-1.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103660-0.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103660-1.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr103660-0.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr103660-0.C
new file mode 100644
index 000..766ec92457c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr103660-0.C
@@ -0,0 +1,28 @@
+/* PR tree-optimization/103660 */
+/* Vector type version. */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-raw -Wno-psabi" } */
+
+typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
+#define funcs(OP,n)\
+v4si min_##n(v4si a, v4si b) { \
+  v4si X = -(a < b) * a;   \
+  v4si Y = -(a >= b) * b;  \
+  return (X OP Y); \
+}  \
+v4si f_##n(v4si a, v4si b, \
+  v4si c, v4si d) {\
+  v4si X = -(a < b) * c;   \
+  v4si Y = -(a >= b) * d;  \
+  return (X OP Y); \
+}
+
+
+funcs(|, ior)
+
+/* min_ior should produce min or `a < b ? a : b` depending on if the 
target
+   supports min on the vector type or not. */
+/* f_ior should produce (a < b) ? c : d */
+/* { dg-final { scan-tree-dump-not   "bit_ior_expr, " "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "(?:lt_expr|min_expr), "2 
"forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "(?:vec_cond_expr|min_expr), "  2 
"forwprop1" } } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr103660-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr103660-1.C
new file mode 100644
index 000..713057586f2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr103660-1.C
@@ -0,0 +1,28 @@
+/* PR tree-optimization/103660 */
+/* Vector type version. */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-raw -Wno-psabi" } */
+
+typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
+#define funcs(OP,n)\
+v4si min_##n(v4si a, v4si b) { \
+  v4si X = a < b ? a : 0;  \
+  v4si Y = a >= b ? b : 0; \
+  return (X OP Y); \
+}  \
+v4si f_##n(v4si a, v4si b, \
+  v4si c, v4si d) {\
+  v4si X = a < b ? c : 0;  \
+  v4si Y = a >= b ? d : 0; \
+  return (X OP Y); \
+}
+
+
+funcs(|, ior)
+
+/* min_ior should produce min or `a < b ? a : b` depending on if the 
target
+   supports min on the vector type or not. */
+/* f_ior should produce (a < b) ? c : d */
+/* { dg-final { scan-tree-dump-not   "bit_ior_expr, " "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "(?:lt_expr|min_expr), "2 
"forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "(?:vec_cond_expr|min_expr), "  2 
"forwprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103660-0.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr103660-0.c
new file mode 100644
index 000..6be0721aedd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103660-0.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/103660 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop4-raw" } */
+
+#define funcs(OP,n)\
+int min_##n(int a, int b) {\
+  int t;   \
+  int t1;  \
+  int t2;  \
+  t1 = (a < b) * a;\
+  t2 = (a >= b) * b;   \
+  t = t1 OP t2;\
+  return t;\
+}  \
+int f_##n(int a, int b, int c, \
+int d) {   \
+  int t;   \
+  int t1;  \
+  int t2;  \
+  t1 = (a < b) * c;\
+  t2 = (a >= b) * d;   \
+  t = t1 OP

[PATCH 3/3] Match: Add pattern for `(a ? b : 0) | (a ? 0 : c)` into `a ? b : c` [PR103660]

2024-08-12 Thread Andrew Pinski

This adds a pattern to convert `(a ? b : 0) | (a ? 0 : c)` into `a ? b : c`
which is simplier. It adds both for cond and vec_cond; even though vec_cond is
handled via a different pattern currently but requires extra steps for matching
so this should be slightly faster.

Also handle it for xor and plus too since those can be handled the same way.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/103660

gcc/ChangeLog:

* match.pd (`(a ? b : 0) | (a ? 0 : c)`): New pattern.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr103660-4.C: New test.
* gcc.dg/tree-ssa/pr103660-4.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd   | 10 +
 gcc/testsuite/g++.dg/tree-ssa/pr103660-4.C | 35 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr103660-4.c | 43 ++
 3 files changed, 88 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr103660-4.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103660-4.c

diff --git a/gcc/match.pd b/gcc/match.pd
index b43ceb6def0..65a3aae2243 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2339,6 +2339,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type))
   (bit_and @0 @1)))
 
+/* Fold `(a ? b : 0) | (a ? 0 : c)` into (a ? b : c).
+Handle also ^ and + in replacement of `|`. */
+(for cnd (cond vec_cond)
+ (for op (bit_ior bit_xor plus)
+  (simplify
+   (op:c
+(cnd:s @0 @00 integer_zerop)
+(cnd:s @0 integer_zerop @01))
+   (cnd @0 @00 @01
+
 (for cmp (tcc_comparison)
  icmp (inverted_tcc_comparison)
  /* Fold (((a < b) & c) | ((a >= b) & d)) into (a < b ? c : d) & 1.  */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr103660-4.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr103660-4.C
new file mode 100644
index 000..47727f86e24
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr103660-4.C
@@ -0,0 +1,35 @@
+/* PR tree-optimization/103660 */
+/* Vector type version. */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop2-raw -Wno-psabi" } */
+
+typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
+#define funcs(OP,n)\
+v4si min_##n(v4si a, v4si b) { \
+  v4si t = {0,0,0,0};  \
+  v4si X = a < b ? a : t;  \
+  v4si Y = a < b ? t : b;  \
+  return (X OP Y); \
+}  \
+v4si f_##n(v4si a, v4si b, \
+  v4si c, v4si d) {\
+  v4si t = {0,0,0,0};  \
+  v4si X = a < b ? c : t;  \
+  v4si Y = a < b ? t : d;  \
+  return (X OP Y); \
+}
+
+
+funcs(|, ior)
+funcs(^, xor)
+funcs(+, plus)
+
+/* min_ior/min_xor/min_plus should produce min or `a < b ? a : b` 
depending on if the target
+   supports min on the vector type or not. */
+/* f_ior/f_xor/f_plus should produce (a < b) ? c : d */
+/* { dg-final { scan-tree-dump-not   "bit_xor_expr, " "forwprop2" } } */
+/* { dg-final { scan-tree-dump-not   "bit_ior_expr, " "forwprop2" } } */
+/* { dg-final { scan-tree-dump-not   "plus_expr, ""forwprop2" } } */
+/* { dg-final { scan-tree-dump-not   "bit_ior_expr, " "forwprop2" } } */
+/* { dg-final { scan-tree-dump-times "(?:lt_expr|min_expr), "6 
"forwprop2" } } */
+/* { dg-final { scan-tree-dump-times "(?:vec_cond_expr|min_expr), "  6 
"forwprop2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103660-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr103660-4.c
new file mode 100644
index 000..26c956fdcec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103660-4.c
@@ -0,0 +1,43 @@
+/* PR tree-optimization/103660 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fgimple -fdump-tree-forwprop1-raw" } */
+
+#define funcs(OP,n)\
+__GIMPLE() \
+int min_##n(int a, int b) {\
+  _Bool X; \
+  int t;   \
+  int t1;  \
+  int t2;  \
+  X = a < b;   \
+  t1 = X ? a : 0;  \
+  t2 = X ? 0 : b;  \
+  t = t1 OP t2;\
+  return t;\
+}  \
+__GIMPLE() \
+int f_##n(int a, int b, int c, \
+int d) {   \
+  _Bool X; \
+  int t;   \
+  int t1;  \
+  int t2;  \
+  X = a < b;   \
+  t1 = X ? c : 0;  \
+  t2 = X ? 0 : d;  \
+  t = t1 OP t2;\
+  return t;\
+}
+
+funcs(|, ior)
+funcs(^, xor)
+funcs(+, plus)
+
+/* min_i/min_ioror/min_plus should produce min */
+/* f_xor/f_ior/f_plus shoul

[PATCH v2] ASAN: call initialize_sanitizer_builtins for hwasan [PR115205]

2024-08-11 Thread Andrew Pinski

Sometimes initialize_sanitizer_builtins is not called before emitting
the asan builtins with hwasan. In the case of the bug report, there
was a path with the fortran front-end where it was not called.
So let's call it in asan_instrument before calling transform_statements
and from hwasan_finish_file.

Built and tested for aarch64-linux-gnu with no regressions.

Changes since v1:
* v2: Add call of asan_instrument to hwasan_finish_file also.

gcc/ChangeLog:

PR sanitizer/115205
* asan.cc (asan_instrument): Call initialize_sanitizer_builtins
for hwasan.
(hwasan_finish_file): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/asan.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 9e0f51b1477..5f262d54a3a 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -4276,6 +4276,7 @@ asan_instrument (void)
 {
   if (hwasan_sanitize_p ())
 {
+  initialize_sanitizer_builtins ();
   transform_statements ();
   return 0;
 }
@@ -4694,6 +4695,8 @@ hwasan_finish_file (void)
   if (flag_sanitize & SANITIZE_KERNEL_HWADDRESS)
 return;
 
+  initialize_sanitizer_builtins ();
+
   /* Avoid instrumenting code in the hwasan constructors/destructors.  */
   flag_sanitize &= ~SANITIZE_HWADDRESS;
   int priority = MAX_RESERVED_INIT_PRIORITY - 1;
-- 
2.43.0

Re: [PATCH] lra: emit caller-save register spills before call insn [PR116028]

2024-08-09 Thread Andrew Pinski

On Fri, Aug 9, 2024 at 8:11 PM Xi Ruoyao  wrote:
>
> On Fri, 2024-08-09 at 17:55 -0400, Vladimir Makarov wrote:
>
> > Still, for GCC developer novice, I think it is important to test all
> > major targets and aarch64 (one target on which bootstrap was broken) is
> > the 2nd most important target.
>
> Linaro CI will complain (via off-list email) if a patch posted on the
> list breaks aarch64.  It complained some of my patches and I fixed them
> before commit.  Why this case was not caught?

I had been wondering the same until I looked into it earlier today.
Linaro CI's does `--disable-bootstrap` and there was no extra
testsuite failures with the patch.
So Linaro CI's is not catching all the bugs that a developer would
catch in the end. Because bootstrap is one of the normal requirements;
though usually only on one target.

Thanks,
Andrew Pinski

>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

Re: [committed] c-family: Add some more ARRAY_SIZE uses

2024-08-09 Thread Andrew Pinski

On Fri, Aug 9, 2024 at 12:39 AM Jakub Jelinek  wrote:
>
> Hi!
>
> These two spots were just non-standard, because they divided
> sizeof (omp_pragmas_simd) by sizeof (*omp_pragmas) and not
> the expected sizeof (*omp_pragmas_simd) and so weren't converted
> into ARRAY_SIZE.  Both of the latter sizes are the same though,
> as both arrays have the same type, so this patch doesn't change
> anything but readability.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, committed
> to trunk.
>
> 2024-08-09  Jakub Jelinek  
>
> * c-pragma.cc (c_pp_lookup_pragma): Use ARRAY_SIZE in
> n_omp_pragmas_simd initializer.

I noticed the loops in c_pp_lookup_pragma could be candidates to use
C++11 range based for loops instead of using ARRAY_SIZE.

> (init_pragmas): Likewise.

Likewise here too.

Definitely less error prone than using ARRAY_SIZE even.

Thanks,
Andrew

>
> --- gcc/c-family/c-pragma.cc.jj 2024-06-05 19:09:54.054616902 +0200
> +++ gcc/c-family/c-pragma.cc2024-08-09 00:44:58.913489987 +0200
> @@ -1565,8 +1565,7 @@ c_pp_lookup_pragma (unsigned int id, con
>  {
>const int n_oacc_pragmas = ARRAY_SIZE (oacc_pragmas);
>const int n_omp_pragmas = ARRAY_SIZE (omp_pragmas);
> -  const int n_omp_pragmas_simd = sizeof (omp_pragmas_simd)
> -/ sizeof (*omp_pragmas);
> +  const int n_omp_pragmas_simd = ARRAY_SIZE (omp_pragmas_simd);
>int i;
>
>for (i = 0; i < n_oacc_pragmas; ++i)
> @@ -1807,8 +1806,7 @@ init_pragma (void)
> }
>if (flag_openmp || flag_openmp_simd)
> {
> - const int n_omp_pragmas_simd
> -   = sizeof (omp_pragmas_simd) / sizeof (*omp_pragmas);
> + const int n_omp_pragmas_simd = ARRAY_SIZE (omp_pragmas_simd);
>   int i;
>
>   for (i = 0; i < n_omp_pragmas_simd; ++i)
>
> Jakub
>

Re: [PATCH] lra: emit caller-save register spills before call insn [PR116028]

2024-08-08 Thread Andrew Pinski

On Fri, Aug 2, 2024 at 7:30 AM Jeff Law  wrote:
>
>
>
> On 8/1/24 4:12 AM, Surya Kumari Jangala wrote:
> > lra: emit caller-save register spills before call insn [PR116028]
> >
> > LRA emits insns to save caller-save registers in the
> > inheritance/splitting pass. In this pass, LRA builds EBBs (Extended
> > Basic Block) and traverses the insns in the EBBs in reverse order from
> > the last insn to the first insn. When LRA sees a write to a pseudo (that
> > has been assigned a caller-save register), and there is a read following
> > the write, with an intervening call insn between the write and read,
> > then LRA generates a spill immediately after the write and a restore
> > immediately before the read. The spill is needed because the call insn
> > will clobber the caller-save register.
> >
> > If there is a write insn and a call insn in two separate BBs but
> > belonging to the same EBB, the spill insn gets generated in the BB
> > containing the write insn. If the write insn is in the entry BB, then
> > the spill insn that is generated in the entry BB prevents shrink wrap
> > from happening. This is because the spill insn references the stack
> > pointer and hence the prolog gets generated in the entry BB itself.
> >
> > This patch ensures that the spill insn is generated before the call insn
> > instead of after the write. This is also more efficient as the spill now
> > occurs only in the path containing the call.
> >
> > 2024-08-01  Surya Kumari Jangala  
> >
> > gcc/
> >   PR rtl-optimization/PR116028
> >   * lra-constraints.cc (split_reg): Spill register before call
> >   insn.
> >   (latest_call_insn): New variable.
> >   (inherit_in_ebb): Track the latest call insn.
> >
> > gcc/testsuite/
> >   PR rtl-optimization/PR116028
> >   * gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc.
> >   * gcc.dg/pr10474.c: Remove xfail for powerpc.
> Implementation looks fine.  I would suggest a comment indicating why
> we're inserting before last_call_insn.  Otherwise someone in the future
> would have to find the patch submission to know why we're handling that
> case specially.
>
> OK with that additional comment.

This causes bootstrap failure on aarch64-linux-gnu; self-tests fail at
stage 2. Looks to be wrong code is produced compiling stage 2
compiler.
I have not looked further than that right now.

Thanks,
Andrew

>
> Thanks,
> jeff

Re: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-08 Thread Andrew Pinski

On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
 wrote:
>
> Hi Richard,
> After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx offloading, the 
> following minimal test:
>
> int main()
> {
>   int x;
>   #pragma omp target map(x)
> x = 5;
>   return x;
> }
>
> compiled with -fopenmp -foffload=nvptx-none now fails with:
> gcc: error: unrecognized command-line option '-m64'
> nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit status 
> compilation terminated.
>
> As mentioned in RFC email, this happens because 
> nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host compiler depending 
> on whether
> offload_abi is OFFLOAD_ABI_LP64 or OFFLOAD_ABI_ILP32, and aarch64 backend 
> doesn't recognize these options.
>
> Based on your suggestion in: 
> https://gcc.gnu.org/pipermail/gcc/2024-July/244470.html,
> The attached patch generates new macro HOST_MULTILIB derived from 
> $enable_as_accelerator_for, and in mkoffload.cc it gates passing -m32/-m64
> to host_compiler on HOST_MULTILIB. I verified that the macro is set to 0 for 
> aarch64 host (and thus avoids above unrecognized command line option error),
> and is set to 1 for x86_64 host.
>
> Does the patch look OK ?

Note I think the usage of the name MULTILIB here is wrong because
aarch64 (and riscv) could have MUTLILIB support just the options are
different. For aarch64, it would be -mabi=ilp32/-mabi=lp64 (riscv it
is more complex).

This most likely should be something more complex due to the above.
Maybe call it HOST_64_32 but even that seems wrong due to Aarch64
having ILP32 support and such.
What about HOST_64ABI_OPTS="-mabi=lp64"/HOST_32ABI_OPTS="-mabi=ilp32"
but  I am not sure if that would be enough to support RISCV which
requires two options.

Thanks,
Andrew Pinski

>
> Signed-off-by: Prathamesh Kulkarni 
>
> Thanks,
> Prathamesh

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1005 matches

Mail list logo