[PATCH] Use simple_dce_from_worklist in phiprop

2024-05-23 Thread Andrew Pinski
I noticed that phiprop leaves around phi nodes which
defines a ssa name which is unused. This just adds a
bitmap to mark those ssa names and then calls
simple_dce_from_worklist at the very end to remove
those phi nodes and all of the dependencies if there
was any. This might allow us to optimize something earlier
due to the removal of the phi which was taking the address
of the variables.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiprop.cc (phiprop_insert_phi): Add
dce_ssa_names argument. Add the phi's result to it.
(propagate_with_phi): Add dce_ssa_names argument.
Update call to phiprop_insert_phi.
(pass_phiprop::execute): Update call to propagate_with_phi.
Call simple_dce_from_worklist if there was a change.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiprop.cc | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index 041521ef106..2a1cdae46d2 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "tree-ssa-loop.h"
 #include "tree-cfg.h"
+#include "tree-ssa-dce.h"
 
 /* This pass propagates indirect loads through the PHI node for its
address to make the load source possibly non-addressable and to
@@ -132,12 +133,15 @@ phivn_valid_p (struct phiprop_d *phivn, tree name, 
basic_block bb)
 
 static tree
 phiprop_insert_phi (basic_block bb, gphi *phi, gimple *use_stmt,
-   struct phiprop_d *phivn, size_t n)
+   struct phiprop_d *phivn, size_t n,
+   bitmap dce_ssa_names)
 {
   tree res;
   gphi *new_phi = NULL;
   edge_iterator ei;
   edge e;
+  tree phi_result = PHI_RESULT (phi);
+  bitmap_set_bit (dce_ssa_names, SSA_NAME_VERSION (phi_result));
 
   gcc_assert (is_gimple_assign (use_stmt)
  && gimple_assign_rhs_code (use_stmt) == MEM_REF);
@@ -276,7 +280,7 @@ chk_uses (tree, tree *idx, void *data)
 
 static bool
 propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
-   size_t n)
+   size_t n, bitmap dce_ssa_names)
 {
   tree ptr = PHI_RESULT (phi);
   gimple *use_stmt;
@@ -420,9 +424,10 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
goto next;
}
 
- phiprop_insert_phi (bb, phi, use_stmt, phivn, n);
+ phiprop_insert_phi (bb, phi, use_stmt, phivn, n, dce_ssa_names);
 
- /* Remove old stmt.  The phi is taken care of by DCE.  */
+ /* Remove old stmt. The phi and all of maybe its depedencies
+will be removed later via simple_dce_from_worklist. */
  gsi = gsi_for_stmt (use_stmt);
  /* Unlinking the VDEF here is fine as we are sure that we process
 stmts in execution order due to aggregate copies having VDEFs
@@ -442,16 +447,15 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
 is the first load transformation.  */
   else if (!phi_inserted)
{
- res = phiprop_insert_phi (bb, phi, use_stmt, phivn, n);
+ res = phiprop_insert_phi (bb, phi, use_stmt, phivn, n, dce_ssa_names);
  type = TREE_TYPE (res);
 
  /* Remember the value we created for *ptr.  */
  phivn[SSA_NAME_VERSION (ptr)].value = res;
  phivn[SSA_NAME_VERSION (ptr)].vuse = vuse;
 
- /* Remove old stmt.  The phi is taken care of by DCE, if we
-want to delete it here we also have to delete all intermediate
-copies.  */
+ /* Remove old stmt.  The phi and all of maybe its depedencies
+will be removed later via simple_dce_from_worklist. */
  gsi = gsi_for_stmt (use_stmt);
  gsi_remove (, true);
 
@@ -514,6 +518,7 @@ pass_phiprop::execute (function *fun)
   gphi_iterator gsi;
   unsigned i;
   size_t n;
+  auto_bitmap dce_ssa_names;
 
   calculate_dominance_info (CDI_DOMINATORS);
 
@@ -531,11 +536,14 @@ pass_phiprop::execute (function *fun)
   if (bb_has_abnormal_pred (bb))
continue;
   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
-   did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
+   did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n, 
dce_ssa_names);
 }
 
   if (did_something)
-gsi_commit_edge_inserts ();
+{
+  gsi_commit_edge_inserts ();
+  simple_dce_from_worklist (dce_ssa_names);
+}
 
   free (phivn);
 
-- 
2.43.0



Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-05-23 Thread Andrew Pinski
On Thu, May 23, 2024 at 8:01 AM Manolis Tsamis  wrote:
>
> This pass detects cases of expensive store forwarding and tries to avoid them
> by reordering the stores and using suitable bit insertion sequences.
> For example it can transform this:
>
>  strbw2, [x1, 1]
>  ldr x0, [x1]  # Epxensive store forwarding to larger load.
>
> To:
>
>  ldr x0, [x1]
>  strbw2, [x1]
>  bfi x0, x2, 0, 8
>

Are you sure this is correct with respect to the C11/C++11 memory
models? If not then the pass should be gated with
flag_store_data_races.
Also stores like this start a new "alias set" (I can't remember the
exact term here). So how do you represent the store's aliasing set? Do
you change it? If not, are you sure that will do the right thing?

You didn't document the new option or the new --param (invoke.texi);
this is the bare minimum requirement.
Note you should add documentation for the new pass in the internals
manual (passes.texi) (note most folks forget to update this when
adding a new pass).

Thanks,
Andrew


> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
>
>   Neoverse-N1:  +29.4%
>   Intel Coffeelake: +13.1%
>   AMD 5950X:+17.5%
>
> PR rtl-optimization/48696
>
> gcc/ChangeLog:
>
> * Makefile.in: Add avoid-store-forwarding.o.
> * common.opt: New option -favoid-store-forwarding.
> * params.opt: New param store-forwarding-max-distance.
> * passes.def: Schedule a new pass.
> * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> * avoid-store-forwarding.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/avoid-store-forwarding-1.c: New test.
> * gcc.dg/avoid-store-forwarding-2.c: New test.
> * gcc.dg/avoid-store-forwarding-3.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
>  gcc/Makefile.in   |   1 +
>  gcc/avoid-store-forwarding.cc | 554 ++
>  gcc/common.opt|   4 +
>  gcc/params.opt|   4 +
>  gcc/passes.def|   1 +
>  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
>  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
>  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
>  gcc/tree-pass.h   |   1 +
>  9 files changed, 681 insertions(+)
>  create mode 100644 gcc/avoid-store-forwarding.cc
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a7f15694c34..be969b1ca1d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1681,6 +1681,7 @@ OBJS = \
> statistics.o \
> stmt.o \
> stor-layout.o \
> +   avoid-store-forwarding.o \
> store-motion.o \
> streamer-hooks.o \
> stringpool.o \
> diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> new file mode 100644
> index 000..d90627c4872
> --- /dev/null
> +++ b/gcc/avoid-store-forwarding.cc
> @@ -0,0 +1,554 @@
> +/* Avoid store forwarding optimization pass.
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +   Contributed by VRULL GmbH.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "alias.h"
> +#include "rtlanal.h"
> +#include "tree-pass.h"
> +#include "cselib.h"
> +#include "predict.h"
> +#include "insn-config.h"
> +#include "expmed.h"
> +#include "recog.h"
> +#include "regset.h"
> +#include "df.h"
> +#include "expr.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "vec.h"
> +
> +/* This pass tries to detect and avoid cases of store forwarding.
> +   On many processors there is a large penalty when smaller stores are
> +   forwarded to larger loads.  The idea used to avoid the stall is to move
> +   the store after the load and in addition emit a bit insert sequence so
> +   the load register has the 

Re: [PATCH] AARCH64: Add Qualcomnm oryon-1 core

2024-05-22 Thread Andrew Pinski
On Tue, May 14, 2024 at 10:27 AM Kyrill Tkachov
 wrote:
>
> Hi Andrew,
>
> On Fri, May 3, 2024 at 8:50 PM Andrew Pinski  wrote:
>>
>> This patch adds Qualcomm's new oryon-1 core; this is enough
>> to recongize the core and later on will add the tuning structure.
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64-cores.def (oryon-1): New entry.
>> * config/aarch64/aarch64-tune.md: Regenerate.
>> * doc/invoke.texi  (AArch64 Options): Document oryon-1.
>>
>> Signed-off-by: Andrew Pinski 
>> Co-authored-by: Joel Jones 
>> Co-authored-by: Wei Zhao 
>> ---
>>  gcc/config/aarch64/aarch64-cores.def | 5 +
>>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>>  gcc/doc/invoke.texi  | 1 +
>>  3 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index f69fc212d56..be60929e400 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -151,6 +151,11 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb, 
>> cortexa57, V8_4A,  (SVE, I8MM, B
>>  /* Qualcomm ('Q') cores. */
>>  AARCH64_CORE("saphira", saphira,saphira,V8_4A,  (CRYPTO), 
>> saphira,   0x51, 0xC01, -1)
>>
>> +/* ARMv8.6-A Architecture Processors.  */
>> +
>> +/* Qualcomm ('Q') cores. */
>> +AARCH64_CORE("oryon-1", oryon1, cortexa57, V8_6A, (CRYPTO, SM4, SHA3, F16), 
>> cortexa72,   0x51, 0x001, -1)
>> +
>>  /* ARMv8-A big.LITTLE implementations.  */
>>
>>  AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, V8A,  
>> (CRC), cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
>> diff --git a/gcc/config/aarch64/aarch64-tune.md 
>> b/gcc/config/aarch64/aarch64-tune.md
>> index abd3c9e0822..ba940f1c890 100644
>> --- a/gcc/config/aarch64/aarch64-tune.md
>> +++ b/gcc/config/aarch64/aarch64-tune.md
>> @@ -1,5 +1,5 @@
>>  ;; -*- buffer-read-only: t -*-
>>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>>  (define_attr "tune"
>> -   
>> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
>> +   
>> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
>> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 9456ced468a..eabe09dc28f 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -21323,6 +21323,7 @@ performance of the code.  Permissible values for 
>> this option are:
>>  @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
>>  @samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c},
>>  @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
>> +@samp{oyron-1},
>
>
> Typo in the name.
> LGTM with that fixed.

Thanks, pushed as r15-784-g01cfd601825014.

Thanks,
Andrew

> Thanks,
> Kyrill
>
>>
>>
>>  @samp{neoverse-512tvb}, @samp{neoverse-e1}, @samp{neoverse-n1},
>>  @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2}, @samp{qdf24xx},
>>  @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
>> --
>> 2.43.0
>>


Re: [PATCH] aarch64: Fold vget_high_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-22 Thread Andrew Pinski
On Wed, May 22, 2024 at 5:28 AM Richard Sandiford
 wrote:
>
> Pengxuan Zheng  writes:
> > This patch is a follow-up of r15-697-ga2e4fe5a53cf75 to also fold 
> > vget_high_*
> > intrinsics to BIT_FILED_REF and remove the vget_high_* definitions from
> > arm_neon.h to use the new intrinsics framework.
> >
> >   PR target/102171
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-builtins.cc 
> > (AARCH64_SIMD_VGET_HIGH_BUILTINS):
> >   New macro to create definitions for all vget_high intrinsics.
> >   (VGET_HIGH_BUILTIN): Likewise.
> >   (enum aarch64_builtins): Add vget_high function codes.
> >   (AARCH64_SIMD_VGET_LOW_BUILTINS): Delete duplicate macro.
> >   (aarch64_general_fold_builtin): Fold vget_high calls.
> >   * config/aarch64/aarch64-simd-builtins.def: Delete vget_high builtins.
> >   * config/aarch64/aarch64-simd.md (aarch64_get_high): Delete.
> >   (aarch64_vget_hi_halfv8bf): Likewise.
> >   * config/aarch64/arm_neon.h (__attribute__): Delete.
> >   (vget_high_f16): Likewise.
> >   (vget_high_f32): Likewise.
> >   (vget_high_f64): Likewise.
> >   (vget_high_p8): Likewise.
> >   (vget_high_p16): Likewise.
> >   (vget_high_p64): Likewise.
> >   (vget_high_s8): Likewise.
> >   (vget_high_s16): Likewise.
> >   (vget_high_s32): Likewise.
> >   (vget_high_s64): Likewise.
> >   (vget_high_u8): Likewise.
> >   (vget_high_u16): Likewise.
> >   (vget_high_u32): Likewise.
> >   (vget_high_u64): Likewise.
> >   (vget_high_bf16): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/vget_high_2.c: New test.
> >   * gcc.target/aarch64/vget_high_2_be.c: New test.
>
> OK, thanks.

Pushed as r15-778-g1d1ef1c22752b3 .

Thanks,
Andrew


>
> Richard
>
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-builtins.cc|  59 +++---
> >  gcc/config/aarch64/aarch64-simd-builtins.def  |   6 -
> >  gcc/config/aarch64/aarch64-simd.md|  22 
> >  gcc/config/aarch64/arm_neon.h | 105 --
> >  .../gcc.target/aarch64/vget_high_2.c  |  30 +
> >  .../gcc.target/aarch64/vget_high_2_be.c   |  31 ++
> >  6 files changed, 104 insertions(+), 149 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_high_2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_high_2_be.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index 11b888016ed..f8eeccb554d 100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -675,6 +675,23 @@ static aarch64_simd_builtin_datum 
> > aarch64_simd_builtin_data[] = {
> >VGET_LOW_BUILTIN(u64) \
> >VGET_LOW_BUILTIN(bf16)
> >
> > +#define AARCH64_SIMD_VGET_HIGH_BUILTINS \
> > +  VGET_HIGH_BUILTIN(f16) \
> > +  VGET_HIGH_BUILTIN(f32) \
> > +  VGET_HIGH_BUILTIN(f64) \
> > +  VGET_HIGH_BUILTIN(p8) \
> > +  VGET_HIGH_BUILTIN(p16) \
> > +  VGET_HIGH_BUILTIN(p64) \
> > +  VGET_HIGH_BUILTIN(s8) \
> > +  VGET_HIGH_BUILTIN(s16) \
> > +  VGET_HIGH_BUILTIN(s32) \
> > +  VGET_HIGH_BUILTIN(s64) \
> > +  VGET_HIGH_BUILTIN(u8) \
> > +  VGET_HIGH_BUILTIN(u16) \
> > +  VGET_HIGH_BUILTIN(u32) \
> > +  VGET_HIGH_BUILTIN(u64) \
> > +  VGET_HIGH_BUILTIN(bf16)
> > +
> >  typedef struct
> >  {
> >const char *name;
> > @@ -717,6 +734,9 @@ typedef struct
> >  #define VGET_LOW_BUILTIN(A) \
> >AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
> >
> > +#define VGET_HIGH_BUILTIN(A) \
> > +  AARCH64_SIMD_BUILTIN_VGET_HIGH_##A,
> > +
> >  #undef VAR1
> >  #define VAR1(T, N, MAP, FLAG, A) \
> >AARCH64_SIMD_BUILTIN_##T##_##N##A,
> > @@ -753,6 +773,7 @@ enum aarch64_builtins
> >/* SIMD intrinsic builtins.  */
> >AARCH64_SIMD_VREINTERPRET_BUILTINS
> >AARCH64_SIMD_VGET_LOW_BUILTINS
> > +  AARCH64_SIMD_VGET_HIGH_BUILTINS
> >/* ARMv8.3-A Pointer Authentication Builtins.  */
> >AARCH64_PAUTH_BUILTIN_AUTIA1716,
> >AARCH64_PAUTH_BUILTIN_PACIA1716,
> > @@ -855,26 +876,21 @@ static aarch64_fcmla_laneq_builtin_datum 
> > aarch64_fcmla_lane_builtin_data[] = {
> > false \
> >},
> >
> > -#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > -  VGET_LOW_BUILTIN(f16) \
> > -  VGET_LOW_BUILTIN(f32) \
> > -  VGET_LOW_BUILTIN(f64) \
> > -  VGET_LOW_BUILTIN(p8) \
> > -  VGET_LOW_BUILTIN(p16) \
> > -  VGET_LOW_BUILTIN(p64) \
> > -  VGET_LOW_BUILTIN(s8) \
> > -  VGET_LOW_BUILTIN(s16) \
> > -  VGET_LOW_BUILTIN(s32) \
> > -  VGET_LOW_BUILTIN(s64) \
> > -  VGET_LOW_BUILTIN(u8) \
> > -  VGET_LOW_BUILTIN(u16) \
> > -  VGET_LOW_BUILTIN(u32) \
> > -  VGET_LOW_BUILTIN(u64) \
> > -  VGET_LOW_BUILTIN(bf16)
> > +#undef VGET_HIGH_BUILTIN
> > +#define VGET_HIGH_BUILTIN(A) \
> > +  {"vget_high_" #A, \
> > +   AARCH64_SIMD_BUILTIN_VGET_HIGH_##A, \
> > +   2, \
> > +   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> > +   { SIMD_INTR_QUAL(A), 

Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-21 Thread Andrew Pinski
On Tue, May 21, 2024 at 5:28 AM Li, Pan2  wrote:
>
> Thanks Andrew for comments.
>
>
>
> > I think you need to make sure type and @0's type matches.
>
>
>
> Oh, yes, we need that, will update in v2.
>
>
>
> > Also I don't think you need :c here since you don't match @0 nor @1 more 
> > than once.
>
>
>
> You mean the :c from (IFN_ADD_OVERFLOW:c@2 @0 @1)), right?
>
> My initial idea is to catch both the (IFN_ADD_OVERFLOW @0 @1) and 
> (IFN_ADD_OVERFLOW @1 @0).
>
> It is unnecessary if IFN_ADD_OVERFLOW takes care of this already.

Since in this case there is Canonical form/order here (at least there
should be).
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))

Since you matching @2 for the realpart rather than `(IFN_ADD_OVERFLOW
@0 @1)` directly the :c is not needed and genmatch will just generate
extra matching code that cannot be not get reached

Thanks,
Andrew.

>
>
>
> Pan
>
>
>
>
>
> From: Andrew Pinski 
> Sent: Tuesday, May 21, 2024 7:40 PM
> To: Li, Pan2 
> Cc: GCC Patches ; 钟居哲 ; Kito 
> Cheng ; Tamar Christina ; 
> Richard Guenther 
> Subject: Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form 
> for unsigned SAT_ADD
>
>
>
>
>
> On Tue, May 21, 2024, 3:55 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, ) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add new simplify to convert branch SAT_ADD into
> branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..8b9ded98323 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +#if GIMPLE
> +
> +(simplify
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))
> + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
> +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))
>
>
>
> I think you need to make sure type and @0's type matches.
>
>
>
> Also I don't think you need :c here since you don't match @0 nor @1 more than 
> once.
>
>
>
> Thanks,
>
> Andrew
>
>
>
>
>
> +
> +#endif
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1


Re: [PATCH] driver: Use -as/ld as final fallback instead of as/ld for cross

2024-05-21 Thread Andrew Pinski
On Tue, May 21, 2024 at 5:12 AM YunQiang Su  wrote:
>
> If `find_a_program` cannot find `as/ld` and we are a cross toolchain,
> the final fallback is `as/ld` of system.  In fact, we can have a try
> with -as/ld before fallback to native as/ld.
>
> This patch is derivatived from Debian's patch:
>   gcc-search-prefixed-as-ld.diff
>
> gcc
> * gcc.cc(execute): Looks for -as/ld before fallback
> to native as/ld.
> ---
>  gcc/gcc.cc | 21 +
>  1 file changed, 21 insertions(+)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..8a1bdb5e3e2 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -3293,6 +3293,27 @@ execute (void)
>string = find_a_program(commands[0].prog);
>if (string)
> commands[0].argv[0] = string;
> +  else if (*cross_compile != '0'
> +   && (!strcmp (commands[0].argv[0], "as")
> +   || !strcmp (commands[0].argv[0], "ld")))
> +   {
> + string = XNEWVEC (char, strlen (commands[0].argv[0]) + 2
> + + strlen (DEFAULT_REAL_TARGET_MACHINE));
> + strcpy (string, DEFAULT_REAL_TARGET_MACHINE);
> + strcat (string, "-");
> + strcat (string, commands[0].argv[0]);
> + const char *string_args[] = {string, "--version", NULL};
> + int exit_status = 0;
> + int err = 0;
> + const char *errmsg = pex_one (PEX_SEARCH, string,
> + CONST_CAST (char **, string_args), string,
> + NULL, NULL, _status, );

I think this should be handled under find_a_program instead of
execute. That should simplify things slightly.
You should also most likely use concat here instead of
XNEWVEC/strcpy/strcat which will also simplify the code.
Like string = concat (DEFAULT_REAL_TARGET_MACHINE, "-", commands[0].prog);

I think this should be done for more than just as/ld but also objcopy
(which is used for gsplit-dwarf).
Is there a reason why you are needing to try to execute with
"--version" as an argument here?

Thanks,
Andrew Pinski

> + if (errmsg == NULL && exit_status == 0 && err == 0)
> +   {
> + commands[0].argv[0] = string;
> + commands[0].prog = string;
> +   }
> +   }
>  }
>
>for (n_commands = 1, i = 0; argbuf.iterate (i, ); i++)
> --
> 2.39.2
>


Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-21 Thread Andrew Pinski
On Tue, May 21, 2024, 3:55 AM  wrote:

> From: Pan Li 
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, ) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add new simplify to convert branch SAT_ADD into
> branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..8b9ded98323 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +#if GIMPLE
> +
> +(simplify
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))
> + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))
>

I think you need to make sure type and @0's type matches.

Also I don't think you need :c here since you don't match @0 nor @1 more
than once.

Thanks,
Andrew


+
> +#endif
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>
>


[PATCH] match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

2024-05-20 Thread Andrew Pinski
The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for signed 
1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.

Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround the 
undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for that.

OK for trunk and gcc-14 and gcc-13 branches? Bootstrapped and tested on 
x86_64-linux-gnu with no regressions.

PR tree-optimization/115154

gcc/ChangeLog:

* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): Disable
for 1bit signed types.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd  |  6 +++--
 .../c-c++-common/ubsan/signed1bitfield-1.c| 25 +++
 .../gcc.c-torture/execute/signed1bitfield-1.c | 23 +
 3 files changed, 52 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..35e3d82b131 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2395,12 +2395,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (mult (convert @0) @1)))
 
 /* Narrow integer multiplication by a zero_one_valued_p operand.
-   Multiplication by [0,1] is guaranteed not to overflow.  */
+   Multiplication by [0,1] is guaranteed not to overflow except for
+   1bit signed types.  */
 (simplify
  (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
-  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)))
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0))
+  && (TYPE_UNSIGNED (type) || TYPE_PRECISION (type) > 1))
   (mult (convert @1) (convert @2
 
 /* (X << C) != 0 can be simplified to X, when C is zero_one_valued_p.
diff --git a/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c 
b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
new file mode 100644
index 000..2ba8cf4dab0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsanitize=undefined" } */
+
+/* PR tree-optimization/115154 */
+/* This was being miscompiled with -fsanitize=undefined due to
+   `(signed:1)(t*5)` being transformed into `-((signed:1)t)` which
+   is undefined. */
+
+struct s {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c 
b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
new file mode 100644
index 000..ab888ca3a04
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/115154 */
+/* This was being miscompiled to `(signed:1)(t*5)`
+   being transformed into `-((signed:1)t)` which is undefined.
+   Note there is a pattern which removes the negative in some cases
+   which works around the issue.  */
+
+struct {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
-- 
2.43.0



RE: [PATCH] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-20 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Richard Biener 
> Sent: Sunday, May 19, 2024 11:55 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PHIOPT: Don't transform minmax if
> middle bb contains a phi [PR115143]
> 
> 
> 
> > Am 19.05.2024 um 01:12 schrieb Andrew Pinski
> :
> >
> > The problem here is even if last_and_only_stmt returns a
> statement,
> > the bb might still contain a phi node which defines a ssa
> name which
> > is used in that statement so we need to add a check to make
> sure that
> > the phi nodes are empty for the middle bbs in both the
> > `CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B`
> cases.
> 
> Is that single arg PHIs or do we have an extra edge into the
> middle BB?  I think that might be unexpected, at least costing
> wise.  Maybe Also to some of the replacement code we have ?

It is only a single arg PHI since we already reject multiple edges in the 
middle BBs for these cases.
It was EVPR that produces the single arg PHI in the original testcase from 
folding of a conditional to false and evpr does not do simple name prop in this 
case and there is no pass inbetween evrp and phiopt that will clear up single 
arg PHI.
I added the Gimple based testcases basically to avoid the needing of depending 
on what previous passes could produce too.

> 
> > OK for trunk and backport to all open branches since r14-
> 3827-g30e6ee074588ba was backported?
> > Bootstrapped and tested on x86_64_linux-gnu with no
> regressions.
> >
> 
> Ok

Does this include the GCC 13 branch or should I wait until after the GCC 13.3.0 
release?

Thanks,
Andrew Pinski

> 
> Richard
> 
> >PR tree-optimization/115143
> >
> > gcc/ChangeLog:
> >
> >* tree-ssa-phiopt.cc (minmax_replacement): Check for
> empty
> >phi nodes for middle bbs for the case where middle bb is
> not empty.
> >
> > gcc/testsuite/ChangeLog:
> >
> >* gcc.c-torture/compile/pr115143-1.c: New test.
> >* gcc.c-torture/compile/pr115143-2.c: New test.
> >* gcc.c-torture/compile/pr115143-3.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> > .../gcc.c-torture/compile/pr115143-1.c| 21
> +
> > .../gcc.c-torture/compile/pr115143-2.c| 30
> +++
> > .../gcc.c-torture/compile/pr115143-3.c| 29
> ++
> > gcc/tree-ssa-phiopt.cc| 12 
> > 4 files changed, 92 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-1.c
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-2.c
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-3.c
> >
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > new file mode 100644
> > index 000..5cb119ea432
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > @@ -0,0 +1,21 @@
> > +/* PR tree-optimization/115143 */
> > +/* This used to ICE.
> > +   minmax part of phiopt would transform,
> > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > +   which was correct except b was defined by a phi in the
> inner
> > +   bb which was not handled. */
> > +short a, d;
> > +char b;
> > +long c;
> > +unsigned long e, f;
> > +void g(unsigned long h) {
> > +  if (c ? e : b)
> > +if (e)
> > +  if (d) {
> > +a = f ? ({
> > +  unsigned long i = d ? f : 0, j = e ? h : 0;
> > +  i < j ? i : j;
> > +}) : 0;
> > +  }
> > +}
> > +
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > new file mode 100644
> > index 000..05c3bbe9738
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-options "-fgimple" } */
> > +/* PR tree-optimization/115143 */
> > +/* This used to ICE.
> > +   minmax part of phiopt would transform,
> > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > +   which was correct except b was defined by a phi in the
> inner
> > +   bb which was not handled. */
> > +unsigned __GIMPLE (ssa,startwith("phiopt")) foo (unsigned
> a, unsigned
> > +b) {
> > +  unsigned j;
> > +  unsigned _23;
> > +  unsigned _12;
> > +
> > +  __BB(2):
> > +  if (a_6(D) != 0u)
> > +goto __BB3;
> >

Re: [PATCH] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-20 Thread Andrew Pinski
On Mon, May 20, 2024 at 2:57 AM Richard Sandiford
 wrote:
>
> Pengxuan Zheng  writes:
> > This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up more
> > optimization opportunities for gimple optimizers.
> >
> > While we are here, we also remove the vget_low_* definitions from 
> > arm_neon.h and
> > use the new intrinsics framework.
> >
> > PR target/102171
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-builtins.cc (AARCH64_SIMD_VGET_LOW_BUILTINS):
> >   New macro to create definitions for all vget_low intrinsics.
> >   (VGET_LOW_BUILTIN): Likewise.
> >   (enum aarch64_builtins): Add vget_low function codes.
> >   (aarch64_general_fold_builtin): Fold vget_low calls.
> >   * config/aarch64/aarch64-simd-builtins.def: Delete vget_low builtins.
> >   * config/aarch64/aarch64-simd.md (aarch64_get_low): Delete.
> >   (aarch64_vget_lo_halfv8bf): Likewise.
> >   * config/aarch64/arm_neon.h (__attribute__): Delete.
> >   (vget_low_f16): Likewise.
> >   (vget_low_f32): Likewise.
> >   (vget_low_f64): Likewise.
> >   (vget_low_p8): Likewise.
> >   (vget_low_p16): Likewise.
> >   (vget_low_p64): Likewise.
> >   (vget_low_s8): Likewise.
> >   (vget_low_s16): Likewise.
> >   (vget_low_s32): Likewise.
> >   (vget_low_s64): Likewise.
> >   (vget_low_u8): Likewise.
> >   (vget_low_u16): Likewise.
> >   (vget_low_u32): Likewise.
> >   (vget_low_u64): Likewise.
> >   (vget_low_bf16): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/pr113573.c: Replace __builtin_aarch64_get_lowv8hi
> >   with vget_low_s16.
> >   * gcc.target/aarch64/vget_low_2.c: New test.
> >   * gcc.target/aarch64/vget_low_2_be.c: New test.
>
> Ok, thanks.  I suppose the patch has the side effect of allowing
> vget_low_bf16 to be called without +bf16.  IMO that's the correct
> behaviour though, and is consistent with how we handle reinterprets.

Pushed as r15-697-ga2e4fe5a53cf75cd055f64e745ebd51253e42254 .

Thanks,
Andrew

>
> Richard
>
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-builtins.cc|  60 ++
> >  gcc/config/aarch64/aarch64-simd-builtins.def  |   5 +-
> >  gcc/config/aarch64/aarch64-simd.md|  23 +---
> >  gcc/config/aarch64/arm_neon.h | 105 --
> >  gcc/testsuite/gcc.target/aarch64/pr113573.c   |   2 +-
> >  gcc/testsuite/gcc.target/aarch64/vget_low_2.c |  30 +
> >  .../gcc.target/aarch64/vget_low_2_be.c|  31 ++
> >  7 files changed, 124 insertions(+), 132 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2_be.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index 75d21de1401..4afe7c86ae3 100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -658,6 +658,23 @@ static aarch64_simd_builtin_datum 
> > aarch64_simd_builtin_data[] = {
> >VREINTERPRET_BUILTINS \
> >VREINTERPRETQ_BUILTINS
> >
> > +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > +  VGET_LOW_BUILTIN(f16) \
> > +  VGET_LOW_BUILTIN(f32) \
> > +  VGET_LOW_BUILTIN(f64) \
> > +  VGET_LOW_BUILTIN(p8) \
> > +  VGET_LOW_BUILTIN(p16) \
> > +  VGET_LOW_BUILTIN(p64) \
> > +  VGET_LOW_BUILTIN(s8) \
> > +  VGET_LOW_BUILTIN(s16) \
> > +  VGET_LOW_BUILTIN(s32) \
> > +  VGET_LOW_BUILTIN(s64) \
> > +  VGET_LOW_BUILTIN(u8) \
> > +  VGET_LOW_BUILTIN(u16) \
> > +  VGET_LOW_BUILTIN(u32) \
> > +  VGET_LOW_BUILTIN(u64) \
> > +  VGET_LOW_BUILTIN(bf16)
> > +
> >  typedef struct
> >  {
> >const char *name;
> > @@ -697,6 +714,9 @@ typedef struct
> >  #define VREINTERPRET_BUILTIN(A, B, L) \
> >AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B,
> >
> > +#define VGET_LOW_BUILTIN(A) \
> > +  AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
> > +
> >  #undef VAR1
> >  #define VAR1(T, N, MAP, FLAG, A) \
> >AARCH64_SIMD_BUILTIN_##T##_##N##A,
> > @@ -732,6 +752,7 @@ enum aarch64_builtins
> >AARCH64_CRC32_BUILTIN_MAX,
> >/* SIMD intrinsic builtins.  */
> >AARCH64_SIMD_VREINTERPRET_BUILTINS
> > +  AARCH64_SIMD_VGET_LOW_BUILTINS
> >/* ARMv8.3-A Pointer Authentication Builtins.  */
> >AARCH64_PAUTH_BUILTIN_AUTIA1716,
> >AARCH64_PAUTH_BUILTIN_PACIA1716,
> > @@ -823,8 +844,37 @@ static aarch64_fcmla_laneq_builtin_datum 
> > aarch64_fcmla_lane_builtin_data[] = {
> >   && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
> >},
> >
> > +#undef VGET_LOW_BUILTIN
> > +#define VGET_LOW_BUILTIN(A) \
> > +  {"vget_low_" #A, \
> > +   AARCH64_SIMD_BUILTIN_VGET_LOW_##A, \
> > +   2, \
> > +   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> > +   { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
> > +   FLAG_AUTO_FP, \
> > +   false \
> > +  },
> > +
> > +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > +  VGET_LOW_BUILTIN(f16) \
> > +  

Re: [pshed] testsuite, C++, Darwin: Skip cxa_atexit-6, which is not applicable.

2024-05-19 Thread Andrew Pinski
On Sun, May 19, 2024 at 6:38 AM Iain Sandoe  wrote:
>
> As per the analysis in the PR, tested on x86_64, i686 and aarch64 Darwin
> (and on x86_64 linux), pushed to trunk, thanks,
> Iain

Thanks for doing this.

Thanks,
Andrew

>
> --- 8< ---
>
> For Darwin, non-weak functions defined in a TU always bind locally
> and so cxa_atexit-6.C is not applicable here.
>
> PR testsuite/114982
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/cxa_atexit-6.C: Skip for Darwin.
>
> Signed-off-by: Iain Sandoe 
> ---
>  gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C 
> b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C
> index f6599a3c9f4..e22036067dd 100644
> --- a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C
> @@ -2,10 +2,14 @@
>  /* { dg-require-effective-target fpic } */
>  /* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized -fPIC" 
> } */
>  // { dg-require-effective-target cxa_atexit }
> +/* This test is not appropriate for targets where non-weak functions defined
> +   in the TU always bind locally; see PR114982.  */
> +/* { dg-skip-if "PR114982" { *-*-darwin* } } */
>  /* PR tree-optimization/19661 */
>
>  /* The call to axexit should not be removed as A::~A() cannot be figured if 
> it
> -   is a pure/const function call as the function call g does not bind 
> locally. */
> +   is a pure/const function call for platforms where the function call g does
> +   not bind locally. */
>
>  __attribute__((noinline))
>  void g() {}
> --
> 2.39.2 (Apple Git-143)
>


Re: [to-be-committed][RISC-V] Eliminate redundant bitmanip operation

2024-05-19 Thread Andrew Pinski
On Sun, May 19, 2024 at 10:58 AM Jeff Law  wrote:
>
> perl has some internal bitmap code.  One of its implementation
> properties is that if you ask it to set a bit, the bit is first cleared.
>
>
> Unfortunately this is fairly hard to see in gimple/match due to type
> changes in the IL.  But it is easy to see in the code we get from
> combine.  So we just match the relevant cases.


So looking into this from a gimple point of view, we can see the issue
on x86_64 if you used explicitly `unsigned char`.
We have:
```
  c_8 = (unsigned char) _1;
  _2 = *a_10(D);
  c.0_3 = (signed char) _1;
  _4 = ~c.0_3;
  _12 = (unsigned char) _4;
``
So for this, we could push the no_op cast from `signed char` to
`unsigned char` past the `bit_not` and I think it will fix the issue
on the gimple level.
So something like:
```
/* Push no_op conversion past the bit_not expression if it was single use. */
(simplify
 (convert (bit_not:s @0))
 (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
  (bit_not (convert @0))))
```

Thanks,
Andrew Pinski

>
>
>
> Regression tested in Ventana's CI system as well as my own.  Waiting on
> the Rivos CI system before moving forward.
>
>
>
> Jeff


Re: [PATCH v1] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-18 Thread Andrew Pinski
On Sat, May 18, 2024, 9:17 PM  wrote:

> From: Pan Li 
>
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus, extract one helper function to
> do this and avoid match code duplication.
>
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (integer_types_ternary_match): New helper
> function to check tenary tree type matches or not.
> * gimple-match-head.cc (integer_types_ternary_match): Ditto but
> for match.
> * match.pd: Leverage above helper function to avoid code dup.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 17 +
>  gcc/gimple-match-head.cc  | 17 +
>  gcc/match.pd  | 25 +
>  3 files changed, 39 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..cdd48c7a5cc 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..91f2e56b8ef 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..b291e34bbe4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0,
> @1
>


Even though unsigned might be the cheaper check, you might need to swap the
order back to where it was so you check integral first.

Otherwise this is nice cleanup. (Note I can't approve it though).

Thanks,
Andrew


>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0,
> @1
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (lt (plus:c @0 @1) @0)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0,
> @1
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (gt @0 (plus:c @0 @1
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0,
> @1
>
>  (match (usadd_right_part_2 @0 @1)
>   (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> integer_zerop)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && 

[PATCH] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-18 Thread Andrew Pinski
The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.

OK for trunk and backport to all open branches since r14-3827-g30e6ee074588ba 
was backported?
Bootstrapped and tested on x86_64_linux-gnu with no regressions.

PR tree-optimization/115143

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.

Signed-off-by: Andrew Pinski 
---
 .../gcc.c-torture/compile/pr115143-1.c| 21 +
 .../gcc.c-torture/compile/pr115143-2.c| 30 +++
 .../gcc.c-torture/compile/pr115143-3.c| 29 ++
 gcc/tree-ssa-phiopt.cc| 12 
 4 files changed, 92 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr115143-3.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
new file mode 100644
index 000..5cb119ea432
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+short a, d;
+char b;
+long c;
+unsigned long e, f;
+void g(unsigned long h) {
+  if (c ? e : b)
+if (e)
+  if (d) {
+a = f ? ({
+  unsigned long i = d ? f : 0, j = e ? h : 0;
+  i < j ? i : j;
+}) : 0;
+  }
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
new file mode 100644
index 000..05c3bbe9738
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
@@ -0,0 +1,30 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) != 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_11(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
new file mode 100644
index 000..53c5fb5588e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
@@ -0,0 +1,29 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) > 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_7(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f166c3132cb..918cf50b589 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1925,6 +1925,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes in the middle bb. */
+  if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
+   return false;
+
   lhs = gimple_assign_lhs (assign);
   ass_code = gimple_assign_rhs_code (assign);
   if (ass_code != MAX_EXPR && ass_code != MIN_EXPR)
@@ -1938,6 +1942,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes in the alt middle bb. */
+  if (!gimple_seq_empty_p (phi_nodes (alt_middle_bb)))
+   

Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-18 Thread Andrew Pinski
On Thu, Apr 11, 2024 at 8:07 PM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?


This is missing adding documentation for the new optab.
It should be documented in md.texi under `Standard Pattern Names For
Generation` section.

Thanks,
Andrew


>
> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
> for isfinite builtin.
> * optabs.def (isfinite_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index d2786f207b8..5262aa01660 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] Optab: add isnormal_optab for __builtin_isnormal

2024-05-18 Thread Andrew Pinski
On Fri, Apr 12, 2024 at 1:10 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isnormal. The normal check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?

This is missing adding documentation for the new optab.
It should be documented in md.texi under `Standard Pattern Names For
Generation` section.

Thanks,
Andrew

>
> Thanks
> Gui Haochen
> ChangeLog
> optab: Add isnormal_optab for isnormal builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
> for isnormal builtin.
> * optabs.def (isnormal_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 3174f52ebe8..defb39de95f 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>  case BUILT_IN_ISFINITE:
>builtin_optab = isfinite_optab; break;
>  case BUILT_IN_ISNORMAL:
> +  builtin_optab = isnormal_optab; break;
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index dcd77315c2a..3c401fc0b4c 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
>  OPTAB_D (isfinite_optab, "isfinite$a2")
> +OPTAB_D (isnormal_optab, "isnormal$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 7:46 PM Tamar Christina 
wrote:

> Hi Victor,
>
> > -Original Message-
> > From: Victor Do Nascimento 
> > Sent: Thursday, May 16, 2024 3:39 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Richard Earnshaw
> > ; Victor Do Nascimento
> > 
> > Subject: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> >
> > From: Victor Do Nascimento 
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> >
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_dotvnx4sivnx8hi):
> >   renamed to `dot_prod_twoway_vnx8hi'.
> >   * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> >   update icodes used in line with above rename.
>
> Please split the target specific bits from the target agnostic parts.
> I.e. this patch series should be split in two.
>
> >   * optabs-tree.cc (optab_for_tree_code_1): Renamed
> >   `optab_for_tree_code' and added new argument.
> >   (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> >   * optabs-tree.h (optab_for_tree_code_1): New.
> >   * optabs.cc (expand_widen_pattern_expr): Expand support for
> >   DOT_PROD_EXPR patterns.
> >   * optabs.def (udot_prod_twoway_optab): New.
> >   (sdot_prod_twoway_optab): Likewise.
> >   * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> >   support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
> >  gcc/config/aarch64/aarch64-sve2.md|  2 +-
> >  gcc/optabs-tree.cc| 23 --
> >  gcc/optabs-tree.h |  2 ++
> >  gcc/optabs.cc |  2 +-
> >  gcc/optabs.def|  2 ++
> >  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
> >  gcc/tree-vect-patterns.cc |  2 +-
> >  8 files changed, 54 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 0d2edf3f19e..e457db09f66 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -764,8 +764,8 @@ public:
> >icode = (e.type_suffix (0).float_p
> >  ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> >  : e.type_suffix (0).unsigned_p
> > -? 

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 4:40 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>

Since you are adding an optab, please update the internals manual with the
documentation of the optab (the standard pattern names section).

Thanks,
Andrew


> [1]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>: e.type_suffix (0).unsigned_p
> -  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +  ? CODE_FOR_udot_prod_twoway_vnx8hi
> +  : CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> 

Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> At present the autovectorizer fails to vectorize simple loops
> involving calls to `__builtin_prefetch'.  A simple example of such
> loop is given below:
>
> void foo(double * restrict a, double * restrict b, int n){
>   int i;
>   for(i=0; i a[i] = a[i] + b[i];
> __builtin_prefetch(&(b[i+8]));
>   }
> }
>
> The failure stems from two issues:
>
> 1. Given that it is typically not possible to fully reason about a
>function call due to the possibility of side effects, the
>autovectorizer does not attempt to vectorize loops which make such
>calls.
>
>Given the memory reference passed to `__builtin_prefetch', in the
>absence of assurances about its effect on the passed memory
>location the compiler deems the function unsafe to vectorize,
>marking it as clobbering memory in `vect_find_stmt_data_reference'.
>This leads to the failure in autovectorization.
>
> 2. Notwithstanding the above issue, though the prefetch statement
>would be classed as `vect_unused_in_scope', the loop invariant that
>is used in the address of the prefetch is the scalar loop's and not
>the vector loop's IV. That is, it still uses `i' and not `vec_iv'
>because the instruction wasn't vectorized, causing DCE to think the
>value is live, such that we now have both the vector and scalar loop
>invariant actively used in the loop.
>
> This patch addresses both of these:
>
> 1. About the issue regarding the memory clobber, data prefetch does
>not generate faults if its address argument is invalid and does not
>write to memory.  Therefore, it does not alter the internal state
>of the program or its control flow under any circumstance.  As
>such, it is reasonable that the function be marked as not affecting
>memory contents.
>
>To achieve this, we add the necessary logic to
>`get_references_in_stmt' to ensure that builtin functions are given
>given the same treatment as internal functions.  If the gimple call
>is to a builtin function and its function code is
>`BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
>
> 2. Finding precedence in the way clobber statements are handled,
>whereby the vectorizer drops these from both the scalar and
>vectorized versions of a given loop, we choose to drop prefetch
>hints in a similar fashion.  This seems appropriate given how
>software prefetch hints are typically ignored by processors across
>architectures, as they seldom lead to performance gain over their
>hardware counterparts.
>
>PR target/114061
>

This most likely be tree-optimization/114061 since it is a generic
vectorizer issue. Oh maybe reference the bug # in summary next time just
for easier reference.

Thanks,
Andrew


> gcc/ChangeLog:
>
> * tree-data-ref.cc (get_references_in_stmt): set
> `clobbers_memory' to false for __builtin_prefetch.
> * tree-vect-loop.cc (vect_transform_loop): Drop all
> __builtin_prefetch calls from loops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-prefetch-drop.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++
>  gcc/tree-data-ref.cc   |  9 +
>  gcc/tree-vect-loop.cc  |  7 ++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> new file mode 100644
> index 000..57723a8c972
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve
> -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
> +
> +void foo(double * restrict a, double * restrict b, int n){
> +  int i;
> +  for(i=0; i +a[i] = a[i] + b[i];
> +__builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "prfm" } } */
> +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m,
> z\[0-9\]+.d, z\[0-9\]+.d" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
> index f37734b5340..47bfec0f922 100644
> --- a/gcc/tree-data-ref.cc
> +++ b/gcc/tree-data-ref.cc
> @@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
> vec *references)
> clobbers_memory = true;
> break;
>   }
> +
> +  else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> +   {
> + enum built_in_function fn_type = DECL_FUNCTION_CODE
> (TREE_OPERAND (gimple_call_fn (stmt), 0));
> + if (fn_type == BUILT_IN_PREFETCH)
> +   clobbers_memory = false;
> + else
> +   clobbers_memory = 

Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 12:55 PM Oleg Endo  wrote:

>
> On Thu, 2024-05-16 at 10:35 +0200, Richard Biener wrote:
> > On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
> > >
> > > On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis 
> wrote:
> > > >
> > > > If we consider code like:
> > > >
> > > > if (bar1 == x)
> > > >   return foo();
> > > > if (bar2 != y)
> > > >   return foo();
> > > > return 0;
> > > >
> > > > We would like the ifcombine pass to convert this to:
> > > >
> > > > if (bar1 == x || bar2 != y)
> > > >   return foo();
> > > > return 0;
> > > >
> > > > The ifcombine pass can handle this transformation but it is ran very
> early and
> > > > it misses the opportunity because there are two seperate blocks for
> foo().
> > > > The pre pass is good at removing duplicate code and blocks and due
> to that
> > > > running ifcombine again after it can increase the number of
> successful
> > > > conversions.
> > >
> > > I do think we should have something similar to re-running
> > > ssa-ifcombine but I think it should be much later, like after the loop
> > > optimizations are done.
> > > Maybe just a simplified version of it (that does the combining and not
> > > the optimizations part) included in isel or pass_optimize_widening_mul
> > > (which itself should most likely become part of isel or renamed since
> > > it handles more than just widening multiply these days).
> >
> > I've long wished we had a (late?) pass that can also undo if-conversion
> > (basically do what RTL expansion would later do).  Maybe
> > gimple-predicate-analysis.cc (what's used by uninit analysis) can
> > represent mixed CFG + if-converted conditions so we can optimize
> > it and code-gen the condition in a more optimal manner much like
> > we have if-to-switch, switch-conversion and switch-expansion.
> >
> > That said, I agree that re-running ifcombine should be later.  And
> there's
> > still the old task of splitting tail-merging from PRE (and possibly
> making
> > it more effective).
>
> Sorry to butt in, but it might be little bit relevant and caught my
> attention.
>
> I've got this SH patch sitting around
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543
>
> The idea is basically to run an additional loop pass after combine and
> split1.  The main purpose is to hoist constant loads out of loops. Such
> constant loads might be formed (in this particular case) during combine
> transformations.
>
> The patch adds a new file gcc/config/sh/sh_loop.cc, which has some boiler-
> plate code copy pasted from other places to get the loop pass setup and
> going.
>
> Any thoughts on this way of doing it?
>

I have been looking at a similar issue on aarch64 for a few cases, csinc
and nand. What I decided to do for nand was not depend on combine in the
end and create a new infrastructure to expand better to rtl from gimple and
maybe even have target specific pattern matching on the gimple level. So
the constant is not part of the other instruction.

I should have a write up/first draft of an implementation by August time
frame or so. The write up will most likely be earlier.

Thanks,
Andrew



>
> Best regards,
> Oleg Endo
>


Re: [PATCH] rs6000: Don't clobber return value when eh_return called [PR114846]

2024-05-15 Thread Andrew Pinski
On Thu, May 16, 2024, 4:09 AM Kewen.Lin  wrote:

> Hi,
>
> As the associated test case in PR114846 shows, currently
> with eh_return involved some register restoring for EH
> RETURN DATA in epilogue can clobber the one which holding
> the return value.  Referring to the existing handlings in
> some other targets, this patch makes eh_return expander
> call one new define_insn_and_split eh_return_internal which
> directly calls rs6000_emit_epilogue with epilogue_type
> EPILOGUE_TYPE_EH_RETURN instead of the previous treating
> normal return with crtl->calls_eh_return specially.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
>
> I'm going to push this next week if no objections.
>


Thanks for fixing this for powerpc. I hope my patch for aarch64 gets
reviewed soon and it will contain many more testcases. Hopefully someone
will fix the arm target too.

Thanks,
Andrew



> BR,
> Kewen
> -
> PR target/114846
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
> EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
> now, adjust the relevant handlings on it.
> * config/rs6000/rs6000.md (eh_return expander): Append by calling
> gen_eh_return_internal and emit_barrier.
> (eh_return_internal): New define_insn_and_split, call function
> rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr114846.c: New test.
> ---
>  gcc/config/rs6000/rs6000-logue.cc   |  7 +++
>  gcc/config/rs6000/rs6000.md | 15 +++
>  gcc/testsuite/gcc.target/powerpc/pr114846.c | 20 
>  3 files changed, 38 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114846.c
>
> diff --git a/gcc/config/rs6000/rs6000-logue.cc
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..bd5d56ba002 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4308,9 +4308,6 @@ rs6000_emit_epilogue (enum epilogue_type
> epilogue_type)
>
>rs6000_stack_t *info = rs6000_stack_info ();
>
> -  if (epilogue_type == EPILOGUE_TYPE_NORMAL && crtl->calls_eh_return)
> -epilogue_type = EPILOGUE_TYPE_EH_RETURN;
> -
>int strategy = info->savres_strategy;
>bool using_load_multiple = !!(strategy & REST_MULTIPLE);
>bool restoring_GPRs_inline = !!(strategy & REST_INLINE_GPRS);
> @@ -4788,7 +4785,9 @@ rs6000_emit_epilogue (enum epilogue_type
> epilogue_type)
>
>/* In the ELFv2 ABI we need to restore all call-saved CR fields from
>   *separate* slots if the routine calls __builtin_eh_return, so
> - that they can be independently restored by the unwinder.  */
> + that they can be independently restored by the unwinder.  Since
> + it is for CR fields restoring, it should be done for any epilogue
> + types (not EPILOGUE_TYPE_EH_RETURN specific).  */
>if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return)
>  {
>int i, cr_off = info->ehcr_offset;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ac5651d7420..d4120c3b9ce 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -14281,6 +14281,8 @@ (define_expand "eh_return"
>""
>  {
>emit_insn (gen_eh_set_lr (Pmode, operands[0]));
> +  emit_jump_insn (gen_eh_return_internal ());
> +  emit_barrier ();
>DONE;
>  })
>
> @@ -14297,6 +14299,19 @@ (define_insn_and_split "@eh_set_lr_"
>DONE;
>  })
>
> +(define_insn_and_split "eh_return_internal"
> +  [(eh_return)]
> +  ""
> +  "#"
> +  "epilogue_completed"
> +  [(const_int 0)]
> +{
> +  if (!TARGET_SCHED_PROLOG)
> +emit_insn (gen_blockage ());
> +  rs6000_emit_epilogue (EPILOGUE_TYPE_EH_RETURN);
> +  DONE;
> +})
> +
>  (define_insn "prefetch"
>[(prefetch (match_operand 0 "indexed_or_indirect_address" "a")
>  (match_operand:SI 1 "const_int_operand" "n")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr114846.c
> b/gcc/testsuite/gcc.target/powerpc/pr114846.c
> new file mode 100644
> index 000..efe2300b73a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114846.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target builtin_eh_return } */
> +
> +/* Ensure it runs successfully.  */
> +
> +__attribute__ ((noipa))
> +int f (int *a, long offset, void *handler)
> +{
> +  if (*a == 5)
> +return 5;
> +  __builtin_eh_return (offset, handler);
> +}
> +
> +int main ()
> +{
> +  int t = 5;
> +  if (f (, 0, 0) != 5)
> +__builtin_abort ();
> +  return 0;
> +}
> --
> 2.39.3
>


Re: [PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Andrew Pinski
On Wed, May 15, 2024, 12:17 PM Wilco Dijkstra 
wrote:

> Improve costing of ctz - both TARGET_CSSC and vector cases were not
> handled yet.
>
> Passes regress & bootstrap - OK for commit?
>

I should note popcount has a similar issue which I hope to fix next week.
Popcount cost is used during expand so it is very useful to be slightly
more correct.

Thanks,
Andrew



> gcc:
> * config/aarch64/aarch64.cc (aarch64_rtx_costs): Improve CTZ
> costing.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index
> fe13c9a0d4863041eb9101882ea57c2094240d16..2a6f76f4008839bf0aa158504430af9b971c
> 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -14309,10 +14309,24 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int
> outer ATTRIBUTE_UNUSED,
>return false;
>
>  case CTZ:
> -  *cost = COSTS_N_INSNS (2);
> -
> -  if (speed)
> -   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
> +  if (VECTOR_MODE_P (mode))
> +   {
> + *cost = COSTS_N_INSNS (3);
> + if (speed)
> +   *cost += extra_cost->vect.alu * 3;
> +   }
> +  else if (TARGET_CSSC)
> +   {
> + *cost = COSTS_N_INSNS (1);
> + if (speed)
> +   *cost += extra_cost->alu.clz;
> +   }
> +  else
> +   {
> + *cost = COSTS_N_INSNS (2);
> + if (speed)
> +   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
> +   }
>return false;
>
>  case COMPARE:
>
>


[PATCH] tree-cfg: Move the returns_twice check to be last statement only [PR114301]

2024-05-14 Thread Andrew Pinski
When I was checking to making sure that all of the bugs dealing
with the case where gimple_can_duplicate_bb_p would return false was fixed,
I noticed that the code which was checking if a call statement was
returns_twice was checking all call statements rather than just the
last statement. Since calling gimple_call_flags has a small non-zero
overhead due to a few string comparison, removing the uses of it
can have a small performance improvement. In the case of returns_twice
functions calls, will always end the basic-block due to the check in
stmt_can_terminate_bb_p (and others). So checking only the last statement
is a small optimization and will be safe.

Bootstrapped and tested pon x86_64-linux-gnu with no regressions.

PR tree-optimization/114301
gcc/ChangeLog:

* tree-cfg.cc (gimple_can_duplicate_bb_p): Check returns_twice
only on the last call statement rather than all.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-cfg.cc | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index b2d47b72084..7fb7b92966b 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -6495,6 +6495,13 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
&& gimple_call_internal_p (last)
&& gimple_call_internal_unique_p (last))
   return false;
+
+/* Prohibit duplication of returns_twice calls, otherwise associated
+   abnormal edges also need to be duplicated properly.
+   return_twice functions will always be the last statement.  */
+if (is_gimple_call (last)
+   && (gimple_call_flags (last) & ECF_RETURNS_TWICE))
+  return false;
   }
 
   for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
@@ -6502,15 +6509,12 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
 {
   gimple *g = gsi_stmt (gsi);
 
-  /* Prohibit duplication of returns_twice calls, otherwise associated
-abnormal edges also need to be duplicated properly.
-An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 duplicated as part of its group, or not at all.
 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
 group, so the same holds there.  */
   if (is_gimple_call (g)
- && (gimple_call_flags (g) & ECF_RETURNS_TWICE
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+ && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
-- 
2.34.1



Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-13 Thread Andrew Pinski
On Mon, May 13, 2024, 11:41 PM Kees Cook  wrote:

> On Mon, May 13, 2024 at 02:46:32PM -0600, Jeff Law wrote:
> >
> >
> > On 5/13/24 1:48 PM, Qing Zhao wrote:
> > > -Warray-bounds is an important option to enable linux kernal to keep
> > > the array out-of-bound errors out of the source tree.
> > >
> > > However, due to the false positive warnings reported in PR109071
> > > (-Warray-bounds false positive warnings due to code duplication from
> > > jump threading), -Warray-bounds=1 cannot be added on by default.
> > >
> > > Although it's impossible to elinimate all the false positive warnings
> > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > > documentation says "always out of bounds"), we should minimize the
> > > false positive warnings in -Warray-bounds=1.
> > >
> > > The root reason for the false positive warnings reported in PR109071
> is:
> > >
> > > When the thread jump optimization tries to reduce the # of branches
> > > inside the routine, sometimes it needs to duplicate the code and
> > > split into two conditional pathes. for example:
> > >
> > > The original code:
> > >
> > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > {
> > >if (index >= 4)
> > >  warn ();
> > >*ptr = 0;
> > >*val = sg->vals[index];
> > >if (index >= 4)
> > >  warn ();
> > >*ptr = *val;
> > >
> > >return;
> > > }
> > >
> > > With the thread jump, the above becomes:
> > >
> > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > {
> > >if (index >= 4)
> > >  {
> > >warn ();
> > >*ptr = 0;// Code duplications since "warn" does
> return;
> > >*val = sg->vals[index];  // same this line.
> > > // In this path, since it's under the
> condition
> > > // "index >= 4", the compiler knows the
> value
> > > // of "index" is larger then 4, therefore
> the
> > > // out-of-bound warning.
> > >warn ();
> > >  }
> > >else
> > >  {
> > >*ptr = 0;
> > >*val = sg->vals[index];
> > >  }
> > >*ptr = *val;
> > >return;
> > > }
> > >
> > > We can see, after the thread jump optimization, the # of branches
> inside
> > > the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> > > code duplication (which is needed for the correctness of the code), we
> > > got a false positive out-of-bound warning.
> > >
> > > In order to eliminate such false positive out-of-bound warning,
> > >
> > > A. Add one more flag for GIMPLE: is_splitted.
> > > B. During the thread jump optimization, when the basic blocks are
> > > duplicated, mark all the STMTs inside the original and duplicated
> > > basic blocks as "is_splitted";
> > > C. Inside the array bound checker, add the following new heuristic:
> > >
> > > If
> > > 1. the stmt is duplicated and splitted into two conditional paths;
> > > +  2. the warning level < 2;
> > > +  3. the current block is not dominating the exit block
> > > Then not report the warning.
> > >
> > > The false positive warnings are moved from -Warray-bounds=1 to
> > >   -Warray-bounds=2 now.
> > >
> > > Bootstrapped and regression tested on both x86 and aarch64. adjusted
> > >   -Warray-bounds-61.c due to the false positive warnings.
> > >
> > > Let me know if you have any comments and suggestions.
> > This sounds horribly wrong.   In the code above, the warning is correct.
>
> It's not sensible from a user's perspective.
>
> If this doesn't warn:
>
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>*ptr = 0;
>*val = sg->vals[index];
>*ptr = *val;
> }
>
> ... because the value range tracking of "index" spans [INT_MIN,INT_MAX],
> and warnings based on the value range are silenced if they haven't been
> clamped at all. (Otherwise warnings would be produced everywhere: only
> when a limited set of values is known is it useful to produce a warning.)
>
>
> But it makes no sense to warn about:
>
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>if (index >= 4)
>  warn ();
>*ptr = 0;
>*val = sg->vals[index];
>if (index >= 4)
>  warn ();
>*ptr = *val;
> }
>
> Because at "*val = sg->vals[index];" the actual value range tracking for
> index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
> "if" statements is the range tracking [4,INT_MAX].)
>
> However, in the case where jump threading has split the execution flow
> and produced a copy of "*val = sg->vals[index];" where the value range
> tracking for "index" is now [4,INT_MAX], is the warning valid. But it
> is only for that instance. Reporting it for effectively both (there is
> only 1 source line for the array indexing) is misleading because there
> is nothing the user can do about it -- the compiler created the copy and
> then noticed it had a range it could apply to that array index.
>

"there is 

[PATCH] Match: optimize `a == CST & unary(a)` [PR111487]

2024-05-13 Thread Andrew Pinski
This is an expansion of the optimize `a == CST & a`
to handle more than just casts. It adds optimization
for unary.
The patch for binary operators will come later.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111487
gcc/ChangeLog:

* match.pd (tcc_int_unary): New operator list.
(`a == CST & unary(a)`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/and-unary-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd| 12 
 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c | 61 +
 2 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743ae464..3ee28a3d8fc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -57,6 +57,10 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "cfn-operators.pd"
 
+/* integer unary operators that return the same type. */
+(define_operator_list tcc_int_unary
+ abs absu negate bit_not BSWAP POPCOUNT CTZ CLZ PARITY)
+
 /* Define operand lists for math rounding functions {,i,l,ll}FN,
where the versions prefixed with "i" return an int, those prefixed with
"l" return a long and those prefixed with "ll" return a long long.
@@ -5451,6 +5455,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   @2
   { build_zero_cst (type); }))
 
+/* `(a == CST) & unary(a)` can be simplified to `(a == CST) & unary(CST)`. */
+(simplify
+ (bit_and:c (convert@2 (eq @0 INTEGER_CST@1))
+(convert? (tcc_int_unary @3)))
+ (if (bitwise_equal_p (@0, @3))
+  (with { tree  inner_type = TREE_TYPE (@3); }
+   (bit_and @2 (convert (tcc_int_unary (convert:inner_type @1)))
+
 /* Optimize
# x_5 in range [cst1, cst2] where cst2 = cst1 + 1
x_5 == cstN ? cst4 : cst3
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
new file mode 100644
index 000..c157bc11b00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-raw -fdump-tree-optimized-raw" } */
+/* unary part of PR tree-optimization/111487 */
+
+int abs1(int a)
+{
+  int b = __builtin_abs(a);
+  return (a == 1) & b;
+}
+int absu1(int a)
+{
+  int b;
+  b = a > 0 ? -a:a;
+  b = -b;
+return (a == 1) & b;
+}
+
+int bswap1(int a)
+{
+  int b = __builtin_bswap32(a);
+  return (a == 1) & b;
+}
+
+int ctz1(int a)
+{
+  int b = __builtin_ctz(a);
+  return (a == 1) & b;
+}
+int pop1(int a)
+{
+  int b = __builtin_popcount(a);
+  return (a == 1) & b;
+}
+int neg1(int a)
+{
+  int b = -(a);
+  return (a == 1) & b;
+}
+int not1(int a)
+{
+  int b = ~(a);
+  return (a == 1) & b;
+}
+int partity1(int a)
+{
+  int b = __builtin_parity(a);
+  return (a == 1) & b;
+}
+
+
+/* We should optimize out the unary operator for each.
+   For ctz we can optimize directly to `return 0`.
+   For bswap1 and not1, we can do the same but not until after forwprop1.  */
+/* { dg-final { scan-tree-dump-times "eq_expr, " 7 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "eq_expr, " 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "abs_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "absu_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "bit_not_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "negate_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "gimple_call <"  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "bit_and_expr,  "  "forwprop1" } } */
-- 
2.34.1



Re: [COMMITTED/13] Fix PR 110386: backprop vs ABSU_EXPR

2024-05-09 Thread Andrew Pinski
On Sun, Oct 1, 2023 at 12:28 PM Andrew Pinski  wrote:
>
> From: Andrew Pinski 
>
> The issue here is that when backprop tries to go
> and strip sign ops, it skips over ABSU_EXPR but
> ABSU_EXPR not only does an ABS, it also changes the
> type to unsigned.
> Since strip_sign_op_1 is only supposed to strip off
> sign changing operands and not ones that change types,
> removing ABSU_EXPR here is correct. We don't handle
> nop conversions so this does cause any missed optimizations either.
>
> Committed to the GCC 13 branch after bootstrapped and
> tested on x86_64-linux-gnu with no regressions.

And to the GCC 12 and 11 branches too.

Thanks,
Andrew

>
> PR tree-optimization/110386
>
> gcc/ChangeLog:
>
> * gimple-ssa-backprop.cc (strip_sign_op_1): Remove ABSU_EXPR.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/pr110386-1.c: New test.
> * gcc.c-torture/compile/pr110386-2.c: New test.
>
> (cherry picked from commit 2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f)
> ---
>  gcc/gimple-ssa-backprop.cc   |  1 -
>  gcc/testsuite/gcc.c-torture/compile/pr110386-1.c |  9 +
>  gcc/testsuite/gcc.c-torture/compile/pr110386-2.c | 11 +++
>  3 files changed, 20 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
>
> diff --git a/gcc/gimple-ssa-backprop.cc b/gcc/gimple-ssa-backprop.cc
> index 65a65590017..dcb15ed4f61 100644
> --- a/gcc/gimple-ssa-backprop.cc
> +++ b/gcc/gimple-ssa-backprop.cc
> @@ -694,7 +694,6 @@ strip_sign_op_1 (tree rhs)
>  switch (gimple_assign_rhs_code (assign))
>{
>case ABS_EXPR:
> -  case ABSU_EXPR:
>case NEGATE_EXPR:
> return gimple_assign_rhs1 (assign);
>
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
> new file mode 100644
> index 000..4fcc977ad16
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
> @@ -0,0 +1,9 @@
> +
> +int f(int a)
> +{
> +int c = c < 0 ? c : -c;
> +c = -c;
> +unsigned b =  c;
> +unsigned t = b*a;
> +return t*t;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
> new file mode 100644
> index 000..c60e1b6994b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-mavx" } */
> +
> +#include 
> +
> +__m128i do_stuff(__m128i XMM0) {
> +   __m128i ABS0 = _mm_abs_epi32(XMM0);
> +   __m128i MUL0 = _mm_mullo_epi32(ABS0, XMM0);
> +   __m128i MUL1 = _mm_mullo_epi32(MUL0, MUL0);
> +   return MUL1;
> +}
> --
> 2.39.3
>


Re: Ping [PATCH/RFC] target, hooks: Allow a target to trap on unreachable [PR109267].

2024-05-08 Thread Andrew Pinski
On Wed, May 8, 2024 at 12:37 PM Iain Sandoe  wrote:
>
> Hi Folks,
>
> I’d like to land a viable solution to this issue if possible, (it is a show-
> stopper for the aarch64-darwin development branch).
>
> > On 9 Apr 2024, at 14:55, Iain Sandoe  wrote:
> >
> > So far, tested lightly on aarch64-darwin; if this is acceptable then
> > it will be possible to back out of the ad hoc fixes used on x86 and
> > powerpc darwin.
> > Comments welcome, thanks,
>
> @Andrew - you were also (at one stage) talking about some ideas about
> how to handle this is in the middle end.
> Is that something you are likely to have time to do?
> Would it still be reasonable to have a target hook to control the behaviour.
> (the implementation below allows one to make the effect per TU)

I won't be able to implement the idea until July at earliest though.

Thanks,
Andrew

>
>
> > Iain
> >
> > --- 8< ---
> >
> >
> > In the PR cited case a target linker cannot handle enpty FDEs,
> > arguably this is a linker bug - but in some cases we might still
> > wish to work around it.
> >
> > In the case of Darwin, the ABI does not allow two global symbols
> > to have the same address, so that emitting empty functions has
> > potential (almost guarantee) to break ABI.
> >
> > This patch allows a target to ask that __builtin_unreachable is
> > expanded in the same way as __builtin_trap (either to a trap
> > instruction or to abort() if there is no such insn).
> >
> > This means that the middle end's use of unreachability for
> > optimisation should not be altered.
> >
> > __builtin_unreachble is currently expanded to a barrier and
> > __builtin_trap is expanded to a trap insn + a barrier so that it
> > seems we should not be unduly affecting RTL optimisations.
> >
> > For Darwin, we enable this by default, but allow it to be disabled
> > per TU using -mno-unreachable-traps.
> >
> >   PR middle-end/109267
> >
> > gcc/ChangeLog:
> >
> >   * builtins.cc (expand_builtin_unreachable): Allow for
> >   a target to expand this as a trap.
> >   * config/darwin-protos.h (darwin_unreachable_traps_p): New.
> >   * config/darwin.cc (darwin_unreachable_traps_p): New.
> >   * config/darwin.h (TARGET_UNREACHABLE_SHOULD_TRAP): New.
> >   * config/darwin.opt (munreachable-traps): New.
> >   * doc/invoke.texi: Document -munreachable-traps.
> >   * doc/tm.texi: Regenerate.
> >   * doc/tm.texi.in: Document TARGET_UNREACHABLE_SHOULD_TRAP.
> >   * target.def (TARGET_UNREACHABLE_SHOULD_TRAP): New hook.
> >
> > Signed-off-by: Iain Sandoe 
> > ---
> > gcc/builtins.cc|  7 +++
> > gcc/config/darwin-protos.h |  1 +
> > gcc/config/darwin.cc   |  7 +++
> > gcc/config/darwin.h|  4 
> > gcc/config/darwin.opt  |  4 
> > gcc/doc/invoke.texi|  7 ++-
> > gcc/doc/tm.texi|  5 +
> > gcc/doc/tm.texi.in |  2 ++
> > gcc/target.def | 10 ++
> > 9 files changed, 46 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index f8d94c4b435..13f321b6be6 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -5929,6 +5929,13 @@ expand_builtin_trap (void)
> > static void
> > expand_builtin_unreachable (void)
> > {
> > +  /* If the target wants a trap in place of the fall-through, use that.  */
> > +  if (targetm.unreachable_should_trap ())
> > +{
> > +  expand_builtin_trap ();
> > +  return;
> > +}
> > +
> >   /* Use gimple_build_builtin_unreachable or builtin_decl_unreachable
> >  to avoid this.  */
> >   gcc_checking_assert (!sanitize_flags_p (SANITIZE_UNREACHABLE));
> > diff --git a/gcc/config/darwin-protos.h b/gcc/config/darwin-protos.h
> > index b67e05264e1..48a32b2ccc2 100644
> > --- a/gcc/config/darwin-protos.h
> > +++ b/gcc/config/darwin-protos.h
> > @@ -124,6 +124,7 @@ extern void darwin_enter_string_into_cfstring_table 
> > (tree);
> > extern void darwin_asm_output_anchor (rtx symbol);
> > extern bool darwin_use_anchors_for_symbol_p (const_rtx symbol);
> > extern bool darwin_kextabi_p (void);
> > +extern bool darwin_unreachable_traps_p (void);
> > extern void darwin_override_options (void);
> > extern void darwin_patch_builtins (void);
> > extern void darwin_rename_builtins (void);
> > diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
> > index dcfccb4952a..018547d09c6 100644
> > --- a/gcc/config/darwin.cc
> > +++ b/gcc/config/darwin.cc
> > @@ -3339,6 +3339,13 @@ darwin_kextabi_p (void) {
> >   return flag_apple_kext;
> > }
> >
> > +/* True, iff we want to map __builtin_unreachable to a trap.  */
> > +
> > +bool
> > +darwin_unreachable_traps_p (void) {
> > +  return darwin_unreachable_traps;
> > +}
> > +
> > void
> > darwin_override_options (void)
> > {
> > diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
> > index d335ffe7345..17f41cf30ef 100644
> > --- a/gcc/config/darwin.h
> > +++ b/gcc/config/darwin.h
> > @@ -1225,6 +1225,10 @@ void add_framework_path (char *);
> > 

Re: [COMMITTED] warn-access: Fix handling of unnamed types [PR109804]

2024-05-08 Thread Andrew Pinski
On Thu, Feb 22, 2024 at 9:28 AM Andrew Pinski  wrote:
>
> This looks like an oversight of handling DEMANGLE_COMPONENT_UNNAMED_TYPE.
> DEMANGLE_COMPONENT_UNNAMED_TYPE only has the u.s_number.number set while
> the code expected newc.u.s_binary.left would be valid.
> So this treats DEMANGLE_COMPONENT_UNNAMED_TYPE like we treat function 
> paramaters
> (DEMANGLE_COMPONENT_FUNCTION_PARAM) and template paramaters 
> (DEMANGLE_COMPONENT_TEMPLATE_PARAM).
>
> Note the code in the demangler does this when it sets 
> DEMANGLE_COMPONENT_UNNAMED_TYPE:
>   ret->type = DEMANGLE_COMPONENT_UNNAMED_TYPE;
>   ret->u.s_number.number = num;
>
> Committed as obvious after bootstrap/test on x86_64-linux-gnu
> Will commit to other branches in a few days.

Now committed (with the testcase fix backported too) to the GCC 12 branch.

Thanks,
Andrew Pinski

>
> PR tree-optimization/109804
>
> gcc/ChangeLog:
>
> * gimple-ssa-warn-access.cc (new_delete_mismatch_p): Handle
> DEMANGLE_COMPONENT_UNNAMED_TYPE.
>
> gcc/testsuite/ChangeLog:
>
>     * g++.dg/warn/Wmismatched-new-delete-8.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-ssa-warn-access.cc |  1 +
>  .../g++.dg/warn/Wmismatched-new-delete-8.C| 42 +++
>  2 files changed, 43 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
>
> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
> index cd083ab2237..dedaae27b31 100644
> --- a/gcc/gimple-ssa-warn-access.cc
> +++ b/gcc/gimple-ssa-warn-access.cc
> @@ -1701,6 +1701,7 @@ new_delete_mismatch_p (const demangle_component ,
>
>  case DEMANGLE_COMPONENT_FUNCTION_PARAM:
>  case DEMANGLE_COMPONENT_TEMPLATE_PARAM:
> +case DEMANGLE_COMPONENT_UNNAMED_TYPE:
>return newc.u.s_number.number != delc.u.s_number.number;
>
>  case DEMANGLE_COMPONENT_CHARACTER:
> diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C 
> b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
> new file mode 100644
> index 000..0ddc056c6df
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
> @@ -0,0 +1,42 @@
> +/* PR tree-optimization/109804 */
> +/* { dg-do compile { target c++11 } } */
> +/* { dg-options "-Wall" } */
> +
> +/* Here we used to ICE in new_delete_mismatch_p because
> +   we didn't handle unnamed types from the demangler 
> (DEMANGLE_COMPONENT_UNNAMED_TYPE). */
> +
> +template 
> +static inline T * construct_at(void *at, ARGS && args)
> +{
> + struct Placeable : T
> + {
> +  Placeable(ARGS && args) : T(args) { }
> +  void * operator new (long unsigned int, void *ptr) { return ptr; }
> +  void operator delete (void *, void *) { }
> + };
> + return new (at) Placeable(static_cast(args));
> +}
> +template 
> +struct Reconstructible
> +{
> +  char _space[sizeof(MT)];
> +  Reconstructible() { }
> +};
> +template 
> +struct Constructible : Reconstructible
> +{
> + Constructible(){}
> +};
> +struct A { };
> +struct B
> +{
> + Constructible a { };
> + B(int) { }
> +};
> +Constructible b { };
> +void f()
> +{
> +  enum { ENUM_A = 1 };
> +  enum { ENUM_B = 1 };
> +  construct_at(b._space, ENUM_B);
> +}
> --
> 2.43.0
>


Re: [COMMITTED/13] Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

2024-05-08 Thread Andrew Pinski
On Sun, Oct 1, 2023 at 1:23 PM Andrew Pinski  wrote:
>
> From: Andrew Pinski 
>
> The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
> the comparison to see if the transformation could be done was using the
> wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
> the outer value, it was comparing the inner to the value used in the 
> comparison
> which was wrong.
>
> Committed to GCC 13 branch after bootstrapped and tested on x86_64-linux-gnu.

Committed also to GCC 12 and 11 branches.

>
> gcc/ChangeLog:
>
> PR tree-optimization/111331
> * tree-ssa-phiopt.cc (minmax_replacement):
> Fix the LE/GE comparison for the
> `(a CMP CST1) ? max : a` optimization.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/111331
> * gcc.c-torture/execute/pr111331-1.c: New test.
> * gcc.c-torture/execute/pr111331-2.c: New test.
> * gcc.c-torture/execute/pr111331-3.c: New test.
>
> (cherry picked from commit 30e6ee074588bacefd2dfe745b188bb20c81fe5e)
> ---
>  .../gcc.c-torture/execute/pr111331-1.c| 17 +
>  .../gcc.c-torture/execute/pr111331-2.c| 19 +++
>  .../gcc.c-torture/execute/pr111331-3.c| 15 +++
>  gcc/tree-ssa-phiopt.cc|  8 
>  4 files changed, 55 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
> new file mode 100644
> index 000..4c7f4fdbaa9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
> @@ -0,0 +1,17 @@
> +int a;
> +int b;
> +int c(int d, int e, int f) {
> +  if (d < e)
> +return e;
> +  if (d > f)
> +return f;
> +  return d;
> +}
> +int main() {
> +  int g = -1;
> +  a = c(b + 30, 29, g + 29);
> +  volatile t = a;
> +  if (t != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
> new file mode 100644
> index 000..5c677f2caa9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
> @@ -0,0 +1,19 @@
> +
> +int a;
> +int b;
> +
> +int main() {
> +  int d = b+30;
> +  {
> +int t;
> +if (d < 29)
> +  t =  29;
> +else
> +  t = (d > 28) ? 28 : d;
> +a = t;
> +  }
> +  volatile int t = a;
> +  if (a != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
> new file mode 100644
> index 000..213d9bdd539
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
> @@ -0,0 +1,15 @@
> +int a;
> +int b;
> +
> +int main() {
> +  int d = b+30;
> +  {
> +int t;
> +t = d < 29 ? 29 : ((d > 28) ? 28 : d);
> +a = t;
> +  }
> +  volatile int t = a;
> +  if (a != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index a7ab6ce4ad9..c3d78d1400b 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -2270,7 +2270,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND <= LARGER.  */
>   if (!integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node,
> - bound, larger)))
> + bound, arg_false)))
> return false;
> }
>   else if (operand_equal_for_phi_arg_p (arg_false, smaller)
> @@ -2301,7 +2301,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND >= SMALLER.  */
>   if (!integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node,
> - bound, smaller)))
> + bound, arg_false)))
> return false;
> }
>   else
> @@ -2341,7 +2341,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND >= LARGER.  */
>   if (!integer_nonzerop (fold_build

Re: [COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

2024-05-08 Thread Andrew Pinski
On Mon, Mar 11, 2024 at 11:41 PM Andrew Pinski (QUIC)
 wrote:
>
> > -Original Message-
> > From: Andrew Pinski (QUIC) 
> > Sent: Sunday, March 10, 2024 7:58 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Andrew Pinski (QUIC) 
> > Subject: [COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for
> > NaNs [PR95351]
> >
> > The problem here is that merge_truthop_with_opposite_arm would use the
> > type of the result of the comparison rather than the operands of the
> > comparison to figure out if we are honoring NaNs.
> > This fixes that oversight and now we get the correct results in this case.
> >
> > Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
>
> Committed to the GCC 13 branch too.

And the GCC 12 and 11 branches too.


>
> Thanks,
> Andrew
>
> >
> >   PR middle-end/95351
> >
> > gcc/ChangeLog:
> >
> >   * fold-const.cc (merge_truthop_with_opposite_arm): Use
> >   the type of the operands of the comparison and not the type
> >   of the comparison.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/float_opposite_arm-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/fold-const.cc   |  3 ++-
> >  gcc/testsuite/gcc.dg/float_opposite_arm-1.c | 17 +
> >  2 files changed, 19 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index
> > 43105d20be3..299c22bf391 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -6420,7 +6420,6 @@ static tree
> >  merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
> >bool rhs_only)
> >  {
> > -  tree type = TREE_TYPE (cmpop);
> >enum tree_code code = TREE_CODE (cmpop);
> >enum tree_code truthop_code = TREE_CODE (op);
> >tree lhs = TREE_OPERAND (op, 0);
> > @@ -6436,6 +6435,8 @@ merge_truthop_with_opposite_arm (location_t
> > loc, tree op, tree cmpop,
> >if (TREE_CODE_CLASS (code) != tcc_comparison)
> >  return NULL_TREE;
> >
> > +  tree type = TREE_TYPE (TREE_OPERAND (cmpop, 0));
> > +
> >if (rhs_code == truthop_code)
> >  {
> >tree newrhs = merge_truthop_with_opposite_arm (loc, rhs, cmpop,
> > rhs_only); diff --git a/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > new file mode 100644
> > index 000..d2dbff35066
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fdump-tree-original -fdump-tree-optimized" } */
> > +/* { dg-add-options ieee } */
> > +/* PR middle-end/95351 */
> > +
> > +int Foo(double possiblyNAN, double b, double c) {
> > +return (possiblyNAN <= 2.0) || ((possiblyNAN  > 2.0) && (b > c)); }
> > +
> > +/* Make sure we don't remove either >/<=  */
> > +
> > +/* { dg-final { scan-tree-dump "possiblyNAN > 2.0e.0" "original" } } */
> > +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. > 2.0e.0"
> > +"optimized" } } */
> > +
> > +/* { dg-final { scan-tree-dump "possiblyNAN <= 2.0e.0" "original" } }
> > +*/
> > +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. <= 2.0e.0"
> > +"optimized" } } */
> > --
> > 2.43.0
>


Re: [PATCH 1/2] Fix PR 110066: crash with -pg -static on riscv

2024-05-08 Thread Andrew Pinski
On Sat, Jul 22, 2023 at 8:36 PM Kito Cheng via Gcc-patches
 wrote:
>
> OK for trunk, thanks:)

I have now backported it to 13 branch.

Thanks,
Andrew


>
> Andrew Pinski via Gcc-patches  於 2023年7月23日 週日
> 09:07 寫道:
>
> > The problem -fasynchronous-unwind-tables is on by default for riscv linux
> > We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__
> > point
> > to .eh_frame data from crtbeginT.o instead of the user-defined object
> > during static linking.
> >
> > This turns it off.
> >
> > OK?
> >
> > libgcc/ChangeLog:
> >
> > * config.host (riscv*-*-linux*): Add t-crtstuff to tmake_file.
> > (riscv*-*-freebsd*): Likewise.
> > * config/riscv/t-crtstuff: New file.
> > ---
> >  libgcc/config.host | 4 ++--
> >  libgcc/config/riscv/t-crtstuff | 5 +
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >  create mode 100644 libgcc/config/riscv/t-crtstuff
> >
> > diff --git a/libgcc/config.host b/libgcc/config.host
> > index 9d7212028d0..c94d69d84b7 100644
> > --- a/libgcc/config.host
> > +++ b/libgcc/config.host
> > @@ -1304,12 +1304,12 @@ pru-*-*)
> > tm_file="$tm_file pru/pru-abi.h"
> > ;;
> >  riscv*-*-linux*)
> > -   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp
> > riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
> > +   tmake_file="${tmake_file} riscv/t-crtstuff
> > riscv/t-softfp${host_address} t-softfp riscv/t-elf
> > riscv/t-elf${host_address} t-slibgcc-libgcc"
> > extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o
> > crtendS.o crtbeginT.o"
> > md_unwind_header=riscv/linux-unwind.h
> > ;;
> >  riscv*-*-freebsd*)
> > -   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp
> > riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
> > +   tmake_file="${tmake_file} riscv/t-crtstuff
> > riscv/t-softfp${host_address} t-softfp riscv/t-elf
> > riscv/t-elf${host_address} t-slibgcc-libgcc"
> > extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o
> > crtendS.o crtbeginT.o"
> > ;;
> >  riscv*-*-*)
> > diff --git a/libgcc/config/riscv/t-crtstuff
> > b/libgcc/config/riscv/t-crtstuff
> > new file mode 100644
> > index 000..685d11b3e66
> > --- /dev/null
> > +++ b/libgcc/config/riscv/t-crtstuff
> > @@ -0,0 +1,5 @@
> > +# -fasynchronous-unwind-tables -funwind-tables is on by default for riscv
> > linux
> > +# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> > +# to .eh_frame data from crtbeginT.o instead of the user-defined object
> > +# during static linking.
> > +CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > --
> > 2.39.1
> >
> >


[PATCH] match: `a CMP nonnegative ? a : ABS` simplified to just `ABS` [PR112392]

2024-05-07 Thread Andrew Pinski
We can optimize `a == nonnegative ? a : ABS`, `a > nonnegative ? a : ABS`
and `a >= nonnegative ? a : ABS` into `ABS`. This allows removal of
some extra comparison and extra conditional moves in some cases.
I don't remember where I had found though but it is simple to add so
let's add it.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note I have a secondary pattern for the equal case as either a or nonnegative
could be used.

PR tree-optimization/112392

gcc/ChangeLog:

* match.pd (`x CMP nonnegative ? x : ABS`): New pattern;
where CMP is ==, > and >=.
(`x CMP nonnegative@y ? y : ABS`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-41.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd   | 15 ++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c | 34 ++
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 03a03c31233..07e743ae464 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5876,6 +5876,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (convert (absu:utype @0)))
 @3
 
+/* X >  Positive ? X : ABS(X) -> ABS(X) */
+/* X >= Positive ? X : ABS(X) -> ABS(X) */
+/* X == Positive ? X : ABS(X) -> ABS(X) */
+(for cmp (eq gt ge)
+ (simplify
+  (cond (cmp:c @0 tree_expr_nonnegative_p@1) @0 (abs@3 @0))
+  (if (INTEGRAL_TYPE_P (type))
+   @3)))
+
+/* X == Positive ? Positive : ABS(X) -> ABS(X) */
+(simplify
+ (cond (eq:c @0 tree_expr_nonnegative_p@1) @1 (abs@3 @0))
+ (if (INTEGRAL_TYPE_P (type))
+  @3))
+
 /* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
new file mode 100644
index 000..9774e283a7b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiopt1" } */
+/* PR tree-optimization/112392 */
+
+int feq_1(int a, unsigned char b)
+{
+  int absb = b;
+  if (a == absb)  return absb;
+  return a > 0 ? a : -a;
+}
+int feq_2(int a, unsigned char b)
+{
+  int absb = b;
+  if (a == absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+int fgt(int a, unsigned char b)
+{
+  int absb = b;
+  if (a > absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+int fge(int a, unsigned char b)
+{
+  int absb = b;
+  if (a >= absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+
+/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 4 "phiopt1" } } */
-- 
2.43.0



Re: [PATCH] MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) to match

2024-05-07 Thread Andrew Pinski
On Tue, May 7, 2024 at 1:45 PM Jeff Law  wrote:
>
>
>
> On 4/30/24 9:21 PM, Andrew Pinski wrote:
> > This adds a few more of what is currently done in phiopt's value_replacement
> > to match. I noticed this when I was hooking up phiopt's value_replacement
> > code to use match and disabling the old code. But this can be done
> > independently from the hooking up phiopt's value_replacement as phiopt
> > is already hooked up for simplified versions already.
> >
> > /* a != 0 ? a / b : 0  -> a / b iff b is nonzero. */
> > /* a != 0 ? a * b : 0 -> a * b */
> > /* a != 0 ? a & b : 0 -> a & b */
> >
> > We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` which
> > uses that form.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> >   PR treee-optimization/114894
> >
> > gcc/ChangeLog:
> >
> >   * match.pd (`a != 0 ? a / b : 0`): New pattern.
> >   (`a != 0 ? a * b : 0`): New pattern.
> >   (`a != 0 ? a & b : 0`): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/tree-ssa/phi-opt-value-5.c: New test.
> Is there any need to also handle the reversed conditional with the arms
> swapped?If not, this is fine as-is.  If yes, then fine with the
> obvious generalization.

The answer is yes and no. While the PHI-OPT pass will try both cases
but the other (all?) passes does not. This is something I have been
thinking about trying to solve in a generic way instead of adding many
more patterns here. I will start working on that in the middle of
June.
Most of the time cond patterns in match are used is inside phiopt so
having the revered conditional has not been on high on my priority but
with VRP and scev and match (itself) producing more cond_expr, we
should fix this once and for all for GCC 15.

Thanks,
Andrew Pinski

>
> jeff
>


[PATCH] Mention that some options are turned on by `-Ofast` in their descriptions [PR97263]

2024-05-06 Thread Andrew Pinski
Like was done for -ffast-math in r0-105946-ga570fc16fa8056, we should
document that -Ofast enables -fmath-errno, -funsafe-math-optimizations,
-finite-math-only, -fno-trapping-math in their documentation.

Note this changes the stronger "must not" to be "is not" for -fno-trapping-math
since we do enable it for -Ofast already.

OK?

gcc/ChangeLog:

PR middle-end/97263
* doc/invoke.texi(fmath-errno): Document it is turned on
with -Ofast.
(funsafe-math-optimizations): Likewise.
(ffinite-math-only): Likewise.
(fno-trapping-math): Likewise and use less strong language.

Signed-off-by: Andrew Pinski 
---
 gcc/doc/invoke.texi | 41 ++---
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9456ced468a..14ff4d25da7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14996,11 +14996,12 @@ with a single instruction, e.g., @code{sqrt}.  A 
program that relies on
 IEEE exceptions for math error handling may want to use this flag
 for speed while maintaining IEEE arithmetic compatibility.
 
-This option is not turned on by any @option{-O} option since
-it can result in incorrect output for programs that depend on
-an exact implementation of IEEE or ISO rules/specifications for
-math functions. It may, however, yield faster code for programs
-that do not require the guarantees of these specifications.
+This option is not turned on by any @option{-O} option  besides
+@option{-Ofast} since it can result in incorrect output for
+programs that depend on an exact implementation of IEEE or
+ISO rules/specifications for math functions. It may, however,
+yield faster code for programs that do not require the guarantees
+of these specifications.
 
 The default is @option{-fmath-errno}.
 
@@ -15017,11 +15018,12 @@ ANSI standards.  When used at link time, it may 
include libraries
 or startup files that change the default FPU control word or other
 similar optimizations.
 
-This option is not turned on by any @option{-O} option since
-it can result in incorrect output for programs that depend on
-an exact implementation of IEEE or ISO rules/specifications for
-math functions. It may, however, yield faster code for programs
-that do not require the guarantees of these specifications.
+This option is not turned on by any @option{-O} option besides
+@option{-Ofast} since it can result in incorrect output
+for programs that depend on an exact implementation of IEEE
+or ISO rules/specifications for math functions. It may, however,
+yield faster code for programs that do not require the guarantees
+of these specifications.
 Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
 @option{-fassociative-math} and @option{-freciprocal-math}.
 
@@ -15061,11 +15063,12 @@ The default is @option{-fno-reciprocal-math}.
 Allow optimizations for floating-point arithmetic that assume
 that arguments and results are not NaNs or +-Infs.
 
-This option is not turned on by any @option{-O} option since
-it can result in incorrect output for programs that depend on
-an exact implementation of IEEE or ISO rules/specifications for
-math functions. It may, however, yield faster code for programs
-that do not require the guarantees of these specifications.
+This option is not turned on by any @option{-O} option besides
+@option{-Ofast} since it can result in incorrect output
+for programs that depend on an exact implementation of IEEE or
+ISO rules/specifications for math functions. It may, however,
+yield faster code for programs that do not require the guarantees
+of these specifications.
 
 The default is @option{-fno-finite-math-only}.
 
@@ -15089,10 +15092,10 @@ underflow, inexact result and invalid operation.  
This option requires
 that @option{-fno-signaling-nans} be in effect.  Setting this option may
 allow faster code if one relies on ``non-stop'' IEEE arithmetic, for example.
 
-This option should never be turned on by any @option{-O} option since
-it can result in incorrect output for programs that depend on
-an exact implementation of IEEE or ISO rules/specifications for
-math functions.
+This option is not turned on by any @option{-O} option besides
+@option{-Ofast} since it can result in incorrect output for programs
+that depend on an exact implementation of IEEE or ISO rules/specifications
+for math functions.
 
 The default is @option{-ftrapping-math}.
 
-- 
2.43.0



[COMMITTED] aarch64: Fix gcc.target/aarch64/sve/loop_add_6.c for LLP64 targets

2024-05-06 Thread Andrew Pinski
Even though the aarch64-mingw32 support has not been committed yet,
we should fix some of the testcases. In this case 
gcc.target/aarch64/sve/loop_add_6.c
is easy to fix. We should use __SIZETYPE__ instead of `unsigned long` for the 
variables
that will be used for pointer plus.

Committed as obvious after a quick test on aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

PR testsuite/114177
* gcc.target/aarch64/sve/loop_add_6.c: Use __SIZETYPE__ instead
of `unsigned long` for index and offset variables.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.target/aarch64/sve/loop_add_6.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/loop_add_6.c 
b/gcc/testsuite/gcc.target/aarch64/sve/loop_add_6.c
index e7416ebcded..a530998f54b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/loop_add_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/loop_add_6.c
@@ -5,8 +5,8 @@ double __GIMPLE (ssa, startwith("loop"))
 neg_xi (double *x)
 {
   int i;
-  long unsigned int index;
-  long unsigned int offset;
+  __SIZETYPE__ index;
+  __SIZETYPE__ offset;
   double * xi_ptr;
   double xi;
   double neg_xi;
@@ -20,8 +20,8 @@ neg_xi (double *x)
   res_1 = __PHI (__BB5: 0.0, __BB3: res_2);
   i_4 = __PHI (__BB5: 0, __BB3: i_5);
   ivtmp_6 = __PHI (__BB5: 100U, __BB3: ivtmp_7);
-  index = (long unsigned int) i_4;
-  offset = index * 8UL;
+  index = (__SIZETYPE__ ) i_4;
+  offset = index * _Literal (__SIZETYPE__) 8;
   xi_ptr = x_8(D) + offset;
   xi = *xi_ptr;
   neg_xi = -xi;
-- 
2.43.0



[PATCH] aarch64: Add fcsel to cmov integer and csel to float cmov [PR98477]

2024-05-06 Thread Andrew Pinski
This patch adds an alternative to the integer cmov and one to floating
point cmov so we avoid in some more moving

PR target/98477

gcc/ChangeLog:

* config/aarch64/aarch64.md (*cmov_insn[GPI]): Add 'w'
alternative.
(*cmov_insn[GPF]): Add 'r' alternative.
* config/aarch64/iterators.md (wv): New mode attr.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/csel_1.c: New test.
* gcc.target/aarch64/fcsel_2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md  | 13 +++
 gcc/config/aarch64/iterators.md|  4 
 gcc/testsuite/gcc.target/aarch64/csel_1.c  | 27 ++
 gcc/testsuite/gcc.target/aarch64/fcsel_2.c | 20 
 4 files changed, 59 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/csel_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fcsel_2.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2bdd443e71d..a6cedd0f1b8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4404,6 +4404,7 @@ (define_insn "*cmov_insn"
  [ r, Ui1 , rZ  ; csel] csinc\t%0, %4, zr, %M1
  [ r, UsM , UsM ; mov_imm ] mov\t%0, -1
  [ r, Ui1 , Ui1 ; mov_imm ] mov\t%0, 1
+ [ w, w   , w   ; fcsel   ] fcsel\t%0, %3, %4, %m1
   }
 )
 
@@ -4464,15 +4465,17 @@ (define_insn "*cmovdi_insn_uxtw"
 )
 
 (define_insn "*cmov_insn"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
+  [(set (match_operand:GPF 0 "register_operand" "=r,w")
(if_then_else:GPF
 (match_operator 1 "aarch64_comparison_operator"
  [(match_operand 2 "cc_register" "") (const_int 0)])
-(match_operand:GPF 3 "register_operand" "w")
-(match_operand:GPF 4 "register_operand" "w")))]
+(match_operand:GPF 3 "register_operand" "r,w")
+(match_operand:GPF 4 "register_operand" "r,w")))]
   "TARGET_FLOAT"
-  "fcsel\\t%0, %3, %4, %m1"
-  [(set_attr "type" "fcsel")]
+  "@
+   csel\t%0, %3, %4, %m1
+   fcsel\\t%0, %3, %4, %m1"
+  [(set_attr "type" "fcsel,csel")]
 )
 
 (define_expand "movcc"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 99cde46f1ba..42303f2ec02 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1147,6 +1147,10 @@ (define_mode_attr e [(CCFP "") (CCFPE "e")])
 ;; 32-bit version and "%x0" in the 64-bit version.
 (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")])
 
+;; For cmov template to be used with fscel instruction
+(define_mode_attr wv [(QI "s") (HI "s") (SI "s") (DI "d") (SF "s") (DF "d")])
+
+
 ;; The size of access, in bytes.
 (define_mode_attr ldst_sz [(SI "4") (DI "8")])
 ;; Likewise for load/store pair.
diff --git a/gcc/testsuite/gcc.target/aarch64/csel_1.c 
b/gcc/testsuite/gcc.target/aarch64/csel_1.c
new file mode 100644
index 000..5848e5be2ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/csel_1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ssa-phiopt" } */
+/* PR target/98477 */
+
+/* We should be able to produce csel followed by a store
+   and not move between the GPRs and simd registers. */
+/* Note -fno-ssa-phiopt is needed, otherwise the tree level
+   does the VCE after the cmov which allowed to use the csel
+   instruction. */
+_Static_assert (sizeof(long long) == sizeof(double));
+void
+foo (int a, double *b, long long c, long long d)
+{
+  double ct;
+  double dt;
+  __builtin_memcpy(, , sizeof(long long));
+  __builtin_memcpy(, , sizeof(long long));
+  double t = a ? ct : dt;
+  *b = t;
+}
+
+/* { dg-final { scan-assembler-not "\tfcsel\t"  } } */
+/* { dg-final { scan-assembler-times "\tcsel\t" 1 } } */
+/* The store should still happen from the GPRs */
+/* { dg-final { scan-assembler-not "\tstr\td"  } } */
+/* { dg-final { scan-assembler-times "\tstr\tx" 1 } } */
+/* { dg-final { scan-assembler-not "\tfmov\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/fcsel_2.c 
b/gcc/testsuite/gcc.target/aarch64/fcsel_2.c
new file mode 100644
index 000..309e8cbe37f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fcsel_2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* PR target/98477 */
+
+#define vector16 __attribute__((vector_size(16)))
+/* We should be able to produce fscel followed by a store
+   and not move between the GPRs and simd registers. */
+void
+foo (int a, int *b, vector16 int c, vector16 int d)
+{
+  int t = a ? c[0] : d[0];
+  *b = t;
+}
+
+/* { dg-final { scan-assembler-times "\tfcsel\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tcsel\t" } } */
+/* The store should still happen from the simd register */
+/* { dg-final { scan-assembler-times "\tstr\ts" 1 } } */
+/* { dg-final { scan-assembler-not "\tstr\tw" } } */
+/* { dg-final { scan-assembler-not "\tfmov\t" } } */
-- 
2.43.0



[PATCH v3] aarch64: Fix normal returns inside functions which use eh_returns [PR114843]

2024-05-05 Thread Andrew Pinski
The problem here is that on a normal return path, we still restore the
eh data return when we should not.
Instead of one return path in the case of eh_return, this changes over
to use multiple returns pathes just like a normal function.
On the normal path (non-eh return), we need to skip restoring of the eh
return data registers.

This fixes the code generation of _Unwind_RaiseException where the return value
was currupted.

Note this adds some testcases which might fail on some targets.
I know of the following targets will fail also:
arm is recorded as PR 114847.
powerpc is recorded as PR 114846.

Build and tested for aarch64-linux-gnu with no regressions.

Changes in:
* v2: Fix logical error in aarch64_pop_regs which was a premature optimization.
Check regno1 and regno2 independently now.
Also add eh_return-5.c which tests that case.
* v3: Instead of redoing the detection of the eh_return register store off
to frame.eh_return_allocated. Also don't consider eh_return data
registers as pop canidates.

Note v2 was not submitted.

PR target/114843

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_expand_epilogue): New
prototype.
* config/aarch64/aarch64.h (EH_RETURN_DATA_REGISTERS_N): New define.
(EH_RETURN_DATA_REGNO): Use EH_RETURN_DATA_REGISTERS_N instead of hard 
coding 4.
(aarch64_frame): Add eh_return_allocated.
* config/aarch64/aarch64.cc (aarch64_restore_callee_saves): Skip
over the eh return data regs if not eh return.
(aarch64_expand_epilogue): New function, pass false.
(aarch64_expand_epilogue): Add was_eh_return argument.
Update calls to aarch64_restore_callee_saves and aarch64_pop_regs.
For eh_returns, update the sp and do an indirect jump.
Don't check EH_RETURN_TAKEN_RTX any more.
* config/aarch64/aarch64.h (EH_RETURN_TAKEN_RTX): Delete.
* config/aarch64/aarch64.md (eh_return): New define_expand.
(eh_return_internal): New pattern for eh_returns.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/eh_return-3.c: Update testcase.
* gcc.c-torture/execute/eh_return-1.c: New test.
* gcc.c-torture/execute/eh_return-2.c: New test.
* gcc.c-torture/execute/eh_return-3.c: New test.
* gcc.c-torture/execute/eh_return-4.c: New test.
* gcc.c-torture/execute/eh_return-5.c: New test.
* gcc.target/aarch64/eh_return-4.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.cc | 78 ++-
 gcc/config/aarch64/aarch64.h  | 14 ++--
 gcc/config/aarch64/aarch64.md | 24 ++
 .../gcc.c-torture/execute/eh_return-1.c   | 20 +
 .../gcc.c-torture/execute/eh_return-2.c   | 23 ++
 .../gcc.c-torture/execute/eh_return-3.c   | 25 ++
 .../gcc.c-torture/execute/eh_return-4.c   | 25 ++
 .../gcc.c-torture/execute/eh_return-5.c   | 24 ++
 .../gcc.target/aarch64/eh_return-3.c  | 12 ++-
 .../gcc.target/aarch64/eh_return-4.c  | 32 
 11 files changed, 244 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-2.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-3.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-4.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-4.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 42639e9efcf..efe86d52873 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -904,6 +904,7 @@ const char * aarch64_gen_far_branch (rtx *, int, const char 
*, const char *);
 const char * aarch64_output_probe_stack_range (rtx, rtx);
 const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode);
+void aarch64_expand_epilogue (rtx_call_insn *, bool);
 void aarch64_expand_epilogue (rtx_call_insn *);
 rtx aarch64_ptrue_all (unsigned int);
 opt_machine_mode aarch64_ptrue_all_mode (rtx);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 662ff5a9b0c..afbe4eeb340 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -7792,6 +7792,7 @@ aarch64_layout_frame (void)
 
 #define SLOT_NOT_REQUIRED (-2)
 #define SLOT_REQUIRED (-1)
+#define SLOT_EH_RETURN_REQUIRED (-3)
 
   frame.wb_push_candidate1 = INVALID_REGNUM;
   frame.wb_push_candidate2 = INVALID_REGNUM;
@@ -7805,7 +7806,7 @@ aarch64_layout_frame (void)
   /* ... that includes the eh data registers (if needed)...  */
   if (crtl->calls_eh_return)
 for (regno = 0; EH_RETURN_DATA_REGNO (regno) != INVALID_REGNUM; re

[PATCH v3] DCE __cxa_atexit calls where the function is pure/const [PR19661]

2024-05-04 Thread Andrew Pinski
In C++ sometimes you have a deconstructor function which is "empty", like for an
example with unions or with arrays.  The front-end might not know it is empty 
either
so this should be done on during optimization.o
To implement it I added it to DCE where we mark if a statement is necessary or 
not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Changes since v1:
  * v2: Add support for __aeabi_atexit for arm-*eabi. Add extra comments.
Add cxa_atexit-5.C testcase for -fPIC case.
  * v3: Fix testcases for the __aeabi_atexit (forgot to do in the v2).

PR tree-optimization/19661

gcc/ChangeLog:

* tree-ssa-dce.cc (is_cxa_atexit): New function.
(is_removable_cxa_atexit_call): New function.
(mark_stmt_if_obviously_necessary): Don't mark removable
cxa_at_exit calls.
(mark_all_reaching_defs_necessary_1): Likewise.
(propagate_necessity): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/cxa_atexit-1.C: New test.
* g++.dg/tree-ssa/cxa_atexit-2.C: New test.
* g++.dg/tree-ssa/cxa_atexit-3.C: New test.
* g++.dg/tree-ssa/cxa_atexit-4.C: New test.
* g++.dg/tree-ssa/cxa_atexit-5.C: New test.
* g++.dg/tree-ssa/cxa_atexit-6.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C | 20 +++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C | 21 +++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C | 19 +++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C | 20 +++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-5.C | 39 +
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C | 24 
 gcc/tree-ssa-dce.cc  | 58 
 7 files changed, 201 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-5.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
new file mode 100644
index 000..82ff3d2b778
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* The call to axexit should be removed as A::~A() is a pure/const function 
call
+   and there is no visible effect if A::~A() call does not happen.  */
+
+struct A { 
+A(); 
+~A() {} 
+}; 
+ 
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-times "Deleting : 
(?:__cxxabiv1::__cxa_atexit|__aeabiv1::__aeabi_atexit)" 1 "cddce1" } } */
+/* { dg-final { scan-tree-dump-not "__cxa_atexit|__aeabi_atexit" "optimized" } 
} */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
new file mode 100644
index 000..726b6d7f156
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
@@ -0,0 +1,21 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* The call to axexit should be not removed as A::~A() as it marked with 
noipa.  */
+
+struct A { 
+A(); 
+~A();
+}; 
+
+[[gnu::noipa]] A::~A() {}
+ 
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-not "Deleting : 
(?:__cxxabiv1::__cxa_atexit|__aeabiv1::__aeabi_atexit)" "cddce1" } } */
+/* { dg-final { scan-tree-dump-times "(?:__cxa_atexit|__aeabi_atexit)" 1 
"optimized" } } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
new file mode 100644
index 000..42cc7ccb11b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* We should not remove the call to atexit as A::~A is unknown.  */
+
+struct A { 
+A(); 
+~A();
+}; 
+
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-not "Deleting : 
(?:__cxxabiv1::__cxa_atexit|__aeabiv1::__aeabi_atexit)" "cddce1" } } */
+/* { dg-final { scan-tree-dump-times "(?:__cxa_atexit|__aeabi_atexit)" 1 
"optimized" } } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C
new file mode 100644
index 0

[PATCH v2] DCE __cxa_atexit calls where the function is pure/const [PR19661]

2024-05-04 Thread Andrew Pinski
In C++ sometimes you have a deconstructor function which is "empty", like for an
example with unions or with arrays.  The front-end might not know it is empty 
either
so this should be done on during optimization.o
To implement it I added it to DCE where we mark if a statement is necessary or 
not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Changes since v1:
  * v2: Add support for __aeabi_atexit for arm-*eabi. Add extra comments.
Add cxa_atexit-5.C testcase for -fPIC case.

PR tree-optimization/19661

gcc/ChangeLog:

* tree-ssa-dce.cc (is_cxa_atexit): New function.
(is_removable_cxa_atexit_call): New function.
(mark_stmt_if_obviously_necessary): Don't mark removable
cxa_at_exit calls.
(mark_all_reaching_defs_necessary_1): Likewise.
(propagate_necessity): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/cxa_atexit-1.C: New test.
* g++.dg/tree-ssa/cxa_atexit-2.C: New test.
* g++.dg/tree-ssa/cxa_atexit-3.C: New test.
* g++.dg/tree-ssa/cxa_atexit-4.C: New test.
* g++.dg/tree-ssa/cxa_atexit-5.C: New test.
* g++.dg/tree-ssa/cxa_atexit-6.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C | 20 +++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C | 21 +++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C | 19 +++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C | 20 +++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-5.C | 39 +
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C | 24 
 gcc/tree-ssa-dce.cc  | 58 
 7 files changed, 201 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-5.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
new file mode 100644
index 000..1f5f431c7e4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* The call to axexit should be removed as A::~A() is a pure/const function 
call
+   and there is no visible effect if A::~A() call does not happen.  */
+
+struct A { 
+A(); 
+~A() {} 
+}; 
+ 
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-times "Deleting : __cxxabiv1::__cxa_atexit" 1 
"cddce1" } } */
+/* { dg-final { scan-tree-dump-not "__cxa_atexit" "optimized" } } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
new file mode 100644
index 000..4d0656b455c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
@@ -0,0 +1,21 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* The call to axexit should be not removed as A::~A() as it marked with 
noipa.  */
+
+struct A { 
+A(); 
+~A();
+}; 
+
+[[gnu::noipa]] A::~A() {}
+ 
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-not "Deleting : __cxxabiv1::__cxa_atexit" 
"cddce1" } } */
+/* { dg-final { scan-tree-dump-times "__cxa_atexit" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
new file mode 100644
index 000..03a19209661
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* We should not remove the call to atexit as A::~A is unknown.  */
+
+struct A { 
+A(); 
+~A();
+}; 
+
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-not "Deleting : __cxxabiv1::__cxa_atexit" 
"cddce1" } } */
+/* { dg-final { scan-tree-dump-times "__cxa_atexit" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C
new file mode 100644
index 000..b85a7efd16b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C
@@ -0,0 +1,20 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-opti

[PATCH] aarch64: Support multiple variants including up to 3

2024-05-04 Thread Andrew Pinski
On some of the Qualcomm's SoC that includes oryon-1 core, the variant
will be different on the cores due to big.little config. Though
the difference between big and little is not significant enough
to have seperate cost/scheduling models for them and the feature set
is the same across all variants.

Also on some SoCs, there are 3 variants of the core, big.middle.little
so this increases the support there for up to 3 cores and 3 variants
in the original parsing loop but it does not change the support for max
of 2 different cores.

After this patch and the patch that adds oryon-1, -mcpu=native works
on the SoCs I am working with.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

* config/aarch64/driver-aarch64.cc (host_detect_local_cpu): Support
3 cores and 3 variants. If there is one core but multiple variant,
then treat the variant as being all.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_25: New file.
* gcc.target/aarch64/cpunative/info_26: New file.
* gcc.target/aarch64/cpunative/native_cpu_25.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_26.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/driver-aarch64.cc  | 14 ++
 .../gcc.target/aarch64/cpunative/info_25  | 17 
 .../gcc.target/aarch64/cpunative/info_26  | 26 +++
 .../aarch64/cpunative/native_cpu_25.c | 11 
 .../aarch64/cpunative/native_cpu_26.c | 11 
 5 files changed, 74 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_25
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_26
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_25.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_26.c

diff --git a/gcc/config/aarch64/driver-aarch64.cc 
b/gcc/config/aarch64/driver-aarch64.cc
index b620351e572..abe6e7df7dc 100644
--- a/gcc/config/aarch64/driver-aarch64.cc
+++ b/gcc/config/aarch64/driver-aarch64.cc
@@ -256,9 +256,9 @@ host_detect_local_cpu (int argc, const char **argv)
   bool cpu = false;
   unsigned int i = 0;
   unsigned char imp = INVALID_IMP;
-  unsigned int cores[2] = { INVALID_CORE, INVALID_CORE };
+  unsigned int cores[3] = { INVALID_CORE, INVALID_CORE, INVALID_CORE };
   unsigned int n_cores = 0;
-  unsigned int variants[2] = { ALL_VARIANTS, ALL_VARIANTS };
+  unsigned int variants[3] = { ALL_VARIANTS, ALL_VARIANTS, ALL_VARIANTS };
   unsigned int n_variants = 0;
   bool processed_exts = false;
   aarch64_feature_flags extension_flags = 0;
@@ -314,7 +314,7 @@ host_detect_local_cpu (int argc, const char **argv)
  unsigned cvariant = parse_field (buf);
  if (!contains_core_p (variants, cvariant))
{
-  if (n_variants == 2)
+ if (n_variants == 3)
 goto not_found;
 
   variants[n_variants++] = cvariant;
@@ -326,7 +326,7 @@ host_detect_local_cpu (int argc, const char **argv)
  unsigned ccore = parse_field (buf);
  if (!contains_core_p (cores, ccore))
{
- if (n_cores == 2)
+ if (n_cores == 3)
goto not_found;
 
  cores[n_cores++] = ccore;
@@ -383,11 +383,15 @@ host_detect_local_cpu (int argc, const char **argv)
   /* Weird cpuinfo format that we don't know how to handle.  */
   if (n_cores == 0
   || n_cores > 2
-  || (n_cores == 1 && n_variants != 1)
   || imp == INVALID_IMP
   || !processed_exts)
 goto not_found;
 
+  /* If we have one core type but multiple variants, consider
+ that as one variant with ALL_VARIANTS instead.  */
+  if (n_cores == 1 && n_variants != 1)
+variants[0] = ALL_VARIANTS;
+
   /* Simple case, one core type or just looking for the arch. */
   if (n_cores == 1 || arch)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_25 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_25
new file mode 100644
index 000..d6e83ccab09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_25
@@ -0,0 +1,17 @@
+processor  : 0
+BogoMIPS   : 38.40
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp 
asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 
asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 
rng bti ecv afp rpres
+CPU implementer: 0x51
+CPU architecture: 8
+CPU variant: 0x2
+CPU part   : 0x001
+CPU revision   : 1
+
+processor  : 1
+BogoMIPS   : 38.40
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp 
asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 
asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint i8mm bf16 
rng bti ecv afp rpres
+CPU implementer: 0x51
+CPU architecture: 8
+CPU variant

Re: [PATCH 3/3] Add parentheses around DECL_INIT for .original [PR23872]

2024-05-03 Thread Andrew Pinski
On Fri, May 3, 2024 at 4:41 AM Richard Biener
 wrote:
>
> On Thu, May 2, 2024 at 11:40 PM Andrew Pinski  
> wrote:
> >
> > When we have :
> > `void f (int y, int z) { int x = ( z++,y); }`
> >
> > This would have printed the decl's initializer without
> > parentheses which can confusion if you think that is defining
> > another variable rather than the compound expression.
> >
> > This adds parenthese around DECL_INIT if it was a COMPOUND_EXPR.
>
> Looking it seems we'd hit a similar issue for
>
>  foo ((z++,y), 2);
>
> thus in CALL_EXPR context.  Also
>
> int k;
> void foo (int i, int j)
> {
>   k = (i, 2) + j;
> }
>
> dumps as
>
> {
>   k = i, j + 2;;
> }
>
> (ok that's folded to (i, j + 2) but still).
>
> So shouldn't we bite the bullet and wrap all COMPOUND_EXPRs in
> parens instead?  Possibly "tail-calling" the case of
> a, b, c in COMPOUND_EXPR dumping itself?

Let me look into that but it won't be until June due to other things going on.

Thanks,
Andrew

>
> Thanks,
> Richard.
>
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> > * tree-pretty-print.cc (print_declaration): Add parenthese
> > around DECL_INIT if it was a COMPOUND_EXPR.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/tree-pretty-print.cc | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> > index 825ba74443b..8b766dcd2b8 100644
> > --- a/gcc/tree-pretty-print.cc
> > +++ b/gcc/tree-pretty-print.cc
> > @@ -4240,7 +4240,14 @@ print_declaration (pretty_printer *pp, tree t, int 
> > spc, dump_flags_t flags, bool
> >   pp_equal (pp);
> >   pp_space (pp);
> >   if (!(flags & TDF_SLIM))
> > -   dump_generic_node (pp, DECL_INITIAL (t), spc, flags, false);
> > +   {
> > + bool need_paren = TREE_CODE (DECL_INITIAL (t)) == 
> > COMPOUND_EXPR;
> > + if (need_paren)
> > +   pp_left_paren (pp);
> > + dump_generic_node (pp, DECL_INITIAL (t), spc, flags, false);
> > + if (need_paren)
> > +   pp_right_paren (pp);
> > +   }
> >   else
> > pp_string (pp, "<<< omitted >>>");
> > }
> > --
> > 2.43.0
> >


Re: [PATCH 2/3] Improve DECL_EXPR printing in .original [PR23872]

2024-05-03 Thread Andrew Pinski
On Fri, May 3, 2024 at 4:36 AM Richard Biener
 wrote:
>
> On Thu, May 2, 2024 at 11:40 PM Andrew Pinski  
> wrote:
> >
> > Right now we don't print that a DECL_EXPR and we get
> > basically double output of the decls and it looks confusing.
> > This fixes that.
> > for the simple example:
> > `void foo () { int result = 0;}`
> > This gives:
> > ```
> > {
> >   int result = 0;
> >
> >   DECL_EXPR;
> > }
> > ```
>
> Hmm, I think it would be better if it were
>
>   {
>
> int result = 0;
>   }
>
> so omit the dumping from the BLOCK_VARS(?) when the variable has a DECL_EXPR.
> That more easily lets us spot use-before-DECL_EXPR issues.

Yes, yes that would be definitely better. Let me see if I can figure
that out. I might not be able to get back to this until June though.

>
> So I don't think this patch is an improvement?

Ok, yes I agree.

Thanks,
Andrew

>
> Thanks,
> Richard.
>
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> > * tree-pretty-print.cc (dump_generic_node ): Print
> > out `DECL_EXPR<...>` around the decl and update the call to
> > print_declaration to pass false for new argument and pass 0
> > for the spacing.
> > (print_declaration): Add argument is_stmt and don't print
> > a semicolon nor the initializer.
> > * tree-pretty-print.h (print_declaration): Add bool argument
> > and default it to true.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/tree-pretty-print.cc | 18 +++---
> >  gcc/tree-pretty-print.h  |  2 +-
> >  2 files changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> > index f9ad8562078..825ba74443b 100644
> > --- a/gcc/tree-pretty-print.cc
> > +++ b/gcc/tree-pretty-print.cc
> > @@ -2917,8 +2917,9 @@ dump_generic_node (pretty_printer *pp, tree node, int 
> > spc, dump_flags_t flags,
> >break;
> >
> >  case DECL_EXPR:
> > -  print_declaration (pp, DECL_EXPR_DECL (node), spc, flags);
> > -  is_stmt = false;
> > +  pp_string (pp, "DECL_EXPR<");
> > +  print_declaration (pp, DECL_EXPR_DECL (node), 0, flags, false);
> > +  pp_greater (pp);
> >break;
> >
> >  case COND_EXPR:
> > @@ -4151,10 +4152,11 @@ dump_generic_node (pretty_printer *pp, tree node, 
> > int spc, dump_flags_t flags,
> >return spc;
> >  }
> >
> > -/* Print the declaration of a variable.  */
> > +/* Print the declaration of a variable, T to PP starting with SPC spaces 
> > with FLAGS
> > +   and called as IS_STMT a statement or not.  */
> >
> >  void
> > -print_declaration (pretty_printer *pp, tree t, int spc, dump_flags_t flags)
> > +print_declaration (pretty_printer *pp, tree t, int spc, dump_flags_t 
> > flags, bool is_stmt)
> >  {
> >INDENT (spc);
> >
> > @@ -4162,7 +4164,8 @@ print_declaration (pretty_printer *pp, tree t, int 
> > spc, dump_flags_t flags)
> >  {
> >pp_string(pp, "namelist ");
> >dump_decl_name (pp, t, flags);
> > -  pp_semicolon (pp);
> > +  if (is_stmt)
> > +   pp_semicolon (pp);
> >return;
> >  }
> >
> > @@ -4231,7 +4234,7 @@ print_declaration (pretty_printer *pp, tree t, int 
> > spc, dump_flags_t flags)
> >if (TREE_CODE (t) != FUNCTION_DECL)
> >  {
> >/* Print the initial value.  */
> > -  if (DECL_INITIAL (t))
> > +  if (DECL_INITIAL (t) && is_stmt)
> > {
> >   pp_space (pp);
> >   pp_equal (pp);
> > @@ -4250,7 +4253,8 @@ print_declaration (pretty_printer *pp, tree t, int 
> > spc, dump_flags_t flags)
> >pp_right_bracket (pp);
> >  }
> >
> > -  pp_semicolon (pp);
> > +  if (is_stmt)
> > +pp_semicolon (pp);
> >  }
> >
> >
> > diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
> > index 0da6242629b..660c17410a9 100644
> > --- a/gcc/tree-pretty-print.h
> > +++ b/gcc/tree-pretty-print.h
> > @@ -47,7 +47,7 @@ extern void dump_omp_loop_non_rect_expr (pretty_printer 
> > *, tree, int,
> >  dump_flags_t);
> >  extern void print_omp_context_selector (FILE *, tree, dump_flags_t);
> >  extern int dump_generic_node (pretty_printer *, tree, int, dump_flags_t, 
> > bool);
> > -extern void print_declaration (pretty_printer *, tree, int, dump_flags_t);
> > +extern void print_declaration (pretty_printer *, tree, int, dump_flags_t, 
> > bool = true);
> >  extern int op_code_prio (enum tree_code);
> >  extern int op_prio (const_tree);
> >  extern const char *op_symbol_code (enum tree_code, dump_flags_t = 
> > TDF_NONE);
> > --
> > 2.43.0
> >


Re: [PATCH] c++: 'typename T::X' vs 'struct T::X' lookup [PR109420]

2024-05-03 Thread Andrew Pinski
On Fri, May 3, 2024 at 8:08 AM Patrick Palka  wrote:
>
> Hey Andrew,
>
> On Wed, 5 Apr 2023, Andrew Pinski wrote:
>
> > On Wed, Apr 5, 2023 at 10:32 AM Patrick Palka via Gcc-patches
> >  wrote:
> > >
> > > On Wed, 5 Apr 2023, Patrick Palka wrote:
> > >
> > > > r13-6098-g46711ff8e60d64 made make_typename_type no longer ignore
> > > > non-types during the lookup, unless the TYPENAME_TYPE in question was
> > > > followed by the :: scope resolution operator.  But there is another
> > > > exception to this rule: we need to ignore non-types during the lookup
> > > > also if the TYPENAME_TYPE was named with a tag other than 'typename',
> > > > such as 'struct' or 'enum', as per [dcl.type.elab]/5.
> > > >
> > > > This patch implements this additional exception.  It occurred to me that
> > > > the tf_qualifying_scope flag is probably unnecessary if we'd use the
> > > > scope_type tag more thoroughly, but that requires parser changes that
> > > > are probably too risky at this stage.  (I'm working on addressing the
> > > > FIXME/TODOs here for GCC 14.)
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > > > trunk?
> > > >
> > > >   PR c++/109420
> > > >
> > > > gcc/cp/ChangeLog:
> > > >
> > > >   * decl.cc (make_typename_type): Also ignore non-types during
> > > >   the lookup if tag_type is something other than none_type or
> > > >   typename_type.
> > > >   * pt.cc (tsubst) : Pass class_type or
> > > >   enum_type as tag_type to make_typename_type as appropriate
> > > >   instead of always passing typename_type.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * g++.dg/template/typename27.C: New test.
> > > > ---
> > > >  gcc/cp/decl.cc |  9 -
> > > >  gcc/cp/pt.cc   |  9 -
> > > >  gcc/testsuite/g++.dg/template/typename27.C | 19 +++
> > > >  3 files changed, 35 insertions(+), 2 deletions(-)
> > > >  create mode 100644 gcc/testsuite/g++.dg/template/typename27.C
> > > >
> > > > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > > > index 5369714f9b3..a0a20c5accc 100644
> > > > --- a/gcc/cp/decl.cc
> > > > +++ b/gcc/cp/decl.cc
> > > > @@ -4307,7 +4307,14 @@ make_typename_type (tree context, tree name, 
> > > > enum tag_types tag_type,
> > > >   lookup will stop when we hit a dependent base.  */
> > > >if (!dependent_scope_p (context))
> > > >  {
> > > > -  bool want_type = (complain & tf_qualifying_scope);
> > > > +  /* As per [dcl.type.elab]/5 and [temp.res.general]/3, ignore 
> > > > non-types if
> > > > +  the tag corresponds to a class-key or 'enum' (or is scope_type), 
> > > > or if
> > > > +  this typename is followed by :: as per 
> > > > [basic.lookup.qual.general]/1.
> > > > +  TODO: If we'd set the scope_type tag accurately on all 
> > > > TYPENAME_TYPEs
> > > > +  that are followed by :: then we wouldn't need the 
> > > > tf_qualifying_scope
> > > > +  flag.  */
> > > > +  bool want_type = (tag_type != none_type && tag_type != 
> > > > typename_type)
> > > > + || (complain & tf_qualifying_scope);
> > >
> > > Here's v2 which just slightly improves this comment.  I reckon 
> > > [basic.lookup.elab]
> > > is a better reference than [dcl.type.elab]/5 for justifying why the
> > > lookup should be type-only for class-key and 'enum' TYPENAME_TYPEs.
> > >
> > > -- >8 --
> > >
> > > PR c++/109420
> > >
> > > gcc/cp/ChangeLog:
> > >
> > > * decl.cc (make_typename_type): Also ignore non-types during the
> > > lookup if tag_type corresponds to an elaborated-type-specifier.
> > > * pt.cc (tsubst) : Pass class_type or
> > > enum_type as tag_type to make_typename_type as appropriate
> > > instead of always passing typename_type.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * g++.dg/template/typename27.C: New test.
> > > ---
> > >  gcc/c

RE: [PATCH] DCE __cxa_atexit calls where the function is pure/const [PR19661]

2024-05-03 Thread Andrew Pinski (QUIC)


> -Original Message-
> From: Jeff Law 
> Sent: Friday, May 3, 2024 7:53 AM
> To: Andrew Pinski (QUIC) ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] DCE __cxa_atexit calls where the function is pure/const
> [PR19661]
> 
> 
> 
> On 5/2/24 3:56 PM, Andrew Pinski wrote:
> > In C++ sometimes you have a deconstructor function which is "empty",
> > like for an example with unions or with arrays.  The front-end might
> > not know it is empty either so this should be done on during
> > optimization.o To implement it I added it to DCE where we mark if a
> statement is necessary or not.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > PR tree-optimization/19661
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-dce.cc (is_cxa_atexit): New function.
> > (is_removable_cxa_atexit_call): New function.
> > (mark_stmt_if_obviously_necessary): Don't mark removable
> > cxa_at_exit calls.
> > (mark_all_reaching_defs_necessary_1): Likewise.
> > (propagate_necessity): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/tree-ssa/cxa_atexit-1.C: New test.
> > * g++.dg/tree-ssa/cxa_atexit-2.C: New test.
> > * g++.dg/tree-ssa/cxa_atexit-3.C: New test.
> > * g++.dg/tree-ssa/cxa_atexit-4.C: New test.
> > * g++.dg/tree-ssa/cxa_atexit-5.C: New test.
> OK

I have 2 issues reported to me before I pushed this so I am going to fix/check 
on them before pushing this.
The first one is the testcase fails on arm-linux-eabi since it uses 
__eabi_atexit rather than __cxa_atexit (I think the order of arguments for that 
function is slightly different too).
The second one is making sure the function will bind locally (or the user had 
the attribute on the function).
I should have a new patch Monday or Tuesday.

Thanks,
Andrew Pinski

> jeff



[PATCH] AARCH64: Add Qualcomnm oryon-1 core

2024-05-03 Thread Andrew Pinski
This patch adds Qualcomm's new oryon-1 core; this is enough
to recongize the core and later on will add the tuning structure.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (oryon-1): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi  (AArch64 Options): Document oryon-1.

Signed-off-by: Andrew Pinski 
Co-authored-by: Joel Jones 
Co-authored-by: Wei Zhao 
---
 gcc/config/aarch64/aarch64-cores.def | 5 +
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index f69fc212d56..be60929e400 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -151,6 +151,11 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb, cortexa57, 
V8_4A,  (SVE, I8MM, B
 /* Qualcomm ('Q') cores. */
 AARCH64_CORE("saphira", saphira,saphira,V8_4A,  (CRYPTO), saphira, 
  0x51, 0xC01, -1)
 
+/* ARMv8.6-A Architecture Processors.  */
+
+/* Qualcomm ('Q') cores. */
+AARCH64_CORE("oryon-1", oryon1, cortexa57, V8_6A, (CRYPTO, SM4, SHA3, F16), 
cortexa72,   0x51, 0x001, -1)
+
 /* ARMv8-A big.LITTLE implementations.  */
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, V8A,  
(CRC), cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index abd3c9e0822..ba940f1c890 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9456ced468a..eabe09dc28f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21323,6 +21323,7 @@ performance of the code.  Permissible values for this 
option are:
 @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
 @samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c},
 @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
+@samp{oyron-1}, 
 @samp{neoverse-512tvb}, @samp{neoverse-e1}, @samp{neoverse-n1},
 @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2}, @samp{qdf24xx},
 @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
-- 
2.43.0



Re: [PATCH] [ranger] Force buffer alignment in Value_Range [PR114912]

2024-05-03 Thread Andrew Pinski
On Fri, May 3, 2024 at 2:24 AM Aldy Hernandez  wrote:
>
> Sparc requires strict alignment and is choking on the byte vector in
> Value_Range.  Is this the right approach, or is there a more canonical
> way of forcing alignment?

I think the suggestion was to change over to use an union and use the
types directly in the union (anonymous unions and unions containing
non-PODs are part of C++11).
That is:
union {
  int_range_max int_range;
  frange fload_range;
  unsupported_range un_range;
};
...
m_vrange = new (_range) int_range_max ();
...

Also the canonical way of forcing alignment in C++ is to use aliagnas
as my patch in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912
did.
Also I suspect the alignment is not word alignment but rather the
alignment of HOST_WIDE_INT which is not always the same as the
alignment of the pointer but bigger and that is why it is failing on
sparc (32bit rather than 64bit).

Thanks,
Andrew Pinski

>
> If this is correct, OK for trunk?
>
> gcc/ChangeLog:
>
> * value-range.h (class Value_Range): Use a union.
> ---
>  gcc/value-range.h | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 934eec9e386..31af7888018 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -740,9 +740,14 @@ private:
>void init (const vrange &);
>
>vrange *m_vrange;
> -  // The buffer must be at least the size of the largest range.
> -  static_assert (sizeof (int_range_max) > sizeof (frange), "");
> -  char m_buffer[sizeof (int_range_max)];
> +  union {
> +// The buffer must be at least the size of the largest range, and
> +// be aligned on a word boundary for strict alignment targets
> +// such as sparc.
> +static_assert (sizeof (int_range_max) > sizeof (frange), "");
> +char m_buffer[sizeof (int_range_max)];
> +void *align;
> +  } u;
>  };
>
>  // The default constructor is uninitialized and must be initialized
> @@ -816,11 +821,11 @@ Value_Range::init (tree type)
>gcc_checking_assert (TYPE_P (type));
>
>if (irange::supports_p (type))
> -m_vrange = new (_buffer) int_range_max ();
> +m_vrange = new (_buffer) int_range_max ();
>else if (frange::supports_p (type))
> -m_vrange = new (_buffer) frange ();
> +m_vrange = new (_buffer) frange ();
>else
> -m_vrange = new (_buffer) unsupported_range ();
> +m_vrange = new (_buffer) unsupported_range ();
>  }
>
>  // Initialize object with a copy of R.
> @@ -829,11 +834,12 @@ inline void
>  Value_Range::init (const vrange )
>  {
>if (is_a  (r))
> -m_vrange = new (_buffer) int_range_max (as_a  (r));
> +m_vrange = new (_buffer) int_range_max (as_a  (r));
>else if (is_a  (r))
> -m_vrange = new (_buffer) frange (as_a  (r));
> +m_vrange = new (_buffer) frange (as_a  (r));
>else
> -m_vrange = new (_buffer) unsupported_range (as_a  
> (r));
> +m_vrange
> +  = new (_buffer) unsupported_range (as_a  (r));
>  }
>
>  // Assignment operator.  Copying incompatible types is allowed.  That
> --
> 2.44.0
>


[PATCH] DCE __cxa_atexit calls where the function is pure/const [PR19661]

2024-05-02 Thread Andrew Pinski
In C++ sometimes you have a deconstructor function which is "empty", like for an
example with unions or with arrays.  The front-end might not know it is empty 
either
so this should be done on during optimization.o
To implement it I added it to DCE where we mark if a statement is necessary or 
not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/19661

gcc/ChangeLog:

* tree-ssa-dce.cc (is_cxa_atexit): New function.
(is_removable_cxa_atexit_call): New function.
(mark_stmt_if_obviously_necessary): Don't mark removable
cxa_at_exit calls.
(mark_all_reaching_defs_necessary_1): Likewise.
(propagate_necessity): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/cxa_atexit-1.C: New test.
* g++.dg/tree-ssa/cxa_atexit-2.C: New test.
* g++.dg/tree-ssa/cxa_atexit-3.C: New test.
* g++.dg/tree-ssa/cxa_atexit-4.C: New test.
* g++.dg/tree-ssa/cxa_atexit-5.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C | 20 +
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C | 21 ++
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C | 19 +
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C | 20 +
 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-5.C | 39 +
 gcc/tree-ssa-dce.cc  | 44 
 6 files changed, 163 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-5.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
new file mode 100644
index 000..1f5f431c7e4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-1.C
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* The call to axexit should be removed as A::~A() is a pure/const function 
call
+   and there is no visible effect if A::~A() call does not happen.  */
+
+struct A { 
+A(); 
+~A() {} 
+}; 
+ 
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-times "Deleting : __cxxabiv1::__cxa_atexit" 1 
"cddce1" } } */
+/* { dg-final { scan-tree-dump-not "__cxa_atexit" "optimized" } } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
new file mode 100644
index 000..4d0656b455c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-2.C
@@ -0,0 +1,21 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* The call to axexit should be not removed as A::~A() as it marked with 
noipa.  */
+
+struct A { 
+A(); 
+~A();
+}; 
+
+[[gnu::noipa]] A::~A() {}
+ 
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-not "Deleting : __cxxabiv1::__cxa_atexit" 
"cddce1" } } */
+/* { dg-final { scan-tree-dump-times "__cxa_atexit" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
new file mode 100644
index 000..03a19209661
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-3.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* We should not remove the call to atexit as A::~A is unknown.  */
+
+struct A { 
+A(); 
+~A();
+}; 
+
+void foo () { 
+  static A a; 
+} 
+
+/* { dg-final { scan-tree-dump-not "Deleting : __cxxabiv1::__cxa_atexit" 
"cddce1" } } */
+/* { dg-final { scan-tree-dump-times "__cxa_atexit" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C 
b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C
new file mode 100644
index 000..b85a7efd16b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-4.C
@@ -0,0 +1,20 @@
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized -w" } */
+// { dg-require-effective-target cxa_atexit }
+/* PR tree-optimization/19661 */
+
+/* The call to axexit should be removed as A::~A() is a pure/const function 
call
+   and there is no visible effect if A::~A() call does not happen.  */
+
+struct A { 
+A(); 
+[[gnu::pure]] ~A();
+}

[PATCH 2/3] Improve DECL_EXPR printing in .original [PR23872]

2024-05-02 Thread Andrew Pinski
Right now we don't print that a DECL_EXPR and we get
basically double output of the decls and it looks confusing.
This fixes that.
for the simple example:
`void foo () { int result = 0;}`
This gives:
```
{
  int result = 0;

  DECL_EXPR;
}
```

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-pretty-print.cc (dump_generic_node ): Print
out `DECL_EXPR<...>` around the decl and update the call to
print_declaration to pass false for new argument and pass 0
for the spacing.
(print_declaration): Add argument is_stmt and don't print
a semicolon nor the initializer.
* tree-pretty-print.h (print_declaration): Add bool argument
and default it to true.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-pretty-print.cc | 18 +++---
 gcc/tree-pretty-print.h  |  2 +-
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index f9ad8562078..825ba74443b 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -2917,8 +2917,9 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
   break;
 
 case DECL_EXPR:
-  print_declaration (pp, DECL_EXPR_DECL (node), spc, flags);
-  is_stmt = false;
+  pp_string (pp, "DECL_EXPR<");
+  print_declaration (pp, DECL_EXPR_DECL (node), 0, flags, false);
+  pp_greater (pp);
   break;
 
 case COND_EXPR:
@@ -4151,10 +4152,11 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
   return spc;
 }
 
-/* Print the declaration of a variable.  */
+/* Print the declaration of a variable, T to PP starting with SPC spaces with 
FLAGS
+   and called as IS_STMT a statement or not.  */
 
 void
-print_declaration (pretty_printer *pp, tree t, int spc, dump_flags_t flags)
+print_declaration (pretty_printer *pp, tree t, int spc, dump_flags_t flags, 
bool is_stmt)
 {
   INDENT (spc);
 
@@ -4162,7 +4164,8 @@ print_declaration (pretty_printer *pp, tree t, int spc, 
dump_flags_t flags)
 {
   pp_string(pp, "namelist ");
   dump_decl_name (pp, t, flags);
-  pp_semicolon (pp);
+  if (is_stmt)
+   pp_semicolon (pp);
   return;
 }
 
@@ -4231,7 +4234,7 @@ print_declaration (pretty_printer *pp, tree t, int spc, 
dump_flags_t flags)
   if (TREE_CODE (t) != FUNCTION_DECL)
 {
   /* Print the initial value.  */
-  if (DECL_INITIAL (t))
+  if (DECL_INITIAL (t) && is_stmt)
{
  pp_space (pp);
  pp_equal (pp);
@@ -4250,7 +4253,8 @@ print_declaration (pretty_printer *pp, tree t, int spc, 
dump_flags_t flags)
   pp_right_bracket (pp);
 }
 
-  pp_semicolon (pp);
+  if (is_stmt)
+pp_semicolon (pp);
 }
 
 
diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index 0da6242629b..660c17410a9 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -47,7 +47,7 @@ extern void dump_omp_loop_non_rect_expr (pretty_printer *, 
tree, int,
 dump_flags_t);
 extern void print_omp_context_selector (FILE *, tree, dump_flags_t);
 extern int dump_generic_node (pretty_printer *, tree, int, dump_flags_t, bool);
-extern void print_declaration (pretty_printer *, tree, int, dump_flags_t);
+extern void print_declaration (pretty_printer *, tree, int, dump_flags_t, bool 
= true);
 extern int op_code_prio (enum tree_code);
 extern int op_prio (const_tree);
 extern const char *op_symbol_code (enum tree_code, dump_flags_t = TDF_NONE);
-- 
2.43.0



[PATCH 3/3] Add parentheses around DECL_INIT for .original [PR23872]

2024-05-02 Thread Andrew Pinski
When we have :
`void f (int y, int z) { int x = ( z++,y); }`

This would have printed the decl's initializer without
parentheses which can confusion if you think that is defining
another variable rather than the compound expression.

This adds parenthese around DECL_INIT if it was a COMPOUND_EXPR.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-pretty-print.cc (print_declaration): Add parenthese
around DECL_INIT if it was a COMPOUND_EXPR.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-pretty-print.cc | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 825ba74443b..8b766dcd2b8 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -4240,7 +4240,14 @@ print_declaration (pretty_printer *pp, tree t, int spc, 
dump_flags_t flags, bool
  pp_equal (pp);
  pp_space (pp);
  if (!(flags & TDF_SLIM))
-   dump_generic_node (pp, DECL_INITIAL (t), spc, flags, false);
+   {
+ bool need_paren = TREE_CODE (DECL_INITIAL (t)) == COMPOUND_EXPR;
+ if (need_paren)
+   pp_left_paren (pp);
+ dump_generic_node (pp, DECL_INITIAL (t), spc, flags, false);
+ if (need_paren)
+   pp_right_paren (pp);
+   }
  else
pp_string (pp, "<<< omitted >>>");
}
-- 
2.43.0



[PATCH 1/3] Fix printing COMPOUND_EXPR in .original [PR23872]

2024-05-02 Thread Andrew Pinski
Starting with the merge of the openmp branch into the trunk
(r0-73077-g953ff28998b59b), COMPOUND_EXPR started to be printed
as `expr; , expr` which is wrong. This was due to the wrong
conversion of dumping_stmts into `!(flags & TDF_SLIM)`. That is wrong
as we are not dumping stmts at this point (`!(flags & TDF_SLIM)` was always
true for this case as TDF_SLIM case was handled before hand). So switch it
to be always false.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR middle-end/23872
* tree-pretty-print.cc (dump_generic_node ): Fix
calls to dump_generic_node and also remove unreachable code that is 
testing
`flags & TDF_SLIM`.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/atomic-21.f90: Update testcase for the removal of 
`;`.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gfortran.dg/gomp/atomic-21.f90 |  4 ++--
 gcc/tree-pretty-print.cc | 24 ++--
 2 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/gomp/atomic-21.f90 
b/gcc/testsuite/gfortran.dg/gomp/atomic-21.f90
index febcdbbacfb..35099294d7a 100644
--- a/gcc/testsuite/gfortran.dg/gomp/atomic-21.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/atomic-21.f90
@@ -56,7 +56,7 @@ subroutine foobar()
 endif
 
 !  TARGET_EXPR  = #pragma omp atomic capture acq_rel
-!TARGET_EXPR  = NON_LVALUE_EXPR  = 
*TARGET_EXPR  == oo> ? pp : *TARGET_EXPR ;, if 
(TARGET_EXPR )
+!TARGET_EXPR  = NON_LVALUE_EXPR  = 
*TARGET_EXPR  == oo> ? pp : *TARGET_EXPR , if 
(TARGET_EXPR )
 !{
 !  <<< Unknown tree: void_cst >>>
 !}
@@ -66,7 +66,7 @@ subroutine foobar()
 !};
 !
 ! { dg-final { scan-tree-dump-times "TARGET_EXPR  = #pragma omp 
atomic capture acq_rel" 1 "original" } }
-! { dg-final { scan-tree-dump-times "TARGET_EXPR  = 
NON_LVALUE_EXPR  = \\*TARGET_EXPR  
== oo> \\? pp : \\*TARGET_EXPR ;, if \\(TARGET_EXPR 
\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "TARGET_EXPR  = 
NON_LVALUE_EXPR  = \\*TARGET_EXPR  
== oo> \\? pp : \\*TARGET_EXPR , if \\(TARGET_EXPR 
\\)" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "<<< Unknown tree: void_cst >>>" 1 
"original" } }
 ! { dg-final { scan-tree-dump-times "qq = TARGET_EXPR ;" 1 
"original" } }
 
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index c935a7da7d1..f9ad8562078 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -2856,31 +2856,21 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
  }
 
dump_generic_node (pp, TREE_OPERAND (node, 0),
-  spc, flags, !(flags & TDF_SLIM));
-   if (flags & TDF_SLIM)
- newline_and_indent (pp, spc);
-   else
- {
-   pp_comma (pp);
-   pp_space (pp);
- }
+  spc, flags, false);
+   pp_comma (pp);
+   pp_space (pp);
 
for (tp = _OPERAND (node, 1);
 TREE_CODE (*tp) == COMPOUND_EXPR;
 tp = _OPERAND (*tp, 1))
  {
dump_generic_node (pp, TREE_OPERAND (*tp, 0),
-  spc, flags, !(flags & TDF_SLIM));
-   if (flags & TDF_SLIM)
- newline_and_indent (pp, spc);
-   else
- {
-   pp_comma (pp);
-   pp_space (pp);
- }
+  spc, flags, false);
+   pp_comma (pp);
+   pp_space (pp);
  }
 
-   dump_generic_node (pp, *tp, spc, flags, !(flags & TDF_SLIM));
+   dump_generic_node (pp, *tp, spc, flags, false);
   }
   break;
 
-- 
2.43.0



Re: [COMMITTED] Reduce startup costs for Value_Range.

2024-05-01 Thread Andrew Pinski
On Wed, May 1, 2024 at 7:40 PM Ian Lance Taylor  wrote:
>
> On Wed, May 1, 2024 at 12:43 AM Aldy Hernandez  wrote:
> >
> > gcc/ChangeLog:
> >
> > * ipa-fnsummary.cc (evaluate_properties_for_edge): Initialize 
> > Value_Range's.
> > * value-range.h (class Value_Range): Add a buffer and remove
> > m_irange and m_frange.
> > (Value_Range::Value_Range): Call init.
> > (Value_Range::set_type): Same.
> > (Value_Range::init): Use in place new to initialize buffer.
> > (Value_Range::operator=): Tidy.
>
>
> I'm seeing a crash building on sparc-sun-solaris2.11 that may be due
> to this change.  The crash occurs in stage 1, the first time the newly
> built compiler is used.
>
> ./xgcc -B./ -B/var/gcc/iant/install/sparc-sun-solaris2.11/bin/
> -isystem /var/gcc/iant/install/sparc-sun-solaris2.11/include -isystem
> /var/gcc/iant/install/sparc-sun-solaris2.11/sys-include
> -L/var/gcc/iant/bootstrap/gcc/../ld  -xc -nostdinc /dev/null -S -o
> /dev/null -fself-test=../../gcc/gcc/testsuite/selftests
> In function ‘test_fn’:
> cc1: internal compiler error: Bus Error
> 0x1c7db03 crash_signal
> ../../gcc/gcc/toplev.cc:319
> 0x104a82c void wi::copy generic_wide_int >
> >(wide_int_storage&, generic_wide_int false> > const&)
> ../../gcc/gcc/wide-int.h:2191
> 0x1049da3 wide_int_storage&
> wide_int_storage::operator=(wi::hwi_with_prec
> const&)
> ../../gcc/gcc/wide-int.h:1247
> 0x104929b generic_wide_int&
> generic_wide_int::operator=(wi::hwi_with_prec
> const&)
> ../../gcc/gcc/wide-int.h:1002
> 0x104757f irange_bitmask::set_unknown(unsigned int)
> ../../gcc/gcc/value-range.h:163
> 0x1047b6f irange::set_varying(tree_node*)
> ../../gcc/gcc/value-range.h:1067
> 0x1774d1b Value_Range::set_varying(tree_node*)
> ../../gcc/gcc/value-range.h:720
> 0x1aef213 range_cast(vrange&, tree_node*)
> ../../gcc/gcc/range-op.h:248
> 0x1ada517 operator_lshift::op1_range(irange&, tree_node*, irange
> const&, irange const&, relation_trio) const
> ../../gcc/gcc/range-op.cc:2706
> 0x1aeaa6b range_op_lshift_tests
> ../../gcc/gcc/range-op.cc:4750
> 0x1aee20f selftest::range_op_tests()
> ../../gcc/gcc/range-op.cc:4887
> 0x2dfaa37 test_ranges
> ../../gcc/gcc/function-tests.cc:585
> 0x2dfb337 selftest::function_tests_cc_tests()
> ../../gcc/gcc/function-tests.cc:681
> 0x308a027 selftest::run_tests()
> ../../gcc/gcc/selftest-run-tests.cc:108
> 0x1c833ef toplev::run_self_tests()
> ../../gcc/gcc/toplev.cc:2213
> Please submit a full bug report, with preprocessed source (by using
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
> make: *** [../../gcc/gcc/c/Make-lang.in:153: s-selftest-c] Error 1

This was also reported here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912

The same question applies really, what compiler are you using to
compile GCC with? I suspect this is making a difference. It might also
be the sparc compiler that both of you two are using is causing wrong
code with some more complex C++ code even though it is at -O0.
The adding of the deconstructor to Value_Range might be causing the
structure to become a "non-POD" and different argument passing and it
was broken even at -O0 (this is just a guess).

Thanks,
Andrew Pinski

>
> Ian


[PATCH] MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) to match

2024-04-30 Thread Andrew Pinski
This adds a few more of what is currently done in phiopt's value_replacement
to match. I noticed this when I was hooking up phiopt's value_replacement
code to use match and disabling the old code. But this can be done
independently from the hooking up phiopt's value_replacement as phiopt
is already hooked up for simplified versions already.

/* a != 0 ? a / b : 0  -> a / b iff b is nonzero. */
/* a != 0 ? a * b : 0 -> a * b */
/* a != 0 ? a & b : 0 -> a & b */

We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` which
uses that form.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR treee-optimization/114894

gcc/ChangeLog:

* match.pd (`a != 0 ? a / b : 0`): New pattern.
(`a != 0 ? a * b : 0`): New pattern.
(`a != 0 ? a & b : 0`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-value-5.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd  | 18 +
 .../gcc.dg/tree-ssa/phi-opt-value-5.c | 39 +++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-value-5.c

diff --git a/gcc/match.pd b/gcc/match.pd
index d401e7503e6..03a03c31233 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4290,6 +4290,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (eq @0 integer_all_onesp) @1 (op:c@2 @1 @0))
@2))
 
+/* a != 0 ? a / b : 0  -> a / b iff b is nonzero. */
+(for op (trunc_div ceil_div floor_div round_div exact_div)
+ (simplify
+  (cond (ne @0 integer_zerop) (op@2 @3 @1) integer_zerop )
+   (if (bitwise_equal_p (@0, @3)
+&& tree_expr_nonzero_p (@1))
+@2)))
+
+/* Note we prefer the != case here
+   as (a != 0) * (a * b) will generate that version. */
+/* a != 0 ? a * b : 0 -> a * b */
+/* a != 0 ? a & b : 0 -> a & b */
+(for op (mult bit_and)
+ (simplify
+  (cond (ne @0 integer_zerop) (op:c@2 @1 @3) integer_zerop)
+  (if (bitwise_equal_p (@0, @3))
+   @2)))
+
 /* Simplifications of shift and rotates.  */
 
 (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-value-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-value-5.c
new file mode 100644
index 000..8062eb19b11
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-value-5.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* PR treee-optimization/114894 */
+/* Phi-OPT should be able to optimize these without sinking being invoked. */
+/* { dg-options "-O -fdump-tree-phiopt2 -fdump-tree-phiopt3 
-fdump-tree-optimized -fno-tree-sink" } */
+
+int fmul1(int a, int b)
+{
+  int c = a * b;
+  if (a != 0)
+return c;
+  return 0;
+}
+
+
+int fand1(int a, int b)
+{
+  int c = a & b;
+  if (a != 0)
+return c;
+  return 0;
+}
+
+
+void g(int);
+
+int fdiv1(int a, int b)
+{
+  int d = b|1;
+  g(d);
+  int c = a / d;
+  return a != 0 ? c : 0;
+}
+
+/* fdiv1 requires until later than phiopt2 to be able to detect that
+   d is non-zero. to be able to remove the conditional.  */
+/* { dg-final { scan-tree-dump-times "goto" 2 "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not "goto" "phiopt3" } } */
+/* { dg-final { scan-tree-dump-not "goto" "optimized" } } */
+
-- 
2.43.0



Re: [PATCH] c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]

2024-04-30 Thread Andrew Pinski
On Tue, Apr 30, 2024 at 11:54 AM Jason Merrill  wrote:
>
> On 2/20/24 19:06, Andrew Pinski wrote:
> > After r7-987-gf17a223de829cb, the access for the elements of a vector type 
> > would lose the qualifiers.
> > So if we had `constvector[0]`, the type of the element of the array would 
> > not have const on it.
> > This was due to a missing build_qualified_type for the inner type of the 
> > vector when building the array type.
> > We need to add back the call to build_qualified_type and now the access has 
> > the correct qualifiers. So the
> > overloads and even if it is a lvalue or rvalue is correctly done.
> >
> > Note we correctly now reject the testcase gcc.dg/pr83415.c which was 
> > incorrectly accepted after r7-987-gf17a223de829cb.
> >
> > Built and tested for aarch64-linux-gnu.
> >
> >   PR c++/89224
> >
> > gcc/c-family/ChangeLog:
> >
> >   * c-common.cc (convert_vector_to_array_for_subscript): Call 
> > build_qualified_type
> >   for the inner type.
> >
> > gcc/cp/ChangeLog:
> >
> >   * constexpr.cc (cxx_eval_array_reference): Compare main variants
> >   for the vector/array types instead of the types directly.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/torture/vector-subaccess-1.C: New test.
> >   * gcc.dg/pr83415.c: Change warning to error.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >   gcc/c-family/c-common.cc  |  7 +-
> >   gcc/cp/constexpr.cc   |  3 ++-
> >   .../g++.dg/torture/vector-subaccess-1.C   | 23 +++
> >   gcc/testsuite/gcc.dg/pr83415.c|  2 +-
> >   4 files changed, 32 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> >
> > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > index e15eff698df..884dd9043f9 100644
> > --- a/gcc/c-family/c-common.cc
> > +++ b/gcc/c-family/c-common.cc
> > @@ -8936,6 +8936,7 @@ convert_vector_to_array_for_subscript (location_t loc,
> > if (gnu_vector_type_p (TREE_TYPE (*vecp)))
> >   {
> > tree type = TREE_TYPE (*vecp);
> > +  tree newitype;
> >
> > ret = !lvalue_p (*vecp);
> >
> > @@ -8950,8 +8951,12 @@ convert_vector_to_array_for_subscript (location_t 
> > loc,
> >for function parameters.  */
> > c_common_mark_addressable_vec (*vecp);
> >
> > +  /* Make sure qualifiers are copied from the vector type to the new 
> > element
> > +  of the array type.  */
> > +  newitype = build_qualified_type (TREE_TYPE (type), TYPE_QUALS 
> > (type));
> > +
> > *vecp = build1 (VIEW_CONVERT_EXPR,
> > -   build_array_type_nelts (TREE_TYPE (type),
> > +   build_array_type_nelts (newitype,
> > TYPE_VECTOR_SUBPARTS (type)),
> > *vecp);
> >   }
> > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > index fa346fe01c9..1fe91d16e8e 100644
> > --- a/gcc/cp/constexpr.cc
> > +++ b/gcc/cp/constexpr.cc
> > @@ -4421,7 +4421,8 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, 
> > tree t,
> > if (!lval
> > && TREE_CODE (ary) == VIEW_CONVERT_EXPR
> > && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (ary, 0)))
> > -  && TREE_TYPE (t) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (ary, 0
> > +  && TYPE_MAIN_VARIANT (TREE_TYPE (t))
> > +   == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (ary, 
> > 0)
>
> Please add parens around the == expression so the formatting is stable.
ok, I will make that change.

>
> With that change, OK for trunk and release branches.

For the GCC 14 branch, should I wait until after the release due to
RC1 going out today and I am not sure this counts as a show stopper
issue.

Thanks,
Andrew

>
> Jason
>


[COMMITTED] Fix the build: error message `quote`

2024-04-30 Thread Andrew Pinski
The problem here is the quote mark is for English's
possessiveness rather than a quote but the error message
format detection is too simple so it warns which causes
-Werror to fail.

Committed as obvious after a quick build.

gcc/ChangeLog:

* tree-cfg.cc (verify_gimple_assign): Remove quote
mark to shut up the warning.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-cfg.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 1c5b7df8541..b2d47b72084 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -4842,7 +4842,7 @@ verify_gimple_assign (gassign *stmt)
   tree lhs = gimple_assign_lhs (stmt);
   if (is_gimple_reg (lhs))
{
- error ("nontemporal store's lhs cannot be a gimple register");
+ error ("nontemporal store lhs cannot be a gimple register");
  debug_generic_stmt (lhs);
  return true;
}
-- 
2.43.0



[PATCH] PHIOPT: Value-replacement check undef

2024-04-29 Thread Andrew Pinski
While moving value replacement part of PHIOPT over
to use match-and-simplify, I ran into the case where
we would have an undef use that was conditional become
unconditional. This prevents that. I can't remember at this
point what the testcase was though.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (value_replacement): Reject undef variables
so they don't become unconditional used.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index a2bdcb5eae8..f166c3132cb 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1146,6 +1146,13 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   if (code != NE_EXPR && code != EQ_EXPR)
 return 0;
 
+  /* Do not make conditional undefs unconditional.  */
+  if ((TREE_CODE (arg0) == SSA_NAME
+   && ssa_name_maybe_undef_p (arg0))
+  || (TREE_CODE (arg1) == SSA_NAME
+ && ssa_name_maybe_undef_p (arg1)))
+return false;
+
   /* If the type says honor signed zeros we cannot do this
  optimization.  */
   if (HONOR_SIGNED_ZEROS (arg1))
-- 
2.43.0



[PATCH 2/2] PHI-OPT: speed up value_replacement slightly

2024-04-28 Thread Andrew Pinski
This adds a few early outs to value_replacement that I noticed
while rewriting this to use match-and-simplify but could be committed
seperately.
* virtual operands won't change so return early for them
* special case `A ? B : B` as that is already just `B`

Also moves the check for NE/EQ earlier as calculating empty_or_with_defined_p
is an IR walk for a BB and that might be big.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (value_replacement): Move check for
NE/EQ earlier.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f1e07502b02..a2bdcb5eae8 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1131,6 +1131,21 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   enum tree_code code;
   bool empty_or_with_defined_p = true;
 
+  /* Virtual operands don't need to be handled. */
+  if (virtual_operand_p (arg1))
+return 0;
+
+  /* Special case A ? B : B as this will always simplify to B. */
+  if (operand_equal_for_phi_arg_p (arg0, arg1))
+return 0;
+
+  gcond *cond = as_a  (*gsi_last_bb (cond_bb));
+  code = gimple_cond_code (cond);
+
+  /* This transformation is only valid for equality comparisons.  */
+  if (code != NE_EXPR && code != EQ_EXPR)
+return 0;
+
   /* If the type says honor signed zeros we cannot do this
  optimization.  */
   if (HONOR_SIGNED_ZEROS (arg1))
@@ -1161,13 +1176,6 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
empty_or_with_defined_p = false;
 }
 
-  gcond *cond = as_a  (*gsi_last_bb (cond_bb));
-  code = gimple_cond_code (cond);
-
-  /* This transformation is only valid for equality comparisons.  */
-  if (code != NE_EXPR && code != EQ_EXPR)
-return 0;
-
   /* We need to know which is the true edge and which is the false
   edge so that we know if have abs or negative abs.  */
   extract_true_false_edges_from_block (cond_bb, _edge, _edge);
-- 
2.43.0



[PATCH 1/2] MATCH: change single_non_singleton_phi_for_edges for singleton phis

2024-04-28 Thread Andrew Pinski
I noticed that single_non_singleton_phi_for_edges could
return a phi whos entry are all the same for the edge.
This happens only if there was a single phis in the first place.
Also gimple_seq_singleton_p walks the sequence to see if it the one
element in the sequence so there is removing that check actually
reduces the number of pointer walks needed.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (single_non_singleton_phi_for_edges):
Remove the special case of gimple_seq_singleton_p.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiopt.cc | 8 
 1 file changed, 8 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index d1746c4b468..f1e07502b02 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -62,14 +62,6 @@ single_non_singleton_phi_for_edges (gimple_seq seq, edge e0, 
edge e1)
 {
   gimple_stmt_iterator i;
   gphi *phi = NULL;
-  if (gimple_seq_singleton_p (seq))
-{
-  phi = as_a  (gsi_stmt (gsi_start (seq)));
-  /* Never return virtual phis.  */
-  if (virtual_operand_p (gimple_phi_result (phi)))
-   return NULL;
-  return phi;
-}
   for (i = gsi_start (seq); !gsi_end_p (i); gsi_next ())
 {
   gphi *p = as_a  (gsi_stmt (i));
-- 
2.43.0



[PATCH] Remove m_nloops field from loop_versioning

2024-04-26 Thread Andrew Pinski
This is a small cleanup of loop_versioning where m_nloops
is only used in the constructor so we can remove the whole
field.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* gimple-loop-versioning.cc (loop_versioning): Remove m_nloops field.
(loop_versioning::loop_versioning): Remove initialization of
m_nloops field and move it to be a local variable.
(loop_versioning::analyze_blocks): Fix formating.
---
 gcc/gimple-loop-versioning.cc | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc
index 17877f06921..adea207659b 100644
--- a/gcc/gimple-loop-versioning.cc
+++ b/gcc/gimple-loop-versioning.cc
@@ -322,9 +322,6 @@ private:
   /* An obstack to use for general allocation.  */
   obstack m_obstack;
 
-  /* The number of loops in the function.  */
-  unsigned int m_nloops;
-
   /* The total number of loop version conditions we've found.  */
   unsigned int m_num_conditions;
 
@@ -525,10 +522,10 @@ loop_versioning::name_prop::value_of_expr (tree val, 
gimple *)
 
 loop_versioning::loop_versioning (function *fn)
   : m_fn (fn),
-m_nloops (number_of_loops (fn)),
 m_num_conditions (0),
 m_address_table (31)
 {
+  unsigned m_nloops = number_of_loops (fn);
   bitmap_obstack_initialize (_bitmap_obstack);
   gcc_obstack_init (_obstack);
 
@@ -1437,7 +1434,7 @@ loop_versioning::analyze_blocks ()
  {
linfo.rejected_p = true;
break;
-   }
+ }
 
  if (!linfo.rejected_p)
{
-- 
2.43.0



[PATCH 1/2] Add verification of gimple_assign_nontemporal_move_p [PR112976]

2024-04-26 Thread Andrew Pinski
Currently the middle-end only knows how to support temporal stores
(the undocumented storent optab) so let's verify that the only time
we set nontemporal_move on an assign is if the the lhs is not a
gimple reg.

Bootstrapped and tested on x86_64-linux-gnu no regressions.

gcc/ChangeLog:

PR middle-end/112976
* tree-cfg.cc (verify_gimple_assign): Verify that
nontmporal moves are stores.
* gimple.h (struct gimple): Note that only
nontemporal stores are supported.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple.h|  3 ++-
 gcc/tree-cfg.cc | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 8a8ca109bbf..bd315ffc2dd 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -236,7 +236,8 @@ struct GTY((desc ("gimple_statement_structure (&%h)"), tag 
("GSS_BASE"),
  for clearing this bit before using it.  */
   unsigned int visited : 1;
 
-  /* Nonzero if this tuple represents a non-temporal move.  */
+  /* Nonzero if this tuple represents a non-temporal move; currently
+ only stores are supported.  */
   unsigned int nontemporal_move: 1;
 
   /* Pass local flags.  These flags are free for any pass to use as
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index b1ba33018fd..06a96f96be7 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -4837,6 +4837,17 @@ verify_gimple_assign_single (gassign *stmt)
 static bool
 verify_gimple_assign (gassign *stmt)
 {
+  if (gimple_assign_nontemporal_move_p (stmt))
+{
+  tree lhs = gimple_assign_lhs (stmt);
+  if (is_gimple_reg (lhs))
+   {
+ error ("nontemporal store lhs cannot a gimple register");
+ debug_generic_stmt (lhs);
+ return true;
+   }
+}
+
   switch (gimple_assign_rhs_class (stmt))
 {
 case GIMPLE_SINGLE_RHS:
-- 
2.43.0



[PATCH 2/2] Remove support for nontemporal stores with ssa_names on lhs [PR112976]

2024-04-26 Thread Andrew Pinski
When cfgexpand was changed to support expanding from tuple gimple
(r0-95521-g28ed065ef9f345), the code was added to support
doing nontemporal stores with LHS of a SSA_NAME but that will
never be a nontemporal store.
This patch removes that and asserts that expanding with a LHS
of a SSA_NAME is not a nontemporal store.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR middle-end/112976
* cfgexpand.cc (expand_gimple_stmt_1): Remove
support for expanding nontemporal "moves" with
ssa names on the LHS.

Signed-off-by: Andrew Pinski 
---
 gcc/cfgexpand.cc | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index eef565eddb5..cfc5291aa0c 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -4002,17 +4002,16 @@ expand_gimple_stmt_1 (gimple *stmt)
else
  {
rtx target, temp;
-   bool nontemporal = gimple_assign_nontemporal_move_p (assign_stmt);
+   gcc_assert (!gimple_assign_nontemporal_move_p (assign_stmt));
bool promoted = false;
 
target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
  promoted = true;
 
-  /* If we want to use a nontemporal store, force the value to
- register first.  If we store into a promoted register,
- don't directly expand to target.  */
-   temp = nontemporal || promoted ? NULL_RTX : target;
+  /* If we store into a promoted register, don't directly
+ expand to target.  */
+   temp = promoted ? NULL_RTX : target;
temp = expand_expr_real_gassign (assign_stmt, temp,
 GET_MODE (target), EXPAND_NORMAL);
 
@@ -4034,8 +4033,6 @@ expand_gimple_stmt_1 (gimple *stmt)
 
convert_move (SUBREG_REG (target), temp, unsignedp);
  }
-   else if (nontemporal && emit_storent_insn (target, temp))
- ;
else
  {
temp = force_operand (temp, target);
-- 
2.43.0



[PATCH] aarch64: Use cinc for small constants instead of just add [PR112304]

2024-04-26 Thread Andrew Pinski
On many cores, the mov instruction is "free" so the sequence:
cmp w0, #0
csetw0, ne
add w0, w0, 42
is more expensive than just:
cmp w0, #0
mov w1, #42
cincw0, w1, ne

The reason why we get the add case is that the pattern csinc2_insn
only accepts registers for the predicate and we so we don't get an cinc
without that. The small change to the predicate of using general_operand
instead allows the combine to match and then the mov is generated with
the register allocator checks the constraints.

Built and tested on aarch64-linux-gnu with no regressions.

PR target/112304

gcc/ChangeLog:

* config/aarch64/aarch64.md (*csinc2_insn): Change the
predicate of the 1st operand to general_operand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cinc-2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md |  2 +-
 gcc/testsuite/gcc.target/aarch64/cinc-2.c | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cinc-2.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a6051ebfc5a..046a249475d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4549,7 +4549,7 @@ (define_insn "aarch64_"
 (define_insn "*csinc2_insn"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (plus:GPI (match_operand 2 "aarch64_comparison_operation" "")
-  (match_operand:GPI 1 "register_operand" "r")))]
+  (match_operand:GPI 1 "general_operand" "r")))]
   ""
   "cinc\\t%0, %1, %m2"
   [(set_attr "type" "csel")]
diff --git a/gcc/testsuite/gcc.target/aarch64/cinc-2.c 
b/gcc/testsuite/gcc.target/aarch64/cinc-2.c
new file mode 100644
index 000..dc68dfed40f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cinc-2.c
@@ -0,0 +1,11 @@
+/* { dg-options "-O2" } */
+/* PR target/112304 */
+
+int f(int a)
+{
+return (a!=0)+42;
+}
+
+/* This function should produce a cinc with a mov instead of a cset with an 
add. */
+/* { dg-final { scan-assembler-not "cset\t" } } */
+/* { dg-final { scan-assembler "cinc\t" } } */
-- 
2.43.0



[PATCH] aarch64: Fix normal returns inside functions which use eh_returns [PR114843]

2024-04-26 Thread Andrew Pinski
The problem here is that on a normal return path, we still restore the
eh data return when we should not.
Instead of one return path in the case of eh_return, this changes over
to use multiple returns pathes just like a normal function.
On the normal path (non-eh return), we need to skip restoring of the eh
return data registers.

This fixes the code generation of _Unwind_RaiseException where the return value
was currupted.

Note this adds some testcases which might fail on some targets.
I know of the following targets will fail also:
arm is recorded as PR 114847.
powerpc is recorded as PR 114846.

Build and tested for aarch64-linux-gnu with no regressions.

PR target/114843

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_expand_epilogue): New
prototype.
* config/aarch64/aarch64.cc (aarch64_pop_regs): Skip over the
eh return data if not eh return. Generate an add if both registers
was skipped and return true. Return false if we generated a pop.
(aarch64_restore_callee_saves): Skip over the eh return data regs
if not eh return.
(aarch64_expand_epilogue): New function, pass false.
(aarch64_expand_epilogue): Add was_eh_return argument.
Update calls to aarch64_restore_callee_saves and aarch64_pop_regs.
If aarch64_pop_regs return true, make sure we add the CFA note.
For eh_returns, update the sp and do an indirect jump.
Don't check EH_RETURN_TAKEN_RTX any more.
* config/aarch64/aarch64.h (EH_RETURN_TAKEN_RTX): Delete.
* config/aarch64/aarch64.md (eh_return): New define_expand.
(eh_return_internal): New pattern for eh_returns.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/eh_return-3.c: Update testcase.
* gcc.c-torture/execute/eh_return-1.c: New test.
* gcc.c-torture/execute/eh_return-2.c: New test.
* gcc.c-torture/execute/eh_return-3.c: New test.
* gcc.c-torture/execute/eh_return-4.c: New test.
* gcc.target/aarch64/eh_return-4.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 gcc/config/aarch64/aarch64.cc | 69 ---
 gcc/config/aarch64/aarch64.h  |  4 +-
 gcc/config/aarch64/aarch64.md | 24 +++
 .../gcc.c-torture/execute/eh_return-1.c   | 20 ++
 .../gcc.c-torture/execute/eh_return-2.c   | 23 +++
 .../gcc.c-torture/execute/eh_return-3.c   | 25 +++
 .../gcc.c-torture/execute/eh_return-4.c   | 25 +++
 .../gcc.target/aarch64/eh_return-3.c  | 11 ++-
 .../gcc.target/aarch64/eh_return-4.c  | 30 
 10 files changed, 198 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-2.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-3.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/eh_return-4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-4.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 42639e9efcf..efe86d52873 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -904,6 +904,7 @@ const char * aarch64_gen_far_branch (rtx *, int, const char 
*, const char *);
 const char * aarch64_output_probe_stack_range (rtx, rtx);
 const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode);
+void aarch64_expand_epilogue (rtx_call_insn *, bool);
 void aarch64_expand_epilogue (rtx_call_insn *);
 rtx aarch64_ptrue_all (unsigned int);
 opt_machine_mode aarch64_ptrue_all_mode (rtx);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 1beec94629d..99dc6f7f1c0 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8270,10 +8270,28 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, 
rtx reg, rtx reg2,
afterwards by ADJUSTMENT and writing the appropriate REG_CFA_RESTORE notes
into CFI_OPS.  */
 
-static void
+static bool
 aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment,
- rtx *cfi_ops)
+ rtx *cfi_ops, bool was_eh_return)
 {
+  /* Skip over eh return data if not eh_return. */
+  if (!was_eh_return
+  && EH_RETURN_DATA_REGNO (regno1) != INVALID_REGNUM
+  && (!df_regs_ever_live_p (regno1)
+ || crtl->abi->clobbers_full_reg_p (regno1)))
+{
+  if (regno2 == INVALID_REGNUM
+ || (EH_RETURN_DATA_REGNO (regno2) != INVALID_REGNUM
+ && (!df_regs_ever_live_p (regno2)
+ || crtl->abi->clobbers_full_reg_p (regno2
+   {
+ rtx mem = plus_constant (Pmode, stack_pointer_rtx, adjustment);
+ emit_move_insn (stack_pointer_rtx, mem);
+ return tr

[PATCH] Fix link on gcc-13/changes.html

2024-04-17 Thread Andrew Pinski
Just fixes the link to the manual for the new -nostdlib++ option.

Signed-off-by: Andrew Pinski 
---
 htdocs/gcc-13/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 6930bd58..4384c329 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -369,7 +369,7 @@ You may also want to check out our
   The https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C_002b_002b-Dialect-Options.html#index-Wpessimizing-move;>-Wpessimizing-move
 and https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C_002b_002b-Dialect-Options.html#index-Wredundant-move;>-Wredundant-move
 warnings have been extended to warn in more contexts.
-  The https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Link_Options.html#index-nostdlib_002b_002b;>-nostdlib++
+  The https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Link-Options.html#index-nostdlib_002b_002b;>-nostdlib++
 option has been added, to enable linking with g++
 without implicitly linking in the C++ standard library.
 
-- 
2.43.0



Re: [PATCH 1/2] Driver: Add new -truncate option

2024-04-17 Thread Andrew Pinski
On Wed, Apr 17, 2024 at 5:57 PM Peter Damianov  wrote:
>
> This commit adds a new option to the driver that truncates one file after
> linking.
>
> Tested likeso:
>
> $ gcc hello.c -c
> $ du -h hello.o
> 4.0K  hello.o
> $ gcc hello.o -truncate hello
> $ ./a.out
> Hello world
> $ du -h hello.o
> $ 0   hello.o
>
> $ gcc hello.o -truncate
> gcc: error: missing filename after '-truncate'
>
> The motivation for adding this is PR110710. It is used by lto-wrapper to
> truncate files in a shell-independent manner.

I wonder if we should document this option or not. On one hand it is
only supposed to be used by lto but on the other hand, someone could
use it on accident from the command line and we would get a bug report
saying the file passed to it is now 0.

Thanks,
Andrew Pinski

>
> Signed-off-by: Peter Damianov 
> ---
>  gcc/common.opt |  5 +
>  gcc/gcc.cc | 13 +
>  2 files changed, 18 insertions(+)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index ad348844775..3ede2fa8552 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -422,6 +422,11 @@ Display target specific command line options (including 
> assembler and linker opt
>  -time
>  Driver Alias(time)
>
> +;; Truncate the file specified after linking.
> +;; This option is used by lto-wrapper to reduce the peak disk when linking 
> with
> +;; many .LTRANS units.
> +Driver Separate Undocumented MissingArgError(missing filename after %qs)
> +
>  -verbose
>  Driver Alias(v)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 728332b8153..00017964295 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -2138,6 +2138,10 @@ static int have_E = 0;
>  /* Pointer to output file name passed in with -o. */
>  static const char *output_file = 0;
>
> +/* Pointer to input file name passed in with -truncate.
> +   This file should be truncated after linking. */
> +static const char *totruncate_file = 0;
> +
>  /* This is the list of suffixes and codes (%g/%u/%U/%j) and the associated
> temp file.  If the HOST_BIT_BUCKET is used for %j, no entry is made for
> it here.  */
> @@ -4607,6 +4611,10 @@ driver_handle_option (struct gcc_options *opts,
>save_switch ("-o", 1, , validated, true);
>return true;
>
> +case OPT_truncate:
> +  totruncate_file = arg;
> +  break;
> +
>  case OPT_pie:
>  #ifdef ENABLE_DEFAULT_PIE
>/* -pie is turned on by default.  */
> @@ -9273,6 +9281,11 @@ driver::maybe_run_linker (const char *argv0) const
>option).  */
> error ("%s: linker input file not found: %m", outfiles[i]);
> }
> +
> +  if (totruncate_file != NULL && linker_was_run && !seen_error ())
> +/* Truncate file specified by -truncate.
> +   Used by lto-wrapper to reduce temporary disk-space usage. */
> +truncate(totruncate_file, 0);
>  }
>
>  /* The end of "main".  */
> --
> 2.39.2
>


Request for testing on non-Linux targets; remove special casing of /usr/lib and /lib from the driver

2024-04-16 Thread Andrew Pinski (QUIC)
Hi all,
  The driver currently will remove "/lib" and "/usr/lib" from the library path 
that gets passed to the linker because it considers them as paths that the 
linker will already known to search. But this is not true for newer linkers, 
mold and lld for an example don't have a default search path.
This patch removes the special casing to fix FreeBSD building where lld is used 
by default and also fix riscv-linux-gnu when used in combination with mold. 
I have tested it on x86_64-linux-gnu and it works there but since the code in 
the driver has been around since 1992, I request some folks to test it on AIX, 
Mac OS (Darwin) and solaris where the ld is not GNU bfd ld as I don't have 
access to those targets currently.

Thanks,
Andrew Pinski


0001-Don-t-remove-usr-lib-and-lib-from-when-passing-to-th.patch
Description: 0001-Don-t-remove-usr-lib-and-lib-from-when-passing-to-th.patch


Re: [PATCH] build: Check for cargo when building rust language

2024-04-16 Thread Andrew Pinski
On Mon, Apr 8, 2024 at 9:39 AM  wrote:
>
> From: Pierre-Emmanuel Patry 
>
> Hello,
>
> The rust frontend requires cargo to build some of it's components,
> it's presence was not checked during configuration.

WHY did this go in right before the release of GCC 14?
I don't get why this is considered temporary and it goes in right
before a release.
That seems broken to me.

Thanks,
Andrew

>
> Best regards,
> Pierre-Emmanuel
>
> --
>
> Prevent rust language from building when cargo is
> missing.
>
> config/ChangeLog:
>
> * acx.m4: Add a macro to check for rust
> components.
>
> ChangeLog:
>
> * configure: Regenerate.
> * configure.ac: Emit an error message when cargo
> is missing.
>
> Signed-off-by: Pierre-Emmanuel Patry 
> ---
>  config/acx.m4 |  11 +
>  configure | 117 ++
>  configure.ac  |  18 
>  3 files changed, 146 insertions(+)
>
> diff --git a/config/acx.m4 b/config/acx.m4
> index 7efe98aaf96..3c5fe67342e 100644
> --- a/config/acx.m4
> +++ b/config/acx.m4
> @@ -424,6 +424,17 @@ else
>  fi
>  ])
>
> +# Test for Rust
> +# We require cargo and rustc for some parts of the rust compiler.
> +AC_DEFUN([ACX_PROG_CARGO],
> +[AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> +AC_CHECK_TOOL(CARGO, cargo, no)
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> +else
> +  have_cargo=no
> +fi])
> +
>  # Test for D.
>  AC_DEFUN([ACX_PROG_GDC],
>  [AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> diff --git a/configure b/configure
> index 874966fb9f0..46e66e20197 100755
> --- a/configure
> +++ b/configure
> @@ -714,6 +714,7 @@ PGO_BUILD_GEN_CFLAGS
>  HAVE_CXX11_FOR_BUILD
>  HAVE_CXX11
>  do_compare
> +CARGO
>  GDC
>  GNATMAKE
>  GNATBIND
> @@ -5786,6 +5787,104 @@ else
>have_gdc=no
>  fi
>
> +
> +if test -n "$ac_tool_prefix"; then
> +  # Extract the first word of "${ac_tool_prefix}cargo", so it can be a 
> program name with args.
> +set dummy ${ac_tool_prefix}cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$CARGO"; then
> +  ac_cv_prog_CARGO="$CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_CARGO="${ac_tool_prefix}cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +CARGO=$ac_cv_prog_CARGO
> +if test -n "$CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CARGO" >&5
> +$as_echo "$CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +
> +fi
> +if test -z "$ac_cv_prog_CARGO"; then
> +  ac_ct_CARGO=$CARGO
> +  # Extract the first word of "cargo", so it can be a program name with args.
> +set dummy cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_ac_ct_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$ac_ct_CARGO"; then
> +  ac_cv_prog_ac_ct_CARGO="$ac_ct_CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_ac_ct_CARGO="cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +ac_ct_CARGO=$ac_cv_prog_ac_ct_CARGO
> +if test -n "$ac_ct_CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CARGO" >&5
> +$as_echo "$ac_ct_CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +  if test "x$ac_ct_CARGO" = x; then
> +CARGO="no"
> +  else
> +case $cross_compiling:$ac_tool_warned in
> +yes:)
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not 
> prefixed with host triplet" >&5
> +$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" 
> >&2;}
> +ac_tool_warned=yes ;;
> +esac
> +CARGO=$ac_ct_CARGO
> +  fi
> +else
> +  CARGO="$ac_cv_prog_CARGO"
> +fi
> +
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> +else
> +  have_cargo=no
> +fi
>  { $as_echo "$as_me:${as_lineno-$LINENO}: checking how to compare 
> bootstrapped objects" >&5
>  $as_echo_n "checking how to compare bootstrapped objects... " >&6; }
>  if ${gcc_cv_prog_cmp_skip+:} 

[PATCH] Document that vector_size works with typedefs [PR92880]

2024-04-15 Thread Andrew Pinski
This just adds a clause to make it more obvious that the vector_size
attribute extension works with typedefs.
Note this whole section needs a rewrite to be a similar format as other
extensions. But that is for another day.

OK?


gcc/ChangeLog:

PR c/92880
* doc/extend.texi (Using Vector Instructions): Add that
the base_types could be a typedef of them.

Signed-off-by: Andrew Pinski 
---
 gcc/doc/extend.texi | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7b54a241a7b..e290265d68d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12901,12 +12901,13 @@ typedef int v4si __attribute__ ((vector_size (16)));
 @end smallexample
 
 @noindent
-The @code{int} type specifies the @dfn{base type}, while the attribute 
specifies
-the vector size for the variable, measured in bytes.  For example, the
-declaration above causes the compiler to set the mode for the @code{v4si}
-type to be 16 bytes wide and divided into @code{int} sized units.  For
-a 32-bit @code{int} this means a vector of 4 units of 4 bytes, and the
-corresponding mode of @code{foo} is @acronym{V4SI}.
+The @code{int} type specifies the @dfn{base type} (which can be a
+@code{typedef}), while the attribute specifies the vector size for the
+variable, measured in bytes. For example, the declaration above causes
+the compiler to set the mode for the @code{v4si} type to be 16 bytes wide
+and divided into @code{int} sized units.  For a 32-bit @code{int} this
+means a vector of 4 units of 4 bytes, and the corresponding mode of
+@code{foo} is @acronym{V4SI}.
 
 The @code{vector_size} attribute is only applicable to integral and
 floating scalars, although arrays, pointers, and function return values
-- 
2.43.0



[PATCH v2] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]

2024-04-11 Thread Andrew Pinski
The problem is `!a?b:c` pattern will create a COND_EXPR with an 1bit signed 
integer
which breaks patterns like `a?~t:t`. This rejects when we have a signed operand 
for
both patterns.

Note for GCC 15, I am going to look at the canonicalization of `a?~t:t` where t
was a constant since I think keeping it a COND_EXPR might be more canonical and
is what VPR produces from the same IR; if anything expand should handle which 
one
is better.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/114666

gcc/ChangeLog:

* match.pd (`!a?b:c`): Reject signed types for the condition.
(`a?~t:t`): Likewise.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/bitfld-signed1-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd|  6 +-
 .../gcc.c-torture/execute/bitfld-signed1-1.c| 13 +
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 15a1e7350d4..d401e7503e6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5895,7 +5895,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* !A ? B : C -> A ? C : B.  */
  (simplify
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
-  (cnd @0 @2 @1)))
+  /* For CONDs, don't handle signed values here. */
+  (if (cnd == VEC_COND_EXPR
+   || TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (cnd @0 @2 @1
 
 /* abs/negative simplifications moved from fold_cond_expr_with_comparison.
 
@@ -7095,6 +7098,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (cond @0 @1 @2)
  (with { bool wascmp; }
   (if (INTEGRAL_TYPE_P (type)
+   && TYPE_UNSIGNED (TREE_TYPE (@0))
&& bitwise_inverted_equal_p (@1, @2, wascmp)
&& (!wascmp || TYPE_PRECISION (type) == 1))
(if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
diff --git a/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c 
b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
new file mode 100644
index 000..b0ff120ea51
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
@@ -0,0 +1,13 @@
+/* PR tree-optimization/114666 */
+/* We used to miscompile this to be always aborting
+   due to the use of the signed 1bit into the COND_EXPR. */
+
+struct {
+  signed a : 1;
+} b = {-1};
+char c;
+int main()
+{
+  if ((b.a ^ 1UL) < 3)
+__builtin_abort();
+}
-- 
2.43.0



RE: [PATCH] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]

2024-04-11 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, April 11, 2024 2:31 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 
> bit
> types [PR114666]
> 
> On Thu, Apr 11, 2024 at 10:43 AM Andrew Pinski
>  wrote:
> >
> > The issue here is that the `a?~t:t` pattern assumed (maybe correctly)
> > that a here was always going to be a unsigned boolean type. This fixes
> > the problem in both patterns to cast the operand to boolean type first.
> >
> > I should note that VRP seems to be keep on wanting to produce `a ==
> > 0?1:-2` from `((int)a) ^ 1` is a bit odd and partly is the cause of
> > the issue and there seems to be some disconnect on what should be the
> > canonical form. That will be something to look at for GCC 15.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > PR tree-optimization/114666
> >
> > gcc/ChangeLog:
> >
> > * match.pd (`!a?b:c`): Cast `a` to boolean type for cond for
> > gimple.
> > (`a?~t:t`): Cast `a` to boolean type before casting it
> >     to the type.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/execute/bitfld-signed1-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/match.pd| 10 +++---
> >  .../gcc.c-torture/execute/bitfld-signed1-1.c| 13 +
> >  2 files changed, 20 insertions(+), 3 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> > 15a1e7350d4..ffc928b656a 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -5895,7 +5895,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   /* !A ? B : C -> A ? C : B.  */
> >   (simplify
> >(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
> > -  (cnd @0 @2 @1)))
> > +  /* For gimple, make sure the operand to COND is a boolean type,
> > + truth_valued_p will match 1bit integers too. */  (if (GIMPLE &&
> > + cnd == COND_EXPR)
> > +   (cnd (convert:boolean_type_node @0) @2 @1)
> > +   (cnd @0 @2 @1
> 
> This looks "wrong" for GENERIC still?

I tired without the GIMPLE check and ran into the testcase 
gcc.dg/torture/builtins-isinf-sign-1.c failing. Because the extra convert was 
blocking seeing both sides of an equal was the same (I didn't look into it 
further than that). So I decided to limit it to GIMPLE only.

> But this is not really part of the fix but deciding we should not have
> signed:1 as
> cond operand?  I'll note that truth_valued_p allows signed:1.
> 
> Maybe as minimal surgery add a TYPE_UNSIGNED (TREE_TPE (@0)) check here
> instead?

That might work, let me try.

> 
> >  /* abs/negative simplifications moved from
> fold_cond_expr_with_comparison.
> >
> > @@ -7099,8 +7103,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > && (!wascmp || TYPE_PRECISION (type) == 1))
> > (if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
> > || TYPE_PRECISION (type) == 1)
> > -(bit_xor (convert:type @0) @2)
> > -(bit_xor (negate (convert:type @0)) @2)
> > +(bit_xor (convert:type (convert:boolean_type_node @0)) @2)
> > +(bit_xor (negate (convert:type (convert:boolean_type_node @0)))
> > + @2)
> >  #endif
> 
> This looks OK, but then testing TYPE_UNSIGNED (TREE_TYPE (@0)) might be
> better?
> 

Let me do that just like the other pattern.

> Does this all just go downhill from what VRP creates?  That is, would IL
> checking have had a chance detecting it if we say signed:1 are not valid as
> condition?

Yes. So what VRP produces in the testcase is:
`_2 == 0 ? 1 : -2u` (where _2 is the signed 1bit integer).
Now maybe the COND_EXPR should be the canonical form for constants (but that is 
for a different patch I think, I added it to the list of things I should look 
into for GCC 15).

> 
> That said, the latter pattern definitely needs guarding/adjustment, I'm not
> sure the former is wrong?  Semantically [VEC_]COND_EXPR is op0 != 0 ? ... : 
> ...

I forgot to mention that to fix the bug only one of the 2 hunks are needed.

> 
> Richard.
> 
> >  /* Simplify pointer equality compares using PTA.  */ diff --git
> > a/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> > b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> > new file mode 100644
> > index 000..b0ff120ea51
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
> > @@ -0,0 +1,13 @@
> > +/* PR tree-optimization/114666 */
> > +/* We used to miscompile this to be always aborting
> > +   due to the use of the signed 1bit into the COND_EXPR. */
> > +
> > +struct {
> > +  signed a : 1;
> > +} b = {-1};
> > +char c;
> > +int main()
> > +{
> > +  if ((b.a ^ 1UL) < 3)
> > +__builtin_abort();
> > +}
> > --
> > 2.43.0
> >


[PATCH] match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]

2024-04-11 Thread Andrew Pinski
The issue here is that the `a?~t:t` pattern assumed (maybe correctly) that a
here was always going to be a unsigned boolean type. This fixes the problem
in both patterns to cast the operand to boolean type first.

I should note that VRP seems to be keep on wanting to produce `a == 0?1:-2`
from `((int)a) ^ 1` is a bit odd and partly is the cause of the issue and there
seems to be some disconnect on what should be the canonical form. That will be
something to look at for GCC 15.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/114666

gcc/ChangeLog:

* match.pd (`!a?b:c`): Cast `a` to boolean type for cond for
gimple.
(`a?~t:t`): Cast `a` to boolean type before casting it
to the type.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/bitfld-signed1-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd| 10 +++---
 .../gcc.c-torture/execute/bitfld-signed1-1.c| 13 +
 2 files changed, 20 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 15a1e7350d4..ffc928b656a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5895,7 +5895,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* !A ? B : C -> A ? C : B.  */
  (simplify
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
-  (cnd @0 @2 @1)))
+  /* For gimple, make sure the operand to COND is a boolean type,
+ truth_valued_p will match 1bit integers too. */
+  (if (GIMPLE && cnd == COND_EXPR)
+   (cnd (convert:boolean_type_node @0) @2 @1)
+   (cnd @0 @2 @1
 
 /* abs/negative simplifications moved from fold_cond_expr_with_comparison.
 
@@ -7099,8 +7103,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (!wascmp || TYPE_PRECISION (type) == 1))
(if ((!TYPE_UNSIGNED (type) && TREE_CODE (type) == BOOLEAN_TYPE)
|| TYPE_PRECISION (type) == 1)
-(bit_xor (convert:type @0) @2)
-(bit_xor (negate (convert:type @0)) @2)
+(bit_xor (convert:type (convert:boolean_type_node @0)) @2)
+(bit_xor (negate (convert:type (convert:boolean_type_node @0))) @2)
 #endif
 
 /* Simplify pointer equality compares using PTA.  */
diff --git a/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c 
b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
new file mode 100644
index 000..b0ff120ea51
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/bitfld-signed1-1.c
@@ -0,0 +1,13 @@
+/* PR tree-optimization/114666 */
+/* We used to miscompile this to be always aborting
+   due to the use of the signed 1bit into the COND_EXPR. */
+
+struct {
+  signed a : 1;
+} b = {-1};
+char c;
+int main()
+{
+  if ((b.a ^ 1UL) < 3)
+__builtin_abort();
+}
-- 
2.43.0



Re: [PATCH] libgfortran: Disable gthreads weak symbols for glibc 2.34

2024-04-09 Thread Andrew Pinski
On Tue, Apr 9, 2024, 10:07 H.J. Lu  wrote:

> Since Glibc 2.34 all pthreads symbols are defined directly in libc not
> libpthread, and since Glibc 2.32 we have used __libc_single_threaded to
> avoid unnecessary locking in single-threaded programs. This means there
> is no reason to avoid linking to libpthread now, and so no reason to use
> weak symbols defined in gthr-posix.h for all the pthread_xxx functions.
>


First you forgot to cc fortran@. Second the issue is in gthrd-posix.h which
should be fixed instead of libgfortran since the issue will also be seen
with libobjc, and the other users of gthrd.

Note the fix for libstdc++ was also done in the wrong location too and
should have done once and for all in gthrd-posix.h.


Thanks,
Andrew


> Also add prune_warnings to libgomp.exp to prune glibc static link warning:
>
> .*: warning: Using 'dlopen' in statically linked applications requires at
> runtime the shared libraries from the glibc version us ed for linking
>
> libgfortran/
>
> PR libgfortran/114646
> * acinclude.m4: Define GTHREAD_USE_WEAK 0 for glibc 2.34 or
> above on Linux.
> * configure: Regenerated.
>
> libgomp/
>
> PR libgfortran/114646
> * testsuite/lib/libgomp.exp (prune_warnings): New.
> * testsuite/libgomp.fortran/pr114646-1.f90: New test.
> * testsuite/libgomp.fortran/pr114646-2.f90: Likewise.
> ---
>  libgfortran/acinclude.m4  | 14 +
>  libgfortran/configure | 29 +++
>  libgomp/testsuite/lib/libgomp.exp | 14 +
>  .../testsuite/libgomp.fortran/pr114646-1.f90  | 11 +++
>  .../testsuite/libgomp.fortran/pr114646-2.f90  | 22 ++
>  5 files changed, 90 insertions(+)
>  create mode 100644 libgomp/testsuite/libgomp.fortran/pr114646-1.f90
>  create mode 100644 libgomp/testsuite/libgomp.fortran/pr114646-2.f90
>
> diff --git a/libgfortran/acinclude.m4 b/libgfortran/acinclude.m4
> index a73207e5465..f4642494c4f 100644
> --- a/libgfortran/acinclude.m4
> +++ b/libgfortran/acinclude.m4
> @@ -92,6 +92,20 @@ void foo (void);
>AC_DEFINE(GTHREAD_USE_WEAK, 0,
> [Define to 0 if the target shouldn't use #pragma weak])
>;;
> +*-*-linux*)
> +  AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
> +#include 
> +#if !__GLIBC_PREREQ(2, 34)
> +#error glibc version is too old
> +#endif
> +]], [[]])],
> +   libgfor_cv_use_pragma_weak=no,
> +   libgfor_cv_use_pragma_weak=yes)
> +  if test $libgfor_cv_use_pragma_weak = no; then
> +AC_DEFINE(GTHREAD_USE_WEAK, 0,
> + [Define to 0 if the target shouldn't use #pragma weak])
> +  fi
> +  ;;
>esac])
>
>  dnl Check whether target effectively supports weakref
> diff --git a/libgfortran/configure b/libgfortran/configure
> index 774dd52fc95..1f477256b75 100755
> --- a/libgfortran/configure
> +++ b/libgfortran/configure
> @@ -31057,6 +31057,35 @@ $as_echo "#define SUPPORTS_WEAK 1" >>confdefs.h
>
>  $as_echo "#define GTHREAD_USE_WEAK 0" >>confdefs.h
>
> +  ;;
> +*-*-linux*)
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +#include 
> +#if !__GLIBC_PREREQ(2, 34)
> +#error glibc version is too old
> +#endif
> +
> +int
> +main ()
> +{
> +
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +  libgfor_cv_use_pragma_weak=no
> +else
> +  libgfor_cv_use_pragma_weak=yes
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +  if test $libgfor_cv_use_pragma_weak = no; then
> +
> +$as_echo "#define GTHREAD_USE_WEAK 0" >>confdefs.h
> +
> +  fi
>;;
>esac
>
> diff --git a/libgomp/testsuite/lib/libgomp.exp
> b/libgomp/testsuite/lib/libgomp.exp
> index cab926a798b..9cfa6d7b31d 100644
> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp
> @@ -54,6 +54,20 @@ set dg-do-what-default run
>
>  set libgomp_compile_options ""
>
> +# Prune messages that aren't useful.
> +
> +proc prune_warnings { text } {
> +
> +verbose "prune_warnings: entry: $text" 2
> +
> +# Ignore warning from -static: warning: Using 'dlopen' in statically
> linked applications requires at runtime the shared libraries from the glibc
> version used for linking
> +regsub -all "(^|\n)\[^\n\]*: warning: Using 'dlopen' in statically
> linked\[^\n\]*" $text "" text
> +
> +verbose "prune_warnings: exit: $text" 2
> +
> +return $text
> +}
> +
>  #
>  # libgomp_init
>  #
> diff --git a/libgomp/testsuite/libgomp.fortran/pr114646-1.f90
> b/libgomp/testsuite/libgomp.fortran/pr114646-1.f90
> new file mode 100644
> index 000..a48e6103343
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/pr114646-1.f90
> @@ -0,0 +1,11 @@
> +! PR libgfortran/114646
> +! { dg-do run }
> +! { dg-additional-options "-static" }
> +
> +!$OMP PARALLEL
> +!$OMP CRITICAL
> + write(6,*) "Hello world"
> +!$OMP END CRITICAL
> 

Re: [PATCH] build: Check for cargo when building rust language

2024-04-09 Thread Andrew Pinski
On Mon, Apr 8, 2024 at 9:39 AM  wrote:
>
> From: Pierre-Emmanuel Patry 
>
> Hello,
>
> The rust frontend requires cargo to build some of it's components,
> it's presence was not checked during configuration.

NOTE cargo itself is a huge security hole. If anything we should place
all of the required dependencies with the specific versions that has
been tested on gcc.gnu.org (with md5 sums) and download that instead
of depending on some random downloads via cargo. Talk about broken
supply chain when things are downloading things randomly off the
internet.

If there is a way to cache and use those specific versions using
cargo, that should be done but I suspect cargo does not work that way.
Also any time someone says this is a temporary measure it is NOT and
we should never treat it as such unless you already have a patch to
remove it.


Thanks,
Andrew Pinski


>
> Best regards,
> Pierre-Emmanuel
>
> --
>
> Prevent rust language from building when cargo is
> missing.
>
> config/ChangeLog:
>
> * acx.m4: Add a macro to check for rust
> components.
>
> ChangeLog:
>
> * configure: Regenerate.
> * configure.ac: Emit an error message when cargo
> is missing.
>
> Signed-off-by: Pierre-Emmanuel Patry 
> ---
>  config/acx.m4 |  11 +
>  configure | 117 ++
>  configure.ac  |  18 
>  3 files changed, 146 insertions(+)
>
> diff --git a/config/acx.m4 b/config/acx.m4
> index 7efe98aaf96..3c5fe67342e 100644
> --- a/config/acx.m4
> +++ b/config/acx.m4
> @@ -424,6 +424,17 @@ else
>  fi
>  ])
>
> +# Test for Rust
> +# We require cargo and rustc for some parts of the rust compiler.
> +AC_DEFUN([ACX_PROG_CARGO],
> +[AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> +AC_CHECK_TOOL(CARGO, cargo, no)
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> +else
> +  have_cargo=no
> +fi])
> +
>  # Test for D.
>  AC_DEFUN([ACX_PROG_GDC],
>  [AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> diff --git a/configure b/configure
> index 874966fb9f0..46e66e20197 100755
> --- a/configure
> +++ b/configure
> @@ -714,6 +714,7 @@ PGO_BUILD_GEN_CFLAGS
>  HAVE_CXX11_FOR_BUILD
>  HAVE_CXX11
>  do_compare
> +CARGO
>  GDC
>  GNATMAKE
>  GNATBIND
> @@ -5786,6 +5787,104 @@ else
>have_gdc=no
>  fi
>
> +
> +if test -n "$ac_tool_prefix"; then
> +  # Extract the first word of "${ac_tool_prefix}cargo", so it can be a 
> program name with args.
> +set dummy ${ac_tool_prefix}cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$CARGO"; then
> +  ac_cv_prog_CARGO="$CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_CARGO="${ac_tool_prefix}cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +CARGO=$ac_cv_prog_CARGO
> +if test -n "$CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CARGO" >&5
> +$as_echo "$CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +
> +fi
> +if test -z "$ac_cv_prog_CARGO"; then
> +  ac_ct_CARGO=$CARGO
> +  # Extract the first word of "cargo", so it can be a program name with args.
> +set dummy cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_ac_ct_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$ac_ct_CARGO"; then
> +  ac_cv_prog_ac_ct_CARGO="$ac_ct_CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_ex

RE: [PATCH] Another ICE after conflicting types of redeclaration [PR110682]

2024-04-08 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Friday, March 22, 2024 10:50 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) 
> Subject: [PATCH] Another ICE after conflicting types of redeclaration
> [PR110682]
> 
> This another one of these ICE after error issues with the gimplifier and a 
> fallout
> from r12-3278-g823685221de986af.
> The problem here is that STRIP_USELESS_TYPE_CONVERSION will leave
> around a NON_LVALUE_EXPR which is an error mark node.
> Since the gimplifier assumes non-lvalue expressions has been removed, there
> was an ICE.
> 
> This fixes the issue by checking if there is a NON_LVALUE_EXPR and that has an
> error operand, we handle it as the same as if it was an error operand.


Ping?

Thanks,
Andrew


> 
> gcc/ChangeLog:
> 
>   PR c/110682
>   * gimplify.cc (gimplify_expr): Add check if there is
>   a non-lvalue with an error operand.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/110682
>   * gcc.dg/redecl-27.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimplify.cc  |  6 +-
>  gcc/testsuite/gcc.dg/redecl-27.c | 14 ++
>  2 files changed, 19 insertions(+), 1 deletion(-)  create mode 100644
> gcc/testsuite/gcc.dg/redecl-27.c
> 
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index d64bbf3ffbd..001b4af68b9
> 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -17686,7 +17686,11 @@ gimplify_expr (tree *expr_p, gimple_seq
> *pre_p, gimple_seq *post_p,
>save_expr = *expr_p;
> 
>/* Die, die, die, my darling.  */
> -  if (error_operand_p (save_expr))
> +  if (error_operand_p (save_expr)
> +   /* The above strip useless type conversion might not strip out
> +  a conversion from an error so handle that case here.  */
> +   || (TREE_CODE (save_expr) == NON_LVALUE_EXPR
> +   && error_operand_p (TREE_OPERAND (save_expr, 0
>   {
> ret = GS_ERROR;
> break;
> diff --git a/gcc/testsuite/gcc.dg/redecl-27.c b/gcc/testsuite/gcc.dg/redecl-
> 27.c
> new file mode 100644
> index 000..93f577e64ff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/redecl-27.c
> @@ -0,0 +1,14 @@
> +/* We used to ICE while gimplifying the body of f
> +   due to a NON_LVALUE_EXPR still being there.
> +   PR c/110682*/
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +struct a {
> +  const signed char b;
> +};
> +
> +void f(volatile struct a *c) { /* { dg-note "" } */
> +  c - 0 % c->b;
> +  struct a c = {1}; /* { dg-error "redeclared as different kind of
> +symbol" } */ }
> --
> 2.43.0



Re: [PATCH/RFC] On the use of -funreachable-traps to deal with PR 109627

2024-04-08 Thread Andrew Pinski
On Mon, Apr 8, 2024 at 4:04 PM Iain Sandoe  wrote:
>
> Hi
>
> PR 109627 is about functions that have had their bodies completely elided, 
> but still have the wrappers for EH frames (either .cfi_xxx or LFSxx/LFExx).

I was thinking about how to fix this once and for all. The easiest
method I could think of was if __builtin_unreachable is the only thing
in the CFG expand it as __builtin_trap.
And then it should just work.

It should not to hard to add that check in expand_gimple_basic_block
and handle it that way.

What do you think of that? I can code this up for GCC 15 if you want.

Thanks,
Andrew Pinski

>
> These are causing issues for some linkers because such functions result in 
> FDEs with a 0 code extent.
>
> The simplest representation of this is (from PR109527)
>
> void foo () { __builtin_unreachable (); }
>
> The solution (so far) is to detect this case during final lowering and 
> replace the unreachable (which is expanded to nothing, at least for the 
> targets I’ve dealt with) by a trap; this results in two positive improvements 
> (1) the FDE is now finite-sized so the linker consumes it and (2) actually 
> the trap is considerably more user-friendly UB than falling through to some 
> other arbitrary place.
>
> I was looking into using -funreachable-traps to do this for aarch64 Darwin - 
> because the ad-hoc solutions that were applied to X86 and PPC are not easily 
> usable for aarch64.
>
> -funreachabe-traps was added for similar reasons (helping make missing 
> returns less unexpected) in r13-1204-gd68d3664253696 by Jason (and then there 
> have been further improvements resulting in the use of __builtin_unreachable 
> trap () from Jakub)
>
> As I read the commit message for r13-1204, I would expect -funreachable-traps 
> to work for the simple case above, but it does not.  I think that is because 
> the incremental patch below is needed.  however, I am not sure if there was 
> some reason this was not done at the time?
>
> PR 109627 is currently a show-stopper for the aarch64-darwin branch since 
> libgomp and libgm2 fail to bootstrap - and other workarounds (e.g. 
> -D__builtin_unreachable=__builtin_trap) do not work got m2 (since it does not 
> use the C preprocessor by default).
>
> Setting -funreachable-traps either per affected file, or globally for a 
> target resolves the issue in a neater manner.
>
> Any guidance / comments would be most welcome - if the direction seems sane, 
> I can repost this patch formally.
>
> (I have tested quite widely on Darwin and on a small number of Linux cases 
> too)
>
> thanks
> Iain
>
> * I will note that applying this does result in some regressions in several 
> contracts test cases - but they also regress for -fsanitize=undefined 
> -fsanitise-traps (not yet clear if that’s expected or we’ve uncovered a bug 
> in the contracts impl.).
>
> --
>
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index f8d94c4b435..e2d26e45744 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -5931,7 +5931,8 @@ expand_builtin_unreachable (void)
>  {
>/* Use gimple_build_builtin_unreachable or builtin_decl_unreachable
>   to avoid this.  */
> -  gcc_checking_assert (!sanitize_flags_p (SANITIZE_UNREACHABLE));
> +  gcc_checking_assert (!sanitize_flags_p (SANITIZE_UNREACHABLE)
> +  && !flag_unreachable_traps);
>emit_barrier ();
>  }
>
> @@ -10442,7 +10443,7 @@ fold_builtin_0 (location_t loc, tree fndecl)
>
>  case BUILT_IN_UNREACHABLE:
>/* Rewrite any explicit calls to __builtin_unreachable.  */
> -  if (sanitize_flags_p (SANITIZE_UNREACHABLE))
> +  if (sanitize_flags_p (SANITIZE_UNREACHABLE) || flag_unreachable_traps)
> return build_builtin_unreachable (loc);
>break;
>
> 


Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-04-05 Thread Andrew Pinski
On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis  wrote:
>
> If we consider code like:
>
> if (bar1 == x)
>   return foo();
> if (bar2 != y)
>   return foo();
> return 0;
>
> We would like the ifcombine pass to convert this to:
>
> if (bar1 == x || bar2 != y)
>   return foo();
> return 0;
>
> The ifcombine pass can handle this transformation but it is ran very early and
> it misses the opportunity because there are two seperate blocks for foo().
> The pre pass is good at removing duplicate code and blocks and due to that
> running ifcombine again after it can increase the number of successful
> conversions.

I do think we should have something similar to re-running
ssa-ifcombine but I think it should be much later, like after the loop
optimizations are done.
Maybe just a simplified version of it (that does the combining and not
the optimizations part) included in isel or pass_optimize_widening_mul
(which itself should most likely become part of isel or renamed since
it handles more than just widening multiply these days).


Thanks,
Andrew Pinski


>
> PR 102793
>
> gcc/ChangeLog:
>
> * common.opt: -ftree-ifcombine option, enabled by default.
> * doc/invoke.texi: Document.
> * passes.def: Re-run ssa-ifcombine after pre.
> * tree-ssa-ifcombine.cc: Make ifcombine cloneable. Add gate function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/20030922-2.c: Change flag to -fno-tree-ifcombine.
> * gcc.dg/uninit-pred-6_c.c: Remove inconsistent check.
> * gcc.target/aarch64/pr102793.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
>  gcc/common.opt  |  4 +++
>  gcc/doc/invoke.texi |  5 
>  gcc/passes.def  |  1 +
>  gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c  |  2 +-
>  gcc/testsuite/gcc.dg/uninit-pred-6_c.c  |  4 ---
>  gcc/testsuite/gcc.target/aarch64/pr102793.c | 30 +
>  gcc/tree-ssa-ifcombine.cc   |  5 
>  7 files changed, 46 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102793.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index ad348844775..e943202bcf1 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3163,6 +3163,10 @@ ftree-phiprop
>  Common Var(flag_tree_phiprop) Init(1) Optimization
>  Enable hoisting loads from conditional pointers.
>
> +ftree-ifcombine
> +Common Var(flag_tree_ifcombine) Init(1) Optimization
> +Merge some conditional branches to simplify control flow.
> +
>  ftree-pre
>  Common Var(flag_tree_pre) Optimization
>  Enable SSA-PRE optimization on trees.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e2edf7a6c13..8d2ff6b4512 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13454,6 +13454,11 @@ This flag is enabled by default at @option{-O1} and 
> higher.
>  Perform hoisting of loads from conditional pointers on trees.  This
>  pass is enabled by default at @option{-O1} and higher.
>
> +@opindex ftree-ifcombine
> +@item -ftree-ifcombine
> +Merge some conditional branches to simplify control flow.  This pass
> +is enabled by default at @option{-O1} and higher.
> +
>  @opindex fhoist-adjacent-loads
>  @item -fhoist-adjacent-loads
>  Speculatively hoist loads from both branches of an if-then-else if the
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1cbbd413097..1765b476131 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -270,6 +270,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_lim);
>NEXT_PASS (pass_walloca, false);
>NEXT_PASS (pass_pre);
> +  NEXT_PASS (pass_tree_ifcombine);
>NEXT_PASS (pass_sink_code, false /* unsplit edges */);
>NEXT_PASS (pass_sancov);
>NEXT_PASS (pass_asan);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> index 16c79da9521..66c9f481a2f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030922-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O1 -fdump-tree-dom2 -fdisable-tree-ifcombine" } */
> +/* { dg-options "-O1 -fdump-tree-dom2 -fno-tree-ifcombine" } */
>
>  struct rtx_def;
>  typedef struct rtx_def *rtx;
> diff --git a/gcc/testsuite/gcc.dg/uninit-pred-6_c.c 
> b/gcc/testsuite/gcc.dg/uninit-pred-6_c.c
> index f60868dad23..2d8e6501a45 100644
> --- a/gcc/testsuite/gcc.dg/uninit-pred-6_c.c
> +++ b/gcc/testsuite/gcc.dg/uninit-pred-6_c.c
> @@ -20,10 +20,6 @@ int foo (int n, int l, int m,

Re: [PATCH v3] tree-profile: Disable indirect call profiling for IFUNC resolvers

2024-04-03 Thread Andrew Pinski
On Wed, Apr 3, 2024 at 8:32 AM Peter Bergner  wrote:
>
> On 4/3/24 7:40 AM, H.J. Lu wrote:
> > We can't profile indirect calls to IFUNC resolvers nor their callees as
> > it requires TLS which hasn't been set up yet when the dynamic linker is
> > resolving IFUNC symbols.
> >
> > Add an IFUNC resolver caller marker to cgraph_node and set it if the
> > function is called by an IFUNC resolver.  Disable indirect call profiling
> > for IFUNC resolvers and their callees.
>
> The IFUNC resolvers on Power do not use TLS, so isn't this a little too
> conservative?  Should this be triggered via a target hook so architectures
> that don't use TLS in their IFUNC resolvers could still profile them?

I think you misunderstood the patch/situtation. Most ifunc resolves
don't use TLS at all; what is happening here is that the profiler
(-fprofile-generate) is adding TLS usage to the ifunc resolver which
then causes issues. And the use of TLS causes a PLT call to be inside
the ifun which causes all the fun stuff.

This is not about ifunc resolves using TLS directly in code but rather
indirectly via -fprofile-generate.

Thanks,
Andrew Pinski


>
> Peter
>
>


[COMMITTED] Use fatal_error instead of internal_error for when ZSTD is not enabled

2024-03-28 Thread Andrew Pinski
This changes an internal error to be a fatal error for when the ZSTD
is not enabled but the section was compressed as ZSTD.

Committed as approved after bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

* lto-compress.cc (lto_end_uncompression): Use
fatal_error instead of internal_error when ZSTD
is not enabled.

Signed-off-by: Andrew Pinski 
---
 gcc/lto-compress.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lto-compress.cc b/gcc/lto-compress.cc
index c167ac967aa..bebf0277ef6 100644
--- a/gcc/lto-compress.cc
+++ b/gcc/lto-compress.cc
@@ -408,7 +408,7 @@ lto_end_uncompression (struct lto_compression_stream 
*stream,
 }
 #endif
   if (compression == ZSTD)
-internal_error ("compiler does not support ZSTD LTO compression");
+fatal_error (UNKNOWN_LOCATION, "compiler does not support ZSTD LTO 
compression");
 
   lto_uncompression_zlib (stream);
 }
-- 
2.43.0



Re: No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'. [PR106472]

2024-03-28 Thread Andrew Pinski
On Thu, Mar 28, 2024 at 3:15 PM Дилян Палаузов
 wrote:
>
> Hello Ian,
>
> when I add in gcc/go/config-lang.in the line
>   boot_language=yes
>
> then on stage3 x86_64-pc-linux-gnu/libbacktrace is compiled before 
> x86_64-pc-linux-gnu/libgo and this error is gone.
>
> But then Makefile.def has
>   target_modules = { module= libatomic; bootstrap=true; lib_path=.libs; };
>
> and in x86_64-pc-linux-gnu libatomic is not compiled before 
> x86_64-pc-linux-gnu/libgo .  Linking the latter fails
>
> make[2]: Entering directory '/git/gcc/build/x86_64-pc-linux-gnu/libgo'
> /bin/sh ./libtool --tag=CC --mode=link /git/gcc/build/./gcc/xgcc 
> -B/git/gcc/build/./gcc/ -B/usr/local/x86_64-pc-linux-gnu/bin/ …long text… 
> golang.org/x/sys/cpu_gccgo_x86.lo ../libbacktrace/libbacktrace.la 
> ../libatomic/libatomic_convenience.la ../libffi/libffi_convenience.la 
> -lpthread -lm
> ./libtool: line 5195: cd: ../libatomic/.libs: No such file or directory
> libtool: link: cannot determine absolute directory name of 
> `../libatomic/.libs'
>
> So either lib_path=.libs interferes (when gcc/go/config-lang.in contains 
> “boot_language=yes”), I have made the semi-serial build, trying to save a lot 
> of time waiting to get on stage3, somehow wrong, or libatomic must be 
> mentioned in gcc/go/config-lang.in . I have the feeling that ./configure 
> --enable-langugage=all works, because gcc/d/config-lang.in contains 
> boot_language=yes, and then in some way libphobos or d depend on libatomic.
>
> That said bootstrap=true might only be relevant when boot_langugages=yes is 
> present.
>
> In addition gcc/go/config-lang.in:boot_language=yes implies that on stage2 
> (thus in prev-x86_64-pc-linux-gnu/) libbacktrace is built, which I do not 
> want this, as libbacktrace is needed only by libgo on stage3.
>
> Can someone explain, why is libbacktrace built once in the built-root, as 
> stage1-libbacktrace, prev-libbacktrace and libbacktrace (for stage3) and once 
> again in stage1-x86_64-pc-linux-gnu/libbacktrace, 
> prev-x86_64-pc-linux-gnu/libbacktrace/ and in 
> x86_64-pc-linux-gnu/libbacktrace ? My precise question is why libbacktrace is 
> built once in the build-root directory and once in the x86_64-pc-linux-gnu 
> directory?

Because it is both a target library and a host library. Take a cross
compiler that is being built on say target A and targeting target B.
It will be built as a host library to be included as part of the
cc1/cc1plus/etc. and be a target library that will be used for
libsanitizer (and libgo). The GCC build does not use the target
library to link cc1/cc1plus with it; only the host library version.
Does that make sense now?

Thanks,
Andrew Pinski

>
> Kind regards Дилян
>
>
> Am 26. März 2024 16:37:40 UTC schrieb Ian Lance Taylor :
>>
>> On Tue, Mar 26, 2024 at 9:33 AM Дилян Палаузов
>>  wrote:
>>>
>>>
>>>  Makefile.def contains already:
>>>
>>>  host_modules= { module= libbacktrace; bootstrap=true; }; // since 
>>> eff02e4f84 - "libbacktrace/: * Initial implementation" year 2012
>>>
>>>  host_modules= { module= libcpp; bootstrap=true; }; // since 
>>> 4f4e53dd8517c0b2 - year 2004
>>
>>
>> Yes.  I was just trying to answer your question.
>>
>> Ian
>>
>>> Am 25. März 2024 23:59:52 UTC schrieb Ian Lance Taylor :
>>>>
>>>>
>>>>  On Sat, Mar 23, 2024 at 4:32 AM Дилян Палаузов
>>>>   wrote:
>>>>>
>>>>>
>>>>>
>>>>>   Can the build experts say what needs to be changed?  The dependencies I 
>>>>> added are missing in the build configuration (@if gcc-bootstrap).
>>>>>
>>>>>   I cannot say if libbacktrace should or should not be a bootstrap=true 
>>>>> module.
>>>>
>>>>
>>>>
>>>>  I don't count as a build expert these days, but since GCC itself links
>>>>  against libbacktrace, my understanding is that the libbacktrace
>>>>  host_module should be bootstrap=true, just like, say, libcpp.
>>>>
>>>>  Ian


Re: [PATCH v1] doc: Correction of Tree SSA Passes info.

2024-03-25 Thread Andrew Pinski
On Sun, Mar 24, 2024 at 8:46 PM Chenghui Pan  wrote:
>
> Current document of Tree SSA passes contains many parts that is not
> updated for many years.
>
> This patch removes some info that is outdated and not existed in
> current GCC codebase, and fixes some wrong code location descriptions
> based on current codebase status and ChangeLogs.


This improves the situation for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=951 (and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13756 ). Maybe it should
include a reference to those 2 also.

Thanks,
Andrew

>
> gcc/ChangeLog:
>
> * doc/passes.texi: Correction of Tree SSA Passes info.
> ---
>  gcc/doc/passes.texi | 70 -
>  1 file changed, 6 insertions(+), 64 deletions(-)
>
> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> index b50d3d5635b..068036acb7d 100644
> --- a/gcc/doc/passes.texi
> +++ b/gcc/doc/passes.texi
> @@ -450,17 +450,6 @@ The following briefly describes the Tree optimization 
> passes that are
>  run after gimplification and what source files they are located in.
>
>  @itemize @bullet
> -@item Remove useless statements
> -
> -This pass is an extremely simple sweep across the gimple code in which
> -we identify obviously dead code and remove it.  Here we do things like
> -simplify @code{if} statements with constant conditions, remove
> -exception handling constructs surrounding code that obviously cannot
> -throw, remove lexical bindings that contain no variables, and other
> -assorted simplistic cleanups.  The idea is to get rid of the obvious
> -stuff quickly rather than wait until later when it's more work to get
> -rid of it.  This pass is located in @file{tree-cfg.cc} and described by
> -@code{pass_remove_useless_stmts}.
>
>  @item OpenMP lowering
>
> @@ -478,7 +467,7 @@ described by @code{pass_lower_omp}.
>
>  If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
>  parallel regions into their own functions to be invoked by the thread
> -library.  The pass is located in @file{omp-low.cc} and is described by
> +library.  The pass is located in @file{omp-expand.cc} and is described by
>  @code{pass_expand_omp}.
>
>  @item Lower control flow
> @@ -511,15 +500,6 @@ This pass decomposes a function into basic blocks and 
> creates all of
>  the edges that connect them.  It is located in @file{tree-cfg.cc} and
>  is described by @code{pass_build_cfg}.
>
> -@item Find all referenced variables
> -
> -This pass walks the entire function and collects an array of all
> -variables referenced in the function, @code{referenced_vars}.  The
> -index at which a variable is found in the array is used as a UID
> -for the variable within this function.  This data is needed by the
> -SSA rewriting routines.  The pass is located in @file{tree-dfa.cc}
> -and is described by @code{pass_referenced_vars}.
> -
>  @item Enter static single assignment form
>
>  This pass rewrites the function such that it is in SSA form.  After
> @@ -562,15 +542,6 @@ variables that are used once into the expression that 
> uses them and
>  seeing if the result can be simplified.  It is located in
>  @file{tree-ssa-forwprop.cc} and is described by @code{pass_forwprop}.
>
> -@item Copy Renaming
> -
> -This pass attempts to change the name of compiler temporaries involved in
> -copy operations such that SSA->normal can coalesce the copy away.  When 
> compiler
> -temporaries are copies of user variables, it also renames the compiler
> -temporary to the user variable resulting in better use of user symbols.  It 
> is
> -located in @file{tree-ssa-copyrename.c} and is described by
> -@code{pass_copyrename}.
> -
>  @item PHI node optimizations
>
>  This pass recognizes forms of PHI inputs that can be represented as
> @@ -585,8 +556,7 @@ The resulting may-alias, must-alias, and escape analysis 
> information
>  is used to promote variables from in-memory addressable objects to
>  non-aliased variables that can be renamed into SSA form.  We also
>  update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
> -aggregates so that we get fewer false kills.  The pass is located
> -in @file{tree-ssa-alias.cc} and is described by @code{pass_may_alias}.
> +aggregates so that we get fewer false kills.
>
>  Interprocedural points-to information is located in
>  @file{tree-ssa-structalias.cc} and described by @code{pass_ipa_pta}.
> @@ -604,7 +574,7 @@ is described by @code{pass_ipa_tree_profile}.
>  This pass implements series of heuristics to guess propababilities
>  of branches.  The resulting predictions are turned into edge profile
>  by propagating branches across the control flow graphs.
> -The pass is located in @file{tree-profile.cc} and is described by
> +The pass is located in @file{predict.cc} and is described by
>  @code{pass_profile}.
>
>  @item Lower complex arithmetic
> @@ -653,7 +623,7 @@ in @file{tree-ssa-math-opts.cc} and is described by
>  @item Full redundancy elimination
>
>  This is a simpler form of PRE 

[PATCH] Another ICE after conflicting types of redeclaration [PR110682]

2024-03-22 Thread Andrew Pinski
This another one of these ICE after error issues with the
gimplifier and a fallout from r12-3278-g823685221de986af.
The problem here is that STRIP_USELESS_TYPE_CONVERSION will
leave around a NON_LVALUE_EXPR which is an error mark node.
Since the gimplifier assumes non-lvalue expressions has been
removed, there was an ICE.

This fixes the issue by checking if there is a NON_LVALUE_EXPR
and that has an error operand, we handle it as the same as if
it was an error operand.

gcc/ChangeLog:

PR c/110682
* gimplify.cc (gimplify_expr): Add check if there is
a non-lvalue with an error operand.

gcc/testsuite/ChangeLog:

PR c/110682
* gcc.dg/redecl-27.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimplify.cc  |  6 +-
 gcc/testsuite/gcc.dg/redecl-27.c | 14 ++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/redecl-27.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index d64bbf3ffbd..001b4af68b9 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -17686,7 +17686,11 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
   save_expr = *expr_p;
 
   /* Die, die, die, my darling.  */
-  if (error_operand_p (save_expr))
+  if (error_operand_p (save_expr)
+ /* The above strip useless type conversion might not strip out
+a conversion from an error so handle that case here.  */
+ || (TREE_CODE (save_expr) == NON_LVALUE_EXPR
+ && error_operand_p (TREE_OPERAND (save_expr, 0
{
  ret = GS_ERROR;
  break;
diff --git a/gcc/testsuite/gcc.dg/redecl-27.c b/gcc/testsuite/gcc.dg/redecl-27.c
new file mode 100644
index 000..93f577e64ff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/redecl-27.c
@@ -0,0 +1,14 @@
+/* We used to ICE while gimplifying the body of f
+   due to a NON_LVALUE_EXPR still being there.
+   PR c/110682*/
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+struct a {
+  const signed char b;
+};
+
+void f(volatile struct a *c) { /* { dg-note "" } */
+  c - 0 % c->b;
+  struct a c = {1}; /* { dg-error "redeclared as different kind of symbol" } */
+}
-- 
2.43.0



[PATCH] Another ICE after conflicting types of redeclaration [PR109619]

2024-03-21 Thread Andrew Pinski
This another one of these ICE after error issues with the
gimplifier and a fallout from r12-3278-g823685221de986af.
This case happens when we are trying to fold memcpy/memmove.
There is already code to try to catch ERROR_MARKs as arguments
to the builtins so just need to change them to use error_operand_p
which checks the type of the expression to see if it was an error mark
also.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR c/109619
* builtins.cc (fold_builtin_1): Use error_operand_p
instead of checking against ERROR_MARK.
(fold_builtin_2): Likewise.
(fold_builtin_3): Likewise.

gcc/testsuite/ChangeLog:

PR c/109619
* gcc.dg/redecl-26.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/builtins.cc  | 12 ++--
 gcc/testsuite/gcc.dg/redecl-26.c | 14 ++
 2 files changed, 20 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/redecl-26.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eda8bea9c4b..bb74b5cbcd6 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10461,7 +10461,7 @@ fold_builtin_1 (location_t loc, tree expr, tree fndecl, 
tree arg0)
   tree type = TREE_TYPE (TREE_TYPE (fndecl));
   enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
 
-  if (TREE_CODE (arg0) == ERROR_MARK)
+  if (error_operand_p (arg0))
 return NULL_TREE;
 
   if (tree ret = fold_const_call (as_combined_fn (fcode), type, arg0))
@@ -10601,8 +10601,8 @@ fold_builtin_2 (location_t loc, tree expr, tree fndecl, 
tree arg0, tree arg1)
   tree type = TREE_TYPE (TREE_TYPE (fndecl));
   enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
 
-  if (TREE_CODE (arg0) == ERROR_MARK
-  || TREE_CODE (arg1) == ERROR_MARK)
+  if (error_operand_p (arg0)
+  || error_operand_p (arg1))
 return NULL_TREE;
 
   if (tree ret = fold_const_call (as_combined_fn (fcode), type, arg0, arg1))
@@ -10693,9 +10693,9 @@ fold_builtin_3 (location_t loc, tree fndecl,
   tree type = TREE_TYPE (TREE_TYPE (fndecl));
   enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
 
-  if (TREE_CODE (arg0) == ERROR_MARK
-  || TREE_CODE (arg1) == ERROR_MARK
-  || TREE_CODE (arg2) == ERROR_MARK)
+  if (error_operand_p (arg0)
+  || error_operand_p (arg1)
+  || error_operand_p (arg2))
 return NULL_TREE;
 
   if (tree ret = fold_const_call (as_combined_fn (fcode), type,
diff --git a/gcc/testsuite/gcc.dg/redecl-26.c b/gcc/testsuite/gcc.dg/redecl-26.c
new file mode 100644
index 000..5f8889c4c39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/redecl-26.c
@@ -0,0 +1,14 @@
+/* We used to ICE while folding memcpy and memmove.
+   PR c/109619. */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+int *a1, *a2;
+
+void foo(__SIZE_TYPE__ a3) /* { dg-note "" }  */
+{
+  __builtin_memcpy(a1, a2, a3);
+  __builtin_memmove(a1, a2, a3);
+  int *a3; /* { dg-error "redeclared as different kind of symbol" } */
+}
+
-- 
2.43.0



Re: [PATCH] cpp: new built-in __EXP_COUNTER__

2024-03-21 Thread Andrew Pinski
On Thu, Mar 21, 2024, 17:20 Kaz Kylheku  wrote:

> On 2024-03-20 16:34, rep.dot@gmail.com wrote:
> > On 19 March 2024 18:27:13 CET, Kaz Kylheku  wrote:
> >>On 2024-03-18 00:30, Jonathan Wakely wrote:
> >>> I don't have an opinion on the implementation, or the proposal itself,
> >>> except that the implementation seems susprisingly simple, which is
> >>> nice.
> >>
> >>Hi Jonathan,
> >>
> >>Here is an updated patch.
> >>
> >>It rebased cleanly over more than newer 16000 commits, suggesting
> >>that the area in the cpp code is "still waters", which is good.
> >>
> >>I made the documentation change not to recommend using #if, but
> >>#ifdef.
> >>
> >>I got rid of the ChangeLog changes, and also tried to pay more
> >>attention to the log message format, where the ChangeLog pieces
> >>are specified.
> >>
> >>In the first test case, I had to adjust the expected warning text
> >>for two lines.
> >>
> >
> > Please forgive the bike shedding, but __EXP_COUNTER__ would lead me into
> thinking about exponents or thereabouts.
> > __MACRO_EXPANSION_COUNTER__ is more what your patch is about, IMHO?
> Maybe you could come up with a more descriptive name, please?
> >
> > And, while I can see what could possibly be done with that, I'm not
> really convinced that it would be a wise idea to (unilaterally) support
> that idea. Don't you think that this would encourage producing more
> spaghetti code?
> >
> > Just curious about real world motivating examples I guess.
> > cheers
>
> Hi, (Bernhard?)
>
> Concerns about naming are very important; not bike shedding at all.
> I changed the patch to use __EXPANSION_NUMBER__. I didn't include MACRO
> because I hope it's clear that in preprocessing, we are expanding
> macros. The parent symbol is now called __PARENT_EXPANSION_NUMBER__.
>
> I dropped the COUNTER terminology because the existing __COUNTER__
> is a symbol whose value changes each time it is mentioned,
> These symbols are not like that; they capture a fixed value in
> a scope and behave like ordinary macros.
>
> In doing the renaming, I noticed that from the beginning I've already
> been calling the internal value in the macro context macro->exp_number,
> because it's not a counter.
>
> The focus of this feature isn't to enable some new "earth-shattering"
> techniques, but to improve certain situations in existing macros.
>
> For instance, suppose we have a macro that expands to some block
> of code in which there is an internal goto. If we have it
>
>   #define MAC(...) { ... goto _label; ... __label: ; }
>
> then this cannot be used twice in the same function; labels have
> function scope.



In this case why can't you use gcc's already extension of defining a local
label?
https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Local-Labels.html

This extension has been around for over 20 years specifically for that use
case.

Thanks,
Andrew



If we make it
>
>   #define MAC(...) { ... goto CAT(__label, __LINE__); ... CAT(__label,
> __LINE__): ; }
>
> we now can use MAC two or more times in the same function, but not in
> the same line of code.
>
> With __EXPANSION_NUMBER__ it is doable. Given this program:
>
>   #define xcat(A, B) A ## B
>   #define cat(A, B) xcat(A, B)
>   #define lab(PREFIX) cat(PREFIX, __PARENT_EXPANSION_NUMBER__)
>
>   #define MAC { goto lab(foo); /*...*/ lab(foo): ; }
>
>   MAC MAC MAC
>
> We get the preprocessed output (with -E):
>
>   { goto foo3; foo3: ; } { goto foo10; foo10: ; } { goto foo17; foo17: ; }
>
> There are issues with relying on __LINE__ to produce different values
> when it is referenced in code generated by a macro.
>
> The following program prints the same value 12 three times; even though
> PRINT seems to be referenced on different physical lines in the
> PRINT3 macro replacement text. __LINE__ references the line where
> the top-level expansion of PRINT3 occurs, not where PRINT occurs.
>
> #include 
>
> #define PRINT (printf("%d\n", __LINE__))
>
> #define PRINT3 do { \
>   PRINT;\
>   PRINT;\
>   PRINT;\
> } while (0)
>
> int main()
> {
>   PRINT3;
>   return 0;
> }
>


Re: [PATCH v2 00/13] Add aarch64-w64-mingw32 target

2024-03-18 Thread Andrew Pinski
On Mon, Mar 18, 2024 at 3:59 PM Fangrui Song  wrote:
>
> On Mon, Mar 18, 2024 at 3:10 PM Evgeny Karpov
>  wrote:
> >
> >
> > Monday, March 18, 2024 2:34 PM
> > Christophe Lyon wrote:
> >
> > > I had a look at the v2 series, and besides a minor comment patch #8, ISTM 
> > > than
> > > all the comments your received about v1 have been addressed, indeed.
> > >
> > > > While unit testing for the x86_64-w64-mingw32 target is still in
> > > > progress, the first 4 patches do not obviously change other targets,
> > > > including aarch64-linux-gnu.
> > > > Could they be merged once stage 1 starts, or could it be done even
> > > > now?
> > >
> > > What would be the benefit of committing only the first 4 patches?
> > > (whether now or when stage 1 reopens)
> > >
> > > Thanks,
> > >
> > > Christophe
> >
> > Work on obtaining regression test results for x86_x64-w64-mingw32 was still 
> > in progress at that moment.
> > The first 4 patches do not obviously change other targets, so it was safe 
> > to merge them.
> > Now, based on the regression test results 
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647967.html,
> > it appears safe to merge the entire series.
> >
> > Regards,
> > Evgeny
>
> LLVM has had an aarch64 mingw ABI support for a long time. Does this
> patch series introduce a different ABI?
> If yes, do you have a summary?
>
> Does the patch need any adaptation on the LLVM side, or should a
> different target triple be picked?
> I have always been wondering what "32" in "x86_x64-w64-mingw32" means.

It was always mingw32, it comes from win32 API interface which dates
to when windows also a 16bit API too.
The API has always been named win32 and Microsoft didn't rename it to
win64 though.

Thanks,
Andrew Pinski

>
> https://github.com/llvm/llvm-project/pull/78908 even introduced the
> first use of the triple "arm64ec-w64-mingw32" into llvm-project.
>
>
> --
> 宋方睿


Re: C/C++ frontend patches ping

2024-03-14 Thread Andrew Pinski
On Thu, Mar 14, 2024 at 9:36 PM Andi Kleen  wrote:
>
>
> musttail support for C/C++
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643867.html
>
>
> Support constexpr for asm statements in C++
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643933.html


Both of these were posted long after the start of stage 3 and close
into the beginning of stage 4 and since they are both new features I
really doubt they will be reviewed until stage 1 opens up which will
be in about a month or so.

Thanks,
Andrew Pinski


RE: [COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

2024-03-12 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Sunday, March 10, 2024 7:58 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) 
> Subject: [COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for
> NaNs [PR95351]
> 
> The problem here is that merge_truthop_with_opposite_arm would use the
> type of the result of the comparison rather than the operands of the
> comparison to figure out if we are honoring NaNs.
> This fixes that oversight and now we get the correct results in this case.
> 
> Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

Committed to the GCC 13 branch too.

Thanks,
Andrew

> 
>   PR middle-end/95351
> 
> gcc/ChangeLog:
> 
>   * fold-const.cc (merge_truthop_with_opposite_arm): Use
>   the type of the operands of the comparison and not the type
>   of the comparison.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/float_opposite_arm-1.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/fold-const.cc   |  3 ++-
>  gcc/testsuite/gcc.dg/float_opposite_arm-1.c | 17 +
>  2 files changed, 19 insertions(+), 1 deletion(-)  create mode 100644
> gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index
> 43105d20be3..299c22bf391 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -6420,7 +6420,6 @@ static tree
>  merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
>bool rhs_only)
>  {
> -  tree type = TREE_TYPE (cmpop);
>enum tree_code code = TREE_CODE (cmpop);
>enum tree_code truthop_code = TREE_CODE (op);
>tree lhs = TREE_OPERAND (op, 0);
> @@ -6436,6 +6435,8 @@ merge_truthop_with_opposite_arm (location_t
> loc, tree op, tree cmpop,
>if (TREE_CODE_CLASS (code) != tcc_comparison)
>  return NULL_TREE;
> 
> +  tree type = TREE_TYPE (TREE_OPERAND (cmpop, 0));
> +
>if (rhs_code == truthop_code)
>  {
>tree newrhs = merge_truthop_with_opposite_arm (loc, rhs, cmpop,
> rhs_only); diff --git a/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> new file mode 100644
> index 000..d2dbff35066
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-original -fdump-tree-optimized" } */
> +/* { dg-add-options ieee } */
> +/* PR middle-end/95351 */
> +
> +int Foo(double possiblyNAN, double b, double c) {
> +return (possiblyNAN <= 2.0) || ((possiblyNAN  > 2.0) && (b > c)); }
> +
> +/* Make sure we don't remove either >/<=  */
> +
> +/* { dg-final { scan-tree-dump "possiblyNAN > 2.0e.0" "original" } } */
> +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. > 2.0e.0"
> +"optimized" } } */
> +
> +/* { dg-final { scan-tree-dump "possiblyNAN <= 2.0e.0" "original" } }
> +*/
> +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. <= 2.0e.0"
> +"optimized" } } */
> --
> 2.43.0



RE: [Committed] Reject -fno-multiflags [PR114314]

2024-03-12 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Monday, March 11, 2024 8:59 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) 
> Subject: [Committed] Reject -fno-multiflags [PR114314]
> 
> When -fmultiflags option support was added in r13-3693-g6b1a2474f9e422,
> it accidently allowed -fno-multiflags which then would pass on to cc1.
> This fixes that oversight.
> 
> Committed as obvious after bootstrap/test on x86_64-linux-gnu.

Note I also committed this to the GCC 13 branch too.

Thanks,
Andrew

> 
> gcc/ChangeLog:
> 
>   PR driver/114314
>   * common.opt (fmultiflags): Add RejectNegative.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/common.opt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/common.opt b/gcc/common.opt index
> 51c4a17da83..1ad0169bd6f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2295,7 +2295,7 @@ Common Var(flag_move_loop_stores)
> Optimization  Move stores out of loops.
> 
>  fmultiflags
> -Common Driver
> +Common Driver RejectNegative
>  Building block for specs-based multilib-aware TFLAGS.
> 
>  fdce
> --
> 2.43.0



[Committed] Reject -fno-multiflags [PR114314]

2024-03-11 Thread Andrew Pinski
When -fmultiflags option support was added in r13-3693-g6b1a2474f9e422,
it accidently allowed -fno-multiflags which then would pass on to cc1.
This fixes that oversight.

Committed as obvious after bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

PR driver/114314
* common.opt (fmultiflags): Add RejectNegative.

Signed-off-by: Andrew Pinski 
---
 gcc/common.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 51c4a17da83..1ad0169bd6f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2295,7 +2295,7 @@ Common Var(flag_move_loop_stores) Optimization
 Move stores out of loops.
 
 fmultiflags
-Common Driver
+Common Driver RejectNegative
 Building block for specs-based multilib-aware TFLAGS.
 
 fdce
-- 
2.43.0



Re: [RFC] [PR tree-optimization/92539] Optimize away tests against invalid pointers

2024-03-10 Thread Andrew Pinski
On Sun, Mar 10, 2024 at 2:09 PM Jeff Law  wrote:
>
>
>
> On 3/10/24 3:05 PM, Andrew Pinski wrote:
> > On Sun, Mar 10, 2024 at 2:04 PM Jeff Law  wrote:
> >>
> >> Here's a potential approach to fixing PR92539, a P2 -Warray-bounds false
> >> positive triggered by loop unrolling.
> >>
> >> As I speculated a couple years ago, we could eliminate the comparisons
> >> against bogus pointers.  Consider:
> >>
> >>> [local count: 30530247]:
> >>>if (last_12 !=   [(void *)"aa" + 3B])
> >>>  goto ; [54.59%]
> >>>else
> >>>  goto ; [45.41%]
> >>
> >>
> >> That's a valid comparison as ISO allows us to generate, but not
> >> dereference, a pointer one element past the end of the object.
> >>
> >> But +4B is a bogus pointer.  So given an EQ comparison against that
> >> pointer we could always return false and for NE always return true.
> >>
> >> VRP and DOM seem to be the most natural choices for this kind of
> >> optimization on the surface.  However DOM is actually not viable because
> >> the out-of-bounds pointer warning pass is run at the end of VRP.  So
> >> we've got to take care of this prior to the end of VRP.
> >>
> >>
> >>
> >> I haven't done a bootstrap or regression test with this.  But if it
> >> looks reasonable I can certainly push on it further. I have confirmed it
> >> does eliminate the tests and shuts up the bogus warning.
> >>
> >> The downside is this would also shut up valid warnings if user code did
> >> this kind of test.
> >>
> >> Comments/Suggestions?
> >
> > ENOPATCH
> Yea, realized it as I pushed the send button.  Then t-bird crashed,
> repeatedly.
>
> Attached this time..


One minor comment on it

The comment:
> return true for EQ and false for NE.

Seems to be the opposite for what the code does:
> return code == EQ_EXPR ? boolean_false_node : boolean_true_node;

Thanks,
Andrew


>
> jeff
>


[COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

2024-03-10 Thread Andrew Pinski
The problem here is that merge_truthop_with_opposite_arm would
use the type of the result of the comparison rather than the operands
of the comparison to figure out if we are honoring NaNs.
This fixes that oversight and now we get the correct results in this
case.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/95351

gcc/ChangeLog:

* fold-const.cc (merge_truthop_with_opposite_arm): Use
the type of the operands of the comparison and not the type
of the comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/float_opposite_arm-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/fold-const.cc   |  3 ++-
 gcc/testsuite/gcc.dg/float_opposite_arm-1.c | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/float_opposite_arm-1.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 43105d20be3..299c22bf391 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -6420,7 +6420,6 @@ static tree
 merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
 bool rhs_only)
 {
-  tree type = TREE_TYPE (cmpop);
   enum tree_code code = TREE_CODE (cmpop);
   enum tree_code truthop_code = TREE_CODE (op);
   tree lhs = TREE_OPERAND (op, 0);
@@ -6436,6 +6435,8 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, 
tree cmpop,
   if (TREE_CODE_CLASS (code) != tcc_comparison)
 return NULL_TREE;
 
+  tree type = TREE_TYPE (TREE_OPERAND (cmpop, 0));
+
   if (rhs_code == truthop_code)
 {
   tree newrhs = merge_truthop_with_opposite_arm (loc, rhs, cmpop, 
rhs_only);
diff --git a/gcc/testsuite/gcc.dg/float_opposite_arm-1.c 
b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
new file mode 100644
index 000..d2dbff35066
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-original -fdump-tree-optimized" } */
+/* { dg-add-options ieee } */
+/* PR middle-end/95351 */
+
+int Foo(double possiblyNAN, double b, double c)
+{
+return (possiblyNAN <= 2.0) || ((possiblyNAN  > 2.0) && (b > c));
+}
+
+/* Make sure we don't remove either >/<=  */
+
+/* { dg-final { scan-tree-dump "possiblyNAN > 2.0e.0" "original" } } */
+/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. > 2.0e.0" "optimized" 
} } */
+
+/* { dg-final { scan-tree-dump "possiblyNAN <= 2.0e.0" "original" } } */
+/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. <= 2.0e.0" "optimized" 
} } */
-- 
2.43.0



Re: [RFC] [PR tree-optimization/92539] Optimize away tests against invalid pointers

2024-03-10 Thread Andrew Pinski
On Sun, Mar 10, 2024 at 2:04 PM Jeff Law  wrote:
>
> Here's a potential approach to fixing PR92539, a P2 -Warray-bounds false
> positive triggered by loop unrolling.
>
> As I speculated a couple years ago, we could eliminate the comparisons
> against bogus pointers.  Consider:
>
> >[local count: 30530247]:
> >   if (last_12 !=   [(void *)"aa" + 3B])
> > goto ; [54.59%]
> >   else
> > goto ; [45.41%]
>
>
> That's a valid comparison as ISO allows us to generate, but not
> dereference, a pointer one element past the end of the object.
>
> But +4B is a bogus pointer.  So given an EQ comparison against that
> pointer we could always return false and for NE always return true.
>
> VRP and DOM seem to be the most natural choices for this kind of
> optimization on the surface.  However DOM is actually not viable because
> the out-of-bounds pointer warning pass is run at the end of VRP.  So
> we've got to take care of this prior to the end of VRP.
>
>
>
> I haven't done a bootstrap or regression test with this.  But if it
> looks reasonable I can certainly push on it further. I have confirmed it
> does eliminate the tests and shuts up the bogus warning.
>
> The downside is this would also shut up valid warnings if user code did
> this kind of test.
>
> Comments/Suggestions?

ENOPATCH

>
> Jeff


Re: [PATCH] testsuite: xfail test for arm

2024-03-09 Thread Andrew Pinski
On Sat, Mar 9, 2024 at 1:07 AM Torbjörn SVENSSON
 wrote:
>
> I don't know if this affects other targets than arm-none-eabi, so I
> used arm-*-*. If you think it should be *-*-* or some other target
> selector, please let me know what to use instead.
>
> Ok for releases/gcc-13?

Most likely should be short_enums instead of arm*-*-* (I think the old
arm non-eabi didn't use short enums) due to the fix
r14-6517-gb7e4a4c626e applies when -fshort-enums is used.
Also if you are adding a dg-bogus to the branch, it might makes sense
to the same to the trunk (obviously without the xfail part).
Also makes sense to add a reference to r14-6517-gb7e4a4c626e to the
dg-bogus in the source too.

Thanks,
Andrew Pinski

>
> --
>
> On arm-none-eabi, the test case fails with
> .../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:63:65: warning: 
> converting a packed 'enum obj_type' pointer (alignment 1) to a 'struct 
> connection' pointer (alignment 4) may result in an unaligned pointer value 
> [-Waddress-of-packed-member]
>
> The error was fixed in basepoints/gcc-14-6517-gb7e4a4c626e, but it
> was considered to be a too big change to be backported and thus, the
> failing test is marked xfail in GCC13.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
> Added dg-bogus with xfail on offending line for arm-*-*.
>
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  .../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git 
> a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
>  
> b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
> index 2a9c715c32c..461d5f1199c 100644
> --- 
> a/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
> +++ 
> b/gcc/testsuite/gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
> @@ -60,7 +60,7 @@ static inline enum obj_type obj_type(const enum obj_type *t)
>  }
>  static inline struct connection *__objt_conn(enum obj_type *t)
>  {
> - return ((struct connection *)(((void *)(t)) - ((long)&((struct connection 
> *)0)->obj_type)));
> + return ((struct connection *)(((void *)(t)) - ((long)&((struct connection 
> *)0)->obj_type))); /* { dg-bogus "may result in an unaligned pointer value" 
> "" { xfail arm-*-* } } */
>  }
>  static inline struct connection *objt_conn(enum obj_type *t)
>  {
> --
> 2.25.1
>


RE: [PATCH] c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]

2024-03-06 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Tuesday, February 20, 2024 7:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) 
> Subject: [PATCH] c++/c-common: Fix convert_vector_to_array_for_subscript
> for qualified vector types [PR89224]
> 
> After r7-987-gf17a223de829cb, the access for the elements of a vector type
> would lose the qualifiers.
> So if we had `constvector[0]`, the type of the element of the array would not
> have const on it.
> This was due to a missing build_qualified_type for the inner type of the 
> vector
> when building the array type.
> We need to add back the call to build_qualified_type and now the access has
> the correct qualifiers. So the overloads and even if it is a lvalue or rvalue 
> is
> correctly done.
> 
> Note we correctly now reject the testcase gcc.dg/pr83415.c which was
> incorrectly accepted after r7-987-gf17a223de829cb.


Ping?

> 
> Built and tested for aarch64-linux-gnu.
> 
>   PR c++/89224
> 
> gcc/c-family/ChangeLog:
> 
>   * c-common.cc (convert_vector_to_array_for_subscript): Call
> build_qualified_type
>   for the inner type.
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (cxx_eval_array_reference): Compare main variants
>   for the vector/array types instead of the types directly.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/torture/vector-subaccess-1.C: New test.
>   * gcc.dg/pr83415.c: Change warning to error.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/c-family/c-common.cc  |  7 +-
>  gcc/cp/constexpr.cc   |  3 ++-
>  .../g++.dg/torture/vector-subaccess-1.C   | 23 +++
>  gcc/testsuite/gcc.dg/pr83415.c|  2 +-
>  4 files changed, 32 insertions(+), 3 deletions(-)  create mode 100644
> gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> 
> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index
> e15eff698df..884dd9043f9 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -8936,6 +8936,7 @@ convert_vector_to_array_for_subscript (location_t
> loc,
>if (gnu_vector_type_p (TREE_TYPE (*vecp)))
>  {
>tree type = TREE_TYPE (*vecp);
> +  tree newitype;
> 
>ret = !lvalue_p (*vecp);
> 
> @@ -8950,8 +8951,12 @@ convert_vector_to_array_for_subscript
> (location_t loc,
>for function parameters.  */
>c_common_mark_addressable_vec (*vecp);
> 
> +  /* Make sure qualifiers are copied from the vector type to the new
> element
> +  of the array type.  */
> +  newitype = build_qualified_type (TREE_TYPE (type), TYPE_QUALS
> +(type));
> +
>*vecp = build1 (VIEW_CONVERT_EXPR,
> -   build_array_type_nelts (TREE_TYPE (type),
> +   build_array_type_nelts (newitype,
> TYPE_VECTOR_SUBPARTS (type)),
> *vecp);
>  }
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index
> fa346fe01c9..1fe91d16e8e 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -4421,7 +4421,8 @@ cxx_eval_array_reference (const constexpr_ctx
> *ctx, tree t,
>if (!lval
>&& TREE_CODE (ary) == VIEW_CONVERT_EXPR
>&& VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (ary, 0)))
> -  && TREE_TYPE (t) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (ary, 0
> +  && TYPE_MAIN_VARIANT (TREE_TYPE (t))
> +   == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND
> (ary,
> +0)
>  ary = TREE_OPERAND (ary, 0);
> 
>tree oldidx = TREE_OPERAND (t, 1);
> diff --git a/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> b/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> new file mode 100644
> index 000..0c8958a4e03
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
> @@ -0,0 +1,23 @@
> +/* PR c++/89224 */
> +
> +/* The access of `vector[i]` has the same qualifiers as the original
> +   vector which was missing. */
> +
> +typedef __attribute__((vector_size(16))) unsigned char  Int8x8_t;
> +
> +template 
> +void g(T ) {
> +__builtin_abort();
> +}
> +template 
> +void g(const T ) {
> +  __builtin_exit(0);
> +}
> +void f(const Int8x8_t x) {
> +  g(x[0]);
> +}
> +int main(void)
> +{
> +Int8x8_t x ={};
> +f(x);
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr83415.c b/gcc/testsuite/gcc.dg/pr83415.c
> index 5934c16d97c..2fc85031505 100644
> --- a/gcc/testsuite/gcc.dg/pr83415.c
> +++ b/gcc/testsuite/gcc.dg/pr83415.c
> @@ -7,6 +7,6 @@ int
>  main (int argc, short *argv[])
>  {
>int i = argc;
> -  y[i] = 7 - i; /* { dg-warning "read-only" } */
> +  y[i] = 7 - i; /* { dg-error "read-only" } */
>return 0;
>  }
> --
> 2.43.0



Re: [committed] d: Fix gdc -O2 -mavx generates misaligned vmovdqa instruction [PR114171]

2024-03-03 Thread Andrew Pinski
On Sat, Mar 2, 2024 at 5:51 PM Iain Buclaw  wrote:
>
> Hi,
>
> This patch fixes a wrong code issue in the D front-end where lowered
> struct comparisons would reinterpret fields with a different (usually
> bigger) alignment than the original.  Use `build_aligned_type' to
> preserve the alignment when casting away for such comparisons.
>
> Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
> to mainline, and backported to releases/gcc-13, releases/gcc-12, and
> releases/gcc-11.
>
> Regards,
> Iain.
> ---
> PR d/114171
>
> gcc/d/ChangeLog:
>
> * d-codegen.cc (lower_struct_comparison): Keep alignment of original
> type in reinterpret cast for comparison.
>
> gcc/testsuite/ChangeLog:
>
> * gdc.dg/torture/pr114171.d: New test.
> ---
>  gcc/d/d-codegen.cc  |  1 +
>  gcc/testsuite/gdc.dg/torture/pr114171.d | 29 +
>  2 files changed, 30 insertions(+)
>  create mode 100644 gcc/testsuite/gdc.dg/torture/pr114171.d
>
> diff --git a/gcc/d/d-codegen.cc b/gcc/d/d-codegen.cc
> index 5bc233928aa..43d7739f8fc 100644
> --- a/gcc/d/d-codegen.cc
> +++ b/gcc/d/d-codegen.cc
> @@ -1006,6 +1006,7 @@ lower_struct_comparison (tree_code code, 
> StructDeclaration *sd,
>   if (tmode == NULL_TREE)
> tmode = make_unsigned_type (GET_MODE_BITSIZE (mode.require 
> ()));
>
> + tmode = build_aligned_type (tmode, TYPE_ALIGN (stype));

You might also need to build a may_alias variant too. Or make sure the
access is using the correct aliasing set.
I have not checked if the D front-end does anything special for
aliasing sets so I am not sure if that is needed or not but I suspect
it is.

Thanks,
Andrew Pinski


>   t1ref = build_vconvert (tmode, t1ref);
>   t2ref = build_vconvert (tmode, t2ref);
>
> diff --git a/gcc/testsuite/gdc.dg/torture/pr114171.d 
> b/gcc/testsuite/gdc.dg/torture/pr114171.d
> new file mode 100644
> index 000..0f9ffcab916
> --- /dev/null
> +++ b/gcc/testsuite/gdc.dg/torture/pr114171.d
> @@ -0,0 +1,29 @@
> +// { dg-do run }
> +// { dg-additional-options "-mavx" { target avx_runtime } }
> +// { dg-skip-if "needs gcc/config.d" { ! d_runtime } }
> +import gcc.builtins;
> +
> +struct S1
> +{
> +string label;
> +}
> +
> +struct S2
> +{
> +ulong pad;
> +S1 label;
> +}
> +
> +pragma(inline, false)
> +auto newitem()
> +{
> +void *p = __builtin_malloc(S2.sizeof);
> +__builtin_memset(p, 0, S2.sizeof);
> +return cast(S2*) p;
> +}
> +
> +int main()
> +{
> +auto bn = newitem();
> +return bn.label is S1.init ? 0 : 1;
> +}
> --
> 2.40.1
>


RE: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Maxim Kuvyrkov 
> Sent: Thursday, February 29, 2024 9:46 AM
> To: Andrew Pinski (QUIC) 
> Cc: Evgeny Karpov ; Andrew Pinski
> ; Richard Sandiford ; gcc-
> patc...@gcc.gnu.org; 10wa...@gmail.com; m...@harmstone.com; Zac
> Walker ; Ron Riddle
> ; Radek Barton 
> Subject: Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW
> environments for AArch64
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
> 
> > On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC)
>  wrote:
> >
> >
> >
> >> -Original Message-
> >> From: Evgeny Karpov 
> >> Sent: Thursday, February 29, 2024 8:46 AM
> >> To: Andrew Pinski 
> >> Cc: Richard Sandiford ; gcc-
> >> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
> >> ; m...@harmstone.com; Zac Walker
> >> ; Ron Riddle ;
> >> Radek Barton ; Andrew Pinski (QUIC)
> >> 
> >> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
> >> for AArch64
> >>
> >> Wednesday, February 28, 2024 2:00 AM
> >> Andrew Pinski wrote:
> >>
> >>> What does this mean with respect to C++ exceptions? Or you using
> >>> SJLJ exceptions support or the dwarf unwinding ones without SEH
> support?
> >>> I am not sure if SJLJ exceptions is well tested any more in GCC either.
> >>>
> >>> Also I have a question if you ran the full GCC/G++ testsuites and
> >>> what were the results?
> >>> If you did run it, did you use a cross compiler or the native
> >>> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions
> though)?
> >>
> >> As mentioned in the cover letter and the thread, the current
> >> contribution covers only the C scope.
> >> Exception handling is fully disabled for now.
> >> There is an experimental build with C++ and SEH, however, it is not
> >> included in the plan for the current contribution.
> >>
> >> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-
> build
> >>
> >>> If you run using a cross compiler, did you use ssh or some other
> >>> route to run the applications?
> >>>
> >>> Thanks,
> >>> Andrew Pinski
> >>
> >> GitHub Actions are used to cross-compile toolchains, packages and
> >> tests, and execute tests on Windows Arm64.
> >
> > This does not answer my question because what you are running is just
> simple testcases and not the FULL GCC testsuite.
> > So again have you ran the GCC testsuite and do you have a dejagnu board to
> be able to execute the binaries?
> > I think without the GCC testsuite ran to find all of the known failures, 
> > you are
> going to be running into many issues.
> > The GCC testsuite includes many tests for ABI corner cases and many
> features that you will most likely not think about testing using your simple
> testcases.
> > In fact I suspect there will be some of the aarch64 testcases which will 
> > need
> to be modified for the windows ABI which you have not done yet.
> 
> Hi Andrew,
> 
> We (Linaro) have a prototype CI loop setup for testing aarch64-w64-
> mingw32, and we have results for gcc-c and libatomic -- see [1].
> 
> The results are far from clean, but that's expected.  This patch series aims 
> at
> enabling C hello-world only, and subsequent patch series will improve the
> state of the port.
> 
> [1] https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-
> build/6/artifact/artifacts/sumfiles/

Looking at these results, this port is not in any shape or form to be 
upstreamed right now. Even simple -g will cause failures.
Note we don't need a clean testsuite run but the patch series is not even 
allowing enabling hello world due to the -g not being able to used.

Thanks,
Amdrew Pinski

> 
> Thanks,
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org



RE: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Andrew Pinski (QUIC)


> -Original Message-
> From: Evgeny Karpov 
> Sent: Thursday, February 29, 2024 8:46 AM
> To: Andrew Pinski 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
> ; m...@harmstone.com; Zac Walker
> ; Ron Riddle ; Radek
> Barton ; Andrew Pinski (QUIC)
> 
> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
> for AArch64
> 
> Wednesday, February 28, 2024 2:00 AM
> Andrew Pinski wrote:
> 
> > What does this mean with respect to C++ exceptions? Or you using SJLJ
> > exceptions support or the dwarf unwinding ones without SEH support?
> > I am not sure if SJLJ exceptions is well tested any more in GCC either.
> >
> > Also I have a question if you ran the full GCC/G++ testsuites and what
> > were the results?
> > If you did run it, did you use a cross compiler or the native
> > compiler? Did you do a bootstrap (GCC uses C++ but no exceptions though)?
> 
> As mentioned in the cover letter and the thread, the current contribution
> covers only the C scope.
> Exception handling is fully disabled for now.
> There is an experimental build with C++ and SEH, however, it is not included 
> in
> the plan for the current contribution.
> 
> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build
> 
> > If you run using a cross compiler, did you use ssh or some other route
> > to run the applications?
> >
> > Thanks,
> > Andrew Pinski
> 
> GitHub Actions are used to cross-compile toolchains, packages and tests, and
> execute tests on Windows Arm64.

This does not answer my question because what you are running is just simple 
testcases and not the FULL GCC testsuite.
So again have you ran the GCC testsuite and do you have a dejagnu board to be 
able to execute the binaries?
I think without the GCC testsuite ran to find all of the known failures, you 
are going to be running into many issues.
The GCC testsuite includes many tests for ABI corner cases and many features 
that you will most likely not think about testing using your simple testcases.
In fact I suspect there will be some of the aarch64 testcases which will need 
to be modified for the windows ABI which you have not done yet.


Thanks,
Andrew Pinski

> 
> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-
> build/actions/runs/7929205044
> 
> Regards,
> Evgeny


Re: [COMMITTED] aarch64: Fix memtag builtins vs GC [PR108174]

2024-02-28 Thread Andrew Pinski
On Wed, Feb 28, 2024 at 11:14 PM Andrew Pinski  wrote:
>
> The memtag builtins were being GC'ed away so we end up
> with a crash sometimes (maybe even wrong code).
> This fixes that issue by adding GTY on the variable/struct
> aarch64_memtag_builtin_data.
>
> Committed as obvious after a build/test for aarch64-linux-gnu.

Also committed to the GCC 13 branch.

Thanks,
Andrew

>
> PR target/108174
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-builtins.cc (aarch64_memtag_builtin_data): 
> Make
> static and mark with GTY.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/acle/memtag_4.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64-builtins.cc   |  2 +-
>  gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c | 16 
>  2 files changed, 17 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 277904f6d14..75d21de1401 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1840,7 +1840,7 @@ aarch64_init_prefetch_builtin (void)
>  }
>
>  /* Initialize the memory tagging extension (MTE) builtins.  */
> -struct
> +static GTY(()) struct GTY(())
>  {
>tree ftype;
>enum insn_code icode;
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c
> new file mode 100644
> index 000..1e209ffc25a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv9-a+memtag  --param ggc-min-expand=0 --param 
> ggc-min-heapsize=0" } */
> +/* PR target/108174 */
> +/* Check to make sure that the builtin functions are not GC'ed away. */
> +#include "arm_acle.h"
> +
> +void g(void)
> +{
> +  const char *c;
> +  __arm_mte_increment_tag(c , 0 );
> +}
> +void h(void)
> +{
> +  const char *c;
> +  __arm_mte_increment_tag( c,0);
> +}
> --
> 2.43.0
>


[COMMITTED] aarch64: Fix memtag builtins vs GC [PR108174]

2024-02-28 Thread Andrew Pinski
The memtag builtins were being GC'ed away so we end up
with a crash sometimes (maybe even wrong code).
This fixes that issue by adding GTY on the variable/struct
aarch64_memtag_builtin_data.

Committed as obvious after a build/test for aarch64-linux-gnu.

PR target/108174

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_memtag_builtin_data): Make
static and mark with GTY.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/memtag_4.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-builtins.cc   |  2 +-
 gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c | 16 
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 277904f6d14..75d21de1401 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1840,7 +1840,7 @@ aarch64_init_prefetch_builtin (void)
 }
 
 /* Initialize the memory tagging extension (MTE) builtins.  */
-struct
+static GTY(()) struct GTY(())
 {
   tree ftype;
   enum insn_code icode;
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c 
b/gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c
new file mode 100644
index 000..1e209ffc25a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/memtag_4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv9-a+memtag  --param ggc-min-expand=0 --param 
ggc-min-heapsize=0" } */
+/* PR target/108174 */
+/* Check to make sure that the builtin functions are not GC'ed away. */
+#include "arm_acle.h"
+
+void g(void)
+{
+  const char *c;
+  __arm_mte_increment_tag(c , 0 );
+}
+void h(void)
+{
+  const char *c;
+  __arm_mte_increment_tag( c,0);
+}
-- 
2.43.0



Re: [PATCH 1/3] Change 'v1' float and int code to fall back to v0

2024-02-28 Thread Andrew Pinski
On Wed, Feb 28, 2024 at 5:35 PM Tom Tromey  wrote:
>
> >>>>> "Andrew" == Andrew Pinski  writes:
>
> Andrew> I don't know how to update the script server side after it is
> Andrew> committed in git. the checker script is located in git though:
>
> Thanks, I didn't realize it was there.
>
> Could you check in your patch?
> IMO it seems obvious.

Pushed as r14-9230-g5ff49272bf4eb6
(https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646819.html).
I noticed there should be other cleanup of the bug components there
too but I will leave that for another time.

Thanks,
Andrew Pinski

>
> Tom


[COMMITTED] Add libcc1 to bug components

2024-02-28 Thread Andrew Pinski
As found by Tom Tromey in 
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646807.html
libcc1 is not listed as bug component even though it is there in bugzilla.
This fixes that oversight.

Committed as obvious after testing using git gcc-verify on a patch.

contrib/ChangeLog:

* gcc-changelog/git_commit.py (bug_components): Add libcc1.

Signed-off-by: Andrew Pinski 
---
 contrib/gcc-changelog/git_commit.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 87bec4e00f5..87ecb9e1a17 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -105,6 +105,7 @@ bug_components = {
 'java',
 'jit',
 'libbacktrace',
+'libcc1',
 'libf2c',
 'libffi',
 'libfortran',
-- 
2.34.1



Re: [PATCH 1/3] Change 'v1' float and int code to fall back to v0

2024-02-28 Thread Andrew Pinski
On Wed, Feb 28, 2024 at 3:26 PM Jeff Law  wrote:
>
>
>
> On 2/28/24 15:57, Tom Tromey wrote:
> >>>>>> "Jeff" == Jeff Law  writes:
>
> > I could not push this because:
> >
> > remote: *** ChangeLog format failed:
> > remote: *** ERR: invalid PR component in subject: "Fix PR libcc1/113977"
> >
> > I guess this script isn't in sync with the components in bugzilla.
> >
> > I don't know how to fix this.
> Me neither, but I can suggest a hacky workaround.  Change the component
> in bugzilla to something the pre-commit hooks understand, push the fix,
> then change the component back a little while later and adjust the
> ChangeLog after it gets generated overnight.  Ugly as sin.

I don't know how to update the script server side after it is
committed in git. the checker script is located in git though:
```
[apinski@xeond2 contrib]$ git diff gcc-changelog/git_commit.py
diff --git a/contrib/gcc-changelog/git_commit.py
b/contrib/gcc-changelog/git_commit.py
index 87bec4e00f5..4a3720de7fb 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -109,6 +109,7 @@ bug_components = {
 'libffi',
 'libfortran',
 'libgcc',
+'libcc1',
 'libgcj',
 'libgomp',
 'libitm',
```

Thanks,
Andrew Pinski

>
> jeff


  1   2   3   4   5   6   7   8   9   10   >