date:20240219

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Richard Biener

On Mon, 19 Feb 2024, Thomas Schwinge wrote:

> Hi!
> 
> On 2024-02-19T17:31:20+0100, I wrote:
> > On 2024-02-19T11:52:55+0100, Richard Biener  wrote:
> >> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
> >>> On 2024-02-16T14:53:04+0100, I wrote:
> >>> > On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
> >>> >> On 16/02/2024 12:26, Richard Biener wrote:
> >>> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
> >>>  On 16/02/2024 10:17, Richard Biener wrote:
> >>> > On Fri, 16 Feb 2024, Thomas Schwinge wrote:
> >>> >> On 2023-10-20T12:51:03+0100, Andrew Stubbs  
> >>> >> wrote:
> >>> >>> I've committed this patch
> >>> >>
> >>> >> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
> >>> >> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later 
> >>> >> RDNA3/gfx1100
> >>> >> support builds on top of, and that's what I'm currently working on
> >>> >> getting proper GCC/GCN target (not offloading) results for.
> >>> >>
> >>> >> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably 
> >>> >> simple,
> >>> >> and hopefully representative for other SLP execution test FAILs
> >>> >> (regressions compared to my earlier non-gfx1100 testing).
> >>> >>
> >>> >>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
> >>> >>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> >>> >>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
> >>> >>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
> >>> >> -fno-common
> >>> >>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
> >>> >>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
> >>> >>   source-gcc/newlib/libc/include
> >>> >>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
> >>> >>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
> >>> >>   setarch,--addr-no-randomize -fdump-tree-all-all 
> >>> >> -fdump-ipa-all-all
> >>> >>   -fdump-rtl-all-all -save-temps -march=gfx1100
> >>> >>
> >>> >> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
> >>> >> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), 
> >>> >> so I
> >>> >> suppose will also exhibit the same failure mode, once again?
> >>> >>
> >>> >> Compared to '-march=gfx90a', the differences begin in
> >>> >> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 
> >>> >> 'a-bb-slp-cond-1.s'.
> >>> >>
> >>> >> Changed like:
> >>> >>
> >>> >>   @@ -38,10 +38,10 @@ int main ()
> >>> >>#pragma GCC novector
> >>> >>  for (i = 1; i < N; i++)
> >>> >>if (a[i] != i%4 + 1)
> >>> >>   -  abort ();
> >>> >>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
> >>> >>
> >>> >>  if (a[0] != 5)
> >>> >>   -abort ();
> >>> >>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
> >>> >>
> >>> >> ..., we see:
> >>> >>
> >>> >>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
> >>> >>   40 5 != 1
> >>> >>   41 6 != 2
> >>> >>   42 7 != 3
> >>> >>   43 8 != 4
> >>> >>   44 5 != 1
> >>> >>   45 6 != 2
> >>> >>   46 7 != 3
> >>> >>   47 8 != 4
> >>> >>
> >>> >> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
> >>> >> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
> >>> >> scribbled zero values over these (vector lane masking issue, 
> >>> >> perhaps?),
> >>> >> or some other code generation issue?
> >>> >
> >>>  [...], I must be doing something different because 
> >>>  vect/bb-slp-cond-1.c
> >>>  passes for me, on gfx1100.
> >>> >
> >>> > That's strange.  I've looked at your log file (looks good), and used 
> >>> > your
> >>> > toolchain to compile, and your 'gcn-run' to invoke, and still do get:
> >>> >
> >>> > $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
> >>> > GCN Kernel Aborted
> >>> > Kernel aborted
> >>> >
> >>> > Andrew, later on, please try what happens when you put an unconditional
> >>> > 'abort' call into a test case?
> >>> 
> >>> Andrew, any luck with that yet?
> >>> 
> >>> Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c'
> >>> execution test failure mentioned above (manual compilation and
> >>> 'gcn-run')?
> >>
> >> No, when manually compiling/running the testcase it works fine for me.
> >
> > I've updated my GCC master branch sources, but it still fails for me:
> >
> > $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ 
> > source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c 
> > --sysroot=install/amdgcn-amdhsa -isystem 
> > build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem 
> > source-gcc/newlib/libc/include -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ 
> > -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -march=gfx1100 -ftree-vectorize

Re: [PATCH] ipa: Convert lattices from pure array to vector (PR 113476)

2024-02-19 Thread Richard Biener

On Mon, 19 Feb 2024, Martin Jambor wrote:

> On Tue, Feb 13 2024, Martin Jambor wrote:
> > On Mon, Feb 12 2024, Jan Hubicka wrote:
> >>> Believe it or not, even though I have re-worked the internals of the
> >>> lattices completely, the array itself is older than my involvement with
> >>> GCC (or at least with ipa-cp.c ;-).
> >>> 
> >>> So it being an array and not a vector is historical coincidence, as far
> >>> as I am concerned :-).  But that may be the reason, or because vector
> >>> macros at that time looked scary, or perhaps the initialization by
> >>> XCNEWVEC zeroing everything out was considered attractive (I kind of
> >>> like that but constructors would probably be cleaner), I don't know.
> >>
> >> If your class is no longer a POD, then the clearing before construcion
> >> is dead and GCC may optimize it out.  So fixing this may solve some
> >> surprised in foreseable future when we will try to compile older GCC's
> >> with newer ones.
> >>
> >
> > That's a good point.  I'll prepare a patch converting the whole thing to
> > use constructors and vectors.
> >
> 
> In PR 113476 we have discovered that ipcp_param_lattices is no longer
> a POD and should be destructed.  In a follow-up discussion it
> transpired that their initialization done by memsetting their backing
> memory to zero is also invalid because now any write there before
> construction can be considered dead.  Plus that having them in an
> array is a little bit old-school and does not get the extra checking
> offered by vector along with automatic construction and destruction
> when necessary.
> 
> So this patch converts the array to a vector.  That however means that
> ipcp_param_lattices cannot be just a forward declared type but must be
> known to all code that deal with ipa_node_params and thus to all code
> that includes ipa-prop.h.  Therefore I have moved ipcp_param_lattices
> and the type it depends on to a new header ipa-cp.h which now
> ipa-prop.h depends on.  Because we have the (IMHO not a very wise)
> rule that headers don't include what they need themselves, I had to
> add inclusions of ipa-cp.h and sreal.h (on which it depends) to very
> many files, which made the patch rather ugly.
> 
> Bootstrapped and tested on x86_64-linux.  I also had it checked by our
> script which builds more than a hundred of cross-compilers, so other
> targets are hopefully also fine.
> 
> OK for master?

LGTM.

Thanks,
Richard.

> Martin
> 
> 
> gcc/lto/ChangeLog:
> 
> 2024-02-16  Martin Jambor  
> 
>   * lto-common.cc: Include sreal.h and ipa-cp.h.
>   * lto-partition.cc: Include ipa-cp.h, move inclusion of sreal higher.
>   * lto.cc: Include sreal.h and ipa-cp.h.
> 
> gcc/ChangeLog:
> 
> 2024-02-16  Martin Jambor  
> 
>   * ipa-prop.h (ipa_node_params): Convert lattices to a vector, adjust
>   initializers in the contructor.
>   (ipa_node_params::~ipa_node_params): Release lattices as a vector.
>   * ipa-cp.h: New file.
>   * ipa-cp.cc: Include sreal.h and ipa-cp.h.
>   (ipcp_value_source): Move to ipa-cp.h.
>   (ipcp_value_base): Likewise.
>   (ipcp_value): Likewise.
>   (ipcp_lattice): Likewise.
>   (ipcp_agg_lattice): Likewise.
>   (ipcp_bits_lattice): Likewise.
>   (ipcp_vr_lattice): Likewise.
>   (ipcp_param_lattices): Likewise.
>   (ipa_get_parm_lattices): Remove assert latticess is non-NULL).
>   (ipa_value_from_jfunc): Adjust a check for empty lattices.
>   (ipa_context_from_jfunc): Likewise.
>   (ipa_agg_value_from_jfunc): Likewise.
>   (merge_agg_lats_step): Do not memset new aggregate lattices to zero.
>   (ipcp_propagate_stage): Allocate lattices in a vector as opposed to
>   just in contiguous memory.
>   (ipcp_store_vr_results): Adjust a check for empty lattices.
>   * auto-profile.cc: Include sreal.h and ipa-cp.h.
>   * cgraph.cc: Likewise.
>   * cgraphclones.cc: Likewise.
>   * cgraphunit.cc: Likewise.
>   * config/aarch64/aarch64.cc: Likewise.
>   * config/i386/i386-builtins.cc: Likewise.
>   * config/i386/i386-expand.cc: Likewise.
>   * config/i386/i386-features.cc: Likewise.
>   * config/i386/i386-options.cc: Likewise.
>   * config/i386/i386.cc: Likewise.
>   * config/rs6000/rs6000.cc: Likewise.
>   * config/s390/s390.cc: Likewise.
>   * gengtype.cc (open_base_files): Added sreal.h and ipa-cp.h to the
>   files to be included in gtype-desc.cc.
>   * gimple-range-fold.cc: Include sreal.h and ipa-cp.h.
>   * ipa-devirt.cc: Likewise.
>   * ipa-fnsummary.cc: Likewise.
>   * ipa-icf.cc: Likewise.
>   * ipa-inline-analysis.cc: Likewise.
>   * ipa-inline-transform.cc: Likewise.
>   * ipa-inline.cc: Include ipa-cp.h, move inclusion of sreal.h higher.
>   * ipa-modref.cc: Include sreal.h and ipa-cp.h.
>   * ipa-param-manipulation.cc: Likewise.
>   * ipa-predicate.cc: Likewise.
>   * ipa-profile.cc: Likewise.
>   * ipa-prop.cc:

Re: [PATCH] RISC-V: Fix CTZ unnecessary sign extension [PR #106888]

2024-02-19 Thread Jeff Law





On 2/19/24 21:26, Alexandre Oliva wrote:

This backport for gcc-13 is required for pr90838.c to get the expected
count of andi instructions on riscv64-elf, rather than fail because of
two extra andi insns in functions where it is not necessary.  (On
riscv32-elf, it passes).  Regstrapped on x86_64-linux-gnu, along with
other backports, and tested manually on riscv64-elf.  Ok to install?

From: Raphael Moreira Zinsly 

Changes since v1:
- Remove subreg from operand 1.

-- >8 --

We were not able to match the CTZ sign extend pattern on RISC-V
because it gets optimized to zero extend and/or to ANDI patterns.
For the ANDI case, combine scrambles the RTL and generates the
extension by using subregs.

gcc/ChangeLog:
PR target/106888
* config/riscv/bitmanip.md
(disi2): Match with any_extend.
(disi2_sext): New pattern to match
with sign extend using an ANDI instruction.

gcc/testsuite/ChangeLog:
PR target/106888
* gcc.target/riscv/pr106888.c: New test.
* gcc.target/riscv/zbbw.c: Check for ANDI.
In general, shouldn't backports be focused on correctness issues?  It's 
unclear what the motivation is for backporting this change into gcc-13.


Not objecting, trying understand at this stage.
Jeff

Re: [PATCH RFA] build: drop target libs from LD_LIBRARY_PATH [PR105688]

2024-02-19 Thread Alexandre Oliva

On Feb 16, 2024, Jason Merrill  wrote:

> So, for stage2+, let's add just prev- libgcc.

I'm pretty sure this will break bootstrap-lean where libgcc_s isn't a
system library, and we're building post-bootstrap host tools :-(
We need the current stage lib after the prev stage is removed.


I also doubt that TARGET_LIB_PATH was defined and used for no reason.
My hunch is that bootstrap options and/or targets that don't have these
libraries as system libraries will break in some obscure way without it.
But I don't have the bandwidth to track down the history behind their
inclusion.


I insist that the entire approach of choosing the same set of target
library directories regardless of the freshness relationship between
e.g. a system libstdc++ and the one we're building can't possibly be an
overall improvement, it's only trading problems in some scenarios (where
we're building an older libstdc++) for problems in other scenarios
(where we're building a newer libstdc++).  The latter is unfortunately
far more likely, which is reason enough for the current arrangement, but
libstdc++ problems will likely only hit if the gap between system and
being-built libraries is large enough (say, new symbols in the newer
libstdc++ used by the compiler, but not available in the system
library).


I'm really uncomfortable with this change, especially at this stage.
I'd much rather have a relatively obscure workaround for this relatively
obscure problem, while keeping the defaults that have accumulated lots
of testing on lots of configurations.

An idea that occurred to me is to have some configure option or just a
make variable that would be prepended to RPATH_ENVVAR, so that it would
preempt TARGET_LIB_PATH.  That would be a far more conservative change,
that I think we could make even at this stage.  WDYT?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH v8 24/24] libstdc++: Optimize std::is_invocable compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_invocable
by dispatching to the new __is_invocable built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_invocable): Use __is_invocable
built-in trait.
* testsuite/20_util/is_invocable/incomplete_args_neg.cc: Handle
the new error from __is_invocable.
* testsuite/20_util/is_invocable/incomplete_neg.cc: Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits| 6 ++
 .../testsuite/20_util/is_invocable/incomplete_args_neg.cc   | 1 +
 .../testsuite/20_util/is_invocable/incomplete_neg.cc| 1 +
 3 files changed, 8 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 1577042a5b8..51d6df5ca66 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3233,9 +3233,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 using invoke_result_t = typename invoke_result<_Fn, _Args...>::type;
 
   /// std::is_invocable
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_invocable)
+  template
+struct is_invocable
+: public __bool_constant<__is_invocable(_Fn, _ArgTypes...)>
+#else
   template
 struct is_invocable
 : __is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, void>::type
+#endif
 {
   static_assert(std::__is_complete_or_unbounded(__type_identity<_Fn>{}),
"_Fn must be a complete class or an unbounded array");
diff --git a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc 
b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
index a575750f9e9..9619129b817 100644
--- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_args_neg.cc
@@ -18,6 +18,7 @@
 // .
 
 // { dg-error "must be a complete class" "" { target *-*-* } 0 }
+// { dg-prune-output "invalid use of incomplete type" }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc 
b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
index 05848603555..b478ebce815 100644
--- a/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/is_invocable/incomplete_neg.cc
@@ -18,6 +18,7 @@
 // .
 
 // { dg-error "must be a complete class" "" { target *-*-* } 0 }
+// { dg-prune-output "invalid use of incomplete type" }
 
 #include 
 
-- 
2.43.2

[PATCH v8 23/24] c++: Implement __is_invocable built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::is_invocable.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_invocable.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.
* cp-tree.h (build_invoke): New function.
* method.cc (build_invoke): New function.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
* g++.dg/ext/is_invocable1.C: New test.
* g++.dg/ext/is_invocable2.C: New test.
* g++.dg/ext/is_invocable3.C: New test.
* g++.dg/ext/is_invocable4.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |   6 +
 gcc/cp/cp-trait.def  |   1 +
 gcc/cp/cp-tree.h |   2 +
 gcc/cp/method.cc | 132 +
 gcc/cp/semantics.cc  |   4 +
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
 gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
 gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
 gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
 gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
 10 files changed, 720 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 23ea66d9c12..c87b126fdb1 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3791,6 +3791,12 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_FUNCTION:
   inform (loc, "  %qT is not a function", t1);
   break;
+case CPTK_IS_INVOCABLE:
+  if (!t2)
+inform (loc, "  %qT is not invocable", t1);
+  else
+inform (loc, "  %qT is not invocable by %qE", t1, t2);
+  break;
 case CPTK_IS_LAYOUT_COMPATIBLE:
   inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 85056c8140b..6cb2b55f4ea 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
 DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
+DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
 DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 334c11396c2..261d3a71faa 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7334,6 +7334,8 @@ extern tree get_copy_assign   (tree);
 extern tree get_default_ctor   (tree);
 extern tree get_dtor   (tree, tsubst_flags_t);
 extern tree build_stub_object  (tree);
+extern tree build_invoke   (tree, const_tree,
+tsubst_flags_t);
 extern tree strip_inheriting_ctors (tree);
 extern tree inherited_ctor_binfo   (tree);
 extern bool base_ctor_omit_inherited_parms (tree);
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 98c10e6a8b5..8c7352abcde 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1928,6 +1928,138 @@ build_trait_object (tree type)
   return build_stub_object (type);
 }
 
+/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  If the
+   given is not invocable, returns error_mark_node.  */
+
+tree
+build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
+{
+  if (fn_type == error_mark_node || arg_types == error_mark_node)
+return error_mark_node;
+
+  gcc_assert (TYPE_P (fn_type));
+  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
+
+  /* Access check is required to determine if the given is invocable.  */
+  deferring_access_check_sentinel acs (dk_no_deferred);
+
+  /* INVOKE is an unevaluated context.  */
+  cp_unevaluated cp_uneval_guard;
+
+  bool is_ptrdatamem;
+  bool is_ptrmemfunc;
+  if (TREE_CODE (fn_type) == REFERENCE_TYPE)
+{
+  tree deref_fn_type = TREE_TYPE (fn_type);
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (deref_fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (deref_fn_type);
+
+  /* Dereference fn_type if it is a pointer to member.  */
+  if (is_ptrdatamem || is_ptrmemfunc)
+   fn_type = deref_fn_type;
+}
+  else
+{
+  is_ptrdatamem = TYPE_PTRDATAMEM_P (fn_type);
+  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (fn_type);
+}
+
+  if (is_ptrdatamem && TREE_VEC_LENGTH (arg_types) != 1)
+/* Only a pointer to data member with one argument is invocable.  */
+return error_mark_node;

[PATCH] libcpp: Stabilize the location for macros restored after PCH load [PR105608]

2024-02-19 Thread Alexandre Oliva

This backport for gcc-13 is the second of two required for the
g++.dg/pch/line-map-3.C test to stop hitting a variant of the known
problem mentioned in that testcase: on riscv64-elf and riscv32-elf,
after restoring the PCH, the location of the macros is mentioned as if
they were on line 3 rather than 2, so even the existing xfails fail.  I
think this might be too much to backport, and I'm ready to use an xfail
instead, but since this would bring more predictability, I thought I'd
ask whether you'd find this backport acceptable.

Regstrapped on x86_64-linux-gnu, along with other backports, and tested
manually on riscv64-elf.  Ok to install?

From: Lewis Hyatt 

libcpp currently lacks the infrastructure to assign correct locations to
macros that were defined prior to loading a PCH and then restored
afterwards. While I plan to address that fully for GCC 15, this patch
improves things by using at least a valid location, even if it's not the
best one. Without this change, libcpp uses pfile->directive_line as the
location for the restored macros, but this location_t applies to the old
line map, not the one that was just restored from the PCH, so the resulting
location is unpredictable and depends on what was stored in the line maps
before. With this change, all restored macros get assigned locations at the
line of the #include that triggered the PCH restore. A future patch will
store the actual file name and line number of each definition and then
synthesize locations in the new line map pointing to the right place.

gcc/c-family/ChangeLog:

PR preprocessor/105608
* c-pch.cc (c_common_read_pch): Adjust line map so that libcpp
assigns a location to restored macros which is the same location
that triggered the PCH include.

libcpp/ChangeLog:

PR preprocessor/105608
* pch.cc (cpp_read_state): Set a valid location for restored
macros.

(cherry picked from commit 019dc63819befb2b82077fb2d76b5dd670946f36)
---
 gcc/c-family/c-pch.cc |   23 +++
 libcpp/pch.cc |9 -
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/gcc/c-family/c-pch.cc b/gcc/c-family/c-pch.cc
index 9ee6f1790023c..d60972ba93084 100644
--- a/gcc/c-family/c-pch.cc
+++ b/gcc/c-family/c-pch.cc
@@ -318,6 +318,7 @@ c_common_read_pch (cpp_reader *pfile, const char *name,
   struct save_macro_data *smd;
   expanded_location saved_loc;
   bool saved_trace_includes;
+  int cpp_result;
 
   timevar_push (TV_PCH_RESTORE);
 
@@ -343,20 +344,26 @@ c_common_read_pch (cpp_reader *pfile, const char *name,
   cpp_set_line_map (pfile, line_table);
   rebuild_location_adhoc_htab (line_table);
   line_table->trace_includes = saved_trace_includes;
-  linemap_add (line_table, LC_ENTER, 0, saved_loc.file, saved_loc.line);
+
+  /* Set the current location to the line containing the #include (or the
+ #pragma GCC pch_preprocess) for the purpose of assigning locations to any
+ macros that are about to be restored.  */
+  linemap_add (line_table, LC_ENTER, 0, saved_loc.file,
+  saved_loc.line > 1 ? saved_loc.line - 1 : saved_loc.line);
 
   timevar_push (TV_PCH_CPP_RESTORE);
-  if (cpp_read_state (pfile, name, f, smd) != 0)
-{
-  fclose (f);
-  timevar_pop (TV_PCH_CPP_RESTORE);
-  goto end;
-}
-  timevar_pop (TV_PCH_CPP_RESTORE);
+  cpp_result = cpp_read_state (pfile, name, f, smd);
 
+  /* Set the current location to the line following the #include, where we
+ were prior to processing the PCH.  */
+  linemap_line_start (line_table, saved_loc.line, 0);
 
+  timevar_pop (TV_PCH_CPP_RESTORE);
   fclose (f);
 
+  if (cpp_result != 0)
+goto end;
+
   /* Give the front end a chance to take action after a PCH file has
  been loaded.  */
   if (lang_post_pch_load)
diff --git a/libcpp/pch.cc b/libcpp/pch.cc
index a9f4ff19bf1e1..17e423f44b801 100644
--- a/libcpp/pch.cc
+++ b/libcpp/pch.cc
@@ -838,7 +838,14 @@ cpp_read_state (cpp_reader *r, const char *name, FILE *f,
  != NULL)
{
  _cpp_clean_line (r);
- if (!_cpp_create_definition (r, h, 0))
+
+ /* ??? Using r->line_table->highest_line is not ideal here, but we
+do need to use some location that is relative to the new line
+map just loaded, not the old one that was in effect when these
+macros were lexed.  The proper fix is to remember the file name
+and line number where each macro was defined, and then add
+these locations into the new line map.  See PR105608.  */
+ if (!_cpp_create_definition (r, h, r->line_table->highest_line))
abort ();
  _cpp_pop_buffer (r);
}

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding

[PATCH] libcpp: Improve location for macro names [PR66290]

2024-02-19 Thread Alexandre Oliva

This backport for gcc-13 is the first of two required for the
g++.dg/pch/line-map-3.C test to stop hitting a variant of the known
problem mentioned in that testcase: on riscv64-elf and riscv32-elf,
after restoring the PCH, the location of the macros is mentioned as if
they were on line 3 rather than 2, so even the existing xfails fail.  I
think this might be too much to backport, and I'm ready to use an xfail
instead, but since this would bring more predictability, I thought I'd
ask whether you'd find this backport acceptable.

Regstrapped on x86_64-linux-gnu, along with other backports, and tested
manually on riscv64-elf.  Ok to install?

From: Lewis Hyatt 

When libcpp reports diagnostics whose locus is a macro name (such as for
-Wunused-macros), it uses the location in the cpp_macro object that was
stored by _cpp_new_macro. This is currently set to pfile->directive_line,
which contains the line number only and no column information. This patch
changes the stored location to the src_loc for the token defining the macro
name, which includes the location and range information.

libcpp/ChangeLog:

PR c++/66290
* macro.cc (_cpp_create_definition): Add location argument.
* internal.h (_cpp_create_definition): Adjust prototype.
* directives.cc (do_define): Pass new location argument to
_cpp_create_definition.
(do_undef): Stop passing inferior location to cpp_warning_with_line;
the default from cpp_warning is better.
(cpp_pop_definition): Pass new location argument to
_cpp_create_definition.
* pch.cc (cpp_read_state): Likewise.

gcc/testsuite/ChangeLog:

PR c++/66290
* c-c++-common/cpp/macro-ranges.c: New test.
* c-c++-common/cpp/line-2.c: Adapt to check for column information
on macro-related libcpp warnings.
* c-c++-common/cpp/line-3.c: Likewise.
* c-c++-common/cpp/macro-arg-count-1.c: Likewise.
* c-c++-common/cpp/pr58844-1.c: Likewise.
* c-c++-common/cpp/pr58844-2.c: Likewise.
* c-c++-common/cpp/warning-zero-location.c: Likewise.
* c-c++-common/pragma-diag-14.c: Likewise.
* c-c++-common/pragma-diag-15.c: Likewise.
* g++.dg/modules/macro-2_d.C: Likewise.
* g++.dg/modules/macro-4_d.C: Likewise.
* g++.dg/modules/macro-4_e.C: Likewise.
* g++.dg/spellcheck-macro-ordering.C: Likewise.
* gcc.dg/builtin-redefine.c: Likewise.
* gcc.dg/cpp/Wunused.c: Likewise.
* gcc.dg/cpp/redef2.c: Likewise.
* gcc.dg/cpp/redef3.c: Likewise.
* gcc.dg/cpp/redef4.c: Likewise.
* gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
* gcc.dg/cpp/ucnid-11.c: Likewise.
* gcc.dg/cpp/undef2.c: Likewise.
* gcc.dg/cpp/warn-redefined-2.c: Likewise.
* gcc.dg/cpp/warn-redefined.c: Likewise.
* gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
* gcc.dg/cpp/warn-unused-macros.c: Likewise.

(cherry picked from commit 4f3be7cbebce8ec9e0c5d9340b2772581454b862)
---
 gcc/testsuite/c-c++-common/cpp/line-2.c|2 
 gcc/testsuite/c-c++-common/cpp/line-3.c|2 
 gcc/testsuite/c-c++-common/cpp/macro-arg-count-1.c |4 
 gcc/testsuite/c-c++-common/cpp/macro-ranges.c  |   52 ++
 gcc/testsuite/c-c++-common/cpp/pr58844-1.c |4 
 gcc/testsuite/c-c++-common/cpp/pr58844-2.c |4 
 .../c-c++-common/cpp/warning-zero-location.c   |2 
 gcc/testsuite/c-c++-common/pragma-diag-14.c|2 
 gcc/testsuite/c-c++-common/pragma-diag-15.c|2 
 gcc/testsuite/g++.dg/modules/macro-2_d.C   |4 
 gcc/testsuite/g++.dg/modules/macro-4_d.C   |4 
 gcc/testsuite/g++.dg/modules/macro-4_e.C   |2 
 gcc/testsuite/g++.dg/spellcheck-macro-ordering.C   |2 
 gcc/testsuite/gcc.dg/builtin-redefine.c|   18 -
 gcc/testsuite/gcc.dg/cpp/Wunused.c |6 
 gcc/testsuite/gcc.dg/cpp/redef2.c  |   20 -
 gcc/testsuite/gcc.dg/cpp/redef3.c  |   14 -
 gcc/testsuite/gcc.dg/cpp/redef4.c  |  520 ++--
 gcc/testsuite/gcc.dg/cpp/ucnid-11-utf8.c   |   12 
 gcc/testsuite/gcc.dg/cpp/ucnid-11.c|   12 
 gcc/testsuite/gcc.dg/cpp/undef2.c  |6 
 gcc/testsuite/gcc.dg/cpp/warn-redefined-2.c|   10 
 gcc/testsuite/gcc.dg/cpp/warn-redefined.c  |   10 
 gcc/testsuite/gcc.dg/cpp/warn-unused-macros-2.c|2 
 gcc/testsuite/gcc.dg/cpp/warn-unused-macros.c  |2 
 libcpp/directives.cc   |   13 -
 libcpp/internal.h  |2 
 libcpp/macro.cc|   12 
 libcpp/pch.cc  |2 
 29 files changed, 405 insertions(+), 342 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/macro-ranges.c

diff --git a/gcc/testsuite/c-c++-common/cpp/line-2.c

[PATCH] RISC-V: Fix CTZ unnecessary sign extension [PR #106888]

2024-02-19 Thread Alexandre Oliva

This backport for gcc-13 is required for pr90838.c to get the expected
count of andi instructions on riscv64-elf, rather than fail because of
two extra andi insns in functions where it is not necessary.  (On
riscv32-elf, it passes).  Regstrapped on x86_64-linux-gnu, along with
other backports, and tested manually on riscv64-elf.  Ok to install?

From: Raphael Moreira Zinsly 

Changes since v1:
- Remove subreg from operand 1.

-- >8 --

We were not able to match the CTZ sign extend pattern on RISC-V
because it gets optimized to zero extend and/or to ANDI patterns.
For the ANDI case, combine scrambles the RTL and generates the
extension by using subregs.

gcc/ChangeLog:
PR target/106888
* config/riscv/bitmanip.md
(disi2): Match with any_extend.
(disi2_sext): New pattern to match
with sign extend using an ANDI instruction.

gcc/testsuite/ChangeLog:
PR target/106888
* gcc.target/riscv/pr106888.c: New test.
* gcc.target/riscv/zbbw.c: Check for ANDI.

(cherry picked from commit 9000da00dd70988f30d43806bae33b22ee6b9904)
---
 gcc/config/riscv/bitmanip.md  |   13 -
 gcc/testsuite/gcc.target/riscv/pr106888.c |   12 
 gcc/testsuite/gcc.target/riscv/zbbw.c |1 +
 3 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr106888.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7aa591689ba87..cc55ca133c3fe 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -246,13 +246,24 @@ (define_insn "*si2"
 
 (define_insn "*disi2"
   [(set (match_operand:DI 0 "register_operand" "=r")
-(sign_extend:DI
+(any_extend:DI
   (clz_ctz_pcnt:SI (match_operand:SI 1 "register_operand" "r"]
   "TARGET_64BIT && TARGET_ZBB"
   "w\t%0,%1"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "SI")])
 
+;; A SImode clz_ctz_pcnt may be extended to DImode via subreg.
+(define_insn "*disi2_sext"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(and:DI (subreg:DI
+  (clz_ctz_pcnt:SI (match_operand:SI 1 "register_operand" "r")) 0)
+  (match_operand:DI 2 "const_int_operand")))]
+  "TARGET_64BIT && TARGET_ZBB && ((INTVAL (operands[2]) & 0x3f) == 0x3f)"
+  "w\t%0,%1"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "SI")])
+
 (define_insn "*di2"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (clz_ctz_pcnt:DI (match_operand:DI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/riscv/pr106888.c 
b/gcc/testsuite/gcc.target/riscv/pr106888.c
new file mode 100644
index 0..77fb8e5b79c6b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr106888.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64" } */
+
+int
+ctz (int i)
+{
+  int res = __builtin_ctz (i);
+  return res&0x;
+}
+
+/* { dg-final { scan-assembler-times "ctzw" 1 } } */
+/* { dg-final { scan-assembler-not "andi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbbw.c 
b/gcc/testsuite/gcc.target/riscv/zbbw.c
index 709743c3b6807..f7b2b63853f40 100644
--- a/gcc/testsuite/gcc.target/riscv/zbbw.c
+++ b/gcc/testsuite/gcc.target/riscv/zbbw.c
@@ -23,3 +23,4 @@ popcount (int i)
 /* { dg-final { scan-assembler-times "clzw" 1 } } */
 /* { dg-final { scan-assembler-times "ctzw" 1 } } */
 /* { dg-final { scan-assembler-times "cpopw" 1 } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

[PATCH] RISC-V: Fix error combine of pred_mov pattern

2024-02-19 Thread Alexandre Oliva

This backport is the second of two required for the pr111935 testcase,
already backported to gcc-13, to pass on riscv64-elf and riscv32-elf.
The V_VLS mode iterator, used in the original patch, is not available in
gcc-13, and I thought that would be too much to backport (and maybe so
are these two patches, WDYT?), so I changed it to V, to match the
preexisting gcc-13 pattern.  Comments also needed manual adjustment.
Regstrapped on x86_64-linux-gnu, along with other backports, and tested
manually on riscv64-elf.  Ok to install?

From: Lehua Ding 

This patch fix PR110943 which will produce some error code. This is because
the error combine of some pred_mov pattern. Consider this code:

```

void foo9 (void *base, void *out, size_t vl)
{
int64_t scalar = *(int64_t*)(base + 100);
vint64m2_t v = __riscv_vmv_v_x_i64m2 (0, 1);
*(vint64m2_t*)out = v;
}
```

RTL before combine pass:

```
(insn 11 10 12 2 (set (reg/v:RVVM2DI 134 [ v ])
(if_then_else:RVVM2DI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(const_int 1 [0x1])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2DI repeat [
(const_int 0 [0])
])
(unspec:RVVM2DI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":6:20 1089 {pred_movrvvm2di})
(insn 14 13 0 2 (set (mem:RVVM2DI (reg/v/f:DI 136 [ out ]) [1 MEM[(vint64m2_t 
*)out_4(D)]+0 S[32, 32] A128])
(reg/v:RVVM2DI 134 [ v ])) "/app/example.c":7:23 717 
{*movrvvm2di_whole})
```

RTL after combine pass:
```
(insn 14 13 0 2 (set (mem:RVVM2DI (reg:DI 138) [1 MEM[(vint64m2_t *)out_4(D)]+0 
S[32, 32] A128])
(if_then_else:RVVM2DI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(const_int 1 [0x1])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2DI repeat [
(const_int 0 [0])
])
(unspec:RVVM2DI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":7:23 1089 {pred_movrvvm2di})
```

This combine change the semantics of insn 14. I split @pred_mov pattern and
restrict the conditon of @pred_mov.

PR target/110943

gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_int_or_double_0_operand):
New predicate.
* config/riscv/riscv-vector-builtins.cc 
(function_expander::function_expander):
force_reg mem target operand.
* config/riscv/vector.md (@pred_mov): Wrapper.
(*pred_mov): Remove imm -> reg pattern.
(*pred_broadcast_imm): Add imm -> reg pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr110943.c: New test.

(cherry picked from commit 973eb0deb467c79cc21f265a710a81054cfd3e8c)

Dropped from backport:
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Adjust.

This backport is a prerequisite for gcc.target/riscv/rvv/base/pr111935.c
that was backported from gcc-14 to gcc-13 upstream, presumably without
realizing that the test didn't pass in gcc-13.
---
 gcc/config/riscv/predicates.md |5 +
 gcc/config/riscv/riscv-vector-builtins.cc  |9 ++
 gcc/config/riscv/vector.md |   98 +++-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr110943.c |   33 +++
 4 files changed, 101 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110943.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 1707c80cba256..0600824695ed8 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -280,6 +280,11 @@ (define_predicate "vector_const_0_operand"
   (and (match_code "const_vector")
(match_test "satisfies_constraint_Wc0 (op)")))
 
+(define_predicate "vector_const_int_or_double_0_operand"
+  (and (match_code "const_vector")
+   (match_test "satisfies_constraint_vi (op)
+|| satisfies_constraint_Wc0 (op)")))
+
 (define_predicate "vector_move_operand"
   (ior (match_operand 0 "nonimmediate_operand")
(and (match_code "const_vector")
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 01cea23d3e687..60ad59814cd5d 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2935,7 +2935,14 @@ function_expander::function_expander (const 
function_instance ,

[PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i

2024-02-19 Thread Alexandre Oliva

This backport is the first of two required for the pr111935 testcase,
already backported to gcc-13, to pass on riscv64-elf and riscv32-elf.
The V_VLS mode iterator, used in the original patch, is not available in
gcc-13, and I thought that would be too much to backport (and maybe so
are these two patches, WDYT?), so I changed it to V, to match the
preexisting gcc-13 pattern.  Regstrapped on x86_64-linux-gnu, along with
other backports, and tested manually on riscv64-elf.  Ok to install?

From: Lehua Ding 

Hi,

This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
optimize the special case when the scalar operand is zero.

Currently, the broadcast pattern where the scalar operand is a imm
will be converted to vmv.v.i from vmv.s.x and the mask operand will be
converted from 00..01 to 11..11. There are some advantages and
disadvantages before and after the conversion after discussing
with Juzhe offline and we chose not to do this transform.

Before:

  Advantages: The vsetvli info required by vmv.s.x has better compatibility 
since
  vmv.s.x only required SEW and VLEN be zero or one. That mean there
  is more opportunities to combine with other vsetlv infos in vsetvl pass.

  Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction
  will be needed.

After:

  Advantages: No need `li rd, imm` instruction since vmv.v.i support imm 
operand.

  Disadvantages: Like before's advantages. Worse compatibility leads to more
  vsetvl instrunctions need.

Consider the bellow C code and asm after autovec.
there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma)
after converted vmv.s.x to vmv.v.i.

```
int foo1(int* restrict a, int* restrict b, int *restrict c, int n) {
int sum = 0;
for (int i = 0; i < n; i++)
  sum += a[i] * b[i];

return sum;
}
```

asm (Before):

```
foo1:
ble a3,zero,.L7
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L6:
vsetvli a5,a3,e32,m1,tu,ma
sllia4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vvv1,v3,v2
bne a3,zero,.L6
vsetvli a2,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
ret
.L7:
li  a0,0
ret
```

asm (After):

```
foo1:
ble a3,zero,.L4
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a3,e32,m1,tu,ma
sllia4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vvv1,v3,v2
bne a3,zero,.L3
vsetivlizero,1,e32,m1,ta,ma
vmv.v.i v2,0
vsetvli a2,zero,e32,m1,ta,ma
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li  a0,0
ret
```

Best,
Lehua

Co-Authored-By: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_0_operand): New.
* config/riscv/vector.md (*pred_broadcast_zero): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Update.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.

(cherry picked from commit 86d80395cf3c8832b669135b1ca7ea8258790c19)
---
 gcc/config/riscv/predicates.md |4 ++
 gcc/config/riscv/vector.md |   43 ++--
 .../gcc.target/riscv/rvv/base/scalar_move-5.c  |   20 -
 .../gcc.target/riscv/rvv/base/scalar_move-6.c  |   22 --
 4 files changed, 70 insertions(+), 19 deletions(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8654dbc594354..1707c80cba256 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -276,6 +276,10 @@ (define_predicate "reg_or_int_operand"
   (ior (match_operand 0 "register_operand")
(match_operand 0 "const_int_operand")))
 
+(define_predicate "vector_const_0_operand"
+  (and (match_code "const_vector")
+   (match_test "satisfies_constraint_Wc0 (op)")))
+
 (define_predicate "vector_move_operand"
   (ior (match_operand 0 "nonimmediate_operand")
(and (match_code "const_vector")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index db3a972832aea..fb0caab8da360 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1217,23 +1217,24 @@ (define_expand "@pred_broadcast"
  (match_operand:V 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
-  /* Handle vmv.s.x instruction which has memory scalar.  */
-  if (satisfies_constraint_Wdm (operands[3]) || riscv_vector::simm5_p 
(operands[3])
-  || rtx_equal_p (operands[3], CONST0_RTX (mode)))
+  /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
+  if (satisfies_constraint_Wdm (operands[3]))
 {
   if (satisfies_constraint_Wb1 (operands[1]))
-{
-

[PATCH] RISC-V: Fix riscv/arch-19.c with different ISA spec version

2024-02-19 Thread Alexandre Oliva

This testcase is failing with riscv64-elf and riscv32-elf in the gcc-13
branch, if configured to use an assembler that supports -misa-spec; with
an assembler that doesn't, the test passes both with and without the
following backport from the trunk, so I'd like to install it in gcc-13.
Regstrapped on x86_64-linux-gnu, along with other backports, and tested
manually on riscv64-elf.  Ok to install?

From: Kito Cheng 

In newer ISA spec, F will implied zicsr, add that into -march option to
prevent different test result on different default -misa-spec version.

gcc/testsuite/

* gcc.target/riscv/arch-19.c: Add -misa-spec.

(cherry picked from commit 9fde76a3be8e1717d9d38492c40675e742611e45)
---
 gcc/testsuite/gcc.target/riscv/arch-19.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/arch-19.c 
b/gcc/testsuite/gcc.target/riscv/arch-19.c
index b042e1a49fe6f..95204ede26a69 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-19.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-19.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64if_zfinx -mabi=lp64" } */
+/* { dg-options "-march=rv64if_zicsr_zfinx -mabi=lp64" } */
 int foo() {}
-/* { dg-error "'-march=rv64if_zfinx': z\\*inx conflicts with floating-point 
extensions" "" { target *-*-* } 0 } */
+/* { dg-error "'-march=rv64if_zicsr_zfinx': z\\*inx conflicts with 
floating-point extensions" "" { target *-*-* } 0 } */

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: LoongArch: Backport r14-4674 "LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP."?

2024-02-19 Thread chenglulu




在 2024/2/9 下午4:08, Xi Ruoyao 写道:

On Fri, 2024-02-09 at 00:02 +0800, chenglulu wrote:

在 2024/2/7 上午12:23, Xi Ruoyao 写道:

Hi Lulu,

I'm proposing to backport r14-4674 "LoongArch: Delete macro definition
ASM_OUTPUT_ALIGN_WITH_NOP." to releases/gcc-12 and releases/gcc-13.  The
reasons:

1. Strictly speaking, the old ASM_OUTPUT_ALIGN_WITH_NOP macro may cause
a correctness issue.  For example, a developer may use -falign-
functions=16 and then use the low 4 bits of a function pointer to encode
some metainfo.  Then ASM_OUTPUT_ALIGN_WITH_NOP causes the functions not
really aligned to a 16 bytes boundary, causing some breakage.

2. With Binutils-2.42,  ASM_OUTPUT_ALIGN_WITH_NOP can cause illegal
opcodes.  For example:

.globl _start
_start:
.balign 32
nop
nop
nop
addi.d $a0, $r0, 1
.balign 16,54525952,4
addi.d $a0, $a0, 1

is assembled and linked to:

0220 <_start>:
   220: 0340    nop
   224: 0340    nop
   228: 0340    nop
   22c: 02c00404    li.d$a0, 1
   230:     .word   0x   # <== OOPS!
   234: 02c00484    addi.d  $a0, $a0, 1

Arguably this is a bug in GAS (it should at least error out for the
unsupported case where .balign 16,54525952,4 appears with -mrelax; I'd
prefer it to support the 3-operand .align directive even -mrelax for
reasons I've given in [1]).  But we can at least work it around by
removing ASM_OUTPUT_ALIGN_WITH_NOP to allow using GCC 13.3 with Binutils
2.42.

3. Without ASM_OUTPUT_ALIGN_WITH_NOP, GCC just outputs something like
".align 5" which works as expected since Binutils-2.38.

4. GCC < 14 does not have a default setting of -falign-*, so changing
this won't affect anyone who do not specify -falign-* explicitly.

[1]:https://github.com/loongson-community/discussions/issues/41#issuecomment-1925872603

Is it OK to backport r14-4674 into releases/gcc-12 and releases/gcc-13
then?


Ok, I agree with you.

Thanks!

Oops, with Binutils-2.41 GAS will fail to assemble some conditional
branches if we do this :(.

Not sure what to do (maybe backporting both this and a simplified
version of PR112330 fix?)  Let's reconsider after the holiday...

To solve this problem,based on r14-4674, r14-5434 also needs to be 
transplanted.


But I took a look and r14-5434  modified relatively many files.

So I think that without worrying about performance and ensuring that 
there is no problem


with binutils, I think we can make the following modifications:

  -/* "nop" instruction 54525952 (andi $r0,$r0,0) is
  -   used for padding.  */
  +/* ".align num,,4" will insert "nop"(andi $r0,$r0,0) into padding by
  +   default.  */
   #define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
  -  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
  +  fprintf (STREAM, "\t.align\t%d,,4\n", (LOG))

What do you think of it?

Re: [PATCH v23 32/33] c++: Implement __is_invocable built-in trait

2024-02-19 Thread Ken Matsui

On Mon, Oct 23, 2023 at 2:23 PM Jason Merrill  wrote:
>
> On 10/20/23 17:37, Patrick Palka wrote:
> > On Fri, 20 Oct 2023, Patrick Palka wrote:
> >
> >> On Fri, 20 Oct 2023, Patrick Palka wrote:
> >>
> >>> On Fri, 20 Oct 2023, Ken Matsui wrote:
> >>>
>  This patch implements built-in trait for std::is_invocable.
> >>>
> >>> Nice!  My email client unfortunately ate my first review attempt, so
> >>> apologies for my brevity this time around.
> >>>
>  gcc/cp/ChangeLog:
> 
> * cp-trait.def: Define __is_invocable.
> * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
> * semantics.cc (trait_expr_value): Likewise.
> (finish_trait_expr): Likewise.
> (is_invocable_p): New function.
> * method.h: New file to export build_trait_object in method.cc.
>
> Given how much larger semantics.cc is than method.cc, maybe let's put
> is_invocable_p in method.cc instead?  And in general declarations can go
> in cp-tree.h.
>
>  diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
>  index 7cccbae5287..cc2e400531a 100644
>  --- a/gcc/cp/semantics.cc
>  +++ b/gcc/cp/semantics.cc
>  @@ -45,6 +45,10 @@ along with GCC; see the file COPYING3.  If not see
>    #include "gomp-constants.h"
>    #include "predict.h"
>    #include "memmodel.h"
>  +#include "method.h"
>  +
>  +#include "print-tree.h"
>  +#include "tree-pretty-print.h"
> 
>    /* There routines provide a modular interface to perform many parsing
>   operations.  They may therefore be used during actual parsing, or
>  @@ -11714,6 +11718,133 @@ classtype_has_nothrow_assign_or_copy_p (tree 
>  type, bool assign_p)
>  return saw_copy;
>    }
> 
>  +/* Return true if FN_TYPE is invocable with the given ARG_TYPES.  */
>  +
>  +static bool
>  +is_invocable_p (tree fn_type, tree arg_types)
> >
> > (Sorry for the spam)  We'll eventually want to implement a built-in for
> > invoke_result, so perhaps we should preemptively factor out the bulk
> > of this function into a 'build_INVOKE' helper function that returns the
> > built tree?
> >
>  +{
>  +  /* ARG_TYPES must be a TREE_VEC.  */
>  +  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
>  +
>  +  /* Access check is required to determine if the given is invocable.  
>  */
>  +  deferring_access_check_sentinel acs (dk_no_deferred);
>  +
>  +  /* std::is_invocable is an unevaluated context.  */
>  +  cp_unevaluated cp_uneval_guard;
>  +
>  +  bool is_ptrdatamem;
>  +  bool is_ptrmemfunc;
>  +  if (TREE_CODE (fn_type) == REFERENCE_TYPE)
>  +{
>  +  tree deref_fn_type = TREE_TYPE (fn_type);
>  +  is_ptrdatamem = TYPE_PTRDATAMEM_P (deref_fn_type);
>  +  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (deref_fn_type);
>  +
>  +  /* Dereference fn_type if it is a pointer to member.  */
>  +  if (is_ptrdatamem || is_ptrmemfunc)
>  +  fn_type = deref_fn_type;
>  +}
>  +  else
>  +{
>  +  is_ptrdatamem = TYPE_PTRDATAMEM_P (fn_type);
>  +  is_ptrmemfunc = TYPE_PTRMEMFUNC_P (fn_type);
>  +}
>  +
>  +  if (is_ptrdatamem && TREE_VEC_LENGTH (arg_types) != 1)
>  +/* A pointer to data member with non-one argument is not invocable. 
>   */
>  +return false;
>  +
>  +  if (is_ptrmemfunc && TREE_VEC_LENGTH (arg_types) == 0)
>  +/* A pointer to member function with no arguments is not invocable. 
>   */
>  +return false;
>  +
>  +  /* Construct an expression of a pointer to member.  */
>  +  tree datum;
>  +  if (is_ptrdatamem || is_ptrmemfunc)
>  +{
>  +  tree datum_type = TREE_VEC_ELT (arg_types, 0);
>  +
>  +  /* Dereference datum.  */
>  +  if (CLASS_TYPE_P (datum_type))
>  +  {
>  +bool is_refwrap = false;
>  +
>  +tree datum_decl = TYPE_NAME (TYPE_MAIN_VARIANT (datum_type));
>  +if (decl_in_std_namespace_p (datum_decl))
>  +  {
>  +tree name = DECL_NAME (datum_decl);
>  +if (name && (id_equal (name, "reference_wrapper")))
>  +  {
>  +/* Handle std::reference_wrapper.  */
>  +is_refwrap = true;
>  +datum_type = cp_build_reference_type (datum_type, false);
>
> Why do you change datum_type from std::reference_wrapper<...> to
> std::reference_wrapper<...>&?
>
>  +  }
>  +  }
>  +
>  +datum = build_trait_object (datum_type);
>  +
>  +/* If datum_type was not std::reference_wrapper, check if it has
>  +   operator*() overload.  If datum_type was std::reference_wrapper,
>  +   avoid dereferencing the datum twice.  */
>  +if (!is_refwrap)
>  +  if (get_class_binding (datum_type, get_identifier ("operator*")))
> >>>
> >>> We probably should

Re: [PATCH] c-family, c++: Fix up handling of types which may have padding in __atomic_{compare_}exchange

2024-02-19 Thread Jakub Jelinek

On Tue, Feb 20, 2024 at 12:12:11AM +, Jason Merrill wrote:
> On 2/19/24 02:55, Jakub Jelinek wrote:
> > On Fri, Feb 16, 2024 at 01:51:54PM +, Jonathan Wakely wrote:
> > > Ah, although __atomic_compare_exchange only takes pointers, the
> > > compiler replaces that with a call to __atomic_compare_exchange_n
> > > which takes the newval by value, which presumably uses an 80-bit FP
> > > register and so the padding bits become indeterminate again.
> > 
> > The problem is that __atomic_{,compare_}exchange lowering if it has
> > a supported atomic 1/2/4/8/16 size emits code like:
> >_3 = *p2;
> >_4 = VIEW_CONVERT_EXPR (_3);
> 
> Hmm, I find that surprising; I thought of VIEW_CONVERT_EXPR as working on
> lvalues, not rvalues, based on the documentation describing it as roughly
> *(type2 *)

It works on both, if it is on the lhs, it obviously needs to be on an lvalue
and is something like
VIEW_CONVERT_EXPR (mem) = something;
and if it is on rhs, it can be on an rvalue, just reinterpret bits of
something as something else, so more like
((union { typeof (val) x; I_type y; }) (val)).y

> Now I see that gimplify_expr does what you describe, and I wonder what the
> advantage of that is.  That seems to go back to richi's r206420 for PR59471.
> r270579 for PR limited it to cases where the caller wants an rvalue; perhaps
> it should also be avoided when the operand is an INDIRECT_REF?

Strangely we don't even try to optimize it, even at -O2 that
  _3 = *p2_2(D);
  _4 = VIEW_CONVERT_EXPR (_3);
stays around until optimized.  I believe VIEW_CONVERT_EXPR is valid
around a memory reference, so it could be either
  _4 = VIEW_CONVERT_EXPR (*p2_2(D));
or the MEM_REF with aliasing info from original pointer and type from VCE.
For optimizations, guess it is a matter of writing some match.pd rule, but
we can't rely on it for the atomics.

Doing it in the gimplifier is possible, but not sure how to do that easily,
given that the VIEW_CONVERT_EXPR argument can be either lvalue or rvalue
and we need to gimplify it first.
The first part is exactly what forces it into a separate SSA_NAME for the
load vs. VIEW_CONVERT_EXPR around it:

case VIEW_CONVERT_EXPR:
  if ((fallback & fb_rvalue)
  && is_gimple_reg_type (TREE_TYPE (*expr_p))
  && is_gimple_reg_type (TREE_TYPE (TREE_OPERAND (*expr_p, 0
{
  ret = gimplify_expr (_OPERAND (*expr_p, 0), pre_p,
   post_p, is_gimple_val, fb_rvalue);
  recalculate_side_effects (*expr_p);
  break;
}
  /* Fallthru.  */

case ARRAY_REF:
case ARRAY_RANGE_REF:
case REALPART_EXPR:
case IMAGPART_EXPR:
case COMPONENT_REF:
  ret = gimplify_compound_lval (expr_p, pre_p, post_p,
fallback ? fallback : fb_rvalue);
  break;

but if we do the gimplify_compound_lval, we'd actually force it to be
addressable (with possible errors if that isn't possible) etc.
Having just a special-case of VIEW_CONVERT_EXPR of INDIRECT_REF and
do gimplify_compound_lval in that case mikght be wrong I think if
e.g. INDIRECT_REF operand is ADDR_EXPR, shouldn't we cancel *& in that case
and still not force it into memory?

The INDIRECT_REF: case is:
  {
bool volatilep = TREE_THIS_VOLATILE (*expr_p);
bool notrap = TREE_THIS_NOTRAP (*expr_p);
tree saved_ptr_type = TREE_TYPE (TREE_OPERAND (*expr_p, 0));

*expr_p = fold_indirect_ref_loc (input_location, *expr_p);
if (*expr_p != save_expr)
  {
ret = GS_OK;
break;
  }

ret = gimplify_expr (_OPERAND (*expr_p, 0), pre_p, post_p,
 is_gimple_reg, fb_rvalue);
if (ret == GS_ERROR)
  break;

recalculate_side_effects (*expr_p);
*expr_p = fold_build2_loc (input_location, MEM_REF,
   TREE_TYPE (*expr_p),
   TREE_OPERAND (*expr_p, 0),
   build_int_cst (saved_ptr_type, 0));
TREE_THIS_VOLATILE (*expr_p) = volatilep;
TREE_THIS_NOTRAP (*expr_p) = notrap;
ret = GS_OK;
break;
  }
so I think if we want to special case VIEW_CONVERT_EXPR on INDIRECT_REF,
we should basically copy and adjust that to the start of the case
VIEW_CONVERT_EXPR:.  In particular, if fold_indirect_ref_loc did something
and it isn't a different INDIRECT_REF or something addressable, do what it
does now (i.e. possibly treat as lvalue), otherwise gimplify the INDIRECT_REF
operand and build a MEM_REF, but with the type of the VIEW_CONVERT_EXPR but
still saved_ptr_type of the INDIRECT_REF.

Though, that all still feels like an optimization rather than guarantee
which is what we need for the atomics.

Re: [PATCH] c-family, c++: Fix up handling of types which may have padding in __atomic_{compare_}exchange

2024-02-19 Thread Jason Merrill


On 2/19/24 02:55, Jakub Jelinek wrote:

On Fri, Feb 16, 2024 at 01:51:54PM +, Jonathan Wakely wrote:

Ah, although __atomic_compare_exchange only takes pointers, the
compiler replaces that with a call to __atomic_compare_exchange_n
which takes the newval by value, which presumably uses an 80-bit FP
register and so the padding bits become indeterminate again.


The problem is that __atomic_{,compare_}exchange lowering if it has
a supported atomic 1/2/4/8/16 size emits code like:
   _3 = *p2;
   _4 = VIEW_CONVERT_EXPR (_3);


Hmm, I find that surprising; I thought of VIEW_CONVERT_EXPR as working 
on lvalues, not rvalues, based on the documentation describing it as 
roughly *(type2 *)


Now I see that gimplify_expr does what you describe, and I wonder what 
the advantage of that is.  That seems to go back to richi's r206420 for 
PR59471.  r270579 for PR limited it to cases where the caller wants an 
rvalue; perhaps it should also be avoided when the operand is an 
INDIRECT_REF?



so if long double or some small struct etc. has some carefully filled
padding bits, those bits can be lost on the assignment.  The library call
for __atomic_{,compare_}exchange would actually work because it woiuld
load the value from memory using integral type or memcpy.
E.g. on
void
foo (long double *a, long double *b, long double *c)
{
   __atomic_compare_exchange (a, b, c, false, __ATOMIC_RELAXED, 
__ATOMIC_RELAXED);
}
we end up with -O0 with:
fldt(%rax)
fstpt   -48(%rbp)
movq-48(%rbp), %rax
movq-40(%rbp), %rdx
i.e. load *c from memory into 387 register, store it back to uninitialized
stack slot (the padding bits are now random in there) and then load a
__uint128_t (pair of GPR regs).  The problem is that we first load it using
whatever type the pointer points to and then VIEW_CONVERT_EXPR that value:
   p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
   p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
The following patch fixes that by creating a MEM_REF instead, with the
I_type type, but with the original pointer type on the second argument for
aliasing purposes, so we actually preserve the padding bits that way.
I've done that for types which may have padding and also for
non-integral/pointer types, because I fear even on floating point types
like double or float which don't have padding bits the copying through
floating point could misbehave in presence of sNaNs or unsupported bit
combinations.
With this patch instead of the above assembly we emit
movq8(%rax), %rdx
movq(%rax), %rax
I had to add support for MEM_REF in pt.cc, though with the assumption
that it has been already originally created with non-dependent
types/operands (which is the case here for the __atomic*exchange lowering).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-02-19  Jakub Jelinek  

gcc/c-family/
* c-common.cc (resolve_overloaded_atomic_exchange): For
non-integral/pointer types or types which may need padding
instead of setting p1 to VIEW_CONVERT_EXPR (*p1), set it to
MEM_REF with p1 and (typeof (p1)) 0 operands and I_type type.
(resolve_overloaded_atomic_compare_exchange): Similarly for p2.
gcc/cp/
* pt.cc (tsubst_expr): Handle MEM_REF.
gcc/testsuite/
* g++.dg/ext/atomic-5.C: New test.

--- gcc/c-family/c-common.cc.jj 2024-02-16 17:33:43.995739790 +0100
+++ gcc/c-family/c-common.cc2024-02-17 11:11:34.029474214 +0100
@@ -7794,8 +7794,23 @@ resolve_overloaded_atomic_exchange (loca
p0 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p0);
(*params)[0] = p0;
/* Convert new value to required type, and dereference it.  */
-  p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
-  p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
+  if ((!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (p1)))
+   && !POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (p1
+  || clear_padding_type_may_have_padding_p (TREE_TYPE (TREE_TYPE (p1
+{
+  /* If *p1 type can have padding or may involve floating point which
+could e.g. be promoted to wider precision and demoted afterwards,
+state of padding bits might not be preserved.  */
+  build_indirect_ref (loc, p1, RO_UNARY_STAR);
+  p1 = build2_loc (loc, MEM_REF, I_type,
+  build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1),
+  build_zero_cst (TREE_TYPE (p1)));
+}
+  else
+{
+  p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
+  p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
+}
(*params)[1] = p1;
  
/* Move memory model to the 3rd position, and end param list.  */

@@ -7874,8 +7889,23 @@ resolve_overloaded_atomic_compare_exchan
(*params)[1] = p1;
  
/* Convert desired value to required type, and dereference it.  */

-  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
-  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
+  if ((!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (p2)))
+   &&

[pushed] analyzer: fix -Wanalyzer-va-arg-type-mismatch false +ve on int types [PR111289]

2024-02-19 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9076-g5651ad62b08096.

gcc/analyzer/ChangeLog:
PR analyzer/111289
* varargs.cc (representable_in_integral_type_p): New.
(va_arg_compatible_types_p): Add "arg_sval" param.  Handle integer
types.
(kf_va_arg::impl_call_pre): Pass arg_sval to
va_arg_compatible_types_p.

gcc/testsuite/ChangeLog:
PR analyzer/111289
* c-c++-common/analyzer/stdarg-pr111289-int.c: New test.
* c-c++-common/analyzer/stdarg-pr111289-ptr.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/varargs.cc   | 38 --
 .../analyzer/stdarg-pr111289-int.c| 69 +++
 .../analyzer/stdarg-pr111289-ptr.c| 39 +++
 3 files changed, 142 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-int.c
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-ptr.c

diff --git a/gcc/analyzer/varargs.cc b/gcc/analyzer/varargs.cc
index ac0e1cc1af3..3348121a0ef 100644
--- a/gcc/analyzer/varargs.cc
+++ b/gcc/analyzer/varargs.cc
@@ -950,13 +950,43 @@ public:
   }
 };
 
-/* Return true if it's OK to copy a value from ARG_TYPE to LHS_TYPE via
+static bool
+representable_in_integral_type_p (const svalue , const_tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (tree cst = sval.maybe_get_constant ())
+return wi::fits_to_tree_p (wi::to_wide (cst), type);
+
+  return true;
+}
+
+/* Return true if it's OK to copy ARG_SVAL from ARG_TYPE to LHS_TYPE via
va_arg (where argument promotion has already happened).  */
 
 static bool
-va_arg_compatible_types_p (tree lhs_type, tree arg_type)
+va_arg_compatible_types_p (tree lhs_type, tree arg_type, const svalue 
_sval)
 {
-  return compat_types_p (arg_type, lhs_type);
+  if (compat_types_p (arg_type, lhs_type))
+return true;
+
+  /* It's OK if both types are integer types, where one is signed and the
+ other type the corresponding unsigned type, when the value is
+ representable in both types.  */
+  if (INTEGRAL_TYPE_P (lhs_type)
+  && INTEGRAL_TYPE_P (arg_type)
+  && TYPE_UNSIGNED (lhs_type) != TYPE_UNSIGNED (arg_type)
+  && TYPE_PRECISION (lhs_type) == TYPE_PRECISION (arg_type)
+  && representable_in_integral_type_p (arg_sval, lhs_type)
+  && representable_in_integral_type_p (arg_sval, arg_type))
+return true;
+
+  /* It's OK if one type is a pointer to void and the other is a
+ pointer to a character type.
+ This is handled by compat_types_p.  */
+
+  /* Otherwise the types are not compatible.  */
+  return false;
 }
 
 /* If AP_SVAL is a pointer to a var_arg_region, return that var_arg_region.
@@ -1022,7 +1052,7 @@ kf_va_arg::impl_call_pre (const call_details ) const
{
  tree lhs_type = cd.get_lhs_type ();
  tree arg_type = arg_sval->get_type ();
- if (va_arg_compatible_types_p (lhs_type, arg_type))
+ if (va_arg_compatible_types_p (lhs_type, arg_type, *arg_sval))
cd.maybe_set_lhs (arg_sval);
  else
{
diff --git a/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-int.c 
b/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-int.c
new file mode 100644
index 000..33d83169c3e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-int.c
@@ -0,0 +1,69 @@
+#include 
+#include 
+#include 
+
+typedef unsigned int mode_t;
+
+extern void openat (int, const char *, int, mode_t);
+
+/* Signed vs unsigned of same integral type.  */
+
+static void
+test_1 (char const *name, ...)
+{
+  va_list arg;
+  va_start (arg, name);
+
+  mode_t mode = va_arg (arg, mode_t); /* { dg-bogus 
"-Wanalyzer-va-arg-type-mismatch" } */
+
+  va_end (arg);
+  openat (-42, name, 0, mode);
+}
+
+void
+call_test_1 ()
+{
+  test_1 ("nonexist.ent/", 0600);
+}
+
+/* Not the same size: small enough for int promotion.  */
+
+int16_t global_2;
+
+static void
+test_2 (char const *name, ...)
+{
+  va_list arg;
+  va_start (arg, name);
+
+  global_2 = va_arg (arg, int16_t); /* { dg-warning "promoted to 'int'" } */
+
+  va_end (arg);
+}
+
+void
+call_test_2 ()
+{
+  test_2 ("nonexist.ent/", 42);
+}
+
+/* Not the same size: too big for int promotion.  */
+
+long long global_3;
+
+static void
+test_3 (char const *name, ...)
+{
+  va_list arg;
+  va_start (arg, name);
+
+  global_3 = va_arg (arg, long long); /* { dg-warning "'va_arg' expected 'long 
long int' but received 'int' for variadic argument 1 of 'arg'" } */
+
+  va_end (arg);
+}
+
+void
+call_test_3 ()
+{
+  test_3 ("nonexist.ent/", 42);
+}
diff --git a/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-ptr.c 
b/gcc/testsuite/c-c++-common/analyzer/stdarg-pr111289-ptr.c
new file mode 100644
index 000..7bdbf256d59
---

[pushed] analyzer, testsuite: add regression test [PR110520]

2024-02-19 Thread David Malcolm

Tested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9075-geb37ea529745c3.

gcc/testsuite/ChangeLog:
PR analyzer/110520
* c-c++-common/analyzer/null-deref-pr110520.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/c-c++-common/analyzer/null-deref-pr110520.c | 8 
 1 file changed, 8 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/null-deref-pr110520.c

diff --git a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr110520.c 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr110520.c
new file mode 100644
index 000..b57027689ee
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr110520.c
@@ -0,0 +1,8 @@
+#include "analyzer-decls.h"
+
+int main(void) {
+char buf[] = "0";
+int *ptr = (int *)(__builtin_strlen(buf) - 1);
+__analyzer_eval((__builtin_strlen(buf)) == 1); /* { dg-warning "TRUE" } */
+*ptr = 10086; /* { dg-warning "dereference of NULL 'ptr'" } */
+}
-- 
2.26.3

[PATCH] rs6000: Update instruction counts due to combine changes [PR112103]

2024-02-19 Thread Peter Bergner

rs6000: Update instruction counts due to combine changes [PR112103]

The PR91865 combine fix changed instruction counts slightly for rlwinm-0.c.
Adjust expected instruction counts accordingly.

This passed on both powerpc64le-linux and powerpc64-linux running the
testsuite in both 32-bit and 64-bit modes.  Ok for trunk?

FYI, I will open a new bug to track the removing of the superfluous
insns detected in PR112103.


Peter


gcc/testsuite/
PR target/112103
* gcc.target/powerpc/rlwinm-0.c: Adjust expected instruction counts.

diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c 
b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
index 4f4fca2d8ef..a10d9174306 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
@@ -4,10 +4,10 @@
 /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 6739 { target ilp32 } } } 
*/
 /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9716 { target lp64 } } } 
*/
 /* { dg-final { scan-assembler-times {(?n)^\s+blr} 3375 } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } } } 
*/
+/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3090 { target lp64 } } } 
*/
 
 /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 } } 
} */
-/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } } } 
*/
+/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3084 { target lp64 } } } 
*/
 /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */

[patch] OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

2024-02-19 Thread Tobias Burnus


While waiting for some testing to finish, I got distracted and added the
very low hanging OpenACC 3.3 fruits, i.e. those Fortran routines that directly
map to their C counter part.

Comments, remarks?

Tobias
OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

These routines map simply to the C counterpart and are meanwhile
defined in OpenACC 3.3. (There are additional routine changes,
including the Fortran addition of acc_attach/acc_detach, that
require more work than a simple addition of an interface and
are therefore excluded.)

libgomp/ChangeLog:

	* libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3
	routines that simply map to their C counterpart.
	* openacc.f90 (openacc_internal, openacc): Add them.
	* openacc_lib.h: Likewise.
	* testsuite/libgomp.fortran/acc_host_device_ptr.f90: New test.
	* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test.

 libgomp/libgomp.texi   | 171 -
 libgomp/openacc.f90| 101 ++--
 libgomp/openacc_lib.h  |  94 ++-
 .../libgomp.fortran/acc_host_device_ptr.f90|  43 ++
 .../testsuite/libgomp.oacc-fortran/acc-memcpy.f90  |  47 ++
 5 files changed, 399 insertions(+), 57 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index f57190f203c..d7da799a922 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -2157,8 +2157,6 @@ dimensions.
 Running this routine in a @code{target} region is not supported except on
 the initial device.
 
-
-
 @item @emph{C/C++}
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_target_memcpy_rect_async(void *dst,}
@@ -4684,7 +4682,6 @@ returns @code{false}.
 @item   @tab @code{logical acc_on_device}
 @end multitable
 
-
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
 3.2.17.
@@ -4696,17 +4693,24 @@ returns @code{false}.
 @section @code{acc_malloc} -- Allocate device memory.
 @table @asis
 @item @emph{Description}
-This function allocates @var{len} bytes of device memory. It returns
+This function allocates @var{bytes} of device memory. It returns
 the device address of the allocated memory.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
+@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t bytes);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{type(c_ptr) function acc_malloc(bytes)}
+@item   @tab @code{integer(c_size_t), value :: bytes}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.18.
+3.2.18.  @uref{https://www.openacc.org, openacc specification v3.3}, section
+3.2.16.
 @end table
 
 
@@ -4715,16 +4719,23 @@ the device address of the allocated memory.
 @section @code{acc_free} -- Free device memory.
 @table @asis
 @item @emph{Description}
-Free previously allocated device memory at the device address @code{a}.
+Free previously allocated device memory at the device address @code{data_dev}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
+@item @emph{Prototype}: @tab @code{void acc_free(d_void *data_dev);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_free(data_dev)}
+@item   @tab @code{type(c_ptr), value :: data_dev}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.19.
+3.2.19.  @uref{https://www.openacc.org, openacc specification v3.3}, section
+3.2.17.
 @end table
 
 
@@ -5092,17 +5103,26 @@ array element and @var{len} specifies the length in bytes.
 @table @asis
 @item @emph{Description}
 This function maps previously allocated device and host memory. The device
-memory is specified with the device address @var{d}. The host memory is
-specified with the host address @var{h} and a length of @var{len}.
+memory is specified with the device address @var{data_dev}. The host memory is
+specified with the host address @var{data_arg} and a length of @var{bytes}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
+@item @emph{Prototype}: @tab @code{void acc_map_data(h_void *data_arg, d_void *data_dev, size_t bytes);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_map_data(data_arg, data_dev, bytes)}
+@item   @tab @code{type(*), dimension(*) :: data_arg}
+@item   @tab @code{type(c_ptr), value :: data_dev}
+@item   @tab

[Committed] analyzer: Fix maybe_undo_optimize_bit_field_compare vs non-scalar types [PR113983]

2024-02-19 Thread Andrew Pinski

After r14-6419-g4eaaf7f5a378e8, maybe_undo_optimize_bit_field_compare would ICE 
on
vector CST but this function really should be checking if we had integer types 
so
reject non-integral types early on (like it was doing for non-char type before 
r14-6419-g4eaaf7f5a378e8).

Committed as obvious after build and tested for aarch64-linux-gnu with no 
regressions.

PR analyzer/113983

gcc/analyzer/ChangeLog:

* region-model-manager.cc (maybe_undo_optimize_bit_field_compare): 
Reject
non integral types.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/torture/vector-extract-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/analyzer/region-model-manager.cc   |  3 +++
 .../gcc.dg/analyzer/torture/vector-extract-1.c | 14 ++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/vector-extract-1.c

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 62f808a81c2..21e13b48025 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -602,6 +602,9 @@ maybe_undo_optimize_bit_field_compare (tree type,
   tree cst,
   const svalue *arg1)
 {
+  if (!INTEGRAL_TYPE_P (type))
+return NULL;
+
   const binding_map  = compound_sval->get_map ();
   unsigned HOST_WIDE_INT mask = TREE_INT_CST_LOW (cst);
   /* If "mask" is a contiguous range of set bits, see if the
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/vector-extract-1.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/vector-extract-1.c
new file mode 100644
index 000..5b878e6e4e2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/vector-extract-1.c
@@ -0,0 +1,14 @@
+/* PR analyzer/113983  */
+
+/* maybe_undo_optimize_bit_field_compare used to ICE on this
+   because it was not checking for only integer types. */
+
+typedef int __attribute__((__vector_size__(8))) V;
+int i;
+
+V
+foo(void)
+{
+  V v = (V){};
+  return (0, 0) * (i & v);
+}
-- 
2.43.0

Re: Typo on GCC 14 porting_to page

2024-02-19 Thread Jonathan Wakely

On Mon, 19 Feb 2024 at 20:24, peter0x44 wrote:
>
> I was reading the GCC 14 porting to page and I noticed:
>
>   Alternatively, projects using using Autoconf could enable
> AC_USE_SYSTEM_EXTENSIONS.
>
> "using using" should be "using".
>
> I read over the rest and didn't notice anything else wrong.
>
> Thanks,
> Peter D.

Thanks. I've fixed it with the attached patch, pushed to wwwdocs.
commit d472f751802f95635997649ea7ec71e4f725aa50
Author: Jonathan Wakely 
Date:   Mon Feb 19 20:44:12 2024 +

Fix "using using" typo

diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
index 901a1653..35274691 100644
--- a/htdocs/gcc-14/porting_to.html
+++ b/htdocs/gcc-14/porting_to.html
@@ -146,7 +146,7 @@ standard C mode, which can result in implicit function 
declarations.
 To address this, the -std=c11 option can be
 dropped, -std=gnu11 can be used instead,
 or -std=c11 -D_DEFAULT_SOURCE can be used re-enable
-common extensions.  Alternatively, projects using using Autoconf
+common extensions.  Alternatively, projects using Autoconf
 could enable AC_USE_SYSTEM_EXTENSIONS.

Re: [PATCH RFA] build: drop target libs from LD_LIBRARY_PATH [PR105688]

2024-02-19 Thread Iain Sandoe




> On 16 Feb 2024, at 21:05, Jason Merrill  wrote:
> 
> On 2/14/24 18:33, Iain Sandoe wrote:
>>> On 14 Feb 2024, at 22:59, Iain Sandoe  wrote:
 On 12 Feb 2024, at 19:59, Jason Merrill  wrote:
 
 On 2/10/24 07:30, Iain Sandoe wrote:
>> On 10 Feb 2024, at 12:07, Jason Merrill  wrote:
>> 
>> On 2/10/24 05:46, Iain Sandoe wrote:
 On 9 Feb 2024, at 23:21, Iain Sandoe  wrote:
 
 
 
> On 9 Feb 2024, at 10:56, Iain Sandoe  wrote:
>> On 8 Feb 2024, at 21:44, Jason Merrill  wrote:
>> 
>> On 2/8/24 12:55, Paolo Bonzini wrote:
>>> On 2/8/24 18:16, Jason Merrill wrote:
>> 
> 
> Hmm.  In stage 1, when we build with the system gcc, I'd think we 
> want the just-built gnat1 to find the system libgcc.
> 
> In stage 2, when we build with the stage 1 gcc, we want the 
> just-built gnat1 to find the stage 1 libgcc.
> 
> In neither case do we want it to find the libgcc from the current 
> stage.
> 
> So it seems to me that what we want is for stage2+ 
> LD_LIBRARY_PATH to include the TARGET_LIB_PATH from the previous 
> stage.  Something like the below, on top of the earlier patch.
> 
> Does this make sense?  Does it work on Darwin?
 
 Oops, that was broken, please consider this one instead:
>>> Yes, this one makes sense (and the current code would not work 
>>> since it lacks the prev- prefix on TARGET_LIB_PATH).
>> 
>> Indeed, that seems like evidence that the only element of 
>> TARGET_LIB_PATH that has been useful in HOST_EXPORTS is the prev- 
>> part of HOST_LIB_PATH_gcc.
>> 
>> So, here's another patch that just includes that for post-stage1:
>> <0001-build-drop-target-libs-from-LD_LIBRARY_PATH-PR105688.patch>
> 
> Hmm this still fails for me with gnat1 being unable to find libgcc_s.
> It seems I have to add the PREV_HOST_LIB_PATH_gcc to HOST_LIB_PATH 
> for it to succeed so,
> presumably, the post stage1 exports are not being forwarded to that 
> build.  I’ll try to analyze what
> exactly is failing.
 
 The fail is occurring in the target libada build; so, I suppose, one 
 might say it’s reasonable that it
 requires this host path to be added to the target exports since it’s a 
 host library used during target
 builds (or do folks expect the host exports to be made for target lib 
 builds as well?)
 
 Appending the prev-gcc dirctory to the HOST_LIB_PATH fixes this
>>> Hmm this is still not right, in this case, I think it should actually 
>>> be the “just built” directory;
>>> - if we have a tool that depends on host libraries (that happen to be 
>>> also target ones),
>>>  then those libraries have to be built before the tool so that they can 
>>> be linked to it.
>>>  (we specially copy libgcc* and the CRTs to gcc/ to allow for this case)
>>> - there is no prev-gcc in cross and —disable-bootstrap builds, but the 
>>> tool will still be
>>>   linked to the just-built host libraries (which will also be 
>>> installed).
>>> So, I think we have to add HOST_LIB_PATH_gcc to HOST_LIB_PATH
>>> and HOST_PREV_LIB_PATH_gcc to POSTSTAGE1_HOST_EXPORTS (as per this 
>>> patch).
>> 
>> I don't follow.  In a cross build, host libraries are a different 
>> architecture from target libraries, and certainly can't be linked into 
>> host binaries.
>> 
>> In a disable-bootstrap build, even before my change TARGET_LIB_PATH 
>> isn't added to RPATH_ENVVAR, since that has been guarded with @if 
>> gcc-bootstrap.
>> 
>> So in a bootstrap build, it shouldn't be needed for stage1 either.  And 
>> for stage2, the one we need is from stage1, that matches the compiler 
>> we're building host tools with.
>> 
>> What am I missing?
> nothing, I was off on a tangent about the cross/non-bootstrap, sorry 
> about that.
> However, when doing target builds (the previous point) it seems we do 
> have to make provision for gnat1 to find libgcc_s, and, at present, it 
> seems that only the target exports are active.
 
 Ah, I see: When building target libraries in stage2, we run the stage2 
 compiler that needs the stage1 libgcc_s, but we don't have the 
 HOST_EXPORTS because we're building target code, so we also need to get 
 the libgcc path into TARGET_EXPORTS.
 
 Since TARGET_LIB_PATH is only added when gcc-bootstrap, I guess the 
 previous libgcc is the only piece needed in TARGET_EXPORTS as well.  So, 
 how about this version of the patch?
>>> 
>>> I tested this one on an affected

[PATCH wwwdocs COMMITTED] gcc-14: Fix unintentional error in -Wimplicit-int example

2024-02-19 Thread Florian Weimer

diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
index bbbaa25a..901a1653 100644
--- a/htdocs/gcc-14/porting_to.html
+++ b/htdocs/gcc-14/porting_to.html
@@ -92,7 +92,7 @@ below).  In the example below, the type of s 
should be
 
   write_string (fd, s)
   {
-write (1, s, strlen (s));
+write (fd, s, strlen (s));
   }
 
 
@@ -103,7 +103,7 @@ disregarding error handling and short writes):
   void
   write_string (int fd, const char *s)
   {
-write (1, s, strlen (s));
+write (fd, s, strlen (s));
   }

Re: [PATCH] Fortran: fix passing array component to polymorphic argument [PR105658]

2024-02-19 Thread Harald Anlauf


Hi Peter,

On 2/19/24 16:19, Peter Hill wrote:

Hi Harald,

Thanks for your help, please see the updated and signed-off patch below.


great!  This is fine, and I'll commit it tomorrow unless others
have further comments.


It also occurred to me that array temporaries aren't _required_ here
(for arrays of derived type components), but in the general case with
a type with differently sized components, the stride wouldn't be a
multiple of the component's type's size. Is it possible in principle
to have an arbitrary stride?


It is possible to have an arbitrary (fixed, non-unit) stride,
but it is not always taken advantage of.

If you take the last version of the testcase and compile with
option -fdump-tree-original, you can see that the cases commented
with "no temp needed" actually create a suitable descriptor.
E.g.

call print_poly (uu(2,2::2))

becomes:

{
  struct __class__STAR_1_0t class.28;
  struct array01_integer(kind=4) parm.29;

  class.28._vptr = (struct __vtype__STAR * {ref-all}) 
&__vtab_INTEGER_4_;

  parm.29.span = 4;
  parm.29.dtype = {.elem_len=4, .version=0, .rank=1, .type=1};
  parm.29.dim[0].lbound = 1;
  parm.29.dim[0].ubound = 3;
  parm.29.dim[0].stride = 10;
  parm.29.data = (void *) [6];
  parm.29.offset = -10;
  class.28._data = parm.29;
  class.28._len = 0;
  print_poly ();
}

Since we know that 'uu' is a contiguous array, we can calculate
the stride (10) for the 1-d section.

The case of the section of the character array is quite similar,
but the variant with the substring reference would need further
work to avoid the temporary.  (It would be possible.)

But as you say, the general case, which may involve types/classes,
does not map to a simple descriptor.

Thanks for your patch!

Harald

Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-19 Thread Segher Boessenkool

Hi!

On Tue, Jan 16, 2024 at 10:50:01AM +0800, Kewen.Lin wrote:
> As PR109987 and its duplicated bugs show, -mno-power8-vector
> (and -mno-power9-vector) cause some problems and as Segher
> pointed out in [1] they are workaround options, so this patch
> is to remove -m{no,}-power{8,9}-options.

Excellent :-)

> Like what we did
> for option -mdirect-move before, this patch still keep the
> corresponding internal flags and they are automatically set
> based on -mcpu.

Yup.  That makes the code nicer, and it what we already have anyway!

> The test suite update takes some efforts,

Yeah :-/

> it consists of some aspects:
>   - effective target powerpc_p{8,9}vector_ok are removed
> and replaced with powerpc_vsx_ok.

So all such testcases already arrange to have p8 or p9 some other way?

>   - Some cases having -mpower{8,9}-vector are updated with
> -mvsx, some of them already have -mdejagnu-cpu.  For
> those that don't have -mdejagnu-cpu, if -mdejagnu-cpu
> is needed for the test point, then it's appended;
> otherwise, add additional-options -mdejagnu-cpu=power{8,9}
> if has_arch_pwr{8,9} isn't satisfied.

Yeah it's a judgement call every time.

>   - Some test cases are updated with explicit -mvsx.
>   - Some test cases with those two option mixed are adjusted
> to keep the test points, like -mpower8-vector
> -mno-power9-vector are updated with -mdejagnu-cpu=power8
> -mvsx etc.

-mcpu=power8 implies -mvsx already.

>   - Some test cases with -mno-power{8,9}-vector are updated
> by replacing -mno-power{8,9}-vector with -mno-vsx, or
> just removing it.

Okay.

>   - For some cases, we don't always specify -mdejagnu-cpu to
> avoid to restrict the testing coverage, it would check
> has_arch_pwr{8,9} and appended that as need.

That is in general how all tests should be.  Very sometimes we want to
test for a specific CPU, for a regression test that exhibited just on a
certain CPU for example.  But we should never have a -mcpu= (or a
-mpowerN-vector nastiness thing) to test things on a new CPU!  Just do a
testsuite ruyn with *that* CPU.  Not many years from now, *all* CPUs
will have those new instructions anyway, so let's not put noise in the
testcases that will be irrelevant soon.

>   - For vect test cases run, it doesn't specify -mcpu=power9
> for power10 and up.
> 
> Bootstrapped and regtested on:
>   - powerpc64-linux-gnu P7/P8/P9 {-m32,-m64}
>   - powerpc64le-linux-gnu P8/P9/P10

In general it is nice to test 970 as the lowest vector thing we have,
abnd/or p4 as a target without anything vector, as well.  But I expect
thoise will just work for this patch :-)

> Although it's stage4 now, as the discussion in PR113115 we
> are still eager to neuter these two options.

It is mostly a testsuite patch, and testcase patches are fine (and much
wanted!) in stage 4.  The actual compiler options remain, and behaviour
does not change for anyone who used the option as intended,

Okay for trunk.  Thanks!  Comments below:

>   * config/rs6000/rs6000.opt: Make option power{8,9}-vector as
>   WarnRemoved.

Do we want this, or do we want it silent?  Should we remove the options
later, if we now warn for it?

>  (define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]"
> -  "@internal Like @code{wa}, if @option{-mpower9-vector} and @option{-m64} 
> are
> -   used; otherwise, @code{NO_REGS}.")
> +  "@internal Like @code{wa}, if the cpu type is power9 or up, meanwhile
> +   @option{-mvsx} and @option{-m64} are used; otherwise, @code{NO_REGS}.")

"if this is a POWER9 or later and @option{-mvsx} and @option{-m64} are
used".  How clumsy.  Maybe we should make the patterns that use "we"
work without mtvsrdd as well?  Hrm, they will still require 64-bit GPRs
of course, unless we can do something tricky.

We do not need the special constraint at all of course (we can add these
conditions to all patterns that use it: all *two* patterns).  So maybe
that's what we should do :-)

> -If you use the ISA 3.0 instruction set (@option{-mpower9-vector} or
> -@option{-mcpu=power9}) on a 64-bit system, the IEEE 128-bit floating
> -point support will also enable the generation of ISA 3.0 IEEE 128-bit
> -floating point instructions.  Otherwise, if you do not specify to
> -generate ISA 3.0 instructions or you are targeting a 32-bit big endian
> -system, IEEE 128-bit floating point will be done with software
> -emulation.
> +If you use the ISA 3.0 instruction set (@option{-mcpu=power9}) on a
> +64-bit system, the IEEE 128-bit floating point support will also enable
> +the generation of ISA 3.0 IEEE 128-bit floating point instructions.
> +Otherwise, if you do not specify to generate ISA 3.0 instructions or you
> +are targeting a 32-bit big endian system, IEEE 128-bit floating point
> +will be done with software emulation.

You do not need to reformat documentation source code: it is
automatically formatted in all output formats and all viewers :-)

> diff --git

Re: [PATCH][_GLIBCXX_DEBUG] Fix std::__niter_base behavior

2024-02-19 Thread François Dumont

Turns out that 23_containers/vector/erasure.cc was showing the problem 
in _GLIBCXX_DEBUG mode.

I had only run 25_algorithms tests in _GLIBCXX_DEBUG mode.

This is what I'm testing, I 'll let you know tomorrow morning if all 
successful.

Of course feel free to do or ask for a revert instead.

François

On 19/02/2024 09:21, Jonathan Wakely wrote:

On Mon, 19 Feb 2024, 08:12 Jonathan Wakely,  wrote:

On Mon, 19 Feb 2024, 07:08 Stephan Bergmann, 
wrote:

On 2/17/24 15:14, François Dumont wrote:
> Thanks for the link, tested and committed.

I assume this is the cause for the below failure now,

Yes, the new >= C++11 overload of __niter_base recursively unwraps
multiple layers of wrapping, so that a safe iterator wrapping a
normal iterator wrapping a pointer is unwrapped to just a pointer.
But then __niter_wrap doesn't restore both layers.

Actually that's not the problem. __niter_wrap would restore both 
layers, except that it uses __niter_base itself:

>   347 |     { return __from + (__res - std::__niter_base(__from)); }
>       |  ~~~^~~~

And it seems to be getting called with the wrong types. Maybe that's 
just a bug in std:: erase or maybe niter_wrap needs adjusting.

I'll check in a couple of hours if François doesn't get to it first.

I have to wonder how this wasn't caught by existing tests though.

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 0f73da13172..d534e02871f 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -344,7 +344,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX20_CONSTEXPR
 inline _From
 __niter_wrap(_From __from, _To __res)
-{ return __from + (__res - std::__niter_base(__from)); }
+{ return __from + (std::__niter_base(__res) - std::__niter_base(__from)); }

   // No need to wrap, iterator already has the right type.
   template

New Spanish PO file for 'cpplib' (version 14.1-b20240218)

2024-02-19 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Spanish team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/es.po

(This file, 'cpplib-14.1-b20240218.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Contents of PO file 'cpplib-14.1-b20240218.es.po'

2024-02-19 Thread Translation Project Robot



cpplib-14.1-b20240218.es.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

Contents of PO file 'cpplib-14.1-b20240218.ro.po'

2024-02-19 Thread Translation Project Robot



cpplib-14.1-b20240218.ro.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

New Romanian PO file for 'cpplib' (version 14.1-b20240218)

2024-02-19 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Romanian team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/ro.po

(This file, 'cpplib-14.1-b20240218.ro.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

GCN: Restore lost 'gfx90a' target CPU definition (was: [Patch] GCN: Add pre-initial support for gfx1100)

2024-02-19 Thread Thomas Schwinge

Hi!

On 2024-01-07T20:20:19+0100, Tobias Burnus  wrote:
> --- a/gcc/config/gcn/gcn.h
> +++ b/gcc/config/gcn/gcn.h
> @@ -30,6 +30,8 @@
>   builtin_define ("__CDNA2__");  \
>else if (TARGET_RDNA2) 
>   \
>   builtin_define ("__RDNA2__");  \
> +  else if (TARGET_RDNA3) 
>   \
> + builtin_define ("__RDNA3__");  \
>if (TARGET_FIJI)   
>   \
>   {  \
> builtin_define ("__fiji__"); \
> @@ -41,11 +43,13 @@
>   builtin_define ("__gfx906__"); \
>else if (TARGET_GFX908)
>   \
>   builtin_define ("__gfx908__"); \
> -  else if (TARGET_GFX90a)
>   \
> - builtin_define ("__gfx90a__"); \
> +  else if (TARGET_GFX1030)   
>   \
> + builtin_define ("__gfx1030");  \
> +  else if (TARGET_GFX1100)   
>   \
> + builtin_define ("__gfx1100__");\
>} while (0)

Supposedly it wasn't intentional that we lost gfx90a here -- I've pushed
to master branch commit 159174f25716c18a74a915cb01b9a28024ea7a3d
"GCN: Restore lost '__gfx90a__' target CPU definition", see attached.


Grüße
 Thomas


>From 159174f25716c18a74a915cb01b9a28024ea7a3d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 8 Feb 2024 23:27:19 +0100
Subject: [PATCH] GCN: Restore lost '__gfx90a__' target CPU definition

Also, add some safeguards for the future.

Fix-up for commit 52a2c659ae6c21f84b6acce0afcb9b93b9dc71a0
"GCN: Add pre-initial support for gfx1100".

	gcc/
	* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Restore lost
	'__gfx90a__' target CPU definition.  Add some safeguards for the future.
---
 gcc/config/gcn/gcn.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index a17f16aacc40..c314c7b4ae8e 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -32,6 +32,8 @@
 	builtin_define ("__RDNA2__");  \
   else if (TARGET_RDNA3)   \
 	builtin_define ("__RDNA3__");  \
+  else \
+	gcc_unreachable ();\
   if (TARGET_FIJI) \
 	{  \
 	  builtin_define ("__fiji__"); \
@@ -43,10 +45,14 @@
 	builtin_define ("__gfx906__"); \
   else if (TARGET_GFX908)  \
 	builtin_define ("__gfx908__"); \
+  else if (TARGET_GFX90a)  \
+	builtin_define ("__gfx90a__"); \
   else if (TARGET_GFX1030) \
 	builtin_define ("__gfx1030");  \
   else if (TARGET_GFX1100) \
 	builtin_define ("__gfx1100__");\
+  else \
+	gcc_unreachable ();\
   } while (0)
 
 #define ASSEMBLER_DIALECT (TARGET_RDNA2_PLUS ? 1 : 0)
-- 
2.43.0

Contents of PO file 'cpplib-14.1-b20240218.fr.po'

2024-02-19 Thread Translation Project Robot



cpplib-14.1-b20240218.fr.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

New French PO file for 'cpplib' (version 14.1-b20240218)

2024-02-19 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/fr.po

(This file, 'cpplib-14.1-b20240218.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

New Ukrainian PO file for 'cpplib' (version 14.1-b20240218)

2024-02-19 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Ukrainian team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/uk.po

(This file, 'cpplib-14.1-b20240218.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Contents of PO file 'cpplib-14.1-b20240218.uk.po'

2024-02-19 Thread Translation Project Robot



cpplib-14.1-b20240218.uk.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

New template for 'cpplib' made available

2024-02-19 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'cpplib' has been made available
to the language teams for translation.  It is archived as:

https://translationproject.org/POT-files/cpplib-14.1-b20240218.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://gcc.gnu.org/pub/gcc/snapshots/14-20240218/gcc-14-20240218.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

New template for 'gcc' made available

2024-02-19 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

https://translationproject.org/POT-files/gcc-14.1-b20240218.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://gcc.gnu.org/pub/gcc/snapshots/14-20240218/gcc-14-20240218.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Thomas Schwinge

Hi!

On 2024-02-19T17:31:20+0100, I wrote:
> On 2024-02-19T11:52:55+0100, Richard Biener  wrote:
>> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
>>> On 2024-02-16T14:53:04+0100, I wrote:
>>> > On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
>>> >> On 16/02/2024 12:26, Richard Biener wrote:
>>> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>>>  On 16/02/2024 10:17, Richard Biener wrote:
>>> > On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>>> >> On 2023-10-20T12:51:03+0100, Andrew Stubbs  
>>> >> wrote:
>>> >>> I've committed this patch
>>> >>
>>> >> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>>> >> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later 
>>> >> RDNA3/gfx1100
>>> >> support builds on top of, and that's what I'm currently working on
>>> >> getting proper GCC/GCN target (not offloading) results for.
>>> >>
>>> >> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably 
>>> >> simple,
>>> >> and hopefully representative for other SLP execution test FAILs
>>> >> (regressions compared to my earlier non-gfx1100 testing).
>>> >>
>>> >>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>>> >>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>>> >>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>>> >>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
>>> >> -fno-common
>>> >>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
>>> >>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>>> >>   source-gcc/newlib/libc/include
>>> >>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>>> >>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>>> >>   setarch,--addr-no-randomize -fdump-tree-all-all 
>>> >> -fdump-ipa-all-all
>>> >>   -fdump-rtl-all-all -save-temps -march=gfx1100
>>> >>
>>> >> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>>> >> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), 
>>> >> so I
>>> >> suppose will also exhibit the same failure mode, once again?
>>> >>
>>> >> Compared to '-march=gfx90a', the differences begin in
>>> >> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
>>> >>
>>> >> Changed like:
>>> >>
>>> >>   @@ -38,10 +38,10 @@ int main ()
>>> >>#pragma GCC novector
>>> >>  for (i = 1; i < N; i++)
>>> >>if (a[i] != i%4 + 1)
>>> >>   -  abort ();
>>> >>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>>> >>
>>> >>  if (a[0] != 5)
>>> >>   -abort ();
>>> >>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
>>> >>
>>> >> ..., we see:
>>> >>
>>> >>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>>> >>   40 5 != 1
>>> >>   41 6 != 2
>>> >>   42 7 != 3
>>> >>   43 8 != 4
>>> >>   44 5 != 1
>>> >>   45 6 != 2
>>> >>   46 7 != 3
>>> >>   47 8 != 4
>>> >>
>>> >> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
>>> >> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
>>> >> scribbled zero values over these (vector lane masking issue, 
>>> >> perhaps?),
>>> >> or some other code generation issue?
>>> >
>>>  [...], I must be doing something different because vect/bb-slp-cond-1.c
>>>  passes for me, on gfx1100.
>>> >
>>> > That's strange.  I've looked at your log file (looks good), and used your
>>> > toolchain to compile, and your 'gcn-run' to invoke, and still do get:
>>> >
>>> > $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
>>> > GCN Kernel Aborted
>>> > Kernel aborted
>>> >
>>> > Andrew, later on, please try what happens when you put an unconditional
>>> > 'abort' call into a test case?
>>> 
>>> Andrew, any luck with that yet?
>>> 
>>> Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c'
>>> execution test failure mentioned above (manual compilation and
>>> 'gcn-run')?
>>
>> No, when manually compiling/running the testcase it works fine for me.
>
> I've updated my GCC master branch sources, but it still fails for me:
>
> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ 
> source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c 
> --sysroot=install/amdgcn-amdhsa -isystem 
> build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem 
> source-gcc/newlib/libc/include -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ 
> -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -march=gfx1100 -ftree-vectorize 
> -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 
> -save-temps
> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
> GCN Kernel Aborted
> Kernel aborted
>
> Strange.
>
> In 'bb-slp-cond-1.tar.xz' I'm attaching the files I've built.  Could you
> please

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Thomas Schwinge

Hi!

On 2024-02-19T11:52:55+0100, Richard Biener  wrote:
> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
>> On 2024-02-16T14:53:04+0100, I wrote:
>> > On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
>> >> On 16/02/2024 12:26, Richard Biener wrote:
>> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>>  On 16/02/2024 10:17, Richard Biener wrote:
>> > On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>> >> On 2023-10-20T12:51:03+0100, Andrew Stubbs  
>> >> wrote:
>> >>> I've committed this patch
>> >>
>> >> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>> >> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later 
>> >> RDNA3/gfx1100
>> >> support builds on top of, and that's what I'm currently working on
>> >> getting proper GCC/GCN target (not offloading) results for.
>> >>
>> >> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably 
>> >> simple,
>> >> and hopefully representative for other SLP execution test FAILs
>> >> (regressions compared to my earlier non-gfx1100 testing).
>> >>
>> >>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>> >>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>> >>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>> >>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
>> >> -fno-common
>> >>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
>> >>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>> >>   source-gcc/newlib/libc/include
>> >>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>> >>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>> >>   setarch,--addr-no-randomize -fdump-tree-all-all 
>> >> -fdump-ipa-all-all
>> >>   -fdump-rtl-all-all -save-temps -march=gfx1100
>> >>
>> >> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>> >> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so 
>> >> I
>> >> suppose will also exhibit the same failure mode, once again?
>> >>
>> >> Compared to '-march=gfx90a', the differences begin in
>> >> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
>> >>
>> >> Changed like:
>> >>
>> >>   @@ -38,10 +38,10 @@ int main ()
>> >>#pragma GCC novector
>> >>  for (i = 1; i < N; i++)
>> >>if (a[i] != i%4 + 1)
>> >>   -  abort ();
>> >>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>> >>
>> >>  if (a[0] != 5)
>> >>   -abort ();
>> >>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
>> >>
>> >> ..., we see:
>> >>
>> >>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>> >>   40 5 != 1
>> >>   41 6 != 2
>> >>   42 7 != 3
>> >>   43 8 != 4
>> >>   44 5 != 1
>> >>   45 6 != 2
>> >>   46 7 != 3
>> >>   47 8 != 4
>> >>
>> >> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
>> >> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
>> >> scribbled zero values over these (vector lane masking issue, 
>> >> perhaps?),
>> >> or some other code generation issue?
>> >
>>  [...], I must be doing something different because vect/bb-slp-cond-1.c
>>  passes for me, on gfx1100.
>> >
>> > That's strange.  I've looked at your log file (looks good), and used your
>> > toolchain to compile, and your 'gcn-run' to invoke, and still do get:
>> >
>> > $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
>> > GCN Kernel Aborted
>> > Kernel aborted
>> >
>> > Andrew, later on, please try what happens when you put an unconditional
>> > 'abort' call into a test case?
>> 
>> Andrew, any luck with that yet?
>> 
>> Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c'
>> execution test failure mentioned above (manual compilation and
>> 'gcn-run')?
>
> No, when manually compiling/running the testcase it works fine for me.

I've updated my GCC master branch sources, but it still fails for me:

$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ 
source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c 
--sysroot=install/amdgcn-amdhsa -isystem 
build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem 
source-gcc/newlib/libc/include -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ 
-Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -march=gfx1100 -ftree-vectorize 
-fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 
-save-temps
$ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
GCN Kernel Aborted
Kernel aborted

Strange.

In 'bb-slp-cond-1.tar.xz' I'm attaching the files I've built.  Could you
please compare those to yours and try 'gcn-run gfx1030/a.out'?


Grüße
 Thomas


> Didn't yet get to try the .exp files
>
> Richard.
>
>> 
>> Gr??e
>>  Thomas
>> 
>> 
>> >>> I

Re: [PATCH] IBM Z: Preserve exceptions in autovec-*-signaling-eq.c tests

2024-02-19 Thread Andreas Krebbel

On 2/19/24 13:39, Ilya Leoshkevich wrote:
> DSE, DCE, and other passes are removing redundant signaling comparisons
> from these tests, but the whole point is to check that GCC knows how to
> emit them.  Use -fno-delete-dead-exceptions to prevent that.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/s390/zvector/autovec-double-signaling-eq.c:
>   Preserve exceptions.
> * gcc.target/s390/zvector/autovec-float-signaling-eq.c:
>   Likewise.

Ok. Thanks!

Andreas

> ---
>  .../gcc.target/s390/zvector/autovec-double-signaling-eq.c   | 2 +-
>  .../gcc.target/s390/zvector/autovec-float-signaling-eq.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git 
> a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c 
> b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
> index 3645d3cc393..b23568e06b4 100644
> --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
> +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
> -fnon-call-exceptions" } */
> +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
> -fnon-call-exceptions -fno-delete-dead-exceptions" } */
>  
>  #include "autovec.h"
>  
> diff --git 
> a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c 
> b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
> index d98aa0c494e..cd25d10c577 100644
> --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
> +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
> -fnon-call-exceptions" } */
> +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
> -fnon-call-exceptions -fno-delete-dead-exceptions" } */
>  
>  #include "autovec.h"
>

Re: [comitted] bitint: Fix testism where __seg_gs was being used for all targets

2024-02-19 Thread Andre Vieira (lists)





On 19/02/2024 16:17, Jakub Jelinek wrote:

On Mon, Feb 19, 2024 at 04:13:29PM +, Andre Vieira (lists) wrote:

Replaced uses of __seg_gs with the MACRO SEG defined in the testcase to pick
(if any) the right __seg_{gs,fs} keyword based on target.

gcc/testsuite/ChangeLog:

* gcc.dg/bitint-86.c (__seg_gs): Replace with SEG MACRO.


ChangeLog should be
* gcc.dg/bitint-86.c (foo, bar, baz): Replace __seg_gs with SEG.
Otherwise, LGTM.
Sorry for forgetting to do that myself.


Jakub



That makes sense ... but I already pushed it upstream, thought it was 
obvious. Apologies for the ChangeLog mistake :(

Re: [comitted] bitint: Fix testism where __seg_gs was being used for all targets

2024-02-19 Thread Jakub Jelinek

On Mon, Feb 19, 2024 at 04:13:29PM +, Andre Vieira (lists) wrote:
> Replaced uses of __seg_gs with the MACRO SEG defined in the testcase to pick
> (if any) the right __seg_{gs,fs} keyword based on target.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/bitint-86.c (__seg_gs): Replace with SEG MACRO.

ChangeLog should be
* gcc.dg/bitint-86.c (foo, bar, baz): Replace __seg_gs with SEG.
Otherwise, LGTM.
Sorry for forgetting to do that myself.

Jakub

Re: [PATCH] c++: compound-requirement partial substitution [PR113966]

2024-02-19 Thread Jason Merrill


On 2/19/24 09:39, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

When partially substituting a requires-expr, we don't want to perform
any additional checks beyond the substitution itself, so as to minimize
checking requirements out of order.  So when partially substituting
a compound-requirement don't check its return-type-requirement.  Don't
check the noexcept condition either since we can't do that on templated
trees.

PR c++/113966

gcc/cp/ChangeLog:

* constraint.cc (tsubst_compound_requirement): Don't check
the noexcept condition or the return-type-requirement when
partially substituting.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-friend17.C: New test.
---
  gcc/cp/constraint.cc   |  5 +++--
  gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C | 15 +++
  2 files changed, 18 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index d9569013bd3..49de3211d4c 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2134,7 +2134,8 @@ tsubst_compound_requirement (tree t, tree args, sat_info 
info)
  
/* Check the noexcept condition.  */

bool noexcept_p = COMPOUND_REQ_NOEXCEPT_P (t);
-  if (noexcept_p && !expr_noexcept_p (expr, quiet.complain))
+  if (noexcept_p && !processing_template_decl
+  && !expr_noexcept_p (expr, quiet.complain))
  {
if (info.diagnose_unsatisfaction_p ())
inform (loc, "%qE is not %", expr);
@@ -2148,7 +2149,7 @@ tsubst_compound_requirement (tree t, tree args, sat_info 
info)
  return error_mark_node;
  
/* Check expression against the result type.  */

-  if (type)
+  if (type && !processing_template_decl)
  {
if (tree placeholder = type_uses_auto (type))
{
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C
new file mode 100644
index 000..9b5091f14a8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C
@@ -0,0 +1,15 @@
+// PR c++/113966
+// { dg-do compile { target c++20 } }
+
+template concept C = T::value;
+
+template
+struct A {
+  template requires U::value || requires { { T() } -> C; }
+  friend void f(A, U) { }
+
+  template requires requires { { g(U()) } noexcept; }
+  friend void f(A, U, U) { }
+};
+
+template struct A;

[comitted] bitint: Fix testism where __seg_gs was being used for all targets

2024-02-19 Thread Andre Vieira (lists)

Replaced uses of __seg_gs with the MACRO SEG defined in the testcase to 
pick (if any) the right __seg_{gs,fs} keyword based on target.


gcc/testsuite/ChangeLog:

* gcc.dg/bitint-86.c (__seg_gs): Replace with SEG MACRO.diff --git a/gcc/testsuite/gcc.dg/bitint-86.c b/gcc/testsuite/gcc.dg/bitint-86.c
index 
4e5761a203bc39150540326df9c0d88544bb02ef..10a2392b6f530ae165252bdac750061e92d53131
 100644
--- a/gcc/testsuite/gcc.dg/bitint-86.c
+++ b/gcc/testsuite/gcc.dg/bitint-86.c
@@ -15,14 +15,14 @@ struct T { struct S b[4]; };
 #endif
 
 void
-foo (__seg_gs struct T *p)
+foo (SEG struct T *p)
 {
   struct S s;
   p->b[0] = s;
 }
 
 void
-bar (__seg_gs struct T *p, _BitInt(710) x, int y, double z)
+bar (SEG struct T *p, _BitInt(710) x, int y, double z)
 {
   p->b[0].a = x + 42;
   p->b[1].a = x << y;
@@ -31,7 +31,7 @@ bar (__seg_gs struct T *p, _BitInt(710) x, int y, double z)
 }
 
 int
-baz (__seg_gs struct T *p, _BitInt(710) x, _BitInt(710) y)
+baz (SEG struct T *p, _BitInt(710) x, _BitInt(710) y)
 {
   return __builtin_add_overflow (x, y, >b[1].a);
 }

Re: [PATCH] Fortran: fix passing array component to polymorphic argument [PR105658]

2024-02-19 Thread Peter Hill

Hi Harald,

Thanks for your help, please see the updated and signed-off patch below.

> (I am not entirely sure whether we need to exclude pointer and
> allocatable attributes here explicitly, given the constraints
> in F2023:15.5.2.6, but other may have an opinion, too.
> The above should be safe anyway.)

I've included them in the patch here, but it does seem to work fine
without checking those attributes here -- and invalid code is still
caught with that change.

It also occurred to me that array temporaries aren't _required_ here
(for arrays of derived type components), but in the general case with
a type with differently sized components, the stride wouldn't be a
multiple of the component's type's size. Is it possible in principle
to have an arbitrary stride?

Cheers,
Peter

>From 907a104facfc7f35f48ebcfa9ef5f8f5430d4d3c Mon Sep 17 00:00:00 2001
From: Peter Hill 
Date: Thu, 15 Feb 2024 16:58:33 +
Subject: [PATCH] Fortran: fix passing array component ref to polymorphic
 procedures

 PR fortran/105658

gcc/fortran/ChangeLog

* trans-expr.cc (gfc_conv_intrinsic_to_class): When passing an
array component reference of intrinsic type to a procedure
with an unlimited polymorphic dummy argument, a temporary
should be created.

gcc/testsuite/ChangeLog

* gfortran.dg/PR105658.f90: New test.

Signed-off-by: Peter Hill 
---
 gcc/fortran/trans-expr.cc  |  9 +
 gcc/testsuite/gfortran.dg/PR105658.f90 | 50 ++
 2 files changed, 59 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/PR105658.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index a0593b76f18..004081aa6c3 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -1019,6 +1019,14 @@ gfc_conv_intrinsic_to_class (gfc_se *parmse, gfc_expr *e,
   tmp = gfc_typenode_for_spec (_ts);
   var = gfc_create_var (tmp, "class");

+  /* Force a temporary for component or substring references */
+  if (unlimited_poly
+  && class_ts.u.derived->components->attr.dimension
+  && !class_ts.u.derived->components->attr.allocatable
+  && !class_ts.u.derived->components->attr.class_pointer
+  && is_subref_array (e))
+parmse->force_tmp = 1;
+
   /* Set the vptr.  */
   ctree = gfc_class_vptr_get (var);

@@ -6439,6 +6447,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   CLASS object for the unlimited polymorphic formal.  */
gfc_find_vtab (>ts);
gfc_init_se (, se);
+
gfc_conv_intrinsic_to_class (, e, fsym->ts);

  }
diff --git a/gcc/testsuite/gfortran.dg/PR105658.f90
b/gcc/testsuite/gfortran.dg/PR105658.f90
new file mode 100644
index 000..8aacecf806e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR105658.f90
@@ -0,0 +1,50 @@
+! { dg-do compile }
+! { dg-options "-Warray-temporaries" }
+! Test fix for incorrectly passing array component to unlimited
polymorphic procedure
+
+module test_PR105658_mod
+   implicit none
+   type :: foo
+ integer :: member1
+ integer :: member2
+   end type foo
+contains
+   subroutine print_poly(array)
+ class(*), dimension(:), intent(in) :: array
+ select type(array)
+ type is (integer)
+   print*, array
+ type is (character(*))
+   print *, array
+ end select
+   end subroutine print_poly
+
+   subroutine do_print(thing)
+ type(foo), dimension(3), intent(in) :: thing
+ type(foo), parameter :: y(3) = [foo(1,2),foo(3,4),foo(5,6)]
+ integer :: i, j, uu(5,6)
+
+ call print_poly(thing%member1)   ! { dg-warning "array temporary" }
+ call print_poly(y%member2)   ! { dg-warning "array temporary" }
+ call print_poly(y(1::2)%member2) ! { dg-warning "array temporary" }
+
+ ! The following array sections work without temporaries
+ uu = reshape([(((10*i+j),i=1,5),j=1,6)],[5,6])
+ print *, uu(2,2::2)
+ call print_poly (uu(2,2::2)) ! no temp needed!
+ print *, uu(1::2,6)
+ call print_poly (uu(1::2,6)) ! no temp needed!
+   end subroutine do_print
+
+   subroutine do_print2(thing2)
+ class(foo), dimension(:), intent(in) :: thing2
+ call print_poly (thing2% member2) ! { dg-warning "array temporary" }
+   end subroutine do_print2
+
+   subroutine do_print3 ()
+ character(3) :: c(3) = ["abc","def","ghi"]
+ call print_poly (c(1::2))  ! no temp needed!
+ call print_poly (c(1::2)(2:3)) ! { dg-warning "array temporary" }
+   end subroutine do_print3
+
+end module test_PR105658_mod
-- 
2.43.0

[PATCH] ipa: Convert lattices from pure array to vector (PR 113476)

2024-02-19 Thread Martin Jambor

On Tue, Feb 13 2024, Martin Jambor wrote:
> On Mon, Feb 12 2024, Jan Hubicka wrote:
>>> Believe it or not, even though I have re-worked the internals of the
>>> lattices completely, the array itself is older than my involvement with
>>> GCC (or at least with ipa-cp.c ;-).
>>> 
>>> So it being an array and not a vector is historical coincidence, as far
>>> as I am concerned :-).  But that may be the reason, or because vector
>>> macros at that time looked scary, or perhaps the initialization by
>>> XCNEWVEC zeroing everything out was considered attractive (I kind of
>>> like that but constructors would probably be cleaner), I don't know.
>>
>> If your class is no longer a POD, then the clearing before construcion
>> is dead and GCC may optimize it out.  So fixing this may solve some
>> surprised in foreseable future when we will try to compile older GCC's
>> with newer ones.
>>
>
> That's a good point.  I'll prepare a patch converting the whole thing to
> use constructors and vectors.
>

In PR 113476 we have discovered that ipcp_param_lattices is no longer
a POD and should be destructed.  In a follow-up discussion it
transpired that their initialization done by memsetting their backing
memory to zero is also invalid because now any write there before
construction can be considered dead.  Plus that having them in an
array is a little bit old-school and does not get the extra checking
offered by vector along with automatic construction and destruction
when necessary.

So this patch converts the array to a vector.  That however means that
ipcp_param_lattices cannot be just a forward declared type but must be
known to all code that deal with ipa_node_params and thus to all code
that includes ipa-prop.h.  Therefore I have moved ipcp_param_lattices
and the type it depends on to a new header ipa-cp.h which now
ipa-prop.h depends on.  Because we have the (IMHO not a very wise)
rule that headers don't include what they need themselves, I had to
add inclusions of ipa-cp.h and sreal.h (on which it depends) to very
many files, which made the patch rather ugly.

Bootstrapped and tested on x86_64-linux.  I also had it checked by our
script which builds more than a hundred of cross-compilers, so other
targets are hopefully also fine.

OK for master?

Martin

gcc/lto/ChangeLog:

2024-02-16  Martin Jambor  

* lto-common.cc: Include sreal.h and ipa-cp.h.
* lto-partition.cc: Include ipa-cp.h, move inclusion of sreal higher.
* lto.cc: Include sreal.h and ipa-cp.h.

gcc/ChangeLog:

2024-02-16  Martin Jambor  

* ipa-prop.h (ipa_node_params): Convert lattices to a vector, adjust
initializers in the contructor.
(ipa_node_params::~ipa_node_params): Release lattices as a vector.
* ipa-cp.h: New file.
* ipa-cp.cc: Include sreal.h and ipa-cp.h.
(ipcp_value_source): Move to ipa-cp.h.
(ipcp_value_base): Likewise.
(ipcp_value): Likewise.
(ipcp_lattice): Likewise.
(ipcp_agg_lattice): Likewise.
(ipcp_bits_lattice): Likewise.
(ipcp_vr_lattice): Likewise.
(ipcp_param_lattices): Likewise.
(ipa_get_parm_lattices): Remove assert latticess is non-NULL).
(ipa_value_from_jfunc): Adjust a check for empty lattices.
(ipa_context_from_jfunc): Likewise.
(ipa_agg_value_from_jfunc): Likewise.
(merge_agg_lats_step): Do not memset new aggregate lattices to zero.
(ipcp_propagate_stage): Allocate lattices in a vector as opposed to
just in contiguous memory.
(ipcp_store_vr_results): Adjust a check for empty lattices.
* auto-profile.cc: Include sreal.h and ipa-cp.h.
* cgraph.cc: Likewise.
* cgraphclones.cc: Likewise.
* cgraphunit.cc: Likewise.
* config/aarch64/aarch64.cc: Likewise.
* config/i386/i386-builtins.cc: Likewise.
* config/i386/i386-expand.cc: Likewise.
* config/i386/i386-features.cc: Likewise.
* config/i386/i386-options.cc: Likewise.
* config/i386/i386.cc: Likewise.
* config/rs6000/rs6000.cc: Likewise.
* config/s390/s390.cc: Likewise.
* gengtype.cc (open_base_files): Added sreal.h and ipa-cp.h to the
files to be included in gtype-desc.cc.
* gimple-range-fold.cc: Include sreal.h and ipa-cp.h.
* ipa-devirt.cc: Likewise.
* ipa-fnsummary.cc: Likewise.
* ipa-icf.cc: Likewise.
* ipa-inline-analysis.cc: Likewise.
* ipa-inline-transform.cc: Likewise.
* ipa-inline.cc: Include ipa-cp.h, move inclusion of sreal.h higher.
* ipa-modref.cc: Include sreal.h and ipa-cp.h.
* ipa-param-manipulation.cc: Likewise.
* ipa-predicate.cc: Likewise.
* ipa-profile.cc: Likewise.
* ipa-prop.cc: Likewise.
(ipa_node_params_t::duplicate): Assert new lattices remain empty
instead of setting them to NULL.
* ipa-pure-const.cc: Include sreal.h and ipa-cp.h.

Re: veclower: improve selection of vector mode when lowering [PR 112787]

2024-02-19 Thread Richard Biener

On Mon, 19 Feb 2024, Andre Vieira (lists) wrote:

> Hi all,
> 
> OK to backport this to gcc-12 and gcc-13? Patch applies cleanly, bootstrapped
> and regression tested on aarch64-unknown-linux-gnu. Only change is in the
> testcase as I had to use -march=armv9-a because -march=armv8-a+sve conflicts
> with -mcpu=neoverse-n2 in previous gcc versions.

Yes.

Thanks,
Richard.

> Kind Regards,
> Andre
> 
> On 20/12/2023 14:30, Richard Biener wrote:
> > On Wed, 20 Dec 2023, Andre Vieira (lists) wrote:
> > 
> >> Thanks, fully agree with all comments.
> >>
> >> gcc/ChangeLog:
> >>
> >>  PR target/112787
> >>  * tree-vect-generic (type_for_widest_vector_mode): Change function
> >>  to use original vector type and check widest vector mode has at
> >>  most
> >>   the same number of elements.
> >>  (get_compute_type): Pass original vector type rather than the element
> >>  type to type_for_widest_vector_mode and remove now obsolete check
> >>  for the number of elements.
> > 
> > OK.
> > 
> > Richard.
> > 
> >> On 07/12/2023 07:45, Richard Biener wrote:
> >>> On Wed, 6 Dec 2023, Andre Vieira (lists) wrote:
> >>>
>  Hi,
> 
>  This patch addresses the issue reported in PR target/112787 by improving
>  the
>  compute type selection.  We do this by not considering types with more
>  elements
>  than the type we are lowering since we'd reject such types anyway.
> 
>  gcc/ChangeLog:
> 
>    PR target/112787
>    * tree-vect-generic (type_for_widest_vector_mode): Add a parameter to
>    control maximum amount of elements in resulting vector mode.
>    (get_compute_type): Restrict vector_compute_type to a mode no wider
>    than the original compute type.
> 
>  gcc/testsuite/ChangeLog:
> 
>    * gcc.target/aarch64/pr112787.c: New test.
> 
>  Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
>  x86_64-pc-linux-gnu.
> 
>  Is this OK for trunk?
> >>>
> >>> @@ -1347,7 +1347,7 @@ optimize_vector_constructor (gimple_stmt_iterator
> >>> *gsi)
> >>>   TYPE, or NULL_TREE if none is found.  */
> >>>
> >>> Can you improve the function comment?  It also doesn't mention OP ...
> >>>
> >>>static tree
> >>> -type_for_widest_vector_mode (tree type, optab op)
> >>> +type_for_widest_vector_mode (tree type, optab op, poly_int64 max_nunits =
> >>> 0)
> >>>{
> >>>  machine_mode inner_mode = TYPE_MODE (type);
> >>>  machine_mode best_mode = VOIDmode, mode;
> >>> @@ -1371,7 +1371,9 @@ type_for_widest_vector_mode (tree type, optab op)
> >>>  FOR_EACH_MODE_FROM (mode, mode)
> >>>if (GET_MODE_INNER (mode) == inner_mode
> >>>   && maybe_gt (GET_MODE_NUNITS (mode), best_nunits)
> >>> -   && optab_handler (op, mode) != CODE_FOR_nothing)
> >>> +   && optab_handler (op, mode) != CODE_FOR_nothing
> >>> +   && (known_eq (max_nunits, 0)
> >>> +   || known_lt (GET_MODE_NUNITS (mode), max_nunits)))
> >>>
> >>> max_nunits suggests that known_le would be appropriate instead.
> >>>
> >>> I see the only other caller with similar "problems":
> >>>
> >>>   }
> >>> /* Can't use get_compute_type here, as
> >>> supportable_convert_operation
> >>>doesn't necessarily use an optab and needs two arguments.  */
> >>> tree vec_compute_type
> >>>   = type_for_widest_vector_mode (TREE_TYPE (arg_type), mov_optab);
> >>> if (vec_compute_type
> >>> && VECTOR_MODE_P (TYPE_MODE (vec_compute_type))
> >>> && subparts_gt (arg_type, vec_compute_type))
> >>>
> >>> so please do not default to 0 but adjust this one as well.  It also
> >>> seems you then can remove the subparts_gt guards on both
> >>> vec_compute_type uses.
> >>>
> >>> I think the API would be cleaner if we'd pass the original vector type
> >>> we can then extract TYPE_VECTOR_SUBPARTS from, avoiding the extra arg.
> >>>
> >>> No?
> >>>
> >>> Thanks,
> >>> Richard.
> >>
> > 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] c++: compound-requirement partial substitution [PR113966]

2024-02-19 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

When partially substituting a requires-expr, we don't want to perform
any additional checks beyond the substitution itself, so as to minimize
checking requirements out of order.  So when partially substituting
a compound-requirement don't check its return-type-requirement.  Don't
check the noexcept condition either since we can't do that on templated
trees.

PR c++/113966

gcc/cp/ChangeLog:

* constraint.cc (tsubst_compound_requirement): Don't check
the noexcept condition or the return-type-requirement when
partially substituting.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-friend17.C: New test.
---
 gcc/cp/constraint.cc   |  5 +++--
 gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C | 15 +++
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index d9569013bd3..49de3211d4c 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2134,7 +2134,8 @@ tsubst_compound_requirement (tree t, tree args, sat_info 
info)
 
   /* Check the noexcept condition.  */
   bool noexcept_p = COMPOUND_REQ_NOEXCEPT_P (t);
-  if (noexcept_p && !expr_noexcept_p (expr, quiet.complain))
+  if (noexcept_p && !processing_template_decl
+  && !expr_noexcept_p (expr, quiet.complain))
 {
   if (info.diagnose_unsatisfaction_p ())
inform (loc, "%qE is not %", expr);
@@ -2148,7 +2149,7 @@ tsubst_compound_requirement (tree t, tree args, sat_info 
info)
 return error_mark_node;
 
   /* Check expression against the result type.  */
-  if (type)
+  if (type && !processing_template_decl)
 {
   if (tree placeholder = type_uses_auto (type))
{
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C
new file mode 100644
index 000..9b5091f14a8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-friend17.C
@@ -0,0 +1,15 @@
+// PR c++/113966
+// { dg-do compile { target c++20 } }
+
+template concept C = T::value;
+
+template
+struct A {
+  template requires U::value || requires { { T() } -> C; }
+  friend void f(A, U) { }
+
+  template requires requires { { g(U()) } noexcept; }
+  friend void f(A, U, U) { }
+};
+
+template struct A;
-- 
2.44.0.rc1.15.g4fc51f00ef

Re: veclower: improve selection of vector mode when lowering [PR 112787]

2024-02-19 Thread Andre Vieira (lists)


Hi all,

OK to backport this to gcc-12 and gcc-13? Patch applies cleanly, 
bootstrapped and regression tested on aarch64-unknown-linux-gnu. Only 
change is in the testcase as I had to use -march=armv9-a because 
-march=armv8-a+sve conflicts with -mcpu=neoverse-n2 in previous gcc 
versions.


Kind Regards,
Andre

On 20/12/2023 14:30, Richard Biener wrote:

On Wed, 20 Dec 2023, Andre Vieira (lists) wrote:


Thanks, fully agree with all comments.

gcc/ChangeLog:

PR target/112787
* tree-vect-generic (type_for_widest_vector_mode): Change function
 to use original vector type and check widest vector mode has at most
 the same number of elements.
(get_compute_type): Pass original vector type rather than the element
 type to type_for_widest_vector_mode and remove now obsolete check
 for the number of elements.


OK.

Richard.


On 07/12/2023 07:45, Richard Biener wrote:

On Wed, 6 Dec 2023, Andre Vieira (lists) wrote:


Hi,

This patch addresses the issue reported in PR target/112787 by improving
the
compute type selection.  We do this by not considering types with more
elements
than the type we are lowering since we'd reject such types anyway.

gcc/ChangeLog:

  PR target/112787
  * tree-vect-generic (type_for_widest_vector_mode): Add a parameter to
  control maximum amount of elements in resulting vector mode.
  (get_compute_type): Restrict vector_compute_type to a mode no wider
  than the original compute type.

gcc/testsuite/ChangeLog:

  * gcc.target/aarch64/pr112787.c: New test.

Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
x86_64-pc-linux-gnu.

Is this OK for trunk?


@@ -1347,7 +1347,7 @@ optimize_vector_constructor (gimple_stmt_iterator
*gsi)
  TYPE, or NULL_TREE if none is found.  */

Can you improve the function comment?  It also doesn't mention OP ...

   static tree
-type_for_widest_vector_mode (tree type, optab op)
+type_for_widest_vector_mode (tree type, optab op, poly_int64 max_nunits =
0)
   {
 machine_mode inner_mode = TYPE_MODE (type);
 machine_mode best_mode = VOIDmode, mode;
@@ -1371,7 +1371,9 @@ type_for_widest_vector_mode (tree type, optab op)
 FOR_EACH_MODE_FROM (mode, mode)
   if (GET_MODE_INNER (mode) == inner_mode
  && maybe_gt (GET_MODE_NUNITS (mode), best_nunits)
-   && optab_handler (op, mode) != CODE_FOR_nothing)
+   && optab_handler (op, mode) != CODE_FOR_nothing
+   && (known_eq (max_nunits, 0)
+   || known_lt (GET_MODE_NUNITS (mode), max_nunits)))

max_nunits suggests that known_le would be appropriate instead.

I see the only other caller with similar "problems":

  }
/* Can't use get_compute_type here, as supportable_convert_operation
   doesn't necessarily use an optab and needs two arguments.  */
tree vec_compute_type
  = type_for_widest_vector_mode (TREE_TYPE (arg_type), mov_optab);
if (vec_compute_type
&& VECTOR_MODE_P (TYPE_MODE (vec_compute_type))
&& subparts_gt (arg_type, vec_compute_type))

so please do not default to 0 but adjust this one as well.  It also
seems you then can remove the subparts_gt guards on both
vec_compute_type uses.

I think the API would be cleaner if we'd pass the original vector type
we can then extract TYPE_VECTOR_SUBPARTS from, avoiding the extra arg.

No?

Thanks,
Richard.

Re: [PATCH] libgcc: fix Win32 CV abnormal spurious wakeups in timed wait [PR113850]

2024-02-19 Thread Jonathan Yong


On 2/19/24 13:48, Matteo Italia wrote:

Il 17/02/24 01:24, Jonathan Yong ha scritto:

On 2/10/24 10:10, Matteo Italia wrote:

Il 09/02/24 15:18, Matteo Italia ha scritto:

The Win32 threading model uses __gthr_win32_abs_to_rel_time to convert
the timespec used in gthreads to specify the absolute time for end of
the condition variables timed wait to a milliseconds value relative to
"now" to pass to the Win32 SleepConditionVariableCS function.

Unfortunately, the conversion is incorrect, as, due to a typo, it
returns the relative time _in seconds_, so SleepConditionVariableCS
receives a timeout value 1000 times shorter than it should be, 
resulting

in a huge amount of spurious wakeups in calls such as
std::condition_variable::wait_for or wait_until.

Re-reading the commit message I found a few typos, and it was 
generally a bit more obscure than I like; reworded it now, hope it's 
better.


Thanks, pushed to master and 13.x branches.
Great, thank you! Do I need to change the status of the Bugzilla entry 
to RESOLVED, or it's going to be closed automatically at the next 
releases, or something else?


Closed as resolved, thanks.

[PATCH v7 05/22] c++: Implement __is_pointer built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::is_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_pointer.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_pointer.
* g++.dg/ext/is_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 ++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 gcc/testsuite/g++.dg/ext/is_pointer.C| 51 
 5 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_pointer.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 91ace54cac1..22cabd97cb6 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3827,6 +3827,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_POD:
   inform (loc, "  %qT is not a POD type", t1);
   break;
+case CPTK_IS_POINTER:
+  inform (loc, "  %qT is not a pointer", t1);
+  break;
 case CPTK_IS_POLYMORPHIC:
   inform (loc, "  %qT is not a polymorphic type", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index e9347453829..18e2d0f3480 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_NOTHROW_CONVERTIBLE, 
"__is_nothrow_convertible", 2)
 DEFTRAIT_EXPR (IS_OBJECT, "__is_object", 1)
 DEFTRAIT_EXPR (IS_POINTER_INTERCONVERTIBLE_BASE_OF, 
"__is_pointer_interconvertible_base_of", 2)
 DEFTRAIT_EXPR (IS_POD, "__is_pod", 1)
+DEFTRAIT_EXPR (IS_POINTER, "__is_pointer", 1)
 DEFTRAIT_EXPR (IS_POLYMORPHIC, "__is_polymorphic", 1)
 DEFTRAIT_EXPR (IS_REFERENCE, "__is_reference", 1)
 DEFTRAIT_EXPR (IS_SAME, "__is_same", 2)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 41c25f43d27..9dcdb06191a 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12502,6 +12502,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_POD:
   return pod_type_p (type1);
 
+case CPTK_IS_POINTER:
+  return TYPE_PTR_P (type1);
+
 case CPTK_IS_POLYMORPHIC:
   return CLASS_TYPE_P (type1) && TYPE_POLYMORPHIC_P (type1);
 
@@ -12701,6 +12704,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_MEMBER_OBJECT_POINTER:
 case CPTK_IS_MEMBER_POINTER:
 case CPTK_IS_OBJECT:
+case CPTK_IS_POINTER:
 case CPTK_IS_REFERENCE:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index b2e2f2f694d..96b7a89e4f1 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -125,6 +125,9 @@
 #if !__has_builtin (__is_pod)
 # error "__has_builtin (__is_pod) failed"
 #endif
+#if !__has_builtin (__is_pointer)
+# error "__has_builtin (__is_pointer) failed"
+#endif
 #if !__has_builtin (__is_polymorphic)
 # error "__has_builtin (__is_polymorphic) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_pointer.C 
b/gcc/testsuite/g++.dg/ext/is_pointer.C
new file mode 100644
index 000..d6e39565950
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_pointer.C
@@ -0,0 +1,51 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+SA(!__is_pointer(int));
+SA(__is_pointer(int*));
+SA(__is_pointer(int**));
+
+SA(__is_pointer(const int*));
+SA(__is_pointer(const int**));
+SA(__is_pointer(int* const));
+SA(__is_pointer(int** const));
+SA(__is_pointer(int* const* const));
+
+SA(__is_pointer(volatile int*));
+SA(__is_pointer(volatile int**));
+SA(__is_pointer(int* volatile));
+SA(__is_pointer(int** volatile));
+SA(__is_pointer(int* volatile* volatile));
+
+SA(__is_pointer(const volatile int*));
+SA(__is_pointer(const volatile int**));
+SA(__is_pointer(const int* volatile));
+SA(__is_pointer(volatile int* const));
+SA(__is_pointer(int* const volatile));
+SA(__is_pointer(const int** volatile));
+SA(__is_pointer(volatile int** const));
+SA(__is_pointer(int** const volatile));
+SA(__is_pointer(int* const* const volatile));
+SA(__is_pointer(int* volatile* const volatile));
+SA(__is_pointer(int* const volatile* const volatile));
+
+SA(!__is_pointer(int&));
+SA(!__is_pointer(const int&));
+SA(!__is_pointer(volatile int&));
+SA(!__is_pointer(const volatile int&));
+
+SA(!__is_pointer(int&&));
+SA(!__is_pointer(const int&&));
+SA(!__is_pointer(volatile int&&));
+SA(!__is_pointer(const volatile int&&));
+
+SA(!__is_pointer(int[3]));
+SA(!__is_pointer(const int[3]));
+SA(!__is_pointer(volatile int[3]));
+SA(!__is_pointer(const volatile int[3]));
+
+SA(!__is_pointer(int(int)));
+SA(__is_pointer(int(*const)(int)));
+SA(__is_pointer(int(*volatile)(int)));
+SA(__is_pointer(int(*const

[PATCH v7 13/22] c++: Implement __remove_all_extents built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::remove_all_extents.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_all_extents.
* semantics.cc (finish_trait_type): Handle
CPTK_REMOVE_ALL_EXTENTS.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__remove_all_extents.
* g++.dg/ext/remove_all_extents.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  3 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 gcc/testsuite/g++.dg/ext/remove_all_extents.C | 16 
 4 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/remove_all_extents.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 577c96d579b..933c8bcbe68 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -98,6 +98,7 @@ DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
 DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
+DEFTRAIT_TYPE (REMOVE_ALL_EXTENTS, "__remove_all_extents", 1)
 DEFTRAIT_TYPE (REMOVE_CV, "__remove_cv", 1)
 DEFTRAIT_TYPE (REMOVE_CVREF, "__remove_cvref", 1)
 DEFTRAIT_TYPE (REMOVE_EXTENT, "__remove_extent", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 58696225fc4..078424dac23 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12785,6 +12785,9 @@ finish_trait_type (cp_trait_kind kind, tree type1, tree 
type2,
type1 = TREE_TYPE (type1);
   return build_pointer_type (type1);
 
+case CPTK_REMOVE_ALL_EXTENTS:
+  return strip_array_types (type1);
+
 case CPTK_REMOVE_CV:
   return cv_unqualified (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 5d5cbe3b019..85b74bd676b 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -176,6 +176,9 @@
 #if !__has_builtin (__reference_converts_from_temporary)
 # error "__has_builtin (__reference_converts_from_temporary) failed"
 #endif
+#if !__has_builtin (__remove_all_extents)
+# error "__has_builtin (__remove_all_extents) failed"
+#endif
 #if !__has_builtin (__remove_cv)
 # error "__has_builtin (__remove_cv) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/remove_all_extents.C 
b/gcc/testsuite/g++.dg/ext/remove_all_extents.C
new file mode 100644
index 000..60ade2ade7f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/remove_all_extents.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__remove_all_extents(int), int));
+SA(__is_same(__remove_all_extents(int[2]), int));
+SA(__is_same(__remove_all_extents(int[2][3]), int));
+SA(__is_same(__remove_all_extents(int[][3]), int));
+SA(__is_same(__remove_all_extents(const int[2][3]), const int));
+SA(__is_same(__remove_all_extents(ClassType), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[2]), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[2][3]), ClassType));
+SA(__is_same(__remove_all_extents(ClassType[][3]), ClassType));
+SA(__is_same(__remove_all_extents(const ClassType[2][3]), const ClassType));
-- 
2.43.2

[PATCH v7 20/22] libstdc++: Optimize std::decay compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of std::decay
by dispatching to the new __decay built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (decay): Use __decay built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 18a5e4de2d3..2f4c8dd3b21 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2316,6 +2316,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// @cond undocumented
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__decay)
+  template
+struct decay
+{ using type = __decay(_Tp); };
+#else
   // Decay trait for arrays and functions, used for perfect forwarding
   // in make_pair, make_tuple, etc.
   template
@@ -2347,6 +2352,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct decay<_Tp&&>
 { using type = typename __decay_selector<_Tp>::type; };
+#endif
 
   /// @cond undocumented
 
-- 
2.43.2

[PATCH v7 09/22] c++: Implement __add_pointer built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::add_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_pointer.
* semantics.cc (finish_trait_type): Handle CPTK_ADD_POINTER.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __add_pointer.
* g++.dg/ext/add_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  9 ++
 gcc/testsuite/g++.dg/ext/add_pointer.C   | 39 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 4 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/add_pointer.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 05514a51c21..63f879287ce 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -48,6 +48,7 @@
 #define DEFTRAIT_TYPE_DEFAULTED
 #endif
 
+DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_COPY, "__has_nothrow_copy", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 1794e83baa2..635441a7a90 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12776,6 +12776,15 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
 
   switch (kind)
 {
+case CPTK_ADD_POINTER:
+  if (FUNC_OR_METHOD_TYPE_P (type1)
+ && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (type1) != REF_QUAL_NONE))
+   return type1;
+  if (TYPE_REF_P (type1))
+   type1 = TREE_TYPE (type1);
+  return build_pointer_type (type1);
+
 case CPTK_REMOVE_CV:
   return cv_unqualified (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/add_pointer.C 
b/gcc/testsuite/g++.dg/ext/add_pointer.C
new file mode 100644
index 000..c405cdd0feb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/add_pointer.C
@@ -0,0 +1,39 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__add_pointer(int), int*));
+SA(__is_same(__add_pointer(int*), int**));
+SA(__is_same(__add_pointer(const int), const int*));
+SA(__is_same(__add_pointer(int&), int*));
+SA(__is_same(__add_pointer(ClassType*), ClassType**));
+SA(__is_same(__add_pointer(ClassType), ClassType*));
+SA(__is_same(__add_pointer(void), void*));
+SA(__is_same(__add_pointer(const void), const void*));
+SA(__is_same(__add_pointer(volatile void), volatile void*));
+SA(__is_same(__add_pointer(const volatile void), const volatile void*));
+
+void f1();
+using f1_type = decltype(f1);
+using pf1_type = decltype();
+SA(__is_same(__add_pointer(f1_type), pf1_type));
+
+void f2() noexcept; // PR libstdc++/78361
+using f2_type = decltype(f2);
+using pf2_type = decltype();
+SA(__is_same(__add_pointer(f2_type), pf2_type));
+
+using fn_type = void();
+using pfn_type = void(*)();
+SA(__is_same(__add_pointer(fn_type), pfn_type));
+
+SA(__is_same(__add_pointer(void() &), void() &));
+SA(__is_same(__add_pointer(void() & noexcept), void() & noexcept));
+SA(__is_same(__add_pointer(void() const), void() const));
+SA(__is_same(__add_pointer(void(...) &), void(...) &));
+SA(__is_same(__add_pointer(void(...) & noexcept), void(...) & noexcept));
+SA(__is_same(__add_pointer(void(...) const), void(...) const));
+
+SA(__is_same(__add_pointer(void() __restrict), void() __restrict));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index b1430e9bd8b..9d861398bae 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -2,6 +2,9 @@
 // { dg-do compile }
 // Verify that __has_builtin gives the correct answer for C++ built-ins.
 
+#if !__has_builtin (__add_pointer)
+# error "__has_builtin (__add_pointer) failed"
+#endif
 #if !__has_builtin (__builtin_addressof)
 # error "__has_builtin (__builtin_addressof) failed"
 #endif
-- 
2.43.2

[PATCH v7 18/22] libstdc++: Optimize std::add_rvalue_reference compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of
std::add_rvalue_reference by dispatching to the new
__add_rvalue_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_rvalue_reference): Use
__add_rvalue_reference built-in trait.
(__add_rvalue_reference_helper): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 17bf47d59d3..18a5e4de2d3 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1185,6 +1185,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// @cond undocumented
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_rvalue_reference)
+  template
+struct __add_rvalue_reference_helper
+{ using type = __add_rvalue_reference(_Tp); };
+#else
   template
 struct __add_rvalue_reference_helper
 { using type = _Tp; };
@@ -1192,6 +1197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __add_rvalue_reference_helper<_Tp, __void_t<_Tp&&>>
 { using type = _Tp&&; };
+#endif
 
   template
 using __add_rval_ref_t = typename __add_rvalue_reference_helper<_Tp>::type;
@@ -1748,9 +1754,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// add_rvalue_reference
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_rvalue_reference)
+  template
+struct add_rvalue_reference
+{ using type = __add_rvalue_reference(_Tp); };
+#else
   template
 struct add_rvalue_reference
 { using type = __add_rval_ref_t<_Tp>; };
+#endif
 
 #if __cplusplus > 201103L
   /// Alias template for remove_reference
-- 
2.43.2

[PATCH v7 21/22] c++: Implement __rank built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::rank.

gcc/cp/ChangeLog:

* cp-trait.def: Define __rank.
* constraint.cc (diagnose_trait_expr): Handle CPTK_RANK.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __rank.
* g++.dg/ext/rank.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 +++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  | 23 ---
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/rank.C  | 24 
 5 files changed, 51 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/rank.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 62b264e4757..a9b6e7416fa 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3869,6 +3869,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_VOLATILE:
   inform (loc, "  %qT is not a volatile type", t1);
   break;
+case CPTK_RANK:
+  inform (loc, "  %qT cannot yield a rank", t1);
+  break;
 case CPTK_REF_CONSTRUCTS_FROM_TEMPORARY:
   inform (loc, "  %qT is not a reference that binds to a temporary "
  "object of type %qT (direct-initialization)", t1, t2);
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 2d1cb7c227c..85056c8140b 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -99,6 +99,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, 
"__is_trivially_copyable", 1)
 DEFTRAIT_EXPR (IS_UNBOUNDED_ARRAY, "__is_unbounded_array", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
 DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1)
+DEFTRAIT_EXPR (RANK, "__rank", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
 DEFTRAIT_TYPE (REMOVE_ALL_EXTENTS, "__remove_all_extents", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 45dc509855a..7242db75248 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12550,6 +12550,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_DEDUCIBLE:
   return type_targs_deducible_from (type1, type2);
 
+/* __rank is handled in finish_trait_expr. */
+case CPTK_RANK:
+
 #define DEFTRAIT_TYPE(CODE, NAME, ARITY) \
 case CPTK_##CODE:
 #include "cp-trait.def"
@@ -12622,7 +12625,10 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
   if (processing_template_decl)
 {
   tree trait_expr = make_node (TRAIT_EXPR);
-  TREE_TYPE (trait_expr) = boolean_type_node;
+  if (kind == CPTK_RANK)
+   TREE_TYPE (trait_expr) = size_type_node;
+  else
+   TREE_TYPE (trait_expr) = boolean_type_node;
   TRAIT_EXPR_TYPE1 (trait_expr) = type1;
   TRAIT_EXPR_TYPE2 (trait_expr) = type2;
   TRAIT_EXPR_KIND (trait_expr) = kind;
@@ -12714,6 +12720,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_UNBOUNDED_ARRAY:
 case CPTK_IS_UNION:
 case CPTK_IS_VOLATILE:
+case CPTK_RANK:
   break;
 
 case CPTK_IS_LAYOUT_COMPATIBLE:
@@ -12745,8 +12752,18 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
   gcc_unreachable ();
 }
 
-  tree val = (trait_expr_value (kind, type1, type2)
- ? boolean_true_node : boolean_false_node);
+  tree val;
+  if (kind == CPTK_RANK)
+{
+  size_t rank = 0;
+  for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1))
+   ++rank;
+  val = build_int_cst (size_type_node, rank);
+}
+  else
+val = (trait_expr_value (kind, type1, type2)
+  ? boolean_true_node : boolean_false_node);
+
   return maybe_wrap_with_location (val, loc);
 }
 
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 3aca273aad6..7f7b27f7aa7 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -179,6 +179,9 @@
 #if !__has_builtin (__is_volatile)
 # error "__has_builtin (__is_volatile) failed"
 #endif
+#if !__has_builtin (__rank)
+# error "__has_builtin (__rank) failed"
+#endif
 #if !__has_builtin (__reference_constructs_from_temporary)
 # error "__has_builtin (__reference_constructs_from_temporary) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/rank.C b/gcc/testsuite/g++.dg/ext/rank.C
new file mode 100644
index 000..28894184387
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/rank.C
@@ -0,0 +1,24 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__rank(int) == 0);
+SA(__rank(int[2]) == 1);
+SA(__rank(int[][4]) == 2);
+SA(__rank(int[2][2][4][4][6][6]) == 6);

[PATCH v7 06/22] libstdc++: Optimize std::is_pointer compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_pointer
by dispatching to the new __is_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__is_pointer): Use
__is_pointer built-in trait.  Optimize its implementation.
* include/std/type_traits (is_pointer): Likewise.
(is_pointer_v): Likewise.

Co-authored-by: Jonathan Wakely 
Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/bits/cpp_type_traits.h | 31 ++-
 libstdc++-v3/include/std/type_traits| 44 +
 2 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index 59f1a1875eb..210a9ea00da 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -363,6 +363,13 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
   //
   // Pointer types
   //
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
+  template
+struct __is_pointer : __truth_type<_IsPtr>
+{
+  enum { __value = _IsPtr };
+};
+#else
   template
 struct __is_pointer
 {
@@ -377,6 +384,28 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
   typedef __true_type __type;
 };
 
+  template
+struct __is_pointer<_Tp* const>
+{
+  enum { __value = 1 };
+  typedef __true_type __type;
+};
+
+  template
+struct __is_pointer<_Tp* volatile>
+{
+  enum { __value = 1 };
+  typedef __true_type __type;
+};
+
+  template
+struct __is_pointer<_Tp* const volatile>
+{
+  enum { __value = 1 };
+  typedef __true_type __type;
+};
+#endif
+
   //
   // An arithmetic type is an integer type or a floating point type
   //
@@ -387,7 +416,7 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
 
   //
   // A scalar type is an arithmetic type or a pointer type
-  // 
+  //
   template
 struct __is_scalar
 : public __traitor<__is_arithmetic<_Tp>, __is_pointer<_Tp> >
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 60cd22b6f15..6407738a726 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -542,19 +542,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public true_type { };
 #endif
 
-  template
-struct __is_pointer_helper
+  /// is_pointer
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
+  template
+struct is_pointer
+: public __bool_constant<__is_pointer(_Tp)>
+{ };
+#else
+  template
+struct is_pointer
 : public false_type { };
 
   template
-struct __is_pointer_helper<_Tp*>
+struct is_pointer<_Tp*>
 : public true_type { };
 
-  /// is_pointer
   template
-struct is_pointer
-: public __is_pointer_helper<__remove_cv_t<_Tp>>::type
-{ };
+struct is_pointer<_Tp* const>
+: public true_type { };
+
+  template
+struct is_pointer<_Tp* volatile>
+: public true_type { };
+
+  template
+struct is_pointer<_Tp* const volatile>
+: public true_type { };
+#endif
 
   /// is_lvalue_reference
   template
@@ -3264,8 +3278,22 @@ template 
   inline constexpr bool is_array_v<_Tp[_Num]> = true;
 #endif
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
+template 
+  inline constexpr bool is_pointer_v = __is_pointer(_Tp);
+#else
 template 
-  inline constexpr bool is_pointer_v = is_pointer<_Tp>::value;
+  inline constexpr bool is_pointer_v = false;
+template 
+  inline constexpr bool is_pointer_v<_Tp*> = true;
+template 
+  inline constexpr bool is_pointer_v<_Tp* const> = true;
+template 
+  inline constexpr bool is_pointer_v<_Tp* volatile> = true;
+template 
+  inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
+#endif
+
 template 
   inline constexpr bool is_lvalue_reference_v = false;
 template 
-- 
2.43.2

[PATCH v7 15/22] c++: Implement __add_lvalue_reference built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::add_lvalue_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_lvalue_reference.
* semantics.cc (finish_trait_type): Handle
CPTK_ADD_LVALUE_REFERENCE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__add_lvalue_reference.
* g++.dg/ext/add_lvalue_reference.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  8 +++
 .../g++.dg/ext/add_lvalue_reference.C | 21 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 4 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/add_lvalue_reference.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 933c8bcbe68..9a27dca4ea3 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -48,6 +48,7 @@
 #define DEFTRAIT_TYPE_DEFAULTED
 #endif
 
+DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 078424dac23..05f5b62f9df 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12776,6 +12776,14 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
 
   switch (kind)
 {
+case CPTK_ADD_LVALUE_REFERENCE:
+  if (VOID_TYPE_P (type1)
+ || (FUNC_OR_METHOD_TYPE_P (type1)
+ && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (type1) != REF_QUAL_NONE)))
+   return type1;
+  return cp_build_reference_type (type1, /*rval=*/false);
+
 case CPTK_ADD_POINTER:
   if (FUNC_OR_METHOD_TYPE_P (type1)
  && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
diff --git a/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C 
b/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C
new file mode 100644
index 000..8fe1e0300e5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/add_lvalue_reference.C
@@ -0,0 +1,21 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__add_lvalue_reference(int), int&));
+SA(__is_same(__add_lvalue_reference(int&), int&));
+SA(__is_same(__add_lvalue_reference(const int), const int&));
+SA(__is_same(__add_lvalue_reference(int*), int*&));
+SA(__is_same(__add_lvalue_reference(ClassType&), ClassType&));
+SA(__is_same(__add_lvalue_reference(ClassType), ClassType&));
+SA(__is_same(__add_lvalue_reference(int(int)), int(&)(int)));
+SA(__is_same(__add_lvalue_reference(int&&), int&));
+SA(__is_same(__add_lvalue_reference(ClassType&&), ClassType&));
+SA(__is_same(__add_lvalue_reference(void), void));
+SA(__is_same(__add_lvalue_reference(const void), const void));
+SA(__is_same(__add_lvalue_reference(bool(int) const), bool(int) const));
+SA(__is_same(__add_lvalue_reference(bool(int) &), bool(int) &));
+SA(__is_same(__add_lvalue_reference(bool(int) const &&), bool(int) const &&));
+SA(__is_same(__add_lvalue_reference(bool(int)), bool(&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 85b74bd676b..3fca9cfabcc 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -2,6 +2,9 @@
 // { dg-do compile }
 // Verify that __has_builtin gives the correct answer for C++ built-ins.
 
+#if !__has_builtin (__add_lvalue_reference)
+# error "__has_builtin (__add_lvalue_reference) failed"
+#endif
 #if !__has_builtin (__add_pointer)
 # error "__has_builtin (__add_pointer) failed"
 #endif
-- 
2.43.2

[PATCH v7 03/22] c++: Implement __is_volatile built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::is_volatile.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_volatile.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_VOLATILE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_volatile.
* g++.dg/ext/is_volatile.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 +++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/is_volatile.C   | 20 
 5 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_volatile.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 8b7833d6cae..91ace54cac1 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3860,6 +3860,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_UNION:
   inform (loc, "  %qT is not a union", t1);
   break;
+case CPTK_IS_VOLATILE:
+  inform (loc, "  %qT is not a volatile type", t1);
+  break;
 case CPTK_REF_CONSTRUCTS_FROM_TEMPORARY:
   inform (loc, "  %qT is not a reference that binds to a temporary "
  "object of type %qT (direct-initialization)", t1, t2);
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 36faed9c0b3..e9347453829 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -92,6 +92,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
"__is_trivially_assignable", 2)
 DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
+DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
 DEFTRAIT_TYPE (REMOVE_CV, "__remove_cv", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0d08900492b..41c25f43d27 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12532,6 +12532,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_UNION:
   return type_code1 == UNION_TYPE;
 
+case CPTK_IS_VOLATILE:
+  return CP_TYPE_VOLATILE_P (type1);
+
 case CPTK_REF_CONSTRUCTS_FROM_TEMPORARY:
   return ref_xes_from_temporary (type1, type2, /*direct_init=*/true);
 
@@ -12702,6 +12705,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
 case CPTK_IS_UNION:
+case CPTK_IS_VOLATILE:
   break;
 
 case CPTK_IS_LAYOUT_COMPATIBLE:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index e3640faeb96..b2e2f2f694d 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -158,6 +158,9 @@
 #if !__has_builtin (__is_union)
 # error "__has_builtin (__is_union) failed"
 #endif
+#if !__has_builtin (__is_volatile)
+# error "__has_builtin (__is_volatile) failed"
+#endif
 #if !__has_builtin (__reference_constructs_from_temporary)
 # error "__has_builtin (__reference_constructs_from_temporary) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_volatile.C 
b/gcc/testsuite/g++.dg/ext/is_volatile.C
new file mode 100644
index 000..80a1cfc880d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_volatile.C
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+using cClassType = const ClassType;
+using vClassType = volatile ClassType;
+using cvClassType = const volatile ClassType;
+
+// Positive tests.
+SA(__is_volatile(volatile int));
+SA(__is_volatile(const volatile int));
+SA(__is_volatile(vClassType));
+SA(__is_volatile(cvClassType));
+
+// Negative tests.
+SA(!__is_volatile(int));
+SA(!__is_volatile(const int));
+SA(!__is_volatile(ClassType));
+SA(!__is_volatile(cClassType));
-- 
2.43.2

[PATCH v7 22/22] libstdc++: Optimize std::rank compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of std::rank
by dispatching to the new __rank built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (rank): Use __rank built-in trait.
(rank_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 2f4c8dd3b21..1577042a5b8 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1473,6 +1473,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// rank
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__rank)
+  template
+struct rank
+: public integral_constant { };
+#else
   template
 struct rank
 : public integral_constant { };
@@ -1484,6 +1489,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct rank<_Tp[]>
 : public integral_constant::value> { };
+#endif
 
   /// extent
   template
@@ -3579,12 +3585,17 @@ template 
 template 
   inline constexpr size_t alignment_of_v = alignment_of<_Tp>::value;
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__rank)
+template 
+  inline constexpr size_t rank_v = __rank(_Tp);
+#else
 template 
   inline constexpr size_t rank_v = 0;
 template 
   inline constexpr size_t rank_v<_Tp[_Size]> = 1 + rank_v<_Tp>;
 template 
   inline constexpr size_t rank_v<_Tp[]> = 1 + rank_v<_Tp>;
+#endif
 
 template 
   inline constexpr size_t extent_v = 0;
-- 
2.43.2

[PATCH v7 16/22] libstdc++: Optimize std::add_lvalue_reference compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of
std::add_lvalue_reference by dispatching to the new
__add_lvalue_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_lvalue_reference): Use
__add_lvalue_reference built-in trait.
(__add_lvalue_reference_helper): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 34475e6279a..17bf47d59d3 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1157,6 +1157,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// @cond undocumented
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_lvalue_reference)
+  template
+struct __add_lvalue_reference_helper
+{ using type = __add_lvalue_reference(_Tp); };
+#else
   template
 struct __add_lvalue_reference_helper
 { using type = _Tp; };
@@ -1164,6 +1169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __add_lvalue_reference_helper<_Tp, __void_t<_Tp&>>
 { using type = _Tp&; };
+#endif
 
   template
 using __add_lval_ref_t = typename __add_lvalue_reference_helper<_Tp>::type;
@@ -1731,9 +1737,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// add_lvalue_reference
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_lvalue_reference)
+  template
+struct add_lvalue_reference
+{ using type = __add_lvalue_reference(_Tp); };
+#else
   template
 struct add_lvalue_reference
 { using type = __add_lval_ref_t<_Tp>; };
+#endif
 
   /// add_rvalue_reference
   template
-- 
2.43.2

[PATCH v7 19/22] c++: Implement __decay built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::decay.

gcc/cp/ChangeLog:

* cp-trait.def: Define __decay.
* semantics.cc (finish_trait_type): Handle CPTK_DECAY.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __decay.
* g++.dg/ext/decay.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  | 12 
 gcc/testsuite/g++.dg/ext/decay.C | 39 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 4 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/decay.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 173818adf79..2d1cb7c227c 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -51,6 +51,7 @@
 DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
 DEFTRAIT_TYPE (ADD_RVALUE_REFERENCE, "__add_rvalue_reference", 1)
+DEFTRAIT_TYPE (DECAY, "__decay", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_COPY, "__has_nothrow_copy", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 19d6f87a9ea..45dc509855a 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12801,6 +12801,18 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
return type1;
   return cp_build_reference_type (type1, /*rval=*/true);
 
+case CPTK_DECAY:
+  if (TYPE_REF_P (type1))
+   type1 = TREE_TYPE (type1);
+
+  if (TREE_CODE (type1) == ARRAY_TYPE)
+   return finish_trait_type (CPTK_ADD_POINTER, TREE_TYPE (type1), type2,
+ complain);
+  else if (TREE_CODE (type1) == FUNCTION_TYPE)
+   return finish_trait_type (CPTK_ADD_POINTER, type1, type2, complain);
+  else
+   return cv_unqualified (type1);
+
 case CPTK_REMOVE_ALL_EXTENTS:
   return strip_array_types (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/decay.C b/gcc/testsuite/g++.dg/ext/decay.C
new file mode 100644
index 000..cf224b7452c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/decay.C
@@ -0,0 +1,39 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+// class ClassType { };
+
+// Positive tests.
+using test1_type = __decay(bool);
+SA(__is_same(test1_type, bool));
+
+// NB: DR 705.
+using test2_type = __decay(const int);
+SA(__is_same(test2_type, int));
+
+using test3_type = __decay(int[4]);
+SA(__is_same(test3_type, __remove_extent(int[4])*));
+
+using fn_type = void ();
+using test4_type = __decay(fn_type);
+SA(__is_same(test4_type, __add_pointer(fn_type)));
+
+using cfn_type = void () const;
+using test5_type = __decay(cfn_type);
+SA(__is_same(test5_type, cfn_type));
+
+// SA(__is_same(__add_rvalue_reference(int), int&&));
+// SA(__is_same(__add_rvalue_reference(int&&), int&&));
+// SA(__is_same(__add_rvalue_reference(int&), int&));
+// SA(__is_same(__add_rvalue_reference(const int), const int&&));
+// SA(__is_same(__add_rvalue_reference(int*), int*&&));
+// SA(__is_same(__add_rvalue_reference(ClassType&&), ClassType&&));
+// SA(__is_same(__add_rvalue_reference(ClassType), ClassType&&));
+// SA(__is_same(__add_rvalue_reference(int(int)), int(&&)(int)));
+// SA(__is_same(__add_rvalue_reference(void), void));
+// SA(__is_same(__add_rvalue_reference(const void), const void));
+// SA(__is_same(__add_rvalue_reference(bool(int) const), bool(int) const));
+// SA(__is_same(__add_rvalue_reference(bool(int) &), bool(int) &));
+// SA(__is_same(__add_rvalue_reference(bool(int) const &&), bool(int) const 
&&));
+// SA(__is_same(__add_rvalue_reference(bool(int)), bool(&&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index c2503c5d82b..3aca273aad6 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -11,6 +11,9 @@
 #if !__has_builtin (__add_rvalue_reference)
 # error "__has_builtin (__add_rvalue_reference) failed"
 #endif
+#if !__has_builtin (__decay)
+# error "__has_builtin (__decay) failed"
+#endif
 #if !__has_builtin (__builtin_addressof)
 # error "__has_builtin (__builtin_addressof) failed"
 #endif
-- 
2.43.2

[PATCH v7 07/22] c++: Implement __is_unbounded_array built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::is_unbounded_array.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_unbounded_array.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_UNBOUNDED_ARRAY.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_unbounded_array.
* g++.dg/ext/is_unbounded_array.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc  |  3 ++
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 gcc/testsuite/g++.dg/ext/is_unbounded_array.C | 37 +++
 5 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_unbounded_array.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 22cabd97cb6..62b264e4757 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3860,6 +3860,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_TRIVIALLY_COPYABLE:
   inform (loc, "  %qT is not trivially copyable", t1);
   break;
+case CPTK_IS_UNBOUNDED_ARRAY:
+  inform (loc, "  %qT is not an unbounded array", t1);
+  break;
 case CPTK_IS_UNION:
   inform (loc, "  %qT is not a union", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 18e2d0f3480..05514a51c21 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -92,6 +92,7 @@ DEFTRAIT_EXPR (IS_TRIVIAL, "__is_trivial", 1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, "__is_trivially_assignable", 2)
 DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
+DEFTRAIT_EXPR (IS_UNBOUNDED_ARRAY, "__is_unbounded_array", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
 DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 9dcdb06191a..1794e83baa2 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12532,6 +12532,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_TRIVIALLY_COPYABLE:
   return trivially_copyable_p (type1);
 
+case CPTK_IS_UNBOUNDED_ARRAY:
+  return array_of_unknown_bound_p (type1);
+
 case CPTK_IS_UNION:
   return type_code1 == UNION_TYPE;
 
@@ -12708,6 +12711,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_REFERENCE:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
+case CPTK_IS_UNBOUNDED_ARRAY:
 case CPTK_IS_UNION:
 case CPTK_IS_VOLATILE:
   break;
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 96b7a89e4f1..b1430e9bd8b 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -158,6 +158,9 @@
 #if !__has_builtin (__is_trivially_copyable)
 # error "__has_builtin (__is_trivially_copyable) failed"
 #endif
+#if !__has_builtin (__is_unbounded_array)
+# error "__has_builtin (__is_unbounded_array) failed"
+#endif
 #if !__has_builtin (__is_union)
 # error "__has_builtin (__is_union) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_unbounded_array.C 
b/gcc/testsuite/g++.dg/ext/is_unbounded_array.C
new file mode 100644
index 000..283a74e1a0a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_unbounded_array.C
@@ -0,0 +1,37 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+class ClassType { };
+class IncompleteClass;
+union IncompleteUnion;
+
+SA_TEST_CATEGORY(__is_unbounded_array, int[2], false);
+SA_TEST_CATEGORY(__is_unbounded_array, int[], true);
+SA_TEST_CATEGORY(__is_unbounded_array, int[2][3], false);
+SA_TEST_CATEGORY(__is_unbounded_array, int[][3], true);
+SA_TEST_CATEGORY(__is_unbounded_array, float*[2], false);
+SA_TEST_CATEGORY(__is_unbounded_array, float*[], true);
+SA_TEST_CATEGORY(__is_unbounded_array, float*[2][3], false);
+SA_TEST_CATEGORY(__is_unbounded_array, float*[][3], true);
+SA_TEST_CATEGORY(__is_unbounded_array, ClassType[2], false);
+SA_TEST_CATEGORY(__is_unbounded_array, ClassType[], true);
+SA_TEST_CATEGORY(__is_unbounded_array, ClassType[2][3], false);
+SA_TEST_CATEGORY(__is_unbounded_array, ClassType[][3], true);
+SA_TEST_CATEGORY(__is_unbounded_array, IncompleteClass[2][3], false);
+SA_TEST_CATEGORY(__is_unbounded_array, IncompleteClass[][3], true);
+SA_TEST_CATEGORY(__is_unbounded_array,

[PATCH v7 04/22] libstdc++: Optimize std::is_volatile compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_volatile
by dispatching to the new __is_volatile built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_volatile): Use __is_volatile
built-in trait.
(is_volatile_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 6e9ebfb8a18..60cd22b6f15 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -851,6 +851,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// is_volatile
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_volatile)
+  template
+struct is_volatile
+: public __bool_constant<__is_volatile(_Tp)>
+{ };
+#else
   template
 struct is_volatile
 : public false_type { };
@@ -858,6 +864,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct is_volatile<_Tp volatile>
 : public true_type { };
+#endif
 
   /// is_trivial
   template
@@ -3356,10 +3363,15 @@ template 
   inline constexpr bool is_function_v<_Tp&&> = false;
 #endif
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_volatile)
+template 
+  inline constexpr bool is_volatile_v = __is_volatile(_Tp);
+#else
 template 
   inline constexpr bool is_volatile_v = false;
 template 
   inline constexpr bool is_volatile_v = true;
+#endif
 
 template 
   inline constexpr bool is_trivial_v = __is_trivial(_Tp);
-- 
2.43.2

[PATCH v7 02/22] libstdc++: Optimize std::is_const compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_const
by dispatching to the new __is_const built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_const): Use __is_const built-in
trait.
(is_const_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 21402fd8c13..6e9ebfb8a18 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -835,6 +835,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Type properties.
 
   /// is_const
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_const)
+  template
+struct is_const
+: public __bool_constant<__is_const(_Tp)>
+{ };
+#else
   template
 struct is_const
 : public false_type { };
@@ -842,6 +848,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct is_const<_Tp const>
 : public true_type { };
+#endif
 
   /// is_volatile
   template
@@ -3327,10 +3334,15 @@ template 
   inline constexpr bool is_member_pointer_v = is_member_pointer<_Tp>::value;
 #endif
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_const)
+template 
+  inline constexpr bool is_const_v = __is_const(_Tp);
+#else
 template 
   inline constexpr bool is_const_v = false;
 template 
   inline constexpr bool is_const_v = true;
+#endif
 
 #if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function)
 template 
-- 
2.43.2

[PATCH v7 01/22] c++: Implement __is_const built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::is_const.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_const.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_CONST.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_const.
* g++.dg/ext/is_const.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 +++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/is_const.C  | 20 
 5 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_const.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index d9569013bd3..8b7833d6cae 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3766,6 +3766,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_CLASS:
   inform (loc, "  %qT is not a class", t1);
   break;
+case CPTK_IS_CONST:
+  inform (loc, "  %qT is not a const type", t1);
+  break;
 case CPTK_IS_CONSTRUCTIBLE:
   if (!t2)
 inform (loc, "  %qT is not default constructible", t1);
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 394f006f20f..36faed9c0b3 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -64,6 +64,7 @@ DEFTRAIT_EXPR (IS_ASSIGNABLE, "__is_assignable", 2)
 DEFTRAIT_EXPR (IS_BASE_OF, "__is_base_of", 2)
 DEFTRAIT_EXPR (IS_BOUNDED_ARRAY, "__is_bounded_array", 1)
 DEFTRAIT_EXPR (IS_CLASS, "__is_class", 1)
+DEFTRAIT_EXPR (IS_CONST, "__is_const", 1)
 DEFTRAIT_EXPR (IS_CONSTRUCTIBLE, "__is_constructible", -1)
 DEFTRAIT_EXPR (IS_CONVERTIBLE, "__is_convertible", 2)
 DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 57840176863..0d08900492b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12446,6 +12446,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_CLASS:
   return NON_UNION_CLASS_TYPE_P (type1);
 
+case CPTK_IS_CONST:
+  return CP_TYPE_CONST_P (type1);
+
 case CPTK_IS_CONSTRUCTIBLE:
   return is_xible (INIT_EXPR, type1, type2);
 
@@ -12688,6 +12691,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_ARRAY:
 case CPTK_IS_BOUNDED_ARRAY:
 case CPTK_IS_CLASS:
+case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
 case CPTK_IS_FUNCTION:
 case CPTK_IS_MEMBER_FUNCTION_POINTER:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 02b4b4d745d..e3640faeb96 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -71,6 +71,9 @@
 #if !__has_builtin (__is_class)
 # error "__has_builtin (__is_class) failed"
 #endif
+#if !__has_builtin (__is_const)
+# error "__has_builtin (__is_const) failed"
+#endif
 #if !__has_builtin (__is_constructible)
 # error "__has_builtin (__is_constructible) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_const.C 
b/gcc/testsuite/g++.dg/ext/is_const.C
new file mode 100644
index 000..8a0e8df72a9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_const.C
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+using cClassType = const ClassType;
+using vClassType = volatile ClassType;
+using cvClassType = const volatile ClassType;
+
+// Positive tests.
+SA(__is_const(const int));
+SA(__is_const(const volatile int));
+SA(__is_const(cClassType));
+SA(__is_const(cvClassType));
+
+// Negative tests.
+SA(!__is_const(int));
+SA(!__is_const(volatile int));
+SA(!__is_const(ClassType));
+SA(!__is_const(vClassType));
-- 
2.43.2

[PATCH v7 12/22] libstdc++: Optimize std::remove_extent compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of std::remove_extent
by dispatching to the new __remove_extent built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_extent): Use __remove_extent
built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 6346d1daee2..73ddce351fd 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2092,6 +2092,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Array modifications.
 
   /// remove_extent
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__remove_extent)
+  template
+struct remove_extent
+{ using type = __remove_extent(_Tp); };
+#else
   template
 struct remove_extent
 { using type = _Tp; };
@@ -2103,6 +2108,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct remove_extent<_Tp[]>
 { using type = _Tp; };
+#endif
 
   /// remove_all_extents
   template
-- 
2.43.2

[PATCH v7 14/22] libstdc++: Optimize std::remove_all_extents compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of
std::remove_all_extents by dispatching to the new __remove_all_extents
built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_all_extents): Use
__remove_all_extents built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 73ddce351fd..34475e6279a 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2111,6 +2111,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// remove_all_extents
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__remove_all_extents)
+  template
+struct remove_all_extents
+{ using type = __remove_all_extents(_Tp); };
+#else
   template
 struct remove_all_extents
 { using type = _Tp; };
@@ -2122,6 +2127,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct remove_all_extents<_Tp[]>
 { using type = typename remove_all_extents<_Tp>::type; };
+#endif
 
 #if __cplusplus > 201103L
   /// Alias template for remove_extent
-- 
2.43.2

[PATCH v7 17/22] c++: Implement __add_rvalue_reference built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::add_rvalue_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __add_rvalue_reference.
* semantics.cc (finish_trait_type): Handle
CPTK_ADD_RVALUE_REFERENCE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__add_rvalue_reference.
* g++.dg/ext/add_rvalue_reference.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  8 
 .../g++.dg/ext/add_rvalue_reference.C | 20 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +++
 4 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/add_rvalue_reference.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 9a27dca4ea3..173818adf79 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -50,6 +50,7 @@
 
 DEFTRAIT_TYPE (ADD_LVALUE_REFERENCE, "__add_lvalue_reference", 1)
 DEFTRAIT_TYPE (ADD_POINTER, "__add_pointer", 1)
+DEFTRAIT_TYPE (ADD_RVALUE_REFERENCE, "__add_rvalue_reference", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_ASSIGN, "__has_nothrow_assign", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_CONSTRUCTOR, "__has_nothrow_constructor", 1)
 DEFTRAIT_EXPR (HAS_NOTHROW_COPY, "__has_nothrow_copy", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 05f5b62f9df..19d6f87a9ea 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12793,6 +12793,14 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
type1 = TREE_TYPE (type1);
   return build_pointer_type (type1);
 
+case CPTK_ADD_RVALUE_REFERENCE:
+  if (VOID_TYPE_P (type1)
+ || (FUNC_OR_METHOD_TYPE_P (type1)
+ && (type_memfn_quals (type1) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (type1) != REF_QUAL_NONE)))
+   return type1;
+  return cp_build_reference_type (type1, /*rval=*/true);
+
 case CPTK_REMOVE_ALL_EXTENTS:
   return strip_array_types (type1);
 
diff --git a/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C 
b/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C
new file mode 100644
index 000..c92fe6bfa17
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/add_rvalue_reference.C
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__add_rvalue_reference(int), int&&));
+SA(__is_same(__add_rvalue_reference(int&&), int&&));
+SA(__is_same(__add_rvalue_reference(int&), int&));
+SA(__is_same(__add_rvalue_reference(const int), const int&&));
+SA(__is_same(__add_rvalue_reference(int*), int*&&));
+SA(__is_same(__add_rvalue_reference(ClassType&&), ClassType&&));
+SA(__is_same(__add_rvalue_reference(ClassType), ClassType&&));
+SA(__is_same(__add_rvalue_reference(int(int)), int(&&)(int)));
+SA(__is_same(__add_rvalue_reference(void), void));
+SA(__is_same(__add_rvalue_reference(const void), const void));
+SA(__is_same(__add_rvalue_reference(bool(int) const), bool(int) const));
+SA(__is_same(__add_rvalue_reference(bool(int) &), bool(int) &));
+SA(__is_same(__add_rvalue_reference(bool(int) const &&), bool(int) const &&));
+SA(__is_same(__add_rvalue_reference(bool(int)), bool(&&)(int)));
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 3fca9cfabcc..c2503c5d82b 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -8,6 +8,9 @@
 #if !__has_builtin (__add_pointer)
 # error "__has_builtin (__add_pointer) failed"
 #endif
+#if !__has_builtin (__add_rvalue_reference)
+# error "__has_builtin (__add_rvalue_reference) failed"
+#endif
 #if !__has_builtin (__builtin_addressof)
 # error "__has_builtin (__builtin_addressof) failed"
 #endif
-- 
2.43.2

[PATCH v7 10/22] libstdc++: Optimize std::add_pointer compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of std::add_pointer
by dispatching to the new __add_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (add_pointer): Use __add_pointer
built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index c4585a23df9..6346d1daee2 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2149,6 +2149,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 #endif
 
+  /// add_pointer
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__add_pointer)
+  template
+struct add_pointer
+{ using type = __add_pointer(_Tp); };
+#else
   template
 struct __add_pointer_helper
 { using type = _Tp; };
@@ -2157,7 +2163,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __add_pointer_helper<_Tp, __void_t<_Tp*>>
 { using type = _Tp*; };
 
-  /// add_pointer
   template
 struct add_pointer
 : public __add_pointer_helper<_Tp>
@@ -2170,6 +2175,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct add_pointer<_Tp&&>
 { using type = _Tp*; };
+#endif
 
 #if __cplusplus > 201103L
   /// Alias template for remove_pointer
-- 
2.43.2

[PATCH v7 11/22] c++: Implement __remove_extent built-in trait

2024-02-19 Thread Ken Matsui

This patch implements built-in trait for std::remove_extent.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_extent.
* semantics.cc (finish_trait_type): Handle CPTK_REMOVE_EXTENT.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __remove_extent.
* g++.dg/ext/remove_extent.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  5 +
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/remove_extent.C | 16 
 4 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/remove_extent.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 63f879287ce..577c96d579b 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -100,6 +100,7 @@ DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_tempo
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
 DEFTRAIT_TYPE (REMOVE_CV, "__remove_cv", 1)
 DEFTRAIT_TYPE (REMOVE_CVREF, "__remove_cvref", 1)
+DEFTRAIT_TYPE (REMOVE_EXTENT, "__remove_extent", 1)
 DEFTRAIT_TYPE (REMOVE_POINTER, "__remove_pointer", 1)
 DEFTRAIT_TYPE (REMOVE_REFERENCE, "__remove_reference", 1)
 DEFTRAIT_TYPE (TYPE_PACK_ELEMENT, "__type_pack_element", -1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 635441a7a90..58696225fc4 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12793,6 +12793,11 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
type1 = TREE_TYPE (type1);
   return cv_unqualified (type1);
 
+case CPTK_REMOVE_EXTENT:
+  if (TREE_CODE (type1) == ARRAY_TYPE)
+   type1 = TREE_TYPE (type1);
+  return type1;
+
 case CPTK_REMOVE_POINTER:
   if (TYPE_PTR_P (type1))
type1 = TREE_TYPE (type1);
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 9d861398bae..5d5cbe3b019 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -182,6 +182,9 @@
 #if !__has_builtin (__remove_cvref)
 # error "__has_builtin (__remove_cvref) failed"
 #endif
+#if !__has_builtin (__remove_extent)
+# error "__has_builtin (__remove_extent) failed"
+#endif
 #if !__has_builtin (__remove_pointer)
 # error "__has_builtin (__remove_pointer) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/remove_extent.C 
b/gcc/testsuite/g++.dg/ext/remove_extent.C
new file mode 100644
index 000..6183aca5a48
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/remove_extent.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+class ClassType { };
+
+SA(__is_same(__remove_extent(int), int));
+SA(__is_same(__remove_extent(int[2]), int));
+SA(__is_same(__remove_extent(int[2][3]), int[3]));
+SA(__is_same(__remove_extent(int[][3]), int[3]));
+SA(__is_same(__remove_extent(const int[2]), const int));
+SA(__is_same(__remove_extent(ClassType), ClassType));
+SA(__is_same(__remove_extent(ClassType[2]), ClassType));
+SA(__is_same(__remove_extent(ClassType[2][3]), ClassType[3]));
+SA(__is_same(__remove_extent(ClassType[][3]), ClassType[3]));
+SA(__is_same(__remove_extent(const ClassType[2]), const ClassType));
-- 
2.43.2

[PATCH v7 08/22] libstdc++: Optimize std::is_unbounded_array compilation performance

2024-02-19 Thread Ken Matsui

This patch optimizes the compilation performance of
std::is_unbounded_array by dispatching to the new
__is_unbounded_array built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_unbounded_array_v): Use
__is_unbounded_array built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 6407738a726..c4585a23df9 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3706,11 +3706,16 @@ template
   /// True for a type that is an array of unknown bound.
   /// @ingroup variable_templates
   /// @since C++20
+# if _GLIBCXX_USE_BUILTIN_TRAIT(__is_unbounded_array)
+  template
+inline constexpr bool is_unbounded_array_v = __is_unbounded_array(_Tp);
+# else
   template
 inline constexpr bool is_unbounded_array_v = false;
 
   template
 inline constexpr bool is_unbounded_array_v<_Tp[]> = true;
+# endif
 
   /// True for a type that is an array of known bound.
   /// @since C++20
-- 
2.43.2

Re: [PATCH] libgcc: fix Win32 CV abnormal spurious wakeups in timed wait [PR113850]

2024-02-19 Thread Matteo Italia


Il 17/02/24 01:24, Jonathan Yong ha scritto:

On 2/10/24 10:10, Matteo Italia wrote:

Il 09/02/24 15:18, Matteo Italia ha scritto:

The Win32 threading model uses __gthr_win32_abs_to_rel_time to convert
the timespec used in gthreads to specify the absolute time for end of
the condition variables timed wait to a milliseconds value relative to
"now" to pass to the Win32 SleepConditionVariableCS function.

Unfortunately, the conversion is incorrect, as, due to a typo, it
returns the relative time _in seconds_, so SleepConditionVariableCS
receives a timeout value 1000 times shorter than it should be, 
resulting

in a huge amount of spurious wakeups in calls such as
std::condition_variable::wait_for or wait_until.

Re-reading the commit message I found a few typos, and it was 
generally a bit more obscure than I like; reworded it now, hope it's 
better.


Thanks, pushed to master and 13.x branches.
Great, thank you! Do I need to change the status of the Bugzilla entry 
to RESOLVED, or it's going to be closed automatically at the next 
releases, or something else?

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Biener

On Mon, 19 Feb 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> >> I suppose that's better than the first version when a block has a
> >> large number of dominance frontiers.  But I can't remember whether
> >> that was the case in PR98863.  I have a feeling that I tried the above
> >> as part of the PR, since it's the obvious way of applying liveness
> >> once per block rather than once per edge.  But I think it was more
> >> just sheer weight of numbers.
> >
> > Note at least this (see below for the actual patch for trunk) is
> > in the linear part, so it shouldn't trigger any quadraticness.
> > It speeds up the testcase the same as the original one.
> 
> But some of the "quadracticness" is from O(nblocks*nregisters) or
> O(nedges*nregisters), rather than through iteration.

Yeah, that's of course what DF_LR already is subject to, though,
so we hardly make that worse here.

> >> I wonder whether we could create a new DEF bitmap on-the-fly while
> >> creating the defs and uses, restricting it to potential PHI registers.
> >> The test for potential PHI registers is constant time, and we could
> >> build the bitmaps as tree views before converting to list views.
> >> Can give it a go if that sounds OK.
> >
> > Do that within DF itself?  Not sure about that.
> 
> No, in rtl-ssa.  It would be convenient to have a version of
> bitmap_ior_and_into in which the final bitmap is an sbitmap though.
> 
> I.e., schematically:
> 
>   EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
>   if (bitmap_ior_and_into ([b2], b1_def,
>bi.potential_phi_regs)
>   && !bitmap_empty_p ([b2]))
> // Propagate the (potential) new phi node definitions in B2.
> bitmap_set_bit (worklist, b2);
> 
> with a new overload to make that possible.
> 
> > Anyway, I think the below is small enough that it would also qualify
> > for backporting.  Unless there's a problem with it, of course.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> 
> I still think this is likely to reintroduce the problem in PR98863.
> I won't object further though, so yeah, please go ahead.

The core issue there was memory-usage, that's unlikely going to be
worse with this patch.

I've pushed it to trunk and will keep an eye on our testers that
cover WRF.

Thanks,
Richard.

> Richard
> 
> > Thanks,
> > Richard.
> >
> > From d6f57f72fbbdb832b33e4ba5ccbf60a8d90ea264 Mon Sep 17 00:00:00 2001
> > From: Richard Biener 
> > Date: Mon, 19 Feb 2024 11:10:50 +0100
> > Subject: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time
> >  hog
> > To: gcc-patches@gcc.gnu.org
> >
> > The following tries to address the PHI insertion compile-time hog in
> > RTL fwprop observed with the PR54052 testcase where the loop computing
> > the "unfiltered" set of variables possibly needing PHI nodes for each
> > block exhibits quadratic compile-time and memory-use.
> >
> > It does so by pruning the local DEFs with LR_OUT of the block, removing
> > regs that can never be LR_IN (defined by this block) in the dominance
> > frontier.
> >
> > PR rtl-optimization/54052
> > * rtl-ssa/blocks.cc (function_info::place_phis): Filter
> > local defs by LR_OUT.
> > ---
> >  gcc/rtl-ssa/blocks.cc | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> > index 8996443e8d5..cf4224e61ec 100644
> > --- a/gcc/rtl-ssa/blocks.cc
> > +++ b/gcc/rtl-ssa/blocks.cc
> > @@ -645,7 +645,12 @@ function_info::place_phis (build_info )
> >if (bitmap_empty_p ([b1]))
> > continue;
> >  
> > -  bitmap b1_def = _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, b1))->def;
> > +  // Defs in B1 that are possibly in LR_IN in the dominance frontier
> > +  // blocks.
> > +  auto_bitmap b1_def;
> > +  bitmap_and (b1_def, _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> > b1))->def,
> > + DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1)));
> > +
> >bitmap_iterator bmi;
> >unsigned int b2;
> >EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Sandiford

Richard Biener  writes:
>> I suppose that's better than the first version when a block has a
>> large number of dominance frontiers.  But I can't remember whether
>> that was the case in PR98863.  I have a feeling that I tried the above
>> as part of the PR, since it's the obvious way of applying liveness
>> once per block rather than once per edge.  But I think it was more
>> just sheer weight of numbers.
>
> Note at least this (see below for the actual patch for trunk) is
> in the linear part, so it shouldn't trigger any quadraticness.
> It speeds up the testcase the same as the original one.

But some of the "quadracticness" is from O(nblocks*nregisters) or
O(nedges*nregisters), rather than through iteration.

>> I wonder whether we could create a new DEF bitmap on-the-fly while
>> creating the defs and uses, restricting it to potential PHI registers.
>> The test for potential PHI registers is constant time, and we could
>> build the bitmaps as tree views before converting to list views.
>> Can give it a go if that sounds OK.
>
> Do that within DF itself?  Not sure about that.

No, in rtl-ssa.  It would be convenient to have a version of
bitmap_ior_and_into in which the final bitmap is an sbitmap though.

I.e., schematically:

  EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
if (bitmap_ior_and_into ([b2], b1_def,
 bi.potential_phi_regs)
&& !bitmap_empty_p ([b2]))
  // Propagate the (potential) new phi node definitions in B2.
  bitmap_set_bit (worklist, b2);

with a new overload to make that possible.

> Anyway, I think the below is small enough that it would also qualify
> for backporting.  Unless there's a problem with it, of course.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?

I still think this is likely to reintroduce the problem in PR98863.
I won't object further though, so yeah, please go ahead.

Richard

> Thanks,
> Richard.
>
> From d6f57f72fbbdb832b33e4ba5ccbf60a8d90ea264 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Mon, 19 Feb 2024 11:10:50 +0100
> Subject: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time
>  hog
> To: gcc-patches@gcc.gnu.org
>
> The following tries to address the PHI insertion compile-time hog in
> RTL fwprop observed with the PR54052 testcase where the loop computing
> the "unfiltered" set of variables possibly needing PHI nodes for each
> block exhibits quadratic compile-time and memory-use.
>
> It does so by pruning the local DEFs with LR_OUT of the block, removing
> regs that can never be LR_IN (defined by this block) in the dominance
> frontier.
>
>   PR rtl-optimization/54052
>   * rtl-ssa/blocks.cc (function_info::place_phis): Filter
>   local defs by LR_OUT.
> ---
>  gcc/rtl-ssa/blocks.cc | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> index 8996443e8d5..cf4224e61ec 100644
> --- a/gcc/rtl-ssa/blocks.cc
> +++ b/gcc/rtl-ssa/blocks.cc
> @@ -645,7 +645,12 @@ function_info::place_phis (build_info )
>if (bitmap_empty_p ([b1]))
>   continue;
>  
> -  bitmap b1_def = _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, b1))->def;
> +  // Defs in B1 that are possibly in LR_IN in the dominance frontier
> +  // blocks.
> +  auto_bitmap b1_def;
> +  bitmap_and (b1_def, _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> b1))->def,
> +   DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1)));
> +
>bitmap_iterator bmi;
>unsigned int b2;
>EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)

Re: [PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-19 Thread Richard Earnshaw (lists)

On 19/02/2024 10:58, Tamar Christina wrote:
>> -Original Message-
>> From: Tamar Christina
>> Sent: Thursday, February 15, 2024 11:05 AM
>> To: Richard Earnshaw (lists) ; gcc-
>> patc...@gcc.gnu.org
>> Cc: nd ; Marcus Shawcroft ; Kyrylo
>> Tkachov ; Richard Sandiford
>> 
>> Subject: RE: [PATCH]AArch64: xfail modes_1.f90 [PR107071]
>>
>>> -Original Message-
>>> From: Richard Earnshaw (lists) 
>>> Sent: Thursday, February 15, 2024 11:01 AM
>>> To: Tamar Christina ; gcc-patches@gcc.gnu.org
>>> Cc: nd ; Marcus Shawcroft ;
>> Kyrylo
>>> Tkachov ; Richard Sandiford
>>> 
>>> Subject: Re: [PATCH]AArch64: xfail modes_1.f90 [PR107071]
>>>
>>> On 15/02/2024 10:57, Tamar Christina wrote:
 Hi All,

 This test has never worked on AArch64 since the day it was committed.  It 
 has
 a number of issues that prevent it from working on AArch64:

 1.  IEEE does not require that FP operations raise a SIGFPE for FP 
 operations,
     only that an exception is raised somehow.

 2. Most Arm designed cores don't raise SIGFPE and instead set a status 
 register
    and some partner cores raise a SIGILL instead.

 3. The way it checks for feenableexcept doesn't really work for AArch64.

 As such this test doesn't seem to really provide much value on AArch64 so 
 we
 should just xfail it.

 Regtested on aarch64-none-linux-gnu and no issues.

 Ok for master?
>>>
>>> Wouldn't it be better to just skip the test.  XFAIL just adds clutter to 
>>> verbose
>> output
>>> and suggests that someday the tools might be fixed for this case.
>>>
>>> Better still would be a new dg-requires fp_exceptions_raise_sigfpe as a 
>>> guard for
>>> the test.
>>
> 
> It looks like this is similar to 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 so
> I'll just similarly skip it.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90 
> b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> index 
> 205c47f38007d06116289c19d6b23cf3bf83bd48..e29d8c678e6e51c3f2e5dac53c7703bb18a99ac4
>  100644
> --- a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> +++ b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
> @@ -1,5 +1,5 @@
>  ! { dg-do run }
> -!
> +! { dg-skip-if "PR libfortran/78314" { aarch64*-*-gnu* arm*-*-gnueabi 
> arm*-*-gnueabihf } }
>  ! Test IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES
>  
> Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK, but please give the fortran maintainers 24hrs to comment before pushing.

R.

> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>   PR fortran/107071
>   * gfortran.dg/ieee/modes_1.f90: skip aarch64, arm.

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Biener

On Mon, 19 Feb 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Mon, 19 Feb 2024, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > The following tries to address the PHI insertion compile-time hog in
> >> > RTL fwprop observed with the PR54052 testcase where the loop computing
> >> > the "unfiltered" set of variables possibly needing PHI nodes for each
> >> > block exhibits quadratic compile-time and memory-use.
> >> >
> >> > Instead of only pruning the set of candidate regs by LR_IN in the
> >> > second worklist loop do this when computing "unfiltered" already.
> >> >
> >> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >> >
> >> > I'll note that in PR98863 you say in comment#39
> >> >
> >> > "Just to give an update on this: I have a patch that reduces the
> >> > amount of memory consumed by fwprop so that it no longer seems
> >> > to be outlier.  However, it involves doing more bitmap operations.
> >> > In this testcase we have a larger number of registers that
> >> > seem to be live but unused across a large region of code,
> >> > so bitmap ANDs with the live in sets are expensive and hit
> >> > the worst-case O(nblocksnregisters).  I'm still trying to find
> >> > a way of reducing the effect of that."
> >> >
> >> > suggesting that the very AND operation I'm introducing below
> >> > was an actual problem.  It's just not very clear what testcase
> >> > this was on (the PR hasn't one, it just talks about WRF with LTO
> >> > and then some individual TUs of it).
> >> 
> >> Yeah, like you say, I think this kind of AND was exactly the problem.
> >> If the DEF set is much smaller than the IN set, we can spend a lot of
> >> compile time (and cache) iterating over the leading elements of the
> >> IN set.  So this could be trading one hog for another.
> >> 
> >> Could we use some heuristic to choose between the two?  If the IN set
> >> is "sensible", do the AND, otherwise keep it as-is?
> >
> > Not sure how, I don't think DF caches the set sizes (we could
> > compute them, of course).  But I just made an experiment and
> > using DF_LR_OUT instead of DF_LR_BB_INFO->def improves compile-time
> > as well.  So incremental ontop of the posted:
> >
> > diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> > index 9d1cd1b0365..6fd08602c1b 100644
> > --- a/gcc/rtl-ssa/blocks.cc
> > +++ b/gcc/rtl-ssa/blocks.cc
> > @@ -645,12 +645,11 @@ function_info::place_phis (build_info )
> >if (bitmap_empty_p ([b1]))
> > continue;
> >  
> > -  bitmap b1_def = _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> > b1))->def;
> > +  bitmap b1_def = DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1));
> >bitmap_iterator bmi;
> >unsigned int b2;
> >EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
> > -   if (bitmap_ior_and_into ([b2], b1_def,
> > -DF_LR_IN (BASIC_BLOCK_FOR_FN (m_fn, b2)))
> > +   if (bitmap_ior_into ([b2], b1_def)
> > && !bitmap_empty_p ([b2]))
> >   // Propagate the (potential) new phi node definitions in B2.
> >   bitmap_set_bit (worklist, b2);
> >
> > of course that's too big (including live-through), but we could
> > prune by DF_LR_OUT like
> >
> > diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> > index 9d1cd1b0365..6a4dd05908f 100644
> > --- a/gcc/rtl-ssa/blocks.cc
> > +++ b/gcc/rtl-ssa/blocks.cc
> > @@ -645,12 +645,13 @@ function_info::place_phis (build_info )
> >if (bitmap_empty_p ([b1]))
> > continue;
> >  
> > -  bitmap b1_def = _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> > b1))->def;
> > +  auto_bitmap b1_def;
> > +  bitmap_and (b1_def, _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> > b1))->def,
> > + DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1)));
> >bitmap_iterator bmi;
> >unsigned int b2;
> >EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
> > -   if (bitmap_ior_and_into ([b2], b1_def,
> > -DF_LR_IN (BASIC_BLOCK_FOR_FN (m_fn, b2)))
> > +   if (bitmap_ior_into ([b2], b1_def)
> > && !bitmap_empty_p ([b2]))
> >   // Propagate the (potential) new phi node definitions in B2.
> >   bitmap_set_bit (worklist, b2);
> >
> > so for the testcase it seems we have a lot of local defined but
> > not "global" used defs.
> >
> > Would that work for you or am I missing something?
> 
> I suppose that's better than the first version when a block has a
> large number of dominance frontiers.  But I can't remember whether
> that was the case in PR98863.  I have a feeling that I tried the above
> as part of the PR, since it's the obvious way of applying liveness
> once per block rather than once per edge.  But I think it was more
> just sheer weight of numbers.

Note at least this (see below for the actual patch for trunk) is
in the linear part, so it shouldn't trigger any quadraticness.
It speeds up the testcase the same as the original one.

> I wonder

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-19 Thread Tamar Christina

> -Original Message-
> From: Li, Pan2 
> Sent: Monday, February 19, 2024 12:59 PM
> To: Tamar Christina ; Richard Biener
> 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> ; kito.ch...@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Tamar for comments and explanations.
> 
> > I think we should actually do an indirect optab here, because the IFN can 
> > be used
> > to replace the general representation of saturating arithmetic.
> 
> > e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> > is inefficient on all targets and so the IFN can always expand to something 
> > that's
> more
> > efficient like the branchless version add_sat2.
> 
> > I think this is why you suggested a new tree code below, but we don't 
> > really need
> > tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN
> 
> Yes, the backend could choose a branchless(of course we always hate branch for
> performance) code-gen or even better there is one saturation insn.
> Good to learn DEF_INTERNAL_INT_EXT_FN, and will have a try for it.
> 
> > Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the
> sign
> > should be determined by the types at expansion time.  i.e. there should 
> > only be
> > .SAT_ADD.
> 
> Got it, my initial idea comes from that we may have two insns for saturation 
> add,
> mostly these insns need to be signed or unsigned.
> For example, slt/sltu in riscv scalar. But I am not very clear about a 
> scenario like this.
> During define_expand in backend, we hit the standard name
> sat_add_3 but can we tell it is signed or not here? AFAIK, we only have 
> QI, HI,
> SI and DI.

Yeah, the way DEF_INTERNAL_SIGNED_OPTAB_FN works is that you give it two optabs,
one for when it's signed and one for when it's unsigned, and the right one is 
picked
automatically during expansion.  But in GIMPLE you'd only have one IFN.

> Maybe I will have the answer after try DEF_INTERNAL_SIGNED_OPTAB_FN, will
> keep you posted.

Awesome, Thanks!

Tamar
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Monday, February 19, 2024 4:55 PM
> To: Li, Pan2 ; Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> ; kito.ch...@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks for doing this!
> 
> > -Original Message-
> > From: Li, Pan2 
> > Sent: Monday, February 19, 2024 8:42 AM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> > ; kito.ch...@gmail.com; Tamar Christina
> > 
> > Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> >
> > Thanks Richard for comments.
> >
> > > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > > the corresponding ssadd/usadd optabs.  There's not much documentation
> > > unfortunately besides the use of gen_*_fixed_libfunc usage where the
> comment
> > > suggests this is used for fixed-point operations.  It looks like arm uses
> > > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > I find the related description about plus family in GCC internals doc but 
> > it doesn't
> > mention
> > anything about mode m here.
> >
> > (plus:m x y)
> > (ss_plus:m x y)
> > (us_plus:m x y)
> > These three expressions all represent the sum of the values represented by x
> > and y carried out in machine mode m. They diff er in their behavior on 
> > overflow
> > of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> > at the maximum signed value representable in m; us_plus saturates at the
> > maximum unsigned value.
> >
> > > The natural thing is to use direct optab internal functions (that's what 
> > > you
> > > basically did, but you added a new optab, IMO without good reason).
> 
> I think we should actually do an indirect optab here, because the IFN can be 
> used
> to replace the general representation of saturating arithmetic.
> 
> e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> is inefficient on all targets and so the IFN can always expand to something 
> that's
> more
> efficient like the branchless version add_sat2.
> 
> I think this is why you suggested a new tree code below, but we don't really 
> need
> tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN.
> 
> >
> > That makes sense to me, I will try to leverage US_PLUS instead here.
> >
> > > More GIMPLE-like would be to let the types involved decide whether
> > > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > > and if we don't map 1:1 to optabs then instead use tree codes like
> > > S_PLUS_EXPR (mimicing RTL here).
> >
> > Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I 
> > add
> one
> > restriction
> > like

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-19 Thread Li, Pan2

Thanks Tamar for comments and explanations.

> I think we should actually do an indirect optab here, because the IFN can be 
> used
> to replace the general representation of saturating arithmetic.

> e.g. the __builtin_add_overflow case in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> is inefficient on all targets and so the IFN can always expand to something 
> that's more
> efficient like the branchless version add_sat2. 

> I think this is why you suggested a new tree code below, but we don't really 
> need
> tree-codes for this. It can be done cleaner using the same way as 
> DEF_INTERNAL_INT_EXT_FN

Yes, the backend could choose a branchless(of course we always hate branch for 
performance) code-gen or even better there is one saturation insn.
Good to learn DEF_INTERNAL_INT_EXT_FN, and will have a try for it.

> Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the 
> sign
> should be determined by the types at expansion time.  i.e. there should only 
> be
> .SAT_ADD.

Got it, my initial idea comes from that we may have two insns for saturation 
add, mostly these insns need to be signed or unsigned.
For example, slt/sltu in riscv scalar. But I am not very clear about a scenario 
like this. During define_expand in backend, we hit the standard name
sat_add_3 but can we tell it is signed or not here? AFAIK, we only have QI, 
HI, SI and DI.
Maybe I will have the answer after try DEF_INTERNAL_SIGNED_OPTAB_FN, will keep 
you posted.

Pan

-Original Message-
From: Tamar Christina  
Sent: Monday, February 19, 2024 4:55 PM
To: Li, Pan2 ; Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

Thanks for doing this!

> -Original Message-
> From: Li, Pan2 
> Sent: Monday, February 19, 2024 8:42 AM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> ; kito.ch...@gmail.com; Tamar Christina
> 
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Richard for comments.
> 
> > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > the corresponding ssadd/usadd optabs.  There's not much documentation
> > unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> > suggests this is used for fixed-point operations.  It looks like arm uses
> > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> 
> I find the related description about plus family in GCC internals doc but it 
> doesn't
> mention
> anything about mode m here.
> 
> (plus:m x y)
> (ss_plus:m x y)
> (us_plus:m x y)
> These three expressions all represent the sum of the values represented by x
> and y carried out in machine mode m. They diff er in their behavior on 
> overflow
> of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> at the maximum signed value representable in m; us_plus saturates at the
> maximum unsigned value.
> 
> > The natural thing is to use direct optab internal functions (that's what you
> > basically did, but you added a new optab, IMO without good reason).

I think we should actually do an indirect optab here, because the IFN can be 
used
to replace the general representation of saturating arithmetic.

e.g. the __builtin_add_overflow case in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
is inefficient on all targets and so the IFN can always expand to something 
that's more
efficient like the branchless version add_sat2. 

I think this is why you suggested a new tree code below, but we don't really 
need
tree-codes for this. It can be done cleaner using the same way as 
DEF_INTERNAL_INT_EXT_FN.

> 
> That makes sense to me, I will try to leverage US_PLUS instead here.
> 
> > More GIMPLE-like would be to let the types involved decide whether
> > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > and if we don't map 1:1 to optabs then instead use tree codes like
> > S_PLUS_EXPR (mimicing RTL here).
> 
> Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I 
> add one
> restriction
> like unsigned_p (type) in match.pd. Looks we have a better way here.
> 

Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the sign
should be determined by the types at expansion time.  i.e. there should only be
.SAT_ADD. 

i.e. instead of this

+DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)

You should use DEF_INTERNAL_SIGNED_OPTAB_FN.

Regards,
Tamar

> > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> 
> AFAIK, the scalar of the riscv backend doesn't have fixed-point but the 
> vector does
> have. They
> share the same mode as vector integer. For example, RVVM1SI in vector-
> iterators.md. Kito
> and Juzhe can help to correct me if any misunderstandings.
> 
> Pan
> 
> -Original Message-
>

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Sandiford

Richard Biener  writes:
> On Mon, 19 Feb 2024, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > The following tries to address the PHI insertion compile-time hog in
>> > RTL fwprop observed with the PR54052 testcase where the loop computing
>> > the "unfiltered" set of variables possibly needing PHI nodes for each
>> > block exhibits quadratic compile-time and memory-use.
>> >
>> > Instead of only pruning the set of candidate regs by LR_IN in the
>> > second worklist loop do this when computing "unfiltered" already.
>> >
>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> >
>> > I'll note that in PR98863 you say in comment#39
>> >
>> > "Just to give an update on this: I have a patch that reduces the
>> > amount of memory consumed by fwprop so that it no longer seems
>> > to be outlier.  However, it involves doing more bitmap operations.
>> > In this testcase we have a larger number of registers that
>> > seem to be live but unused across a large region of code,
>> > so bitmap ANDs with the live in sets are expensive and hit
>> > the worst-case O(nblocksnregisters).  I'm still trying to find
>> > a way of reducing the effect of that."
>> >
>> > suggesting that the very AND operation I'm introducing below
>> > was an actual problem.  It's just not very clear what testcase
>> > this was on (the PR hasn't one, it just talks about WRF with LTO
>> > and then some individual TUs of it).
>> 
>> Yeah, like you say, I think this kind of AND was exactly the problem.
>> If the DEF set is much smaller than the IN set, we can spend a lot of
>> compile time (and cache) iterating over the leading elements of the
>> IN set.  So this could be trading one hog for another.
>> 
>> Could we use some heuristic to choose between the two?  If the IN set
>> is "sensible", do the AND, otherwise keep it as-is?
>
> Not sure how, I don't think DF caches the set sizes (we could
> compute them, of course).  But I just made an experiment and
> using DF_LR_OUT instead of DF_LR_BB_INFO->def improves compile-time
> as well.  So incremental ontop of the posted:
>
> diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> index 9d1cd1b0365..6fd08602c1b 100644
> --- a/gcc/rtl-ssa/blocks.cc
> +++ b/gcc/rtl-ssa/blocks.cc
> @@ -645,12 +645,11 @@ function_info::place_phis (build_info )
>if (bitmap_empty_p ([b1]))
> continue;
>  
> -  bitmap b1_def = _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> b1))->def;
> +  bitmap b1_def = DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1));
>bitmap_iterator bmi;
>unsigned int b2;
>EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
> -   if (bitmap_ior_and_into ([b2], b1_def,
> -DF_LR_IN (BASIC_BLOCK_FOR_FN (m_fn, b2)))
> +   if (bitmap_ior_into ([b2], b1_def)
> && !bitmap_empty_p ([b2]))
>   // Propagate the (potential) new phi node definitions in B2.
>   bitmap_set_bit (worklist, b2);
>
> of course that's too big (including live-through), but we could
> prune by DF_LR_OUT like
>
> diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> index 9d1cd1b0365..6a4dd05908f 100644
> --- a/gcc/rtl-ssa/blocks.cc
> +++ b/gcc/rtl-ssa/blocks.cc
> @@ -645,12 +645,13 @@ function_info::place_phis (build_info )
>if (bitmap_empty_p ([b1]))
> continue;
>  
> -  bitmap b1_def = _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> b1))->def;
> +  auto_bitmap b1_def;
> +  bitmap_and (b1_def, _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
> b1))->def,
> + DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1)));
>bitmap_iterator bmi;
>unsigned int b2;
>EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
> -   if (bitmap_ior_and_into ([b2], b1_def,
> -DF_LR_IN (BASIC_BLOCK_FOR_FN (m_fn, b2)))
> +   if (bitmap_ior_into ([b2], b1_def)
> && !bitmap_empty_p ([b2]))
>   // Propagate the (potential) new phi node definitions in B2.
>   bitmap_set_bit (worklist, b2);
>
> so for the testcase it seems we have a lot of local defined but
> not "global" used defs.
>
> Would that work for you or am I missing something?

I suppose that's better than the first version when a block has a
large number of dominance frontiers.  But I can't remember whether
that was the case in PR98863.  I have a feeling that I tried the above
as part of the PR, since it's the obvious way of applying liveness
once per block rather than once per edge.  But I think it was more
just sheer weight of numbers.

I wonder whether we could create a new DEF bitmap on-the-fly while
creating the defs and uses, restricting it to potential PHI registers.
The test for potential PHI registers is constant time, and we could
build the bitmaps as tree views before converting to list views.
Can give it a go if that sounds OK.

Thanks,
Richard

[Patch] libgomp: Device load_image - minor num-funcs/vars check improvement

2024-02-19 Thread Tobias Burnus

When debugging a linker issue, leading to a mismatch in the number of 
host/device functions, I was surprised by seeing one additional entry. 
Well, it turned out to be due to the ICV variable.


This patch makes it more consistent. The "+1" is returned since 
r12-2769-g0bac793ed6bad2 (for the on-device omp_get_device_num), 
extended in r13-2545-g9f2fca56593a2b for a struct to support more ICV 
variables on the devices [to handle OMP_..._DEV environment variables].


As the value is returned unconditionally, it makes sense to use it both 
for the expected-value diagnostic and for the condition further below.


Comments, suggestions, remarks?

Tobias

PS: Alternative would be to make the plugin's value depend on whether 
the data was loaded. But that would make the number-of-entries assert 
weaker and might cause corner-case issues when a slightly older libgomp 
plugin is used with the updated libgomp.so. Thus, I have settled for the 
attached variant.libgomp: Device load_image - improve minor num-funcs/vars check

The run time library loads the offload functions and variable and optionally
the ICV variable and returns the number of loaded items, which has to match
the host side. The plugin returns "+1" (since GCC 12) for the ICV variable
entry, independently whether it was loaded or not, but the var's value
(start == end == 0) can be used to detect when this failed.

Thus, we can tighten the assert check - which this commit does together with
making the output less surprising - and simplify the condition further below.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): If ICV variable
	is is not available, decrement other_count and thus the return value.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_load_image_to_device): Extend fatal-error message;
	simplify a condition.

 libgomp/target.c | 78 +---
 1 file changed, 35 insertions(+), 43 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index 1367e9cce6c..456a9147154 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2355,15 +2355,14 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 num_ind_funcs
   ? (uint64_t *) host_ind_func_table : NULL);
 
-  if (num_target_entries != num_funcs + num_vars
-  /* "+1" due to the additional ICV struct.  */
-  && num_target_entries != num_funcs + num_vars + 1)
+  /* The "+1" is due to the additional ICV struct.  */
+  if (num_target_entries != num_funcs + num_vars + 1)
 {
   gomp_mutex_unlock (>lock);
   if (is_register_lock)
 	gomp_mutex_unlock (_lock);
   gomp_fatal ("Cannot map target functions or variables"
-		  " (expected %u, have %u)", num_funcs + num_vars,
+		  " (expected %u + %u + 1, have %u)", num_funcs, num_vars,
 		  num_target_entries);
 }
 
@@ -2447,48 +2446,41 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   array++;
 }
 
-  /* Last entry is for a ICVs variable.
- Tolerate case where plugin does not return those entries.  */
-  if (num_funcs + num_vars < num_target_entries)
+  /* Last entry is for the ICV struct variable; if absent, start = end = 0.  */
+  struct addr_pair *icv_var = _table[num_funcs + num_vars];
+  if (icv_var->start != 0)
 {
-  struct addr_pair *var = _table[num_funcs + num_vars];
-
-  /* Start address will be non-zero for the ICVs variable if
-	 the variable was found in this image.  */
-  if (var->start != 0)
+  /* The index of the devicep within devices[] is regarded as its
+	 'device number', which is different from the per-device type
+	 devicep->target_id.  */
+  int dev_num = (int) (devicep - [0]);
+  struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
+  size_t var_size = icv_var->end - icv_var->start;
+  if (var_size != sizeof (struct gomp_offload_icvs))
 	{
-	  /* The index of the devicep within devices[] is regarded as its
-	 'device number', which is different from the per-device type
-	 devicep->target_id.  */
-	  int dev_num = (int) (devicep - [0]);
-	  struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
-	  size_t var_size = var->end - var->start;
-	  if (var_size != sizeof (struct gomp_offload_icvs))
-	{
-	  gomp_mutex_unlock (>lock);
-	  if (is_register_lock)
-		gomp_mutex_unlock (_lock);
-	  gomp_fatal ("offload plugin managed 'icv struct' not of expected "
-			  "format");
-	}
-	  /* Copy the ICVs variable to place on device memory, hereby
-	 actually designating its device number into effect.  */
-	  gomp_copy_host2dev (devicep, NULL, (void *) var->start, icvs,
-			  var_size, false, NULL);
-	  splay_tree_key k = >key;
-	  k->host_start = (uintptr_t) icvs;
-	  k->host_end =
-	k->host_start + (size_mask & sizeof (struct gomp_offload_icvs));
-	  k->tgt = tgt;
-	  k->tgt_offset = var->start;
-	  k->refcount = REFCOUNT_INFINITY;

[PATCH] IBM Z: Preserve exceptions in autovec-*-signaling-eq.c tests

2024-02-19 Thread Ilya Leoshkevich

DSE, DCE, and other passes are removing redundant signaling comparisons
from these tests, but the whole point is to check that GCC knows how to
emit them.  Use -fno-delete-dead-exceptions to prevent that.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/autovec-double-signaling-eq.c:
Preserve exceptions.
* gcc.target/s390/zvector/autovec-float-signaling-eq.c:
Likewise.
---
 .../gcc.target/s390/zvector/autovec-double-signaling-eq.c   | 2 +-
 .../gcc.target/s390/zvector/autovec-float-signaling-eq.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
index 3645d3cc393..b23568e06b4 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions" } */
+/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions -fno-delete-dead-exceptions" } */
 
 #include "autovec.h"
 
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
index d98aa0c494e..cd25d10c577 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions" } */
+/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions -fno-delete-dead-exceptions" } */
 
 #include "autovec.h"
 
-- 
2.43.2

Re: CI for "Option handling: add documentation URLs"

2024-02-19 Thread Mark Wielaard

On Sun, 2024-02-18 at 23:58 +0100, Mark Wielaard wrote:
> So I think the regenerate-opt-urls check does work as intended. So
> lets automate it, because it looks like nobody regenerated the
> url.opts after updating the documentation.
> 
> But we should first apply this diff. Could you double check it is
> sane/correct?

And then I forgot to attach the diff. Attached now.
Hopefully it is identical for you after doing
  make html && cd gcc && make regenerate-opt-urls
(It is for me having now done it on a debian and fedora x86_64 setup.)

Cheers,

Mark
diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index 5365c8e2bc5..9f97dc61a77 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -88,6 +88,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Wabsolute-value)
 Waddress
 UrlSuffix(gcc/Warning-Options.html#index-Waddress)
 
+Waddress-of-packed-member
+UrlSuffix(gcc/Warning-Options.html#index-Waddress-of-packed-member)
+
 Waligned-new
 UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Waligned-new)
 
@@ -115,6 +118,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Walloc-zero)
 Walloca-larger-than=
 UrlSuffix(gcc/Warning-Options.html#index-Walloca-larger-than_003d) LangUrlSuffix_D(gdc/Warnings.html#index-Walloca-larger-than)
 
+Warith-conversion
+UrlSuffix(gcc/Warning-Options.html#index-Warith-conversion)
+
 Warray-bounds=
 UrlSuffix(gcc/Warning-Options.html#index-Warray-bounds)
 
@@ -122,13 +128,10 @@ Warray-compare
 UrlSuffix(gcc/Warning-Options.html#index-Warray-compare)
 
 Warray-parameter
-UrlSuffix(gcc/Warning-Options.html#index-Wno-array-parameter)
+UrlSuffix(gcc/Warning-Options.html#index-Warray-parameter)
 
 Warray-parameter=
-UrlSuffix(gcc/Warning-Options.html#index-Wno-array-parameter)
-
-Wzero-length-bounds
-UrlSuffix(gcc/Warning-Options.html#index-Wzero-length-bounds)
+UrlSuffix(gcc/Warning-Options.html#index-Warray-parameter)
 
 Wassign-intercept
 UrlSuffix(gcc/Objective-C-and-Objective-C_002b_002b-Dialect-Options.html#index-Wassign-intercept)
@@ -148,9 +151,6 @@ UrlSuffix(gcc/Warning-Options.html#index-Wbool-compare)
 Wbool-operation
 UrlSuffix(gcc/Warning-Options.html#index-Wbool-operation)
 
-Wframe-address
-UrlSuffix(gcc/Warning-Options.html#index-Wframe-address)
-
 Wbuiltin-declaration-mismatch
 UrlSuffix(gcc/Warning-Options.html#index-Wbuiltin-declaration-mismatch) LangUrlSuffix_D(gdc/Warnings.html#index-Wbuiltin-declaration-mismatch)
 
@@ -217,6 +217,12 @@ UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wcatch-value)
 Wchar-subscripts
 UrlSuffix(gcc/Warning-Options.html#index-Wchar-subscripts)
 
+Wclass-conversion
+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wclass-conversion)
+
+Wclass-memaccess
+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wclass-memaccess)
+
 Wclobbered
 UrlSuffix(gcc/Warning-Options.html#index-Wclobbered)
 
@@ -298,6 +304,12 @@ UrlSuffix(gcc/Warning-Options.html#index-Wdiscarded-qualifiers)
 Wdiv-by-zero
 UrlSuffix(gcc/Warning-Options.html#index-Wdiv-by-zero)
 
+Wdouble-promotion
+UrlSuffix(gcc/Warning-Options.html#index-Wdouble-promotion)
+
+Wduplicate-decl-specifier
+UrlSuffix(gcc/Warning-Options.html#index-Wduplicate-decl-specifier)
+
 Wduplicated-branches
 UrlSuffix(gcc/Warning-Options.html#index-Wduplicated-branches)
 
@@ -307,6 +319,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Wduplicated-cond)
 Weffc++
 UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Weffc_002b_002b)
 
+Welaborated-enum-base
+UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Welaborated-enum-base)
+
 Wempty-body
 UrlSuffix(gcc/Warning-Options.html#index-Wempty-body)
 
@@ -328,12 +343,18 @@ UrlSuffix(gcc/Warning-Options.html#index-Werror) LangUrlSuffix_D(gdc/Warnings.ht
 Wexceptions
 UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wexceptions)
 
+Wexpansion-to-defined
+UrlSuffix(gcc/Warning-Options.html#index-Wexpansion-to-defined)
+
 Wextra
 UrlSuffix(gcc/Warning-Options.html#index-Wextra) LangUrlSuffix_D(gdc/Warnings.html#index-Wextra)
 
 Wextra-semi
 UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wextra-semi)
 
+Wflex-array-member-not-at-end
+UrlSuffix(gcc/Warning-Options.html#index-Wflex-array-member-not-at-end)
+
 Wfloat-conversion
 UrlSuffix(gcc/Warning-Options.html#index-Wfloat-conversion)
 
@@ -355,6 +376,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Wformat-nonliteral)
 Wformat-overflow
 UrlSuffix(gcc/Warning-Options.html#index-Wformat-overflow)
 
+Wformat-overflow=
+UrlSuffix(gcc/Warning-Options.html#index-Wformat-overflow)
+
 Wformat-security
 UrlSuffix(gcc/Warning-Options.html#index-Wformat-security)
 
@@ -364,6 +388,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Wformat-signedness)
 Wformat-truncation
 UrlSuffix(gcc/Warning-Options.html#index-Wformat-truncation)
 
+Wformat-truncation=
+UrlSuffix(gcc/Warning-Options.html#index-Wformat-truncation)
+
 Wformat-y2k
 UrlSuffix(gcc/Warning-Options.html#index-Wformat-y2k)
 
@@ -373,14 +400,8 @@ UrlSuffix(gcc/Warning-Options.html#index-Wformat-zero-length)
 Wformat=

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Biener

On Mon, 19 Feb 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following tries to address the PHI insertion compile-time hog in
> > RTL fwprop observed with the PR54052 testcase where the loop computing
> > the "unfiltered" set of variables possibly needing PHI nodes for each
> > block exhibits quadratic compile-time and memory-use.
> >
> > Instead of only pruning the set of candidate regs by LR_IN in the
> > second worklist loop do this when computing "unfiltered" already.
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > I'll note that in PR98863 you say in comment#39
> >
> > "Just to give an update on this: I have a patch that reduces the
> > amount of memory consumed by fwprop so that it no longer seems
> > to be outlier.  However, it involves doing more bitmap operations.
> > In this testcase we have a larger number of registers that
> > seem to be live but unused across a large region of code,
> > so bitmap ANDs with the live in sets are expensive and hit
> > the worst-case O(nblocksnregisters).  I'm still trying to find
> > a way of reducing the effect of that."
> >
> > suggesting that the very AND operation I'm introducing below
> > was an actual problem.  It's just not very clear what testcase
> > this was on (the PR hasn't one, it just talks about WRF with LTO
> > and then some individual TUs of it).
> 
> Yeah, like you say, I think this kind of AND was exactly the problem.
> If the DEF set is much smaller than the IN set, we can spend a lot of
> compile time (and cache) iterating over the leading elements of the
> IN set.  So this could be trading one hog for another.
> 
> Could we use some heuristic to choose between the two?  If the IN set
> is "sensible", do the AND, otherwise keep it as-is?

Not sure how, I don't think DF caches the set sizes (we could
compute them, of course).  But I just made an experiment and
using DF_LR_OUT instead of DF_LR_BB_INFO->def improves compile-time
as well.  So incremental ontop of the posted:

diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index 9d1cd1b0365..6fd08602c1b 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -645,12 +645,11 @@ function_info::place_phis (build_info )
   if (bitmap_empty_p ([b1]))
continue;
 
-  bitmap b1_def = _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
b1))->def;
+  bitmap b1_def = DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1));
   bitmap_iterator bmi;
   unsigned int b2;
   EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
-   if (bitmap_ior_and_into ([b2], b1_def,
-DF_LR_IN (BASIC_BLOCK_FOR_FN (m_fn, b2)))
+   if (bitmap_ior_into ([b2], b1_def)
&& !bitmap_empty_p ([b2]))
  // Propagate the (potential) new phi node definitions in B2.
  bitmap_set_bit (worklist, b2);

of course that's too big (including live-through), but we could
prune by DF_LR_OUT like

diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index 9d1cd1b0365..6a4dd05908f 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -645,12 +645,13 @@ function_info::place_phis (build_info )
   if (bitmap_empty_p ([b1]))
continue;
 
-  bitmap b1_def = _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
b1))->def;
+  auto_bitmap b1_def;
+  bitmap_and (b1_def, _LR_BB_INFO (BASIC_BLOCK_FOR_FN (m_fn, 
b1))->def,
+ DF_LR_OUT (BASIC_BLOCK_FOR_FN (m_fn, b1)));
   bitmap_iterator bmi;
   unsigned int b2;
   EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
-   if (bitmap_ior_and_into ([b2], b1_def,
-DF_LR_IN (BASIC_BLOCK_FOR_FN (m_fn, b2)))
+   if (bitmap_ior_into ([b2], b1_def)
&& !bitmap_empty_p ([b2]))
  // Propagate the (potential) new phi node definitions in B2.
  bitmap_set_bit (worklist, b2);

so for the testcase it seems we have a lot of local defined but
not "global" used defs.

Would that work for you or am I missing something?

Thanks,
Richard.

[committed] d: Add UTF BOM tests to gdc.dg testsuite

2024-02-19 Thread Iain Buclaw

Hi,

This patch checks in a few combinations of UTF BOM/no-BOM tests to the
gdc.dg testsuite.

Some of these are part of the upstream DMD `gdc.test' testsuite, but
they had been omitted because they get mangled by the lib/gdc-utils.exp
helpers when parsing and staging the tests. Translate them over to the
gdc.dg testsuite instead.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
gcc/testsuite/ChangeLog:

* gdc.dg/bom_UTF16BE.d: New test.
* gdc.dg/bom_UTF16LE.d: New test.
* gdc.dg/bom_UTF32BE.d: New test.
* gdc.dg/bom_UTF32LE.d: New test.
* gdc.dg/bom_UTF8.d: New test.
* gdc.dg/bom_characters.d: New test.
* gdc.dg/bom_error_UTF8.d: New test.
* gdc.dg/bom_infer_UTF16BE.d: New test.
* gdc.dg/bom_infer_UTF16LE.d: New test.
* gdc.dg/bom_infer_UTF32BE.d: New test.
* gdc.dg/bom_infer_UTF32LE.d: New test.
* gdc.dg/bom_infer_UTF8.d: New test.
---
 gcc/testsuite/gdc.dg/bom_UTF16BE.d   | Bin 0 -> 300 bytes
 gcc/testsuite/gdc.dg/bom_UTF16LE.d   | Bin 0 -> 300 bytes
 gcc/testsuite/gdc.dg/bom_UTF32BE.d   | Bin 0 -> 556 bytes
 gcc/testsuite/gdc.dg/bom_UTF32LE.d   | Bin 0 -> 556 bytes
 gcc/testsuite/gdc.dg/bom_UTF8.d  |  11 +++
 gcc/testsuite/gdc.dg/bom_characters.d| Bin 0 -> 780 bytes
 gcc/testsuite/gdc.dg/bom_error_UTF8.d|  11 +++
 gcc/testsuite/gdc.dg/bom_infer_UTF16BE.d | Bin 0 -> 298 bytes
 gcc/testsuite/gdc.dg/bom_infer_UTF16LE.d | Bin 0 -> 298 bytes
 gcc/testsuite/gdc.dg/bom_infer_UTF32BE.d | Bin 0 -> 552 bytes
 gcc/testsuite/gdc.dg/bom_infer_UTF32LE.d | Bin 0 -> 552 bytes
 gcc/testsuite/gdc.dg/bom_infer_UTF8.d|  11 +++
 12 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/bom_UTF16BE.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_UTF16LE.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_UTF32BE.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_UTF32LE.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_UTF8.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_characters.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_error_UTF8.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_infer_UTF16BE.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_infer_UTF16LE.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_infer_UTF32BE.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_infer_UTF32LE.d
 create mode 100644 gcc/testsuite/gdc.dg/bom_infer_UTF8.d

diff --git a/gcc/testsuite/gdc.dg/bom_UTF16BE.d 
b/gcc/testsuite/gdc.dg/bom_UTF16BE.d
new file mode 100644
index 
..f18cec9b1e597b96e27c8650bac552ccb5bde54e
GIT binary patch
literal 300
zcmY+9%?iRm41~Y4;5#fmwc_7dkG_T%-L6{cAGM-Te06np+kz5ynCv8z}60;*=6SPcuE2mmY*
zoRV8mEEf(^4KwD#Wr*a*d-Nz&=Xor5KeG#H)Z^oSLL^tGjq`BVL)eI??A0Hszu$c9
QZB*OpLchIXJ*eUOFJQ(j-2eap

literal 0
HcmV?d1

diff --git a/gcc/testsuite/gdc.dg/bom_UTF16LE.d 
b/gcc/testsuite/gdc.dg/bom_UTF16LE.d
new file mode 100644
index 
..e79a4ddbce1617edc38aa2b8aec61698d200461e
GIT binary patch
literal 300
zcmY+9%?iRm41~Y4;5#fmwW8Lu9{U$vtW9aoV
z%Mu$SJVOU(A{5r$q~yP3U5qaGLk69TzfZCv=f9>P{UW3T=|{ln%{
RZ>!o)7rN~Yn^7IE{{oAqEZqPA

literal 0
HcmV?d1

diff --git a/gcc/testsuite/gdc.dg/bom_UTF32BE.d 
b/gcc/testsuite/gdc.dg/bom_UTF32BE.d
new file mode 100644
index 
..eaf3b04b458fe51bf8b3726af5970ded60f19ad6
GIT binary patch
literal 556
zcmaLUxvD}j5P;#-KE+_A;#O;UU?I!3`G_#8+4UUrr$0!r~*zoS9^jlhW&*QtDNL
z@d3vfmgrRT17lzc=Q|v+#ujq~o~Xg^=DE)mWsdO)Hn7e;FBrmF8Nb80^Aq-H;15j<
zV6Hv*?-}nP0{itgX%cn}0^GF}}hW)&^Q=SMx4o=GkSh)i8#A*_}0JB|!
z?ZI#62JFlHJicIUZR+3rGg6K56~?eO&_cVKcNr7U?+z3BGxE_epWjwZ>k3U|I
uuje|s_U1eIj!OUIR?Y3%xbI!U`^iMzHA1qD)

literal 0
HcmV?d1

diff --git a/gcc/testsuite/gdc.dg/bom_UTF8.d b/gcc/testsuite/gdc.dg/bom_UTF8.d
new file mode 100644
index 000..f3e8af4eb38
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/bom_UTF8.d
@@ -0,0 +1,11 @@
+// { dg-do compile }
+module object;
+
+extern(C):
+int printf(const char *, ...);
+
+int main()
+{
+printf("hello world\n");
+return 0;
+}
diff --git a/gcc/testsuite/gdc.dg/bom_characters.d 
b/gcc/testsuite/gdc.dg/bom_characters.d
new file mode 100644
index 
..4b42b4c611ba7b2746db7df9f3ba19c760952ae9
GIT binary patch
literal 780
zcmbW##v3kP)nCCW)IL=_nmD43E|lBDznNEhmHjQV!*%7S~nKoyZSzRHXYqMsV_nyRa
iM|{5-WY~psx#)gdVZWN$q}}zS{5#*~AAZbl|LhCNt~kg5

literal 0
HcmV?d1

diff --git a/gcc/testsuite/gdc.dg/bom_error_UTF8.d 
b/gcc/testsuite/gdc.dg/bom_error_UTF8.d
new file mode 100644
index 000..0e47e59bda3
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/bom_error_UTF8.d
@@ -0,0 +1,11 @@
+// { dg-do compile }
+module object; // { dg-error "character 0xfeff is not a valid token" }
+
+extern(C):
+int printf(const char *, ...);
+
+int main()
+{
+printf("hello world\n");
+

RE: [PATCH]AArch64: xfail modes_1.f90 [PR107071]

2024-02-19 Thread Tamar Christina

> -Original Message-
> From: Tamar Christina
> Sent: Thursday, February 15, 2024 11:05 AM
> To: Richard Earnshaw (lists) ; gcc-
> patc...@gcc.gnu.org
> Cc: nd ; Marcus Shawcroft ; Kyrylo
> Tkachov ; Richard Sandiford
> 
> Subject: RE: [PATCH]AArch64: xfail modes_1.f90 [PR107071]
> 
> > -Original Message-
> > From: Richard Earnshaw (lists) 
> > Sent: Thursday, February 15, 2024 11:01 AM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: nd ; Marcus Shawcroft ;
> Kyrylo
> > Tkachov ; Richard Sandiford
> > 
> > Subject: Re: [PATCH]AArch64: xfail modes_1.f90 [PR107071]
> >
> > On 15/02/2024 10:57, Tamar Christina wrote:
> > > Hi All,
> > >
> > > This test has never worked on AArch64 since the day it was committed.  It 
> > > has
> > > a number of issues that prevent it from working on AArch64:
> > >
> > > 1.  IEEE does not require that FP operations raise a SIGFPE for FP 
> > > operations,
> > >     only that an exception is raised somehow.
> > >
> > > 2. Most Arm designed cores don't raise SIGFPE and instead set a status 
> > > register
> > >    and some partner cores raise a SIGILL instead.
> > >
> > > 3. The way it checks for feenableexcept doesn't really work for AArch64.
> > >
> > > As such this test doesn't seem to really provide much value on AArch64 so 
> > > we
> > > should just xfail it.
> > >
> > > Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > Wouldn't it be better to just skip the test.  XFAIL just adds clutter to 
> > verbose
> output
> > and suggests that someday the tools might be fixed for this case.
> >
> > Better still would be a new dg-requires fp_exceptions_raise_sigfpe as a 
> > guard for
> > the test.
> 

It looks like this is similar to 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 so
I'll just similarly skip it.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90 
b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
index 
205c47f38007d06116289c19d6b23cf3bf83bd48..e29d8c678e6e51c3f2e5dac53c7703bb18a99ac4
 100644
--- a/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
+++ b/gcc/testsuite/gfortran.dg/ieee/modes_1.f90
@@ -1,5 +1,5 @@
 ! { dg-do run }
-!
+! { dg-skip-if "PR libfortran/78314" { aarch64*-*-gnu* arm*-*-gnueabi 
arm*-*-gnueabihf } }
 ! Test IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES
 
Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR fortran/107071
* gfortran.dg/ieee/modes_1.f90: skip aarch64, arm.


rb18274.patch
Description: rb18274.patch

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Sandiford

Richard Biener  writes:
> The following tries to address the PHI insertion compile-time hog in
> RTL fwprop observed with the PR54052 testcase where the loop computing
> the "unfiltered" set of variables possibly needing PHI nodes for each
> block exhibits quadratic compile-time and memory-use.
>
> Instead of only pruning the set of candidate regs by LR_IN in the
> second worklist loop do this when computing "unfiltered" already.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> I'll note that in PR98863 you say in comment#39
>
> "Just to give an update on this: I have a patch that reduces the
> amount of memory consumed by fwprop so that it no longer seems
> to be outlier.  However, it involves doing more bitmap operations.
> In this testcase we have a larger number of registers that
> seem to be live but unused across a large region of code,
> so bitmap ANDs with the live in sets are expensive and hit
> the worst-case O(nblocksnregisters).  I'm still trying to find
> a way of reducing the effect of that."
>
> suggesting that the very AND operation I'm introducing below
> was an actual problem.  It's just not very clear what testcase
> this was on (the PR hasn't one, it just talks about WRF with LTO
> and then some individual TUs of it).

Yeah, like you say, I think this kind of AND was exactly the problem.
If the DEF set is much smaller than the IN set, we can spend a lot of
compile time (and cache) iterating over the leading elements of the
IN set.  So this could be trading one hog for another.

Could we use some heuristic to choose between the two?  If the IN set
is "sensible", do the AND, otherwise keep it as-is?

> Indeed the patch doesn't do anything about quadraticness but
> it seems to effectively reduce the size of 'unfiltered' (but
> bringing in LR_IN into the workset).

Yeah, this is always going to be O(blocks * registers) in the worst case.

[Was just in the process of replying to the bugzilla ticket when
this patch arrived :) ]

Thanks,
Richard

[PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-02-19 Thread Ajit Agarwal

Hello All:

Changes in V3 since V2 patch.

Fdllowing changes are done in this patch.

a) Remove commented asserted code in rtl-ssa/changes.cc
b) Handle such code in rs6000-vecload-fusion.cc.

Same as V2:

Common infrastructure using generic code for load store fusion of rs6000
target.

Generic code are implemented and defined  that can be used in target specific
code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-vecload-fusion.cc that uses generic 
code.

Bootstrapped and regtested in powerpc64-linux-gnu.

Also ran spec cpu2017 benchmarks.

rs6000: Load store fusion for rs6000 target using common infrastructure

Common infrastructure using generic code for load store fusion of rs6000
target.

Generic code are implemented and defined  that can be used in target specific
code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-vecload-fusion.cc that uses generic 
code.

2024-02-19  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New vecload fusion pass
before pass_early_remat.
* config/rs6000/rs6000-vecload-fusion.cc: Add new pass.
Add target specific implementation using pure virtual
functions.
* config.gcc: Add new executable.
* config/rs6000/rs6000-protos.h: Add new prototype for vecload
fusion pass.
* config/rs6000/rs6000.cc: Add new prototype for vecload fusion
pass.
* config/rs6000/t-rs6000: Add new rule.
* pair-fusion-base.h: Generic header code for load store fusion
that can be shared across different architectures.
* pair-fusion-common.cc: Generic source code for load store
fusion that can be shared across different architectures.
* pair-fusion.cc: Generic implementation of pair_fusion class
defined in pair-fusion-base.h
* Makefile.in: Add new executable pair-fusion.o and
pair-fusion-common.o.
* rtl-ssa/accesses.h: Moved set_is_live_out_use as public
from private.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/vecload-fusion.C: New test.
* g++.target/powerpc/vecload-fusion_1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/Makefile.in   |2 +
 gcc/config.gcc|4 +-
 gcc/config/rs6000/rs6000-passes.def   |4 +-
 gcc/config/rs6000/rs6000-protos.h |1 +
 gcc/config/rs6000/rs6000-vecload-fusion.cc|  701 ++
 gcc/config/rs6000/rs6000.cc   |1 +
 gcc/config/rs6000/t-rs6000|5 +
 gcc/pair-fusion-base.h|  586 
 gcc/pair-fusion-common.cc | 1202 
 gcc/pair-fusion.cc| 1225 +
 gcc/rtl-ssa/accesses.h|2 +-
 .../g++.target/powerpc/vecload-fusion.C   |   15 +
 .../g++.target/powerpc/vecload-fusion_1.C |   22 +
 .../gcc.target/powerpc/mma-builtin-1.c|4 +-
 14 files changed, 3768 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-vecload-fusion.cc
 create mode 100644 gcc/pair-fusion-base.h
 create mode 100644 gcc/pair-fusion-common.cc
 create mode 100644 gcc/pair-fusion.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload-fusion.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload-fusion_1.C

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a74761b7ab3..df5061ddfe7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1563,6 +1563,8 @@ OBJS = \
ipa-strub.o \
ipa.o \
ira.o \
+   pair-fusion-common.o \
+   pair-fusion.o \
ira-build.o \
ira-costs.o \
ira-conflicts.o \
diff --git a/gcc/config.gcc b/gcc/config.gcc
index a0f9c672308..d696d826df8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -518,7 +518,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-fusion.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
@@ -555,7 +555,7 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Richard Biener

On Mon, 19 Feb 2024, Thomas Schwinge wrote:

> Hi!
> 
> On 2024-02-16T14:53:04+0100, I wrote:
> > On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
> >> On 16/02/2024 12:26, Richard Biener wrote:
> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>  On 16/02/2024 10:17, Richard Biener wrote:
> > On Fri, 16 Feb 2024, Thomas Schwinge wrote:
> >> On 2023-10-20T12:51:03+0100, Andrew Stubbs  
> >> wrote:
> >>> I've committed this patch
> >>
> >> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
> >> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later 
> >> RDNA3/gfx1100
> >> support builds on top of, and that's what I'm currently working on
> >> getting proper GCC/GCN target (not offloading) results for.
> >>
> >> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably 
> >> simple,
> >> and hopefully representative for other SLP execution test FAILs
> >> (regressions compared to my earlier non-gfx1100 testing).
> >>
> >>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
> >>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> >>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
> >>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
> >> -fno-common
> >>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
> >>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
> >>   source-gcc/newlib/libc/include
> >>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
> >>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
> >>   setarch,--addr-no-randomize -fdump-tree-all-all 
> >> -fdump-ipa-all-all
> >>   -fdump-rtl-all-all -save-temps -march=gfx1100
> >>
> >> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
> >> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I
> >> suppose will also exhibit the same failure mode, once again?
> >>
> >> Compared to '-march=gfx90a', the differences begin in
> >> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
> >>
> >> Changed like:
> >>
> >>   @@ -38,10 +38,10 @@ int main ()
> >>#pragma GCC novector
> >>  for (i = 1; i < N; i++)
> >>if (a[i] != i%4 + 1)
> >>   -  abort ();
> >>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
> >>
> >>  if (a[0] != 5)
> >>   -abort ();
> >>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
> >>
> >> ..., we see:
> >>
> >>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
> >>   40 5 != 1
> >>   41 6 != 2
> >>   42 7 != 3
> >>   43 8 != 4
> >>   44 5 != 1
> >>   45 6 != 2
> >>   46 7 != 3
> >>   47 8 != 4
> >>
> >> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
> >> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
> >> scribbled zero values over these (vector lane masking issue, perhaps?),
> >> or some other code generation issue?
> >
>  [...], I must be doing something different because vect/bb-slp-cond-1.c
>  passes for me, on gfx1100.
> >
> > That's strange.  I've looked at your log file (looks good), and used your
> > toolchain to compile, and your 'gcn-run' to invoke, and still do get:
> >
> > $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
> > GCN Kernel Aborted
> > Kernel aborted
> >
> > Andrew, later on, please try what happens when you put an unconditional
> > 'abort' call into a test case?
> 
> Andrew, any luck with that yet?
> 
> Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c'
> execution test failure mentioned above (manual compilation and
> 'gcn-run')?

No, when manually compiling/running the testcase it works fine for me.
Didn't yet get to try the .exp files

Richard.

> 
> Gr??e
>  Thomas
> 
> 
> >>> I didn't try to run it - when doing make check-gcc fails to using
> >>> gcn-run for test invocation
> >
> > Note, that for such individual test cases, invoking the compiler and then
> > 'gcn-run' manually would seem easiest?
> >
> >>> what's the trick to make it do that?
> >
> > I tell you've probably not done much "embedded" or simulator testing of
> > GCC targets?  ;-P
> >
> >> There's a config file for nvptx here: 
> >> https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp
> >
> > Yes, and I have pending some updates to that one, to be finished once
> > I've generally got my testing set up again, to a sufficient degree...
> >
> >> You can probably make the obvious adjustments. I think Thomas has a GCN 
> >> version with a few more features.
> >
> > Right.  I'm attaching my current 'amdgcn-amdhsa-run.exp'.
> >
> > I'm aware that the 'set_board_info gcc,[...] [...]' may be obsolete/wrong
> > (as Andrew also noted

[PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Biener

The following tries to address the PHI insertion compile-time hog in
RTL fwprop observed with the PR54052 testcase where the loop computing
the "unfiltered" set of variables possibly needing PHI nodes for each
block exhibits quadratic compile-time and memory-use.

Instead of only pruning the set of candidate regs by LR_IN in the
second worklist loop do this when computing "unfiltered" already.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'll note that in PR98863 you say in comment#39

"Just to give an update on this: I have a patch that reduces the
amount of memory consumed by fwprop so that it no longer seems
to be outlier.  However, it involves doing more bitmap operations.
In this testcase we have a larger number of registers that
seem to be live but unused across a large region of code,
so bitmap ANDs with the live in sets are expensive and hit
the worst-case O(nblocksnregisters).  I'm still trying to find
a way of reducing the effect of that."

suggesting that the very AND operation I'm introducing below
was an actual problem.  It's just not very clear what testcase
this was on (the PR hasn't one, it just talks about WRF with LTO
and then some individual TUs of it).

Indeed the patch doesn't do anything about quadraticness but
it seems to effectively reduce the size of 'unfiltered' (but
bringing in LR_IN into the workset).

For the PR54052 testcase this improves compile-time from 1420s
to 64s and slightly reduces peak memory use (I would have expected
more but didn't do any statistics on the 'unfiltered' bitmaps
themselves).

OK for trunk and branches if tests go well?

Thanks,
Richard.

PR rtl-optimization/54052
* rtl-ssa/blocks.cc (function_info::place_phis): Filter
unfiltered by LR_IN.
---
 gcc/rtl-ssa/blocks.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index 8996443e8d5..9d1cd1b0365 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -649,7 +649,8 @@ function_info::place_phis (build_info )
   bitmap_iterator bmi;
   unsigned int b2;
   EXECUTE_IF_SET_IN_BITMAP ([b1], 0, b2, bmi)
-   if (bitmap_ior_into ([b2], b1_def)
+   if (bitmap_ior_and_into ([b2], b1_def,
+DF_LR_IN (BASIC_BLOCK_FOR_FN (m_fn, b2)))
&& !bitmap_empty_p ([b2]))
  // Propagate the (potential) new phi node definitions in B2.
  bitmap_set_bit (worklist, b2);
-- 
2.35.3

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-19 Thread Thomas Schwinge

Hi!

On 2024-02-16T14:53:04+0100, I wrote:
> On 2024-02-16T12:41:06+, Andrew Stubbs  wrote:
>> On 16/02/2024 12:26, Richard Biener wrote:
>>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
 On 16/02/2024 10:17, Richard Biener wrote:
> On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>> On 2023-10-20T12:51:03+0100, Andrew Stubbs  wrote:
>>> I've committed this patch
>>
>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later RDNA3/gfx1100
>> support builds on top of, and that's what I'm currently working on
>> getting proper GCC/GCN target (not offloading) results for.
>>
>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably simple,
>> and hopefully representative for other SLP execution test FAILs
>> (regressions compared to my earlier non-gfx1100 testing).
>>
>>   $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>>   source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>>   --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>>   -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common
>>   -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem
>>   build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>>   source-gcc/newlib/libc/include
>>   -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>>   -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>>   setarch,--addr-no-randomize -fdump-tree-all-all -fdump-ipa-all-all
>>   -fdump-rtl-all-all -save-temps -march=gfx1100
>>
>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so I
>> suppose will also exhibit the same failure mode, once again?
>>
>> Compared to '-march=gfx90a', the differences begin in
>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'.
>>
>> Changed like:
>>
>>   @@ -38,10 +38,10 @@ int main ()
>>#pragma GCC novector
>>  for (i = 1; i < N; i++)
>>if (a[i] != i%4 + 1)
>>   -  abort ();
>>   +  __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>>
>>  if (a[0] != 5)
>>   -abort ();
>>   +__builtin_printf("%d %d != %d\n", 0, a[0], 5);
>>
>> ..., we see:
>>
>>   $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>>   40 5 != 1
>>   41 6 != 2
>>   42 7 != 3
>>   43 8 != 4
>>   44 5 != 1
>>   45 6 != 2
>>   46 7 != 3
>>   47 8 != 4
>>
>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
>> 'a[i * stride + 0..3] != 0'.  So, either some earlier iteration has
>> scribbled zero values over these (vector lane masking issue, perhaps?),
>> or some other code generation issue?
>
 [...], I must be doing something different because vect/bb-slp-cond-1.c
 passes for me, on gfx1100.
>
> That's strange.  I've looked at your log file (looks good), and used your
> toolchain to compile, and your 'gcn-run' to invoke, and still do get:
>
> $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
> GCN Kernel Aborted
> Kernel aborted
>
> Andrew, later on, please try what happens when you put an unconditional
> 'abort' call into a test case?

Andrew, any luck with that yet?

Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c'
execution test failure mentioned above (manual compilation and
'gcn-run')?


Grüße
 Thomas


>>> I didn't try to run it - when doing make check-gcc fails to using
>>> gcn-run for test invocation
>
> Note, that for such individual test cases, invoking the compiler and then
> 'gcn-run' manually would seem easiest?
>
>>> what's the trick to make it do that?
>
> I tell you've probably not done much "embedded" or simulator testing of
> GCC targets?  ;-P
>
>> There's a config file for nvptx here: 
>> https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp
>
> Yes, and I have pending some updates to that one, to be finished once
> I've generally got my testing set up again, to a sufficient degree...
>
>> You can probably make the obvious adjustments. I think Thomas has a GCN 
>> version with a few more features.
>
> Right.  I'm attaching my current 'amdgcn-amdhsa-run.exp'.
>
> I'm aware that the 'set_board_info gcc,[...] [...]' may be obsolete/wrong
> (as Andrew also noted privately) -- likewise, at least in part, for
> GCC/nvptx, which is where I copied all that from.  (Will revise later;
> not relevant for this discussion, here.)
>
> Similar to what I've recently added to libgomp, there is 'flock'ing here,
> so that you may use 'make -j[...] check' for (partial) parallelism, but
> still all execution testing runs serialized.  I found this to greatly
> help denoise the test results.  (Not ideal, of course,

Re: [PATCH] arm: Fixed C23 call compatibility with arm-none-eabi

2024-02-19 Thread Andrew Pinski

On Mon, Feb 19, 2024 at 1:14 AM Torbjörn SVENSSON
 wrote:
>
> Ok for trunk and releases/gcc-13?
> Regtested on top of 945cb8490cb for arm-none-eabi, without any regression.
>
> Backporting to releases/gcc-13 will change -std=c23 to -std=c2x.
>
> --
>
> In commit 4fe34cdcc80ac225b80670eabc38ac5e31ce8a5a, -std=c23 support was
> introduced to support functions without any named arguments.  For
> arm-none-eabi, this is not as simple as placing all arguments on the
> stack.  Align the caller to use r0, r1, r2 and r3 for arguments even for
> functions without any named arguments, as specified in the AAPCS.
>
> Verify that the generic test case have the arguments are in the right
> order and add ARM specific test cases.
>
> gcc/ChangeLog:
>
> * calls.h: Added the type of the function to function_arg_info.
> * calls.cc: Save the type of the function.
> * config/arm/arm.cc: Check in the AAPCS layout function if
> function has no named args.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/c23-stdarg-split-1a.c: Detect out of order
> arguments.
> * gcc.dg/torture/c23-stdarg-split-1b.c: Likewise.

It is almost always better to add a new testcase for the expanded idea
of the test rather than modifying the original.

Thanks,
Andrew Pinski

> * gcc.target/arm/aapcs/align_vaarg3.c: New test.
> * gcc.target/arm/aapcs/align_vaarg4.c: New test.
>
> Signed-off-by: Torbjörn SVENSSON 
> Co-authored-by: Yvan ROUX 
> ---
>  gcc/calls.cc  |  2 +-
>  gcc/calls.h   | 20 --
>  gcc/config/arm/arm.cc | 13 ---
>  .../gcc.dg/torture/c23-stdarg-split-1a.c  |  4 +-
>  .../gcc.dg/torture/c23-stdarg-split-1b.c  | 15 +---
>  .../gcc.target/arm/aapcs/align_vaarg3.c   | 37 +++
>  .../gcc.target/arm/aapcs/align_vaarg4.c   | 31 
>  7 files changed, 102 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/align_vaarg3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/align_vaarg4.c
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 01f44734743..a1cc283b952 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1376,7 +1376,7 @@ initialize_argument_information (int num_actuals 
> ATTRIBUTE_UNUSED,
>  with those made by function.cc.  */
>
>/* See if this argument should be passed by invisible reference.  */
> -  function_arg_info arg (type, argpos < n_named_args);
> +  function_arg_info arg (type, fntype, argpos < n_named_args);
>if (pass_by_reference (args_so_far_pnt, arg))
> {
>   const bool callee_copies
> diff --git a/gcc/calls.h b/gcc/calls.h
> index 464a4e34e33..88836559ebe 100644
> --- a/gcc/calls.h
> +++ b/gcc/calls.h
> @@ -35,24 +35,33 @@ class function_arg_info
>  {
>  public:
>function_arg_info ()
> -: type (NULL_TREE), mode (VOIDmode), named (false),
> +: type (NULL_TREE), fntype (NULL_TREE), mode (VOIDmode), named (false),
>pass_by_reference (false)
>{}
>
>/* Initialize an argument of mode MODE, either before or after promotion.  
> */
>function_arg_info (machine_mode mode, bool named)
> -: type (NULL_TREE), mode (mode), named (named), pass_by_reference (false)
> +: type (NULL_TREE), fntype (NULL_TREE), mode (mode), named (named),
> +pass_by_reference (false)
>{}
>
>/* Initialize an unpromoted argument of type TYPE.  */
>function_arg_info (tree type, bool named)
> -: type (type), mode (TYPE_MODE (type)), named (named),
> +: type (type), fntype (NULL_TREE), mode (TYPE_MODE (type)), named 
> (named),
>pass_by_reference (false)
>{}
>
> +  /* Initialize an unpromoted argument of type TYPE with a known function 
> type
> + FNTYPE.  */
> +  function_arg_info (tree type, tree fntype, bool named)
> +: type (type), fntype (fntype), mode (TYPE_MODE (type)), named (named),
> +pass_by_reference (false)
> +  {}
> +
>/* Initialize an argument with explicit properties.  */
>function_arg_info (tree type, machine_mode mode, bool named)
> -: type (type), mode (mode), named (named), pass_by_reference (false)
> +: type (type), fntype (NULL_TREE), mode (mode), named (named),
> +pass_by_reference (false)
>{}
>
>/* Return true if the gimple-level type is an aggregate.  */
> @@ -96,6 +105,9 @@ public:
>   libgcc support functions).  */
>tree type;
>
> +  /* The type of the function that has this argument, or null if not known.  
> */
> +  tree fntype;
> +
>/* The mode of the argument.  Depending on context, this might be
>   the mode of the argument type or the mode after promotion.  */
>machine_mode mode;
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 1cd69268ee9..98e149e5b7e 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -7006,7 +7006,7 @@ aapcs_libcall_value

Re: [PATCH v2] testsuite, arm: Fix testcase arm/pr112337.c to check for the options first

2024-02-19 Thread Saurabh Jha



On 2/9/2024 2:57 PM, Richard Earnshaw (lists) wrote:

On 30/01/2024 17:07, Saurabh Jha wrote:

Hey,

Previously, this test was added to fix this bug: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112337. However, it did not check 
the compilation options before using them, leading to errors.

This patch fixes the test by first checking whether it can use the options 
before using them.

Tested for arm-none-eabi and found no regressions. The output of check-gcc with 
RUNTESTFLAGS="arm.exp=*" changed like this:

Before:
# of expected passes  5963
# of unexpected failures  64

After:
# of expected passes  5964
# of unexpected failures  63

Ok for master?

Regards,
Saurabh

gcc/testsuite/ChangeLog:

 * gcc.target/arm/pr112337.c: Check whether we can use the compilation 
options before using them.

My apologies for missing this earlier.  It didn't show up in patchwork. That's 
most likely because the attachment is a binary blob instead of text/plain.  
That also means that the Linaro CI system hasn't seen this patch either.  
Please can you fix your mailer to add plain text patch files.

-/* { dg-options "-O2 -march=armv8.1-m.main+fp.dp+mve.fp -mfloat-abi=hard" } */
+/* { dg-require-effective-target arm_hard_ok } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-options "-O2 -mfloat-abi=hard" } */
+/* { dg-add-options arm_v8_1m_mve } */

This is moving in the right direction, but it adds more than necessary now: 
checking for, and adding -mfloat-abi=hard is not necessary any more as 
arm_v8_1m_mve_ok will work out what float-abi flags are needed to make the 
options work. (What's more, it will prevent the test from running if the base 
configuration of the compiler is incompatible with the hard float ABI, which is 
more than we need.).

So please can you re-spin removing the hard-float check and removing that from 
dg-options.

Thanks,
R.


Hi Richard,

Agreed with your comments. Please find the patch with the suggested 
changes attached.


Regards,

Saurabh

From 1c92c94074449929f40cea99a6450bcde3aec12f Mon Sep 17 00:00:00 2001
From: Saurabh Jha 
Date: Tue, 30 Jan 2024 15:03:36 +
Subject: [PATCH] Fix testcase pr112337.c to check the options first

---
 gcc/testsuite/gcc.target/arm/pr112337.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr112337.c 
b/gcc/testsuite/gcc.target/arm/pr112337.c
index 5dacf0aa4f8..10b7881b9f9 100644
--- a/gcc/testsuite/gcc.target/arm/pr112337.c
+++ b/gcc/testsuite/gcc.target/arm/pr112337.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=armv8.1-m.main+fp.dp+mve.fp -mfloat-abi=hard" } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
 
 #pragma GCC arm "arm_mve_types.h"
 int32x4_t h(void *p) { return __builtin_mve_vldrwq_sv4si(p); }
-- 
2.43.0

Re: [PATCH] aarch64, acle header: Cast uint64_t pointers to DIMode.

2024-02-19 Thread Richard Sandiford

Iain Sandoe  writes:
>> On 15 Feb 2024, at 18:05, Richard Sandiford  
>> wrote:
>> 
>> Iain Sandoe  writes:
 On 5 Feb 2024, at 14:56, Iain Sandoe  wrote:
 
 Tested on aarch64-linux,darwin and a cross from aarch64-darwin to linux,
 OK for trunk, or some alternative is needed?
>>> 
>>> Hmm.. apparently, this fails the linaro pre-commit CI for g++ with:
>>> error: invalid conversion from 'long int*' to 'long unsigned int*' 
>>> [-fpermissive]
>>> 
>>> So, I guess some alternative is needed, advice welcome,
>> 
>> The builtins are registered with:
>> 
>> static void
>> aarch64_init_rng_builtins (void)
>> {
>>  tree unsigned_ptr_type = build_pointer_type (unsigned_intDI_type_node);
>>  ...
>> 
>> Does it work if you change unsigned_intDI_type_node to
>> get_typenode_from_name (UINT64_TYPE)?
>
> Yes, that works fine; tested on aarch64-linux and aarch64-darwin.
>
> revised, as below,
> OK for trunk?
> Iain
>
>
> Subject: [PATCH] aarch64: Register rng builtins with uint64_t pointers.
>
> Currently, these are registered as unsigned_intDI_type_node which is not
> necessarily the same type definition as uint64_t.  On platforms where these
> differ that causes fails in consuming the arm_acle.h header.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (aarch64_init_rng_builtins):
>   Register these builtins with a pointer to uint64_t rather than unsigned
>   DI mode.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-builtins.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index e211a7271ba..1330558f109 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1759,7 +1759,8 @@ aarch64_init_tme_builtins (void)
>  static void
>  aarch64_init_rng_builtins (void)
>  {
> -  tree unsigned_ptr_type = build_pointer_type (unsigned_intDI_type_node);
> +  tree unsigned_ptr_type
> += build_pointer_type (get_typenode_from_name (UINT64_TYPE));
>tree ftype
>  = build_function_type_list (integer_type_node, unsigned_ptr_type, NULL);
>aarch64_builtin_decls[AARCH64_BUILTIN_RNG_RNDR]

[PATCH] Do not emulate vectors containing floats.

2024-02-19 Thread Juergen Christ

Fixes various test failures on s390x.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_operation): Don't emulate floating
  point vectors

Signed-off-by: Juergen Christ 

Regtested and bootstrapped on x86_64-pc-linux-gnu and
s390x-ibm-linux-gnu.  Okay for trunk?

---
 gcc/tree-vect-stmts.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 09749ae38174..4164f254fd6e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6756,7 +6756,8 @@ vectorizable_operation (vec_info *vinfo,
 those through even when the mode isn't word_mode.  For
 ops we have to lower the lowering code assumes we are
 dealing with word_mode.  */
-  if code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
+  if (FLOAT_MODE_P (vec_mode)
+ || (((code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
|| !target_support_p)
   && maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD))
  /* Check only during analysis.  */
-- 
2.39.3

[PATCH] arm: Fixed C23 call compatibility with arm-none-eabi

2024-02-19 Thread Torbjörn SVENSSON

Ok for trunk and releases/gcc-13?
Regtested on top of 945cb8490cb for arm-none-eabi, without any regression.

Backporting to releases/gcc-13 will change -std=c23 to -std=c2x.

--

In commit 4fe34cdcc80ac225b80670eabc38ac5e31ce8a5a, -std=c23 support was
introduced to support functions without any named arguments.  For
arm-none-eabi, this is not as simple as placing all arguments on the
stack.  Align the caller to use r0, r1, r2 and r3 for arguments even for
functions without any named arguments, as specified in the AAPCS.

Verify that the generic test case have the arguments are in the right
order and add ARM specific test cases.

gcc/ChangeLog:

* calls.h: Added the type of the function to function_arg_info.
* calls.cc: Save the type of the function.
* config/arm/arm.cc: Check in the AAPCS layout function if
function has no named args.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/c23-stdarg-split-1a.c: Detect out of order
arguments.
* gcc.dg/torture/c23-stdarg-split-1b.c: Likewise.
* gcc.target/arm/aapcs/align_vaarg3.c: New test.
* gcc.target/arm/aapcs/align_vaarg4.c: New test.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/calls.cc  |  2 +-
 gcc/calls.h   | 20 --
 gcc/config/arm/arm.cc | 13 ---
 .../gcc.dg/torture/c23-stdarg-split-1a.c  |  4 +-
 .../gcc.dg/torture/c23-stdarg-split-1b.c  | 15 +---
 .../gcc.target/arm/aapcs/align_vaarg3.c   | 37 +++
 .../gcc.target/arm/aapcs/align_vaarg4.c   | 31 
 7 files changed, 102 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/align_vaarg3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/align_vaarg4.c

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 01f44734743..a1cc283b952 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -1376,7 +1376,7 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
 with those made by function.cc.  */
 
   /* See if this argument should be passed by invisible reference.  */
-  function_arg_info arg (type, argpos < n_named_args);
+  function_arg_info arg (type, fntype, argpos < n_named_args);
   if (pass_by_reference (args_so_far_pnt, arg))
{
  const bool callee_copies
diff --git a/gcc/calls.h b/gcc/calls.h
index 464a4e34e33..88836559ebe 100644
--- a/gcc/calls.h
+++ b/gcc/calls.h
@@ -35,24 +35,33 @@ class function_arg_info
 {
 public:
   function_arg_info ()
-: type (NULL_TREE), mode (VOIDmode), named (false),
+: type (NULL_TREE), fntype (NULL_TREE), mode (VOIDmode), named (false),
   pass_by_reference (false)
   {}
 
   /* Initialize an argument of mode MODE, either before or after promotion.  */
   function_arg_info (machine_mode mode, bool named)
-: type (NULL_TREE), mode (mode), named (named), pass_by_reference (false)
+: type (NULL_TREE), fntype (NULL_TREE), mode (mode), named (named),
+pass_by_reference (false)
   {}
 
   /* Initialize an unpromoted argument of type TYPE.  */
   function_arg_info (tree type, bool named)
-: type (type), mode (TYPE_MODE (type)), named (named),
+: type (type), fntype (NULL_TREE), mode (TYPE_MODE (type)), named (named),
   pass_by_reference (false)
   {}
 
+  /* Initialize an unpromoted argument of type TYPE with a known function type
+ FNTYPE.  */
+  function_arg_info (tree type, tree fntype, bool named)
+: type (type), fntype (fntype), mode (TYPE_MODE (type)), named (named),
+pass_by_reference (false)
+  {}
+
   /* Initialize an argument with explicit properties.  */
   function_arg_info (tree type, machine_mode mode, bool named)
-: type (type), mode (mode), named (named), pass_by_reference (false)
+: type (type), fntype (NULL_TREE), mode (mode), named (named),
+pass_by_reference (false)
   {}
 
   /* Return true if the gimple-level type is an aggregate.  */
@@ -96,6 +105,9 @@ public:
  libgcc support functions).  */
   tree type;
 
+  /* The type of the function that has this argument, or null if not known.  */
+  tree fntype;
+
   /* The mode of the argument.  Depending on context, this might be
  the mode of the argument type or the mode after promotion.  */
   machine_mode mode;
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 1cd69268ee9..98e149e5b7e 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -7006,7 +7006,7 @@ aapcs_libcall_value (machine_mode mode)
numbers referred to here are those in the AAPCS.  */
 static void
 aapcs_layout_arg (CUMULATIVE_ARGS *pcum, machine_mode mode,
- const_tree type, bool named)
+ const_tree type, bool named, const_tree fntype)
 {
   int nregs, nregs2;
   int ncrn;
@@ -7018,8 +7018,9 @@ aapcs_layout_arg (CUMULATIVE_ARGS *pcum, machine_mode 
mode,
   pcum->aapcs_arg_processed = true;
 
   /*

Re: [PATCH] libstdc++, Darwin: Handle a linker warning [PR112397].

2024-02-19 Thread Jonathan Wakely

On Sun, 18 Feb 2024, 16:15 Iain Sandoe,  wrote:

> Tested on i686-darwin9, x86_64-darwin14,17,19,21,23, x86_64-linux,
> aarch64-linux-gnu,
>
> OK for trunk?
> eventual back-ports?
>

Yup, ok for all.


thanks
> Iain
>
> --- 8< ---
>
> Darwin's linker warns when we make a direct branch to code that is
> in a weak definition (citing that if a different implementation of
> the weak function is chosen by the dynamic linker this would be an
> error).
>
> As the analysis in the PR shows, this can happen when we have hot/
> cold partitioning and there is an error path that is primarily cold
> but makes use of epilogue code in the hot section.  In this simple
> case, we can easily deduce that the code is in fact safe; however
> that is not something we can realistically implement in the linker.
>
> Since the user-replaceable allocators are implemented using weak
> definitions, this is a warning that is frequently flagged up in both
> the testsuite and end-user code.
>
> The chosen solution here is to suppress the hot/cold partitioning for
> these cases (it is unlikely to impact performance much c.f. the
> actual allocation).
>
> PR target/112397
>
> libstdc++-v3/ChangeLog:
>
> * configure: Regenerate.
> * configure.ac: Detect if we are building for Darwin.
> * libsupc++/Makefile.am: If we are building for Darwin, then
> suppress hot/cold partitioning for the array allocators.
> * libsupc++/Makefile.in: Regenerated.
>
> Signed-off-by: Iain Sandoe 
> Co-authored-by: Jonathan Wakely 
> ---
>  libstdc++-v3/configure | 35 +++---
>  libstdc++-v3/configure.ac  |  6 +
>  libstdc++-v3/libsupc++/Makefile.am |  8 +++
>  libstdc++-v3/libsupc++/Makefile.in |  6 +
>  4 files changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
> index c68cac4f345..37396bd6ebb 100644
> --- a/libstdc++-v3/configure.ac
> +++ b/libstdc++-v3/configure.ac
> @@ -109,6 +109,12 @@ ACX_LT_HOST_FLAGS
>  AC_SUBST(enable_shared)
>  AC_SUBST(enable_static)
>  AM_CONDITIONAL([ENABLE_DARWIN_AT_RPATH], [test x$enable_darwin_at_rpath =
> xyes])
> +os_is_darwin=no
> +case ${host_os} in
> +  darwin*) os_is_darwin=yes ;;
> +  *) ;;
> +esac
> +AM_CONDITIONAL([OS_IS_DARWIN], [test x${os_is_darwin} = xyes])
>
>  if test "$enable_vtable_verify" = yes; then
>predep_objects_CXX="${predep_objects_CXX}
> ${glibcxx_builddir}/../libgcc/vtv_start.o"
> diff --git a/libstdc++-v3/libsupc++/Makefile.am
> b/libstdc++-v3/libsupc++/Makefile.am
> index d0e1618507e..e151ce7a1fe 100644
> --- a/libstdc++-v3/libsupc++/Makefile.am
> +++ b/libstdc++-v3/libsupc++/Makefile.am
> @@ -132,6 +132,14 @@ atomicity_file =
> ${glibcxx_srcdir}/$(ATOMICITY_SRCDIR)/atomicity.h
>  atomicity.cc: ${atomicity_file}
> $(LN_S) ${atomicity_file} ./atomicity.cc || true
>
> +if OS_IS_DARWIN
> +# See PR 112397
> +new_opvnt.lo: new_opvnt.cc
> +   $(LTCXXCOMPILE) -fno-reorder-blocks-and-partition -I. -c $<
> +new_opvnt.o: new_opvnt.cc
> +   $(CXXCOMPILE) -fno-reorder-blocks-and-partition -I. -c $<
> +endif
> +
>  # AM_CXXFLAGS needs to be in each subdirectory so that it can be
>  # modified in a per-library or per-sub-library way.  Need to manually
>  # set this option because CONFIG_CXXFLAGS has to be after
> --
> 2.39.2 (Apple Git-143)
>
>

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-19 Thread Tamar Christina

Thanks for doing this!

> -Original Message-
> From: Li, Pan2 
> Sent: Monday, February 19, 2024 8:42 AM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> ; kito.ch...@gmail.com; Tamar Christina
> 
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Richard for comments.
> 
> > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > the corresponding ssadd/usadd optabs.  There's not much documentation
> > unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> > suggests this is used for fixed-point operations.  It looks like arm uses
> > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> 
> I find the related description about plus family in GCC internals doc but it 
> doesn't
> mention
> anything about mode m here.
> 
> (plus:m x y)
> (ss_plus:m x y)
> (us_plus:m x y)
> These three expressions all represent the sum of the values represented by x
> and y carried out in machine mode m. They diff er in their behavior on 
> overflow
> of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> at the maximum signed value representable in m; us_plus saturates at the
> maximum unsigned value.
> 
> > The natural thing is to use direct optab internal functions (that's what you
> > basically did, but you added a new optab, IMO without good reason).

I think we should actually do an indirect optab here, because the IFN can be 
used
to replace the general representation of saturating arithmetic.

e.g. the __builtin_add_overflow case in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
is inefficient on all targets and so the IFN can always expand to something 
that's more
efficient like the branchless version add_sat2. 

I think this is why you suggested a new tree code below, but we don't really 
need
tree-codes for this. It can be done cleaner using the same way as 
DEF_INTERNAL_INT_EXT_FN.

> 
> That makes sense to me, I will try to leverage US_PLUS instead here.
> 
> > More GIMPLE-like would be to let the types involved decide whether
> > it's signed or unsigned saturation.  That's actually what I'd prefer here
> > and if we don't map 1:1 to optabs then instead use tree codes like
> > S_PLUS_EXPR (mimicing RTL here).
> 
> Sorry I don't get the point here for GIMPLE-like way. For the .SAT_ADDU, I 
> add one
> restriction
> like unsigned_p (type) in match.pd. Looks we have a better way here.
> 

Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the sign
should be determined by the types at expansion time.  i.e. there should only be
.SAT_ADD. 

i.e. instead of this

+DEF_INTERNAL_OPTAB_FN (SAT_ADDU, ECF_CONST | ECF_NOTHROW, sat_addu, binary)

You should use DEF_INTERNAL_SIGNED_OPTAB_FN.

Regards,
Tamar

> > Any other opinions?  Anyone knows more about fixed-point and RTL/modes?
> 
> AFAIK, the scalar of the riscv backend doesn't have fixed-point but the 
> vector does
> have. They
> share the same mode as vector integer. For example, RVVM1SI in vector-
> iterators.md. Kito
> and Juzhe can help to correct me if any misunderstandings.
> 
> Pan
> 
> -Original Message-
> From: Richard Biener 
> Sent: Monday, February 19, 2024 3:36 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> ; kito.ch...@gmail.com; tamar.christ...@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> On Sat, Feb 17, 2024 at 11:30 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > unsigned saturation add.  Aka set the result of add to the max
> > when overflow.  It will take the pattern similar as below.
> >
> > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADDU (1, 254)   => 255.
> > * SAT_ADDU (1, 255)   => 255.
> > * SAT_ADDU (2, 255)   => 255.
> > * SAT_ADDU (255, 255) => 255.
> >
> > The patch also implement the SAT_ADDU in the riscv backend as
> > the sample.  Given below example:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-19 Thread Li, Pan2

> There was a discussion about this back in 2021:
> https://gcc.gnu.org/pipermail/gcc/2021-May/236015.html
> Including a reference to the much older discussion from JSM about
> fixed-point types and lowering and such:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2011-05/msg00846.html

Thanks Andrew, I will go thru for more details.

Pan

-Original Message-
From: Andrew Pinski  
Sent: Monday, February 19, 2024 4:31 PM
To: Richard Biener 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com; tamar.christ...@arm.com
Subject: Re: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

On Sun, Feb 18, 2024 at 11:37 PM Richard Biener
 wrote:
>
> On Sat, Feb 17, 2024 at 11:30 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > unsigned saturation add.  Aka set the result of add to the max
> > when overflow.  It will take the pattern similar as below.
> >
> > SAT_ADDU (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADDU (1, 254)   => 255.
> > * SAT_ADDU (1, 255)   => 255.
> > * SAT_ADDU (2, 255)   => 255.
> > * SAT_ADDU (255, 255) => 255.
> >
> > The patch also implement the SAT_ADDU in the riscv backend as
> > the sample.  Given below example:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> >
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADDU (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > Then we will have the middle-end representation like .SAT_ADDU after
> > this patch.
>
> I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> the corresponding ssadd/usadd optabs.  There's not much documentation
> unfortunately besides the use of gen_*_fixed_libfunc usage where the comment
> suggests this is used for fixed-point operations.  It looks like arm uses
> fractional/accumulator modes for this but for example bfin has ssaddsi3.
>
> So the question is whether the fixed-point case can be distinguished from
> the integer case based on mode.
>
> There's also FIXED_POINT_TYPE on the GENERIC/GIMPLE side and
> no special tree operator codes for them.  So compared to what appears
> to be the case on RTL we'd need a way to represent saturating integer
> operations on GIMPLE.
>
> The natural thing is to use direct optab internal functions (that's what you
> basically did, but you added a new optab, IMO without good reason).
> More GIMPLE-like would be to let the types involved decide whether
> it's signed or unsigned saturation.  That's actually what I'd prefer here
> and if we don't map 1:1 to optabs then instead use tree codes like
> S_PLUS_EXPR (mimicing RTL here).
>
> Any other opinions?  Anyone knows more about fixed-point and RTL/modes?

There was a discussion about this back in 2021:
https://gcc.gnu.org/pipermail/gcc/2021-May/236015.html

Including a reference to the much older discussion from JSM about
fixed-point types and lowering and such:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-05/msg00846.html

I am not 100% sure how much of this applies here though.

I have not looked fully into either thread to get a sense of what was
decided in the end.

Thanks,
Andrew

>
> Richard.
>
> > PR target/51492
> > PR target/112600
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-protos.h (riscv_expand_saturation_addu):
> > New func decl for the SAT_ADDU expand.
> > * config/riscv/riscv.cc (riscv_expand_saturation_addu): New func
> > impl for the SAT_ADDU expand.
> > * config/riscv/riscv.md (sat_addu_3): New pattern to impl
> > the standard name SAT_ADDU.
> > * doc/md.texi: Add doc for SAT_ADDU.
> > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADDU.
> > * internal-fn.def (SAT_ADDU): Add SAT_ADDU.
> > * match.pd: Add simplify pattern patch for SAT_ADDU.
> > * optabs.def (OPTAB_D): Add sat_addu_optab.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/sat_addu-1.c: New test.
> > * gcc.target/riscv/sat_addu-2.c: New test.
> >

1 2 >

1 - 100 of 106 matches

Mail list logo