[pushed] c++: fix TTP level reduction cache

2023-05-02 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We try to cache the result of reduce_template_parm_level so that when we
reduce the same parm multiple times we get the same result, but this wasn't
working for template template parms because in that case TYPE is a
TEMPLATE_TEMPLATE_PARM, and so same_type_p was false because of the same
level mismatch that we're trying to adjust for.  So in that case compare the
template parms of the template template parms instead.

The result can be seen in nontype12.C, where we previously gave three
duplicate errors on line 7 and now give only one because subsequent
substitutions use the cache.

gcc/cp/ChangeLog:

* pt.cc (reduce_template_parm_level): Fix comparison of
template template parm to cached version.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype12.C: Check for duplicate error.
---
 gcc/cp/pt.cc  | 7 ++-
 gcc/testsuite/g++.dg/template/nontype12.C | 3 ++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 471fc20bc5b..5446b5058b7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4550,7 +4550,12 @@ reduce_template_parm_level (tree index, tree type, int 
levels, tree args,
   if (TEMPLATE_PARM_DESCENDANTS (index) == NULL_TREE
   || (TEMPLATE_PARM_LEVEL (TEMPLATE_PARM_DESCENDANTS (index))
  != TEMPLATE_PARM_LEVEL (index) - levels)
-  || !same_type_p (type, TREE_TYPE (TEMPLATE_PARM_DESCENDANTS (index
+  || !(TREE_CODE (type) == TEMPLATE_TEMPLATE_PARM
+  ? (comp_template_parms
+ (DECL_TEMPLATE_PARMS (TYPE_NAME (type)),
+  DECL_TEMPLATE_PARMS (TEMPLATE_PARM_DECL
+   (TEMPLATE_PARM_DESCENDANTS (index)
+  : same_type_p (type, TREE_TYPE (TEMPLATE_PARM_DESCENDANTS (index)
 {
   tree orig_decl = TEMPLATE_PARM_DECL (index);
 
diff --git a/gcc/testsuite/g++.dg/template/nontype12.C 
b/gcc/testsuite/g++.dg/template/nontype12.C
index e37cf8f7646..6642ffd0a13 100644
--- a/gcc/testsuite/g++.dg/template/nontype12.C
+++ b/gcc/testsuite/g++.dg/template/nontype12.C
@@ -4,7 +4,8 @@
 template struct A
 {
   template int foo();// { dg-error "double" "" { 
target c++17_down } }
-  template class> int bar();// { dg-error "double" "" { 
target c++17_down } }
+  template class> int bar();// { dg-bogus 
{double.*C:7:[^\n]*double} }
+  // { dg-error "double" "" { target c++17_down } .-1 }
   template struct X; // { dg-error "double" "" { 
target c++17_down } }
 };
 

base-commit: d7cb9720ed54687bd1135c5e6ef90776a9db0bd5
-- 
2.31.1



[PATCH 1/2] Factor out copy_phi_args from gimple_duplicate_sese_tail and remove_forwarder_block.

2023-05-02 Thread Andrew Pinski via Gcc-patches
While improving replace_phi_edge_with_variable for the diamond formed bb
case, I needed a new function, copy_phi_args and then I went to search for
similar code and noticed both gimple_duplicate_sese_tail and
remove_forwarder_block have the same code I need. So I decided it would
be best if I factor it out into a new function into a common area
and call it from those two places (and will use it for phiopt too).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-cfg.cc (copy_phi_args): New function
(gimple_duplicate_sese_tail): Use it instead of
doing it inline.
* tree-cfg.h (copy_phi_args): New declaration.
* tree-cfgcleanup.cc (remove_forwarder_block): Use
copy_phi_args instead of implementing it inline.
---
 gcc/tree-cfg.cc| 29 ++---
 gcc/tree-cfg.h |  1 +
 gcc/tree-cfgcleanup.cc | 10 +-
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 4927fc0a8d9..3f24d9c5b1c 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -6802,6 +6802,23 @@ bb_part_of_region_p (basic_block bb, basic_block* bbs, 
unsigned n_region)
   return false;
 }
 
+/* For each PHI in BB, copy the PHI argument associated with SRC_E to TGT_E.  
*/
+
+void
+copy_phi_args (basic_block bb, edge src_e, edge tgt_e)
+{
+  gphi_iterator gsi;
+  int src_indx = src_e->dest_idx;
+
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
+{
+  gphi *phi = gsi.phi ();
+  tree def = gimple_phi_arg_def (phi, src_indx);
+  location_t locus = gimple_phi_arg_location (phi, src_indx);
+  add_phi_arg (phi, unshare_expr (def), tgt_e, locus);
+}
+}
+
 /* Duplicates REGION consisting of N_REGION blocks.  The new blocks
are stored to REGION_COPY in the same order in that they appear
in REGION, if REGION_COPY is not NULL.  ENTRY is the entry to
@@ -6847,9 +6864,6 @@ gimple_duplicate_sese_tail (edge entry, edge exit,
   gimple_stmt_iterator gsi;
   edge sorig, snew;
   basic_block exit_bb;
-  gphi_iterator psi;
-  gphi *phi;
-  tree def;
   class loop *target, *aloop, *cloop;
 
   gcc_assert (EDGE_COUNT (exit->src->succs) == 2);
@@ -6947,14 +6961,7 @@ gimple_duplicate_sese_tail (edge entry, edge exit,
gcc_assert (single_succ_edge (region_copy[i]));
e = redirect_edge_and_branch (single_succ_edge (region_copy[i]), 
exit_bb);
PENDING_STMT (e) = NULL;
-   for (psi = gsi_start_phis (exit_bb);
-!gsi_end_p (psi);
-gsi_next ())
- {
-   phi = psi.phi ();
-   def = PHI_ARG_DEF (phi, nexits[0]->dest_idx);
-   add_phi_arg (phi, def, e, gimple_phi_arg_location_from_edge (phi, 
e));
- }
+   copy_phi_args (exit_bb, nexits[0], e);
   }
   e = redirect_edge_and_branch (nexits[1], nexits[0]->dest);
   PENDING_STMT (e) = NULL;
diff --git a/gcc/tree-cfg.h b/gcc/tree-cfg.h
index 9b56a68fe9d..7ec3f812a76 100644
--- a/gcc/tree-cfg.h
+++ b/gcc/tree-cfg.h
@@ -113,6 +113,7 @@ extern basic_block gimple_switch_default_bb (function *, 
gswitch *);
 extern edge gimple_switch_edge (function *, gswitch *, unsigned);
 extern edge gimple_switch_default_edge (function *, gswitch *);
 extern bool cond_only_block_p (basic_block);
+extern void copy_phi_args (basic_block, edge, edge);
 
 /* Return true if the LHS of a call should be removed.  */
 
diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
index 42b25312122..f3582e5ce52 100644
--- a/gcc/tree-cfgcleanup.cc
+++ b/gcc/tree-cfgcleanup.cc
@@ -612,15 +612,7 @@ remove_forwarder_block (basic_block bb)
{
  /* Create arguments for the phi nodes, since the edge was not
 here before.  */
- for (gphi_iterator psi = gsi_start_phis (dest);
-  !gsi_end_p (psi);
-  gsi_next ())
-   {
- gphi *phi = psi.phi ();
- location_t l = gimple_phi_arg_location_from_edge (phi, succ);
- tree def = gimple_phi_arg_def (phi, succ->dest_idx);
- add_phi_arg (phi, unshare_expr (def), s, l);
-   }
+ copy_phi_args (dest, succ, s);
}
 }
 
-- 
2.39.1



[PATCH 2/2] PHIOPT: Improve replace_phi_edge_with_variable for diamond shapped bb

2023-05-02 Thread Andrew Pinski via Gcc-patches
While looking at differences between what minmax_replacement
and match_simplify_replacement does. I noticed that they sometimes
chose different edges to remove. I decided we should be able to do
better and be able to remove both empty basic blocks in the
case of match_simplify_replacement as that moves the statements.

This also updates the testcases as now match_simplify_replacement
will remove the unused MIN/MAX_EXPR and they were checking for
those.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Handle
diamond form bb with forwarder only empty blocks better.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-15.c: Update test.
* gcc.dg/tree-ssa/minmax-16.c: Update test.
* gcc.dg/tree-ssa/minmax-3.c: Update test.
* gcc.dg/tree-ssa/minmax-4.c: Update test.
* gcc.dg/tree-ssa/minmax-5.c: Update test.
* gcc.dg/tree-ssa/minmax-8.c: Update test.
---
 gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c |  3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c |  9 +++
 gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c  |  2 +-
 gcc/tree-ssa-phiopt.cc| 32 ++-
 7 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
index 8a39871c938..6731f91e6c3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
@@ -30,5 +30,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
+/* There should only be two MIN_EXPR left, the 3rd one was removed. */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
 /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
index 623b12b3f74..094364e6424 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
@@ -25,11 +25,8 @@ main (void)
   return 0;
 }
 
-/* After phiopt1, there really should be only 3 MIN_EXPR in the IR (including 
debug statements).
-   But the way phiopt does not cleanup the CFG all the time, the PHI might 
still reference the
-   alternative bb's moved statement.
-   Note in the end, we do dce the statement and other debug statements to end 
up with only 2 MIN_EXPR.
-   So check that too. */
-/* { dg-final { scan-tree-dump-times "MIN_EXPR" 4 "phiopt1" } } */
+/* After phiopt1, will be only 2 MIN_EXPR in the IR (including debug 
statements). */
+/* xk will only have the final result so the extra debug info does not change 
anything. */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
 /* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
index 2af10776346..521afe3e4d9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
@@ -25,5 +25,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
 /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
index 973f39bfed3..49e27185b5e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
@@ -26,4 +26,4 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "phiopt1" } } */
-/* { dg-final { scan-tree-dump-times "MAX_EXPR" 3 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "phiopt1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
index 34e4e720511..194c881cc98 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
@@ -25,5 +25,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "phiopt1" } } */
 /* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c
index 0160e573fef..d5cb53145ea 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c
@@ -26,4 +26,4 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "phiopt1" } } */
-/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 

[PATCH v2 1/1] libstdc++: Set _M_string_length before calling _M_dispose() [PR109703]

2023-05-02 Thread Kefu Chai via Gcc-patches
This patch always sets _M_string_length in the constructor specialized
for range of input_iterator, for the cases like istringstream.

We copy from source range to the local buffer, and then reallocate to
a larger one if necessary. When disposing the old buffer, the old buffer
could be provisioned by the local buffer or an allocated buffer.
_M_is_local() is used to tell if the buffer is the local one or not. In
addition to comparing the buffer address with the local buffer, this
function also performs the sanity checking if _M_string_length is greater
than _S_local_capacity, if the check fails __builtin_unreachable() is
called. But we failed to set _M_string_length in this constructor is
specialized for std::input_iterator. So, if UBSan is enabled when
compiling the source, there are chances that the uninitialized data in
_M_string_length is greater than _S_local_capacity, and the application
aborts a runtime error or exception emitted by the UBSan.

In this change, to avoid the false alarm, _M_string_length is
initialized to zero before doing anything else, so that _M_is_local()
doesn't see an uninitialized value.

This issue only surfaces when constructing a string with a range of
input_iterator, and the uninitialized _M_string_length is greater than
_S_local_capacity, i.e., 15.

libstdc++-v3/ChangeLog:

PR libstdc++/109703
* include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)):
Initialize _M_string_length.

Signed-off-by: Kefu Chai 
Co-authored-by: Jonathan Wakely 
---
 libstdc++-v3/include/bits/basic_string.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 8247ee6bdc6..b16b2898b62 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -760,7 +760,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
_GLIBCXX20_CONSTEXPR
 basic_string(_InputIterator __beg, _InputIterator __end,
 const _Alloc& __a = _Alloc())
-   : _M_dataplus(_M_local_data(), __a)
+   : _M_dataplus(_M_local_data(), __a), _M_string_length(0)
{
 #if __cplusplus >= 201103L
  _M_construct(__beg, __end, std::__iterator_category(__beg));
-- 
2.40.1



[PATCH v2 0/1] Set _M_string_length before calling _M_dispose()

2023-05-02 Thread Kefu Chai via Gcc-patches
Hi Jonathan,

Thank you for your review and suggestion. The change looks great!
Assigning a value with an immediate zero is indeed much faster.

in v2:

* revised the commit message a little bit, I found it a little bit
  difficult to parse when re-reading it.
* associated the commit with PR/libstdc++/109703. as I just filed 
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109706, which turns out
  to be a dup of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109703

The rest of the v2 patch is identical to the one attached in your reply.

Would you please taking another look?

Kefu Chai (1):
  libstdc++: Set _M_string_length before calling _M_dispose() [PR109703]

 libstdc++-v3/include/bits/basic_string.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.40.1



Re: [PATCH] RISC-V: fix build issue with gcc 4.9.x

2023-05-02 Thread Andrew Pinski via Gcc-patches
On Tue, May 2, 2023 at 5:38 PM Kito Cheng via Gcc-patches
 wrote:
>
> > >>> Pushed to trunk, thanks for catching that, that's definitely should
> > >>> use log2 no matter C++03 or C++11,
> > >>> but I think GCC allows the usage of C++11 according to
> > >>> https://gcc.gnu.org/install/prerequisites.html :P
> > >> Yes, we should be able to use C++11.  I'd like to get that to C++17 at
> > >> some point, but I think the biggest problem is the desire to support
> > >> bootstrapping on something like centos7/rhel7.
> > >
> > > At least we have auto and range based for loop, I am satisfied with
> > > that enough.
> >
> > Indeed, gcc 4.9 already support C++11 but for some reason std::log2 fail 
> > with
> > it. Probably because gcc 4.9 is the last gcc release with C++03 used by 
> > default.
> > I wasn't able to reproduce the build issue (without my patch) with gcc 10 
> > or 11.
>
> Anyway I think your fix is reasonable since we include math.h not cmath.h 
> here,
> so use without std:: is rightway, but don't know why it is build-able
> with newer GCC,
> I guess that might be included by some other header indirectly.

No libstdc++ was specifically fixed in GCC 6 to do the correct thing,
see PR 60401 and PR 14608 (r6-6392-g96e19adabc80146648825).

Thanks,
Andrew Pinski

>
> > I'm fine with new prerequisites for the next gcc release, I checked the 
> > release
> > note about that. I noticed that all toolchains I've built with gcc 13.1 for
> > other cpu target where successful with the same setup (docker image).
> >
> > Thanks!
> >
> > Best regards,
> > Romain
> >
> > >
> > >
> > >>
> > >> jeff
> >


Re: [PATCH] RISC-V: fix build issue with gcc 4.9.x

2023-05-02 Thread Kito Cheng via Gcc-patches
> >>> Pushed to trunk, thanks for catching that, that's definitely should
> >>> use log2 no matter C++03 or C++11,
> >>> but I think GCC allows the usage of C++11 according to
> >>> https://gcc.gnu.org/install/prerequisites.html :P
> >> Yes, we should be able to use C++11.  I'd like to get that to C++17 at
> >> some point, but I think the biggest problem is the desire to support
> >> bootstrapping on something like centos7/rhel7.
> >
> > At least we have auto and range based for loop, I am satisfied with
> > that enough.
>
> Indeed, gcc 4.9 already support C++11 but for some reason std::log2 fail with
> it. Probably because gcc 4.9 is the last gcc release with C++03 used by 
> default.
> I wasn't able to reproduce the build issue (without my patch) with gcc 10 or 
> 11.

Anyway I think your fix is reasonable since we include math.h not cmath.h here,
so use without std:: is rightway, but don't know why it is build-able
with newer GCC,
I guess that might be included by some other header indirectly.

> I'm fine with new prerequisites for the next gcc release, I checked the 
> release
> note about that. I noticed that all toolchains I've built with gcc 13.1 for
> other cpu target where successful with the same setup (docker image).
>
> Thanks!
>
> Best regards,
> Romain
>
> >
> >
> >>
> >> jeff
>


[PATCH] Add stats to simple_dce_from_worklist

2023-05-02 Thread Andrew Pinski via Gcc-patches
While looking to move substitute_and_fold_engine
over to use simple_dce_from_worklist, I noticed
that we don't record the stats of the removed stmts/phis.
So this does that.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-dce.cc (simple_dce_from_worklist): Record
stats on removed number of statements and phis.
---
 gcc/tree-ssa-dce.cc | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 1fd88e8ee37..86a33e1a40d 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -2099,6 +2099,8 @@ make_pass_cd_dce (gcc::context *ctxt)
 void
 simple_dce_from_worklist (bitmap worklist)
 {
+  long phiremoved = 0;
+  long stmtremoved = 0;
   while (! bitmap_empty_p (worklist))
 {
   /* Pop item.  */
@@ -2144,12 +2146,20 @@ simple_dce_from_worklist (bitmap worklist)
}
   gimple_stmt_iterator gsi = gsi_for_stmt (t);
   if (gimple_code (t) == GIMPLE_PHI)
-   remove_phi_node (, true);
+   {
+ remove_phi_node (, true);
+ phiremoved++;
+   }
   else
{
  unlink_stmt_vdef (t);
  gsi_remove (, true);
  release_defs (t);
+ stmtremoved++;
}
 }
+  statistics_counter_event (cfun, "PHIs removed",
+   phiremoved);
+  statistics_counter_event (cfun, "Statements removed",
+   stmtremoved);
 }
-- 
2.39.1



[PATCH] c++: wrong std::is_convertible with cv-qual fn [PR109680]

2023-05-02 Thread Marek Polacek via Gcc-patches
This PR points out that std::is_convertible has given the wrong answer
in

  static_assert (!std::is_convertible_v , "");

since r13-2822 implemented __is_{,nothrow_}convertible.

std::is_convertible uses the imaginary

  To test() { return std::declval(); }

to do its job.  Here, From is 'int () const'.  std::declval is defined as:

  template
  typename std::add_rvalue_reference::type declval() noexcept;

std::add_rvalue_reference is defined as "If T is a function type that
has no cv- or ref- qualifier or an object type, provides a member typedef
type which is T&&, otherwise type is T."

In our case, T is cv-qualified, so the result is T, so we end up with

  int () const declval() noexcept;

which is invalid.  In other words, this is pretty much like:

  using T = int () const;
  T fn1(); // bad, fn returning a fn
  T& fn2(); // bad, cannot declare reference to qualified function type
  T* fn3(); // bad, cannot declare pointer to qualified function type

  using U = int ();
  U fn4(); // bad, fn returning a fn
  U& fn5(); // OK
  U* fn6(); // OK

I think is_convertible_helper needs to simulate std::declval better.
I wouldn't be surprised if other type traits needed a similar fix.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?

I've tested the new test with G++12 and clang++ as well (with
std::is_convertible).

PR c++/109680

gcc/cp/ChangeLog:

* method.cc (is_convertible_helper): Correct simulating std::declval.

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_convertible6.C: New test.
---
 gcc/cp/method.cc   | 20 
 gcc/testsuite/g++.dg/ext/is_convertible6.C | 16 
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_convertible6.C

diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 00eae56eb5b..38eb7520312 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -2245,6 +2245,26 @@ is_convertible_helper (tree from, tree to)
 {
   if (VOID_TYPE_P (from) && VOID_TYPE_P (to))
 return integer_one_node;
+  /* std::is_{,nothrow_}convertible test whether the imaginary function
+ definition
+
+   To test() { return std::declval(); }
+
+ is well-formed.  A function can't return a function...  */
+  if (FUNC_OR_METHOD_TYPE_P (to)
+  /* ...neither can From be a function with cv-/ref-qualifiers:
+std::declval is defined as
+
+ template
+ typename std::add_rvalue_reference::type declval() noexcept;
+
+   and std::add_rvalue_reference yields T when T is a function with
+   cv- or ref-qualifiers, making the definition ill-formed.
+   ??? Should we check this in other uses of build_stub_object too?  */
+  || (FUNC_OR_METHOD_TYPE_P (from)
+ && (type_memfn_quals (from) != TYPE_UNQUALIFIED
+ || type_memfn_rqual (from) != REF_QUAL_NONE)))
+return error_mark_node;
   cp_unevaluated u;
   tree expr = build_stub_object (from);
   deferring_access_check_sentinel acs (dk_no_deferred);
diff --git a/gcc/testsuite/g++.dg/ext/is_convertible6.C 
b/gcc/testsuite/g++.dg/ext/is_convertible6.C
new file mode 100644
index 000..180582663e8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_convertible6.C
@@ -0,0 +1,16 @@
+// PR c++/109680
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+SA(!__is_convertible(int () const, int (*)()));
+SA(!__is_convertible(int (*)(), int () const));
+
+SA( __is_convertible(int (), int (*)()));
+SA(!__is_convertible(int (*)(), int ()));
+
+SA( __is_convertible(int (int), int (*) (int)));
+SA(!__is_convertible(int (*) (int), int (int)));
+
+SA(!__is_convertible(int (int) const, int (*) (int)));
+SA(!__is_convertible(int (*) (int), int (int) const));

base-commit: 33020780a9699f1146eeed61783cec89fde337a0
-- 
2.40.1



[pushed] c++: simplify member template substitution

2023-05-02 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X) \
template  struct X##1; \
template  struct X##2; \
template  struct X##3; \
template  struct X##4; \
template  struct X##5; \
template  struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template  struct A
  {
NC3(am)
  };
  template  void sink(Ts...);
  template  void g()
  {
sink(A()...);
  }
  template  void f()
  {
g<__integer_pack(I)...>();
  }
  int main()
  {
f<1000>();
  }

gcc/cp/ChangeLog:

* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
of a class template.
(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
---
 gcc/cp/pt.cc | 36 +---
 1 file changed, 9 insertions(+), 27 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3f1cf139bbd..471fc20bc5b 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -12285,21 +12285,13 @@ instantiate_class_template (tree type)
   Ignore it; it will be regenerated when needed.  */
continue;
 
- bool class_template_p = (TREE_CODE (t) != ENUMERAL_TYPE
-  && TYPE_LANG_SPECIFIC (t)
-  && CLASSTYPE_IS_TEMPLATE (t));
+ /* If the member is a class template, we've
+already substituted its type.  */
+ if (CLASS_TYPE_P (t)
+ && CLASSTYPE_IS_TEMPLATE (t))
+   continue;
 
- /* If the member is a class template, then -- even after
-substitution -- there may be dependent types in the
-template argument list for the class.  We increment
-PROCESSING_TEMPLATE_DECL so that dependent_type_p, as
-that function will assume that no types are dependent
-when outside of a template.  */
- if (class_template_p)
-   ++processing_template_decl;
  tree newtag = tsubst (t, args, tf_error, NULL_TREE);
- if (class_template_p)
-   --processing_template_decl;
  if (newtag == error_mark_node)
continue;
 
@@ -12307,19 +12299,6 @@ instantiate_class_template (tree type)
{
  tree name = TYPE_IDENTIFIER (t);
 
- if (class_template_p)
-   /* Unfortunately, lookup_template_class sets
-  CLASSTYPE_IMPLICIT_INSTANTIATION for a partial
-  instantiation (i.e., for the type of a member
-  template class nested within a template class.)
-  This behavior is required for
-  maybe_process_partial_specialization to work
-  correctly, but is not accurate in this case;
-  the TAG is not an instantiation of anything.
-  (The corresponding TEMPLATE_DECL is an
-  instantiation, but the TYPE is not.) */
-   CLASSTYPE_USE_TEMPLATE (newtag) = 0;
-
  /* Now, install the tag.  We don't use pushtag
 because that does too much work -- creating an
 implicit typedef, which we've already done.  */
@@ -14750,7 +14729,10 @@ tsubst_template_decl (tree t, tree args, 
tsubst_flags_t complain,
   /* For a partial specialization, we need to keep pointing to
 the primary template.  */
   if (!DECL_TEMPLATE_SPECIALIZATION (t))
-   CLASSTYPE_TI_TEMPLATE (inner) = r;
+   {
+ CLASSTYPE_TI_TEMPLATE (inner) = r;
+ CLASSTYPE_USE_TEMPLATE (inner) = 0;
+   }
 
   DECL_TI_ARGS (r) = CLASSTYPE_TI_ARGS (inner);
   inner = TYPE_MAIN_DECL (inner);

base-commit: f9861511a1fa0f9e386f3f7bcee84b6e3ca3c579
-- 
2.31.1



Re: [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads.

2023-05-02 Thread Segher Boessenkool
On Wed, Apr 26, 2023 at 12:18:36PM -0400, Michael Meissner wrote:
>   * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
>   of the ld and lwa instructions which use the DS encoding instead of D.
>   Use the YZ constraint for these loads.  Handle prefixed loads better.

Don't use tabs in the middle of a line.

"Handle prefixed loads better" is not what the patch does, and/or is so
vague as to be useless.

> --- a/gcc/config/rs6000/genfusion.pl
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -56,7 +56,7 @@ sub mode_to_ldst_char
>  sub gen_ld_cmpi_p10
>  {
>  my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
> - $mempred, $ccmode, $np, $extend, $resultmode);
> + $mempred, $ccmode, $np, $extend, $resultmode, $constraint);
>LMODE: foreach $lmode ('DI','SI','HI','QI') {
>$ldst = mode_to_ldst_char($lmode);
>$clobbermode = $lmode;
> @@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
>CCMODE: foreach $ccmode ('CC','CCUNS') {
> $np = "NON_PREFIXED_D";
> $mempred = "non_update_memory_operand";
> +   $constraint = "m";
> if ( $ccmode eq 'CC' ) {
> next CCMODE if $lmode eq 'QI';
> -   if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
> +   if ( $lmode eq 'HI' ) {
> +   $np = "NON_PREFIXED_D";
> +   $mempred = "non_update_memory_operand";
> +   $echr = "a";
> +   } elsif ( $lmode eq 'SI' ) {
> +   # ld and lwa are both DS-FORM.
> +   $np = "NON_PREFIXED_DS";
> +   $mempred = "lwa_operand";
> +   $echr = "a";
> +   $constraint = "YZ";
> +   } elsif ( $lmode eq 'DI' ) {
> # ld and lwa are both DS-FORM.
> $np = "NON_PREFIXED_DS";
> $mempred = "ds_form_mem_operand";
> +   $echr = "";
> +   $constraint = "YZ";
> }
> $cmpl = "";
> -   $echr = "a";
> $constpred = "const_m1_to_1_operand";
> } else {
> if ( $lmode eq 'DI' ) {
> # ld is DS-form, but lwz is not.
> $np = "NON_PREFIXED_DS";
> $mempred = "ds_form_mem_operand";
> +   $constraint = "YZ";
> }
> $cmpl = "l";
> $echr = "z";
> @@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10
>  
> print "(define_insn_and_split 
> \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
> print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" 
> \"=x\")\n";
> -   print "(compare:${ccmode} (match_operand:${lmode} 1 
> \"${mempred}\" \"m\")\n";
> +   print "(compare:${ccmode} (match_operand:${lmode} 1 
> \"${mempred}\" \"${constraint}\")\n";
> if ($ccmode eq 'CCUNS') { print "   "; }
> print "(match_operand:${lmode} 3 \"${constpred}\" 
> \"n\")))\n";
> if ($result eq 'clobber') {
> @@ -137,6 +150,11 @@ sub gen_ld_cmpi_p10
> print "  \"\"\n";
> print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
> print "   (set_attr \"cost\" \"8\")\n";
> +
> +   if ($extend eq "sign") {
> +   print "   (set_attr \"sign_extend\" \"yes\")\n";
> +   }
> +
> print "   (set_attr \"length\" \"8\")])\n";
> print "\n";
>}

This already was a 90-line function that did too many things.  Now it is
bigger and does more things, and the patch is unintelligible.

Please first factor things.  There are many more things terrible Perl
code style here (like all of the quoting), but where to start :-/

I once again spent many hours trying to review this, and once again
failed.  Please write better code, and please make better patches.

> index ec783803820..7d6c94aee5b 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes"
> (eq_attr "maybe_prefixed" "no"))
>(const_string "no")
>  
> -  (eq_attr "type" "load,fpload,vecload")
> +  (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")

Don't duplicate vecload.

> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
> @@ -0,0 +1,25 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-require-effective-target powerpc_prefixed_addr } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */

The power10_ok selector still is terribly broken (it allows only some
variants of 64-bit Linux and nothing more, to start with).  Do we still
need it in any case?

Same for powerpc_prefixed_addr.  Is there any supported target that does
not have a working assembler?

What is -fstack-protector here for?  That should be documented, or
better, it should just be removed if possible.

> -/* { dg-final { scan-assembler-times 

Re: [PATCH] PHIOPT: Improve replace_phi_edge_with_variable for diamond shapped bb

2023-05-02 Thread Andrew Pinski via Gcc-patches
On Tue, May 2, 2023 at 5:26 AM Richard Biener via Gcc-patches
 wrote:
>
> On Sun, Apr 30, 2023 at 11:14 PM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > While looking at differences between what minmax_replacement
> > and match_simplify_replacement does. I noticed that they sometimes
> > chose different edges to remove. I decided we should be able to do
> > better and be able to remove both empty basic blocks in the
> > case of match_simplify_replacement as that moves the statements.
> >
> > This also updates the testcases as now match_simplify_replacement
> > will remove the unused MIN/MAX_EXPR and they were checking for
> > those.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-phiopt.cc (copy_phi_args): New function.
> > (replace_phi_edge_with_variable): Handle diamond form bb
> > with forwarder only empty blocks better.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/minmax-15.c: Update test.
> > * gcc.dg/tree-ssa/minmax-16.c: Update test.
> > * gcc.dg/tree-ssa/minmax-3.c: Update test.
> > * gcc.dg/tree-ssa/minmax-4.c: Update test.
> > * gcc.dg/tree-ssa/minmax-5.c: Update test.
> > * gcc.dg/tree-ssa/minmax-8.c: Update test.
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c |  3 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c |  9 ++--
> >  gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c  |  2 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c  |  2 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c  |  2 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c  |  2 +-
> >  gcc/tree-ssa-phiopt.cc| 51 ++-
> >  7 files changed, 59 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
> > index 8a39871c938..6731f91e6c3 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
> > @@ -30,5 +30,6 @@ main (void)
> >return 0;
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
> > +/* There should only be two MIN_EXPR left, the 3rd one was removed. */
> > +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
> >  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> > index 623b12b3f74..094364e6424 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> > @@ -25,11 +25,8 @@ main (void)
> >return 0;
> >  }
> >
> > -/* After phiopt1, there really should be only 3 MIN_EXPR in the IR 
> > (including debug statements).
> > -   But the way phiopt does not cleanup the CFG all the time, the PHI might 
> > still reference the
> > -   alternative bb's moved statement.
> > -   Note in the end, we do dce the statement and other debug statements to 
> > end up with only 2 MIN_EXPR.
> > -   So check that too. */
> > -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 4 "phiopt1" } } */
> > +/* After phiopt1, will be only 2 MIN_EXPR in the IR (including debug 
> > statements). */
> > +/* xk will only have the final result so the extra debug info does not 
> > change anything. */
> > +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
> >  /* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
> >  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> > index 2af10776346..521afe3e4d9 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> > @@ -25,5 +25,5 @@ main (void)
> >return 0;
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
> >  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> > index 973f39bfed3..49e27185b5e 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> > @@ -26,4 +26,4 @@ main (void)
> >  }
> >
> >  /* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "phiopt1" } } */
> > -/* { dg-final { scan-tree-dump-times "MAX_EXPR" 3 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "phiopt1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> > index 34e4e720511..194c881cc98 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> > @@ -25,5 +25,5 @@ main (void)
> >return 0;
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 

Re: [PATCH v4 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-05-02 Thread Peter Bergner via Gcc-patches
On 4/28/23 6:49 PM, Hans-Peter Nilsson wrote:
> On Fri, 28 Apr 2023, Jeff Law wrote:
>> So while what Ajit has done is a step forward, at some point the actual
>> details of the ABI need to be described in a way that can be checked and
>> consumed by REE.
> 
> IIRC I also commented and suggested a few target macros that 
> *should* have helped to that effect.  Ajit, I suggest you see my 
> previous reply in this or a related conversation.

To be specific, Hans Peter said earlier:


On 3/30/23 7:01 PM, Hans-Peter Nilsson wrote:
> Pardon the arm-chair development mode but it sounds like 
> re-inventing the TARGET_PROMOTE_* hooks...
> 
> Maybe just hook up TARGET_PROMOTE_FUNCTION_MODE to ree.c (as 
> "you" already already define it for "rs6000")?


Peter




Re: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Segher Boessenkool
Hi!

On Tue, May 02, 2023 at 05:20:49PM +0100, Roger Sayle wrote:
> On 02 May 2023 14:49, Segher Boessenkool wrote:
> Then combine inserts an additional copy:

Combine makes sure a pseudo-to-pseudo move remains.  Without that,
combine will seize part of RA's job, and butcher it.  It has always done
that, but the 2-2 combine patches made it clearer than before.

> (insn 17 16 18 2 (clobber (reg/v:SI 26 [ x ])) "../../shiftsi.c":4:41 -1
>  (nil))
> (insn 18 17 19 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 0)
> (reg:HI 30)) "../../shiftsi.c":4:41 6 {movhi_internal}
>  (nil))
> (insn 19 18 3 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 2)
> (reg:HI 31 [+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
>  (nil))

> I don't think it's a problem/fault with the machine description, but
> how the clobber above is being interpreted by the different register
> allocators.

Insn 17 is trivially dead code and should have been removed.  It
arguably should not have been created in the first place.  Why do we do
that, what is the purpose?

> Back in August 2022, I submitted a x86_64 backend patch that used a
> peephole2 to eliminate this type of (subreg-lower) generated clobber:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599419.html
> but Uros was hesitant to accept this change, without an RTL expert
> explaining why the clobbers were being generated in the first place.

It makes sure any previous value in it is regarded as dead, but since
insns 18 and 19 clearly write to the whole register anyway, this is just
extra work to later undo.  Or not undo, which is the problem here :-/

> Like my x86_64 patch above, this issue could also probably be cleaned
> up by a peephole2 for xstormy16 that straddled/removed the clobber.

Or simply not create such useless clobbers in the first place?

> My simplistic summary is that when a backend lowers a multi-word
> instruction with a splitter, it (typically) does so without introducing
> a clobber.  If the clobbers generated by the middle-end's automatic
> lowering confuse the register allocator, then this is more efficient.
> Ideally, such clobbers should be avoided (if possible) and/or LRA
> improved to understand that they don't actually cause an allocno
> conflict.  [but I'm in no way a register allocation expert].

Yup, I agree with all that.  We should not create dead code, and not be
confused by dead code either :-)


Segher


Re: [PATCH] MATCH: Port CLRSB part of builtin_zero_pattern

2023-05-02 Thread Andrew Pinski via Gcc-patches
On Tue, May 2, 2023 at 5:24 AM Richard Biener via Gcc-patches
 wrote:
>
> On Sun, Apr 30, 2023 at 11:13 PM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > This ports the clrsb builtin part of builtin_zero_pattern
> > to match.pd. A simple pattern to port.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> > * match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New
> > pattern.
> > ---
> >  gcc/match.pd | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 0e782cde71d..bf918ba70ce 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7787,6 +7787,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(cond (ne @0 integer_zerop@1) (func@4 (convert? @2)) integer_zerop@3)
> >@4))
> >
> > +/* a != 0 ? FUN(a) : CST -> Fun(a) for some CLRSB builtins
> > +   where CST is precision-1. */
> > +(for func (CLRSB)
> > + (simplify
> > +  (cond (ne @0 integer_zerop@1) (func@5 (convert?@4 @2)) INTEGER_CST@3)
>
> As you don't seem to use @2 why not match (func@5 @4) only?

Thanks for catching this, @2 should really have been @0, otherwise we
get the wrong answer in general.
I fixed the other patterns I just added for this same issue too.

This is what I committed instead:
/* a != 0 ? FUN(a) : CST -> Fun(a) for some CLRSB builtins
   where CST is precision-1. */
(for func (CLRSB)
 (simplify
  (cond (ne @0 integer_zerop@1) (func@4 (convert?@3 @0)) INTEGER_CST@2)
  (if (wi::to_widest (@2) == TYPE_PRECISION (TREE_TYPE (@3)) - 1)
   @4)))

Thanks,
Andrew

>
> Otherwise LGTM.
>
> > +  (if (wi::to_widest (@3) == TYPE_PRECISION (TREE_TYPE (@4)) - 1)
> > +   @5)))
> > +
> >  #if GIMPLE
> >  /* a != 0 ? CLZ(a) : CST -> .CLZ(a) where CST is the result of the 
> > internal function for 0. */
> >  (for func (CLZ)
> > --
> > 2.31.1
> >


[COMMITTED] tree-optimization: [PR109702] MATCH: Fix a ? func(a) : N patterns

2023-05-02 Thread Andrew Pinski via Gcc-patches
I accidently messed up these patterns so the comparison
against 0 and the arguments was not matching up when they
need to be.

I committed this as obvious after a bootstrap/test on x86_64-linux-gnu

PR tree-optimization/109702

gcc/ChangeLog:

* match.pd: Fix "a != 0 ? FUNC(a) : CST" patterns
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-25b.c: New test.
---
 gcc/match.pd| 16 ++---
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c | 70 +
 2 files changed, 78 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 0e782cde71d..b14b7017c9a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7784,14 +7784,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* a != 0 ? FUN(a) : 0 -> Fun(a) for some builtin functions. */
 (for func (POPCOUNT BSWAP FFS PARITY)
  (simplify
-  (cond (ne @0 integer_zerop@1) (func@4 (convert? @2)) integer_zerop@3)
-  @4))
+  (cond (ne @0 integer_zerop@1) (func@3 (convert? @0)) integer_zerop@2)
+  @3))
 
 #if GIMPLE
 /* a != 0 ? CLZ(a) : CST -> .CLZ(a) where CST is the result of the internal 
function for 0. */
 (for func (CLZ)
  (simplify
-  (cond (ne @0 integer_zerop@1) (func (convert?@4 @2)) INTEGER_CST@3)
+  (cond (ne @0 integer_zerop@1) (func (convert?@3 @0)) INTEGER_CST@2)
   (with { int val;
  internal_fn ifn = IFN_LAST;
  if (direct_internal_fn_supported_p (IFN_CLZ, type, OPTIMIZE_FOR_BOTH)
@@ -7799,13 +7799,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
val) == 2)
ifn = IFN_CLZ;
}
-   (if (ifn == IFN_CLZ && wi::to_widest (@3) == val)
-(IFN_CLZ @4)
+   (if (ifn == IFN_CLZ && wi::to_widest (@2) == val)
+(IFN_CLZ @3)
 
 /* a != 0 ? CTZ(a) : CST -> .CTZ(a) where CST is the result of the internal 
function for 0. */
 (for func (CTZ)
  (simplify
-  (cond (ne @0 integer_zerop@1) (func (convert?@4 @2)) INTEGER_CST@3)
+  (cond (ne @0 integer_zerop@1) (func (convert?@3 @0)) INTEGER_CST@2)
   (with { int val;
  internal_fn ifn = IFN_LAST;
  if (direct_internal_fn_supported_p (IFN_CTZ, type, OPTIMIZE_FOR_BOTH)
@@ -7813,8 +7813,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
val) == 2)
ifn = IFN_CTZ;
}
-   (if (ifn == IFN_CTZ && wi::to_widest (@3) == val)
-(IFN_CTZ @4)
+   (if (ifn == IFN_CTZ && wi::to_widest (@2) == val)
+(IFN_CTZ @3)
 #endif
 
 /* Common POPCOUNT/PARITY simplifications.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
new file mode 100644
index 000..698a20f7a56
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
@@ -0,0 +1,70 @@
+/* PR tree-optimization/109702 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Test to make sure unrelated arguments and comparisons
+   don't get optimized incorrectly. */
+
+unsigned short test_bswap16(unsigned short x, unsigned short y)
+{
+  return x ? __builtin_bswap16(y) : 0;
+}
+
+unsigned int test_bswap32(unsigned int x, unsigned int y)
+{
+  return x ? __builtin_bswap32(y) : 0;
+}
+
+unsigned long long test_bswap64(unsigned long long x, unsigned long long y)
+{
+  return x ? __builtin_bswap64(y) : 0;
+}
+
+int test_clrsb(int x, int y)
+{
+  return x ? __builtin_clrsb(y) : (__SIZEOF_INT__*8-1);
+}
+
+int test_clrsbl(long x, long y)
+{
+  return x ? __builtin_clrsbl(y) : (__SIZEOF_LONG__*8-1);
+}
+
+int test_clrsbll(long long x, long long y)
+{
+  return x ? __builtin_clrsbll(y) : (__SIZEOF_LONG_LONG__*8-1);
+}
+
+int test_parity(unsigned int x, unsigned int y)
+{
+  return x ? __builtin_parity(y) : 0;
+}
+
+int test_parityl(unsigned long x, unsigned long y)
+{
+  return x ? __builtin_parityl(y) : 0;
+}
+
+int test_parityll(unsigned long long x, unsigned long long y)
+{
+  return x ? __builtin_parityll(y) : 0;
+}
+
+int test_popcount(unsigned int x, unsigned int y)
+{
+  return x ? __builtin_popcount(y) : 0;
+}
+
+int test_popcountl(unsigned long x, unsigned long y)
+{
+  return x ? __builtin_popcountl(y) : 0;
+}
+
+int test_popcountll(unsigned long long x, unsigned long long y)
+{
+  return x ? __builtin_popcountll(y) : 0;
+}
+
+/* 4 types of functions, each with 3 types and there are 2 goto each */
+/* { dg-final { scan-tree-dump-times "goto " 24 "optimized" } } */
+
-- 
2.31.1



[PATCH] c++: satisfaction of non-dep member alias template-id

2023-05-02 Thread Patrick Palka via Gcc-patches
constraints_satisfied_p already carefully checks dependence of template
arguments before proceeding with satisfaction, so the dependence check
in instantiate_alias_template is unnecessary and overly conservative.
Getting rid of it allows us to check satisfaction ahead of time in more
cases as in the below testcase.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.cc (instantiate_alias_template): Exit early upon
error from coerce_template_parms.  Remove dependence test
guarding constraints_satisfied_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-alias6.C: New test.
---
 gcc/cp/pt.cc |  6 +++---
 gcc/testsuite/g++.dg/cpp2a/concepts-alias6.C | 15 +++
 2 files changed, 18 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-alias6.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3f1cf139bbd..930291917f2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -22178,11 +22178,11 @@ instantiate_alias_template (tree tmpl, tree args, 
tsubst_flags_t complain)
 
   args = coerce_template_parms (DECL_TEMPLATE_PARMS (tmpl),
args, tmpl, complain);
+  if (args == error_mark_node)
+return args;
 
   /* FIXME check for satisfaction in check_instantiated_args.  */
-  if (flag_concepts
-  && !any_dependent_template_arguments_p (args)
-  && !constraints_satisfied_p (tmpl, args))
+  if (!constraints_satisfied_p (tmpl, args))
 {
   if (complain & tf_error)
{
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-alias6.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-alias6.C
new file mode 100644
index 000..4acd57d36e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-alias6.C
@@ -0,0 +1,15 @@
+// Verify we can check satisfaction of non-dependent member alias
+// template-ids whose constraints don't depend on outer template
+// arguments ahead of time.
+// { dg-do compile { target c++20 } }
+
+template
+struct A {
+  template requires (N > 0)
+  using at = T;
+
+  void f() {
+using ty1 = at<0>; // { dg-error "constraint" }
+using ty2 = at<1>;
+  }
+};
-- 
2.40.1.459.g48d89b51b3



Re: [PATCH] target: [PR109657] (a ? -1 : 0) | b could be optimized better for aarch64

2023-05-02 Thread Andrew Pinski via Gcc-patches
On Tue, May 2, 2023 at 5:23 AM Richard Sandiford via Gcc-patches
 wrote:
>
> Andrew Pinski via Gcc-patches  writes:
> > There is no canonical form for this case defined. So the aarch64 backend 
> > needs
> > a pattern to match both of these forms.
> >
> > The forms are:
> > (set (reg/i:SI 0 x0)
> > (if_then_else:SI (eq (reg:CC 66 cc)
> > (const_int 0 [0]))
> > (reg:SI 97)
> > (const_int -1 [0x])))
> > and
> > (set (reg/i:SI 0 x0)
> > (ior:SI (neg:SI (ne:SI (reg:CC 66 cc)
> > (const_int 0 [0])))
> > (reg:SI 102)))
> >
> > Currently the aarch64 backend matches the first form so this
> > patch adds a insn_and_split to match the second form and
> > convert it to the first form.
> >
> > OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
> >
> >   PR target/109657
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64.md (*cmov_insn_m1): New
> >   insn_and_split pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/csinv-2.c: New test.
> > ---
> >  gcc/config/aarch64/aarch64.md  | 20 +
> >  gcc/testsuite/gcc.target/aarch64/csinv-2.c | 26 ++
> >  2 files changed, 46 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/csinv-2.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index e1a2b265b20..57fe5601350 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -4194,6 +4194,26 @@ (define_insn "*cmovsi_insn_uxtw"
> >[(set_attr "type" "csel, csel, csel, csel, csel, mov_imm, mov_imm")]
> >  )
> >
> > +;; There are two canonical forms for `cmp ? -1 : a`.
> > +;; This is the second form and is here to help combine.
> > +;; Support `-(cmp) | a` into `cmp ? -1 : a` to be canonical in the backend.
> > +(define_insn_and_split "*cmov_insn_m1"
> > +  [(set (match_operand:GPI 0 "register_operand" "=r")
> > +(ior:GPI
> > +  (neg:GPI
> > +   (match_operator:GPI 1 "aarch64_comparison_operator"
> > +[(match_operand 2 "cc_register" "") (const_int 0)]))
> > +  (match_operand 3 "register_operand" "r")))]
> > +  ""
> > +  "#"
> > +  "&& true"
> > +  [(set (match_dup 0)
> > + (if_then_else:GPI (match_dup 1)
> > +  (const_int -1) (match_dup 3)))]
>
> Sorry for the nit, but the formatting of the last two lines looks odd IMO.
> How about:
>
> (if_then_else:GPI (match_dup 1) (const_int -1) (match_dup 3))...
>
> or:
>
> (if_then_else:GPI (match_dup 1)
>   (const_int -1)
>   (match_dup 3))...
>
> OK with that change, thanks.

I committed with the second form as it is easier to read than all on
one line I think.

Thanks,
Andrew

>
> Richard
>
> > +  {}
> > +  [(set_attr "type" "csel")]
> > +)
> > +
> >  (define_insn "*cmovdi_insn_uxtw"
> >[(set (match_operand:DI 0 "register_operand" "=r")
> >   (if_then_else:DI
> > diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-2.c 
> > b/gcc/testsuite/gcc.target/aarch64/csinv-2.c
> > new file mode 100644
> > index 000..89132acb713
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/csinv-2.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +/* PR target/109657: (a ? -1 : 0) | b could be better */
> > +
> > +/* Both functions should have the same assembly of:
> > +   cmp w1, 0
> > +   csinv   w0, w0, wzr, eq
> > +
> > +   We should not get:
> > +   cmp w1, 0
> > +   csetm   w1, ne
> > +   orr w0, w1, w0
> > + */
> > +/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]" 2 } } */
> > +/* { dg-final { scan-assembler-not "csetm\tw\[0-9\]" } } */
> > +unsigned b(unsigned a, unsigned b)
> > +{
> > +  if(b)
> > +return -1;
> > +  return a;
> > +}
> > +unsigned b1(unsigned a, unsigned b)
> > +{
> > +unsigned t = b ? -1 : 0;
> > +return a | t;
> > +}


Re: [PATCH] RISC-V: fix build issue with gcc 4.9.x

2023-05-02 Thread Romain Naour via Gcc-patches
Hi Kito,

Le 02/05/2023 à 17:51, Kito Cheng a écrit :
>>> Pushed to trunk, thanks for catching that, that's definitely should
>>> use log2 no matter C++03 or C++11,
>>> but I think GCC allows the usage of C++11 according to
>>> https://gcc.gnu.org/install/prerequisites.html :P
>> Yes, we should be able to use C++11.  I'd like to get that to C++17 at
>> some point, but I think the biggest problem is the desire to support
>> bootstrapping on something like centos7/rhel7.
> 
> At least we have auto and range based for loop, I am satisfied with
> that enough.

Indeed, gcc 4.9 already support C++11 but for some reason std::log2 fail with
it. Probably because gcc 4.9 is the last gcc release with C++03 used by default.
I wasn't able to reproduce the build issue (without my patch) with gcc 10 or 11.

I'm fine with new prerequisites for the next gcc release, I checked the release
note about that. I noticed that all toolchains I've built with gcc 13.1 for
other cpu target where successful with the same setup (docker image).

Thanks!

Best regards,
Romain

> 
> 
>>
>> jeff



Re: [PATCH] doc: Describe behaviour of enums with fixed underlying type

2023-05-02 Thread Jonathan Wakely via Gcc-patches
On Thu, 27 Apr 2023 at 16:58, Marek Polacek wrote:

> On Thu, Apr 27, 2023 at 12:16:34PM +0100, Jonathan Wakely via Gcc-patches
> wrote:
> > C2x adds the ability to give an enumeration type a fixed underlying
> > type, as C++ already has. The -fshort-enums option alters the compiler's
> > choice of underlying type, but when it's fixed the compiler can't
> > choose.
> >
> > Similarly for C++ -fstrict-enums has no effect with a fixed underlying
> > type, because every value of the underlying type is a valid value of the
> > enumeration type.
> >
> > This caused confusion recently: https://gcc.gnu.org/PR109532
> >
> > OK for trunk?
>
> LGTM.
>

That's an ack C front end reviewer (thanks!), do I need an ack from a
C++/docs/global review too, or can I push?



>
> > -- >8 --
> >
> > gcc/ChangeLog:
> >
> >   * doc/invoke.texi (Code Gen Options): Note that -fshort-enums
> >   is ignored for a fixed underlying type.
> >   (C++ Dialect Options): Likewise for -fstrict-enums.
> > ---
> >  gcc/doc/invoke.texi | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 2f40c58b21c..0f91464f8c0 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -3495,6 +3495,8 @@ defined in the C++ standard; basically, a value
> that can be
> >  represented in the minimum number of bits needed to represent all the
> >  enumerators).  This assumption may not be valid if the program uses a
> >  cast to convert an arbitrary integer value to the enumerated type.
> > +This option has no effect for an enumeration type with a fixed
> underlying
> > +type.
> >
> >  @opindex fstrong-eval-order
> >  @item -fstrong-eval-order
> > @@ -18303,6 +18305,8 @@ Use it to conform to a non-default application
> binary interface.
> >  Allocate to an @code{enum} type only as many bytes as it needs for the
> >  declared range of possible values.  Specifically, the @code{enum} type
> >  is equivalent to the smallest integer type that has enough room.
> > +This option has no effect for an enumeration type with a fixed
> underlying
> > +type.
> >
> >  @strong{Warning:} the @option{-fshort-enums} switch causes GCC to
> generate
> >  code that is not binary compatible with code generated without that
> switch.
> > --
> > 2.40.0
> >
>
> Marek
>
>


Re: [committed] RISCV: Inline subword atomic ops

2023-05-02 Thread Patrick O'Neill

Is this OK for a backport to GCC-13 as well?

(with the whitespace fixes/changelog revision squashed into it)

Patrick

On 4/26/23 10:01, Patrick O'Neill wrote:

Committed - I had to reformat the changelog so it would push and resolve a
trivial merge conflict in riscv.opt.

---

RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill 

gcc/ChangeLog:
PR target/104338
* config/riscv/riscv-protos.h: Add helper function stubs.
* config/riscv/riscv.cc: Add helper functions for subword masking.
* config/riscv/riscv.opt: Add command-line flag.
* config/riscv/sync.md: Add masking logic and inline asm for 
fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* doc/invoke.texi: Add blurb regarding command-line flag.

libgcc/ChangeLog:
PR target/104338
* config/riscv/atomic.c: Add reference to duplicate logic.

gcc/testsuite/ChangeLog:
PR target/104338
* gcc.target/riscv/inline-atomics-1.c: New test.
* gcc.target/riscv/inline-atomics-2.c: New test.
* gcc.target/riscv/inline-atomics-3.c: New test.
* gcc.target/riscv/inline-atomics-4.c: New test.
* gcc.target/riscv/inline-atomics-5.c: New test.
* gcc.target/riscv/inline-atomics-6.c: New test.
* gcc.target/riscv/inline-atomics-7.c: New test.
* gcc.target/riscv/inline-atomics-8.c: New test.

Signed-off-by: Patrick O'Neill 
Signed-off-by: Palmer Dabbelt 


[Committed 11/11] RISC-V: Table A.6 conformance tests

2023-05-02 Thread Patrick O'Neill
Updated the amo/load/store/fence tests to use check-function-bodies to
ensure ordering. This is especially important for Load/Store where
we want to ensure the correct fence is emitted in the correct spot.

Compare exchange & subword amo ops still use scan-assembler-times.

The change to check-function-bodies was pre-approved by Jeff Law.

Committed.

Patrick

---

These tests cover basic cases to ensure the atomic mappings follow the
strengthened Table A.6 mappings that are compatible with Table A.7.

2023-04-27 Patrick O'Neill 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-1.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-2.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-3.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-4.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-5.c: New test.
* gcc.target/riscv/amo-table-a-6-load-1.c: New test.
* gcc.target/riscv/amo-table-a-6-load-2.c: New test.
* gcc.target/riscv/amo-table-a-6-load-3.c: New test.
* gcc.target/riscv/amo-table-a-6-store-1.c: New test.
* gcc.target/riscv/amo-table-a-6-store-2.c: New test.
* gcc.target/riscv/amo-table-a-6-store-compat-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill 
---
 .../gcc.target/riscv/amo-table-a-6-amo-add-1.c | 15 +++
 .../gcc.target/riscv/amo-table-a-6-amo-add-2.c | 15 +++
 .../gcc.target/riscv/amo-table-a-6-amo-add-3.c | 15 +++
 .../gcc.target/riscv/amo-table-a-6-amo-add-4.c | 15 +++
 .../gcc.target/riscv/amo-table-a-6-amo-add-5.c | 15 +++
 .../riscv/amo-table-a-6-compare-exchange-1.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-2.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-3.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-4.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-5.c   |  9 +
 .../riscv/amo-table-a-6-compare-exchange-6.c   | 10 ++
 .../riscv/amo-table-a-6-compare-exchange-7.c   |  9 +
 .../gcc.target/riscv/amo-table-a-6-fence-1.c   | 14 ++
 .../gcc.target/riscv/amo-table-a-6-fence-2.c   | 15 +++
 .../gcc.target/riscv/amo-table-a-6-fence-3.c   | 15 +++
 .../gcc.target/riscv/amo-table-a-6-fence-4.c   | 15 +++
 .../gcc.target/riscv/amo-table-a-6-fence-5.c   | 15 +++
 .../gcc.target/riscv/amo-table-a-6-load-1.c| 16 
 .../gcc.target/riscv/amo-table-a-6-load-2.c| 17 +
 .../gcc.target/riscv/amo-table-a-6-load-3.c| 18 ++
 .../gcc.target/riscv/amo-table-a-6-store-1.c   | 16 
 .../gcc.target/riscv/amo-table-a-6-store-2.c   | 17 +
 .../riscv/amo-table-a-6-store-compat-3.c   | 18 ++
 .../riscv/amo-table-a-6-subword-amo-add-1.c|  9 +
 .../riscv/amo-table-a-6-subword-amo-add-2.c|  9 +
 .../riscv/amo-table-a-6-subword-amo-add-3.c|  9 +
 .../riscv/amo-table-a-6-subword-amo-add-4.c|  9 +
 .../riscv/amo-table-a-6-subword-amo-add-5.c|  9 +
 28 files changed, 360 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/amo-table-a-6-amo-add-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/amo-table-a-6-compare-exchange-2.c
 create mode 100644 

[pushed] c++: less invalidate_class_lookup_cache

2023-05-02 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In the testcase below, we push_to_top_level to instantiate f and g, and they
can both use the previous_class_level cache from instantiating A.
Wiping the cache in pop_from_top_level is not helpful; we'll do that in
pushclass if needed.

  template  struct A
  {
int i;
void f() { i = 42; }
void g() { i = 24; }
  };

  int main()
  {
A a;
a.f();
a.g();
  }

gcc/cp/ChangeLog:

* name-lookup.cc (pop_from_top_level): Don't
invalidate_class_lookup_cache.
---
 gcc/cp/name-lookup.cc | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 7c61bc3bf61..8fd5733c1e2 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -8205,9 +8205,6 @@ pop_from_top_level (void)
 
   auto_cond_timevar tv (TV_NAME_LOOKUP);
 
-  /* Clear out class-level bindings cache.  */
-  if (previous_class_level)
-invalidate_class_lookup_cache ();
   pop_class_stack ();
 
   release_tree_vector (current_lang_base);

base-commit: bc24c51c0ccd64617864897ad071c98004ffc0a4
-- 
2.31.1



[PATCH 2/2] c++: look for empty base at specific offset [PR109678]

2023-05-02 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

While looking at the empty base handling for 109678, it occurred to me that
we ought to be able to look for an empty base at a specific offset, not just
in general.

PR c++/109678

gcc/cp/ChangeLog:

* cp-tree.h (lookup_base): Add offset parm.
* constexpr.cc (cxx_fold_indirect_ref_1): Pass it.
* search.cc (struct lookup_base_data_s): Add offset.
(dfs_lookup_base): Handle it.
(lookup_base): Pass it.
---
 gcc/cp/cp-tree.h|  3 ++-
 gcc/cp/constexpr.cc |  2 +-
 gcc/cp/search.cc| 25 ++---
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c9c4cd6f32f..406a5508ce7 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7515,7 +7515,8 @@ extern tree build_if_nonnull  (tree, 
tree, tsubst_flags_t);
 extern tree get_parent_with_private_access (tree decl, tree binfo);
 extern bool accessible_base_p  (tree, tree, bool);
 extern tree lookup_base (tree, tree, base_access,
-base_kind *, tsubst_flags_t);
+base_kind *, tsubst_flags_t,
+HOST_WIDE_INT = -1);
 extern tree dcast_base_hint(tree, tree);
 extern int accessible_p(tree, tree, bool);
 extern int accessible_in_template_p(tree, tree);
diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 37d1c444c9e..70dd6cf4d90 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -5452,7 +5452,7 @@ cxx_fold_indirect_ref_1 (const constexpr_ctx *ctx, 
location_t loc, tree type,
 which is likely to be a waste of time (109678).  */
   if (is_empty_class (type)
  && CLASS_TYPE_P (optype)
- && DERIVED_FROM_P (type, optype))
+ && lookup_base (optype, type, ba_any, NULL, tf_none, off))
{
  if (empty_base)
*empty_base = true;
diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
index 3f521b3bd72..cd80f285ac9 100644
--- a/gcc/cp/search.cc
+++ b/gcc/cp/search.cc
@@ -56,6 +56,7 @@ static tree dfs_get_pure_virtuals (tree, void *);
 
 struct lookup_base_data_s
 {
+  HOST_WIDE_INT offset; /* Offset we want, or -1 if any.  */
   tree t;  /* type being searched.  */
   tree base;   /* The base type we're looking for.  */
   tree binfo;  /* Found binfo.  */
@@ -74,6 +75,22 @@ dfs_lookup_base (tree binfo, void *data_)
 {
   struct lookup_base_data_s *data = (struct lookup_base_data_s *) data_;
 
+  if (data->offset != -1)
+{
+  /* We're looking for the type at a particular offset.  */
+  int comp = compare_tree_int (BINFO_OFFSET (binfo), data->offset);
+  if (comp > 0)
+   /* Don't bother looking into bases laid out later; even if they
+  do virtually inherit from the base we want, we can get there
+  by another path.  */
+   return dfs_skip_bases;
+  else if (comp != 0
+  && SAME_BINFO_TYPE_P (BINFO_TYPE (binfo), data->base))
+   /* Right type, wrong offset.  */
+   return dfs_skip_bases;
+  /* Fall through.  */
+}
+
   if (SAME_BINFO_TYPE_P (BINFO_TYPE (binfo), data->base))
 {
   if (!data->binfo)
@@ -190,7 +207,7 @@ accessible_base_p (tree t, tree base, bool consider_local_p)
 /* Lookup BASE in the hierarchy dominated by T.  Do access checking as
ACCESS specifies.  Return the binfo we discover.  If KIND_PTR is
non-NULL, fill with information about what kind of base we
-   discovered.
+   discovered.  If OFFSET is other than -1, only match at that offset.
 
If the base is inaccessible, or ambiguous, then error_mark_node is
returned.  If the tf_error bit of COMPLAIN is not set, no error
@@ -198,7 +215,8 @@ accessible_base_p (tree t, tree base, bool consider_local_p)
 
 tree
 lookup_base (tree t, tree base, base_access access,
-base_kind *kind_ptr, tsubst_flags_t complain)
+base_kind *kind_ptr, tsubst_flags_t complain,
+HOST_WIDE_INT offset /* = -1 */)
 {
   tree binfo;
   tree t_binfo;
@@ -246,8 +264,9 @@ lookup_base (tree t, tree base, base_access access,
   data.base = base;
   data.binfo = NULL_TREE;
   data.ambiguous = data.via_virtual = false;
-  data.repeated_base = CLASSTYPE_REPEATED_BASE_P (t);
+  data.repeated_base = (offset == -1) && CLASSTYPE_REPEATED_BASE_P (t);
   data.want_any = access == ba_any;
+  data.offset = offset;
 
   dfs_walk_once (t_binfo, dfs_lookup_base, NULL, );
   binfo = data.binfo;
-- 
2.31.1



[PATCH 1/2] c++: std::variant slow to compile [PR109678]

2023-05-02 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Here, when dealing with a class with a complex subobject structure, we would
try and fail to find the relevant FIELD_DECL for an empty base before giving
up.  And we would do this at each level, in a combinatorially problematic
way.  Instead, we should check for an empty base first.

PR c++/109678

gcc/cp/ChangeLog:

* constexpr.cc (cxx_fold_indirect_ref_1): Handle empty base first.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/variant1.C: New test.
---
 gcc/cp/constexpr.cc   | 23 +++--
 gcc/testsuite/g++.dg/cpp1z/variant1.C | 47 +++
 2 files changed, 60 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/variant1.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index d1097764b10..37d1c444c9e 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -5446,6 +5446,19 @@ cxx_fold_indirect_ref_1 (const constexpr_ctx *ctx, 
location_t loc, tree type,
  return ret;
  }
  }
+
+  /* Handle conversion to an empty base class, which is represented with a
+NOP_EXPR.  Do this before spelunking into the non-empty subobjects,
+which is likely to be a waste of time (109678).  */
+  if (is_empty_class (type)
+ && CLASS_TYPE_P (optype)
+ && DERIVED_FROM_P (type, optype))
+   {
+ if (empty_base)
+   *empty_base = true;
+ return op;
+   }
+
   for (tree field = TYPE_FIELDS (optype);
   field; field = DECL_CHAIN (field))
if (TREE_CODE (field) == FIELD_DECL
@@ -5468,16 +5481,6 @@ cxx_fold_indirect_ref_1 (const constexpr_ctx *ctx, 
location_t loc, tree type,
  return ret;
  }
  }
-  /* Also handle conversion to an empty base class, which
-is represented with a NOP_EXPR.  */
-  if (is_empty_class (type)
- && CLASS_TYPE_P (optype)
- && DERIVED_FROM_P (type, optype))
-   {
- if (empty_base)
-   *empty_base = true;
- return op;
-   }
 }
 
   return NULL_TREE;
diff --git a/gcc/testsuite/g++.dg/cpp1z/variant1.C 
b/gcc/testsuite/g++.dg/cpp1z/variant1.C
new file mode 100644
index 000..9b18cc233ca
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/variant1.C
@@ -0,0 +1,47 @@
+// PR c++/109678
+// With the bug, compiling this testcase takes more than the typical timeout.
+// { dg-do compile { target c++17 } }
+
+#include 
+
+struct A {};
+struct B {};
+struct C {};
+struct D {};
+struct E {};
+struct F {};
+struct G {};
+struct H {};
+struct I {};
+struct J {};
+struct K {};
+struct L {};
+struct M {};
+struct N {};
+struct O {};
+struct P {};
+struct Q {};
+struct R {};
+struct S {};
+struct T {};
+struct U {};
+struct V {};
+struct W {
+// gcc13 + compiler explorer = 2ms 
+// gcc12.2 + compiler explorer =   400ms
+int i;
+};
+struct X {};
+struct Y {};
+struct Z {};
+
+using Foo = std::variant;
+
+struct Bar {
+Foo f;
+static Bar dummy() {
+// issue is triggered by this initialization
+return {Z{}};
+// return {A{}}; // would be very quick
+}
+};

base-commit: bc24c51c0ccd64617864897ad071c98004ffc0a4
-- 
2.31.1



[Committed 10/11] RISC-V: Weaken atomic loads

2023-05-02 Thread Patrick O'Neill



On 4/28/23 11:04, Jeff Law wrote:



On 4/27/23 10:23, Patrick O'Neill wrote:

This change brings atomic loads in line with table A.6 of the ISA
manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (atomic_load): Implement atomic
load mapping.

OK.
jeff


Committed.

Patrick



[Committed 06/11] RISC-V: Strengthen atomic stores

2023-05-02 Thread Patrick O'Neill



On 4/28/23 10:40, Jeff Law wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.

2023-04-27 Patrick O'Neill 

PR 89835

Should be "PR target/89835"



gcc/ChangeLog:

* config/riscv/sync.md:

Needs some text here :-)


I'm not objecting to this patch, but I think we've got an option 
question about whether or not this approach is too expensive for 
existing or soon arriving implementations.


If the decision on that topic is to just pay the cost, then this patch 
is fine.  If we decide to make compatibility optional to avoid the 
additional cost, then this will need suitable adjustments.


Jeff


Acked in Patchworks meeting:
https://inbox.sourceware.org/gcc-patches/c53ac2b2-4edf-34c6-a935-3b31644c9...@rivosinc.com/

Updated changelog and committed:

    PR target/89835

gcc/ChangeLog:

    * config/riscv/sync.md (atomic_store): Use simple store
    instruction in combination with fence(s).

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/pr89835.c: New test.

Patrick



[Committed 09/11] RISC-V: Weaken mem_thread_fence

2023-05-02 Thread Patrick O'Neill



On 4/28/23 11:00, Jeff Law wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

This change brings atomic fences in line with table A.6 of the ISA
manual.

Relax mem_thread_fence according to the memmodel given.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md (mem_thread_fence_1): Change fence
depending on the given memory model.

OK
jeff


Committed.

Patrick



[Committed 08/11] RISC-V: Weaken LR/SC pairs

2023-05-02 Thread Patrick O'Neill



On 4/28/23 10:56, Jeff Law wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

Introduce the %I and %J flags for setting the .aqrl bits on LR/SC pairs
as needed.

Atomic compare and exchange ops provide success and failure memory
models. C++17 and later place no restrictions on the relative strength
of each model, so ensure we cover both by using a model that enforces
the ordering of both given models.

This change brings LR/SC ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_union_memmodels): Expose
riscv_union_memmodels function to sync.md.
* config/riscv/riscv.cc (riscv_union_memmodels): Add function to
get the union of two memmodels in sync.md.
(riscv_print_operand): Add %I and %J flags that output the
optimal LR/SC flag bits for a given memory model.
* config/riscv/sync.md: Remove static .aqrl bits on LR op/.rl
bits on SC op and replace with optimized %I, %J flags.

OK.

Note for the future.  Operands don't have to appear in-order in a 
define_insn.  So the kind of reordering you did here may not have been 
strictly necessary.   As you found out, when you renumber the 
operands, you have to adjust the assembly template, which can be error 
prone. Knowing that I checked them pretty closely and they look right 
to me.




Jeff


Committed.

Patrick



Re: [PATCH v5 05/11] RISC-V: Add AMO release bits

2023-05-02 Thread Patrick O'Neill



On 4/28/23 10:34, Jeff Law wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

This patch sets the relevant .rl bits on amo operations.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): change behavior
of %A to include release bits.

Capitalize "change" in the ChangeLog entry.  OK with that nit fixed.

jeff


Capitalized "change" and committed.

gcc/ChangeLog:

    * config/riscv/riscv.cc (riscv_print_operand): Change behavior
    of %A to include release bits.

Patrick



[Committed 07/11] RISC-V: Eliminate AMO op fences

2023-05-02 Thread Patrick O'Neill



On 4/28/23 10:43, Jeff Law wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

Atomic operations with the appropriate bits set already enfore release
semantics. Remove unnecessary release fences from atomic ops.

This change brings AMO ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_memmodel_needs_amo_release): Change function name.
(riscv_print_operand): Remove unneeded %F case.
* config/riscv/sync.md: Remove unneeded fences.
OK.  Though note this depends on a resolution of patch #6.  You could 
potentially leave the %F support in riscv_print_operand and install 
the rest of this patch while we settle the question around #6.


Jeff


Committed.

Patrick



[Committed 04/11] RISC-V: Enforce atomic compare_exchange SEQ_CST

2023-05-02 Thread Patrick O'Neill



On 4/28/23 10:23, Jeff Law wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

This patch enforces SEQ_CST for atomic compare_exchange ops.

Replace Fence/LR.aq/SC.aq pairs with SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md: Change FENCE/LR.aq/SC.aq into
sequentially consistent LR.aqrl/SC.rl pair.
OK.  Note that generally you should note which pattern you're changing 
in a ChangeLog entry, similar to how we note the function being 
changed.  So something like this might be better:


* config/riscv/sync.md (atomic_cas_value_strong): ...

Jeff


Edited ChangeLog and committed:

gcc/ChangeLog:

    * config/riscv/sync.md (atomic_cas_value_strong): Change
    FENCE/LR.aq/SC.aq into sequentially consistent LR.aqrl/SC.rl
    pair.

Patrick



[Committed 03/11] RISC-V: Enforce subword atomic LR/SC SEQ_CST

2023-05-02 Thread Patrick O'Neill



On 4/27/23 09:22, Patrick O'Neill wrote:

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/sync.md: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.

Signed-off-by: Patrick O'Neill 

Acked by Jeff Law in reply to [PATCH v5 02/11]
https://inbox.sourceware.org/gcc-patches/ce2e7d52-046d-90d7-baa2-11b299d50...@gmail.com/

Committed.

Patrick


[Committed 02/11] RISC-V: Enforce Libatomic LR/SC SEQ_CST

2023-05-02 Thread Patrick O'Neill



On 4/28/23 09:50, Jeff Law wrote:



On 4/27/23 10:22, Patrick O'Neill wrote:

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill 

libgcc/ChangeLog:

* config/riscv/atomic.c: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.
OK.  When you install this, make sure you also install #3 of the kit 
which mirrors these changes for the inline subword atomics.


jeff


Committed.


Patrick


[Committed 01/11] RISC-V: Eliminate SYNC memory models

2023-05-02 Thread Patrick O'Neill

On 4/28/23 09:23, Jeff Law wrote:

On 4/27/23 10:22, Patrick O'Neill wrote:

Remove references to MEMMODEL_SYNC_* models by converting via
memmodel_base().

2023-04-27 Patrick O'Neill 

gcc/ChangeLog:

* config/riscv/riscv.cc: Remove MEMMODEL_SYNC_* cases and
sanitize memmodel input with memmodel_base.
OK.  Not sure if you want to commit it now or wait for the full set to 
get ACK'd (since there are some questions on the trailing sync approach).


Jeff


Committed.

Patrick



Re: [PATCH 1/2] c++: potentiality of templated memfn call [PR109480]

2023-05-02 Thread Patrick Palka via Gcc-patches
on Tue, 2 May 2023, Patrick Palka wrote:

> On Tue, 2 May 2023, Jason Merrill wrote:
> 
> > On 5/1/23 15:59, Patrick Palka wrote:
> > > Here we're incorrectly deeming the templated call a.g() inside b's
> > > initializer as potentially constant, despite g being non-constexpr,
> > > which leads to us wastefully instantiating the initializer ahead of time
> > > and triggering a bug in access checking deferral (which will get fixed
> > > in the subsequent patch).
> > > 
> > > This patch fixes this by calling get_fns earlier during potentiality
> > > checking so that we also handle the templated form of a member function
> > > call (whose overall callee is a COMPONENT_REF) when checking if the called
> > > function is constexpr etc.
> > > 
> > >   PR c++/109480
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * constexpr.cc (potential_constant_expression_1) :
> > >   Reorganize to call get_fns sooner.  Remove dead store to 'fun'.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/cpp0x/noexcept59.C: Make e() constexpr so that the
> > >   expected "without object" diagnostic isn't replaced by a
> > >   "call to non-constexpr function" diagnostic.
> > >   * g++.dg/template/non-dependent25.C: New test.
> > > ---
> > >   gcc/cp/constexpr.cc | 16 
> > >   gcc/testsuite/g++.dg/cpp0x/noexcept59.C |  2 +-
> > >   gcc/testsuite/g++.dg/template/non-dependent25.C | 14 ++
> > >   3 files changed, 23 insertions(+), 9 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/template/non-dependent25.C
> > > 
> > > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > > index d1097764b10..29d872d0a5e 100644
> > > --- a/gcc/cp/constexpr.cc
> > > +++ b/gcc/cp/constexpr.cc
> > > @@ -9132,6 +9132,10 @@ potential_constant_expression_1 (tree t, bool
> > > want_rval, bool strict, bool now,
> > >   if (fun && is_overloaded_fn (fun))
> > > {
> > > + if (!RECUR (fun, true))
> > > +   return false;
> > > + fun = get_fns (fun);
> > > +
> > >   if (TREE_CODE (fun) == FUNCTION_DECL)
> > > {
> > >   if (builtin_valid_in_constant_expr_p (fun))
> > > @@ -9167,7 +9171,8 @@ potential_constant_expression_1 (tree t, bool
> > > want_rval, bool strict, bool now,
> > >  expression the address will be folded away, so look
> > >  through it now.  */
> > >   if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fun)
> > > - && !DECL_CONSTRUCTOR_P (fun))
> > > + && !DECL_CONSTRUCTOR_P (fun)
> > > + && !processing_template_decl)
> > 
> > I don't see any rationale for this hunk?
> 
> Now that we call get_fns earlier, we can reach this code path with a
> templated non-static memfn call, but the code that follows assumes
> non-templated form.
> 
> I tried teaching it to handle the templated form too, but there's
> apparently two different templated forms for non-static memfn calls,
> one with a COMPONENT_REF callee and one with an ordinary BASELINK
> callee (without a implicit object argument).  In the former the implict
> object argument is inside the COMPONENT_REF (and is a reference instead
> of a pointer), and in the latter we don't even have an implicit object
> argument to inspect.
> 
> FWIW I think which form we use depends on whether we know if the called
> function is a member of the current instantiation, e.g
> 
>   struct A { void f(); };
> 
>   template struct B;
> 
>   template
>   struct C : B {
> void g();
> 
> void h() {
>   A::f(); // templated form has BASELINK callee, no object arg
>   C::g(); // templated form has COMPONENT_REF callee
> }
>   };
> 
> So it seemed best to punt on templated non-static memfn calls here for
> now and treat that as a separate enhancement.

And I'm not even sure if the code path in question is necessary at all
anymore: disabling it outright doesn't cause any regressions in the testsuite.
It seems effectively equivalent to the body of the loop over the args a few
lines later:

  for (; i < nargs; ++i)
{
  tree x = get_nth_callarg (t, i);
  /* In a template, reference arguments haven't been converted to
 REFERENCE_TYPE and we might not even know if the parameter
 is a reference, so accept lvalue constants too.  */
  bool rv = processing_template_decl ? any : rval;
  /* Don't require an immediately constant value, as constexpr
 substitution might not use the value of the argument.  */
  bool sub_now = false;
  if (!potential_constant_expression_1 (x, rv, strict,
sub_now, fundef_p, flags,
jump_target))
return false;
}

> 
> > 
> > > {
> > >   tree x = get_nth_callarg (t, 0);
> > >   if (is_this_parameter (x))
> > > @@ -9182,16 +9187,11 @@ 

Re: [PATCH 1/2] c++: potentiality of templated memfn call [PR109480]

2023-05-02 Thread Patrick Palka via Gcc-patches
On Tue, 2 May 2023, Jason Merrill wrote:

> On 5/1/23 15:59, Patrick Palka wrote:
> > Here we're incorrectly deeming the templated call a.g() inside b's
> > initializer as potentially constant, despite g being non-constexpr,
> > which leads to us wastefully instantiating the initializer ahead of time
> > and triggering a bug in access checking deferral (which will get fixed
> > in the subsequent patch).
> > 
> > This patch fixes this by calling get_fns earlier during potentiality
> > checking so that we also handle the templated form of a member function
> > call (whose overall callee is a COMPONENT_REF) when checking if the called
> > function is constexpr etc.
> > 
> > PR c++/109480
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constexpr.cc (potential_constant_expression_1) :
> > Reorganize to call get_fns sooner.  Remove dead store to 'fun'.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/noexcept59.C: Make e() constexpr so that the
> > expected "without object" diagnostic isn't replaced by a
> > "call to non-constexpr function" diagnostic.
> > * g++.dg/template/non-dependent25.C: New test.
> > ---
> >   gcc/cp/constexpr.cc | 16 
> >   gcc/testsuite/g++.dg/cpp0x/noexcept59.C |  2 +-
> >   gcc/testsuite/g++.dg/template/non-dependent25.C | 14 ++
> >   3 files changed, 23 insertions(+), 9 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/template/non-dependent25.C
> > 
> > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > index d1097764b10..29d872d0a5e 100644
> > --- a/gcc/cp/constexpr.cc
> > +++ b/gcc/cp/constexpr.cc
> > @@ -9132,6 +9132,10 @@ potential_constant_expression_1 (tree t, bool
> > want_rval, bool strict, bool now,
> > if (fun && is_overloaded_fn (fun))
> >   {
> > +   if (!RECUR (fun, true))
> > + return false;
> > +   fun = get_fns (fun);
> > +
> > if (TREE_CODE (fun) == FUNCTION_DECL)
> >   {
> > if (builtin_valid_in_constant_expr_p (fun))
> > @@ -9167,7 +9171,8 @@ potential_constant_expression_1 (tree t, bool
> > want_rval, bool strict, bool now,
> >expression the address will be folded away, so look
> >through it now.  */
> > if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fun)
> > -   && !DECL_CONSTRUCTOR_P (fun))
> > +   && !DECL_CONSTRUCTOR_P (fun)
> > +   && !processing_template_decl)
> 
> I don't see any rationale for this hunk?

Now that we call get_fns earlier, we can reach this code path with a
templated non-static memfn call, but the code that follows assumes
non-templated form.

I tried teaching it to handle the templated form too, but there's
apparently two different templated forms for non-static memfn calls,
one with a COMPONENT_REF callee and one with an ordinary BASELINK
callee (without a implicit object argument).  In the former the implict
object argument is inside the COMPONENT_REF (and is a reference instead
of a pointer), and in the latter we don't even have an implicit object
argument to inspect.

FWIW I think which form we use depends on whether we know if the called
function is a member of the current instantiation, e.g

  struct A { void f(); };

  template struct B;

  template
  struct C : B {
void g();

void h() {
  A::f(); // templated form has BASELINK callee, no object arg
  C::g(); // templated form has COMPONENT_REF callee
}
  };

So it seemed best to punt on templated non-static memfn calls here for
now and treat that as a separate enhancement.

> 
> >   {
> > tree x = get_nth_callarg (t, 0);
> > if (is_this_parameter (x))
> > @@ -9182,16 +9187,11 @@ potential_constant_expression_1 (tree t, bool
> > want_rval, bool strict, bool now,
> > i = 1;
> >   }
> >   }
> > -   else
> > - {
> > -   if (!RECUR (fun, true))
> > - return false;
> > -   fun = get_first_fn (fun);
> > - }
> > +
> > +   fun = OVL_FIRST (fun);
> > /* Skip initial arguments to base constructors.  */
> > if (DECL_BASE_CONSTRUCTOR_P (fun))
> >   i = num_artificial_parms_for (fun);
> > -   fun = DECL_ORIGIN (fun);
> >   }
> > else if (fun)
> > {
> > diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept59.C
> > b/gcc/testsuite/g++.dg/cpp0x/noexcept59.C
> > index c752601ba09..1dc826d3111 100644
> > --- a/gcc/testsuite/g++.dg/cpp0x/noexcept59.C
> > +++ b/gcc/testsuite/g++.dg/cpp0x/noexcept59.C
> > @@ -3,7 +3,7 @@
> > template  class A
> >   {
> > -  void e ();
> > +  constexpr bool e () { return true; };
> > bool f (int() noexcept(this->e())); // { dg-error "this" }
> > bool g (int() noexcept(e())); // { dg-error "without object" }
> >   };
> > diff --git a/gcc/testsuite/g++.dg/template/non-dependent25.C
> > 

Re: [PATCH] c++: Fix up VEC_INIT_EXPR gimplification after r12-7069

2023-05-02 Thread Jason Merrill via Gcc-patches

On 5/2/23 11:19, Jakub Jelinek wrote:

Hi!

During patch backporting, I've noticed that while most cp_walk_tree calls
with cp_fold_r callback callers were changed from  to cp_fold_data
, the VEC_INIT_EXPR gimplifications has not, so it still passes just
address of a hash_set and so if during the folding we ever touch
data->flags, we use uninitialized data there.

The following patch changes it to do the same thing as cp_fold_function
because the VEC_INIT_EXPR gimplifications will happen on function bodies
only.

Ok for trunk if it passes bootstrap/regtest?


OK.


2023-05-02  Jakub Jelinek  

* cp-gimplify.cc (cp_fold_data): Move definition earlier.
(cp_gimplify_expr): Pass address of ff_genericize | ff_mce_false
constructed data rather than  to cp_walk_tree with cp_fold_r.

--- gcc/cp/cp-gimplify.cc.jj2023-03-16 22:01:02.295090975 +0100
+++ gcc/cp/cp-gimplify.cc   2023-05-02 17:05:03.079652427 +0200
@@ -57,6 +57,13 @@ enum fold_flags {
  
  using fold_flags_t = int;
  
+struct cp_fold_data

+{
+  hash_set pset;
+  fold_flags_t flags;
+  cp_fold_data (fold_flags_t flags): flags (flags) {}
+};
+
  /* Forward declarations.  */
  
  static tree cp_genericize_r (tree *, int *, void *);

@@ -505,8 +512,8 @@ cp_gimplify_expr (tree *expr_p, gimple_s
*expr_p = expand_vec_init_expr (NULL_TREE, *expr_p,
tf_warning_or_error);
  
-	hash_set pset;

-   cp_walk_tree (expr_p, cp_fold_r, , NULL);
+   cp_fold_data data (ff_genericize | ff_mce_false);
+   cp_walk_tree (expr_p, cp_fold_r, , NULL);
cp_genericize_tree (expr_p, false);
copy_if_shared (expr_p);
ret = GS_OK;
@@ -1029,13 +1036,6 @@ struct cp_genericize_data
   in fold-const, we need to perform this before transformation to
   GIMPLE-form.  */
  
-struct cp_fold_data

-{
-  hash_set pset;
-  fold_flags_t flags;
-  cp_fold_data (fold_flags_t flags): flags (flags) {}
-};
-
  static tree
  cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
  {

Jakub





Re: [PATCH 2/2] c++: non-dep init folding and access checking [PR109480]

2023-05-02 Thread Jason Merrill via Gcc-patches

On 5/1/23 15:59, Patrick Palka wrote:

enforce_access currently inspects processing_template_decl to determine
whether to defer the given access check until instantiation time.  But
using this flag is unreliable because it gets cleared during e.g.
non-dependent initializer folding, and can lead to premature access
check failures as in the below testcase.  It seems better to inspect
current_template_parms instead.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?


OK.


PR c++/109480

gcc/cp/ChangeLog:

* semantics.cc (enforce_access): Check current_template_parms
instead of processing_template_decl when determining whether
to defer the access check.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent25a.C: New test.
---
  gcc/cp/semantics.cc |  2 +-
  .../g++.dg/template/non-dependent25a.C  | 17 +
  2 files changed, 18 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent25a.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 9ba316ab3be..474da71bff6 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -346,7 +346,7 @@ enforce_access (tree basetype_path, tree decl, tree 
diag_decl,
  }
  
tree cs = current_scope ();

-  if (processing_template_decl
+  if (current_template_parms
&& (CLASS_TYPE_P (cs) || TREE_CODE (cs) == FUNCTION_DECL))
  if (tree template_info = get_template_info (cs))
{
diff --git a/gcc/testsuite/g++.dg/template/non-dependent25a.C 
b/gcc/testsuite/g++.dg/template/non-dependent25a.C
new file mode 100644
index 000..902e537ec09
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent25a.C
@@ -0,0 +1,17 @@
+// PR c++/109480
+// A version of non-dependent25.C where b's initializer is a constant
+// expression.
+// { dg-do compile { target c++11 } }
+
+template
+struct A {
+  void f() {
+constexpr A a;
+const bool b = a.g(); // { dg-bogus "private" }
+  }
+
+private:
+  constexpr bool g() const { return true; }
+};
+
+template struct A;




Re: [PATCH 1/2] c++: potentiality of templated memfn call [PR109480]

2023-05-02 Thread Jason Merrill via Gcc-patches

On 5/1/23 15:59, Patrick Palka wrote:

Here we're incorrectly deeming the templated call a.g() inside b's
initializer as potentially constant, despite g being non-constexpr,
which leads to us wastefully instantiating the initializer ahead of time
and triggering a bug in access checking deferral (which will get fixed
in the subsequent patch).

This patch fixes this by calling get_fns earlier during potentiality
checking so that we also handle the templated form of a member function
call (whose overall callee is a COMPONENT_REF) when checking if the called
function is constexpr etc.

PR c++/109480

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) :
Reorganize to call get_fns sooner.  Remove dead store to 'fun'.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept59.C: Make e() constexpr so that the
expected "without object" diagnostic isn't replaced by a
"call to non-constexpr function" diagnostic.
* g++.dg/template/non-dependent25.C: New test.
---
  gcc/cp/constexpr.cc | 16 
  gcc/testsuite/g++.dg/cpp0x/noexcept59.C |  2 +-
  gcc/testsuite/g++.dg/template/non-dependent25.C | 14 ++
  3 files changed, 23 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent25.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index d1097764b10..29d872d0a5e 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9132,6 +9132,10 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
  
  	if (fun && is_overloaded_fn (fun))

  {
+   if (!RECUR (fun, true))
+ return false;
+   fun = get_fns (fun);
+
if (TREE_CODE (fun) == FUNCTION_DECL)
  {
if (builtin_valid_in_constant_expr_p (fun))
@@ -9167,7 +9171,8 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
   expression the address will be folded away, so look
   through it now.  */
if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fun)
-   && !DECL_CONSTRUCTOR_P (fun))
+   && !DECL_CONSTRUCTOR_P (fun)
+   && !processing_template_decl)


I don't see any rationale for this hunk?


  {
tree x = get_nth_callarg (t, 0);
if (is_this_parameter (x))
@@ -9182,16 +9187,11 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
i = 1;
  }
  }
-   else
- {
-   if (!RECUR (fun, true))
- return false;
-   fun = get_first_fn (fun);
- }
+
+   fun = OVL_FIRST (fun);
/* Skip initial arguments to base constructors.  */
if (DECL_BASE_CONSTRUCTOR_P (fun))
  i = num_artificial_parms_for (fun);
-   fun = DECL_ORIGIN (fun);
  }
else if (fun)
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept59.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept59.C
index c752601ba09..1dc826d3111 100644
--- a/gcc/testsuite/g++.dg/cpp0x/noexcept59.C
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept59.C
@@ -3,7 +3,7 @@
  
  template  class A

  {
-  void e ();
+  constexpr bool e () { return true; };
bool f (int() noexcept(this->e())); // { dg-error "this" }
bool g (int() noexcept(e())); // { dg-error "without object" }
  };
diff --git a/gcc/testsuite/g++.dg/template/non-dependent25.C 
b/gcc/testsuite/g++.dg/template/non-dependent25.C
new file mode 100644
index 000..a2f9801e11f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent25.C
@@ -0,0 +1,14 @@
+// PR c++/109480
+
+template
+struct A {
+  void f() {
+A a;
+const bool b = a.g();
+  }
+
+private:
+  bool g() const;
+};
+
+template struct A;




Re: [PATCH] c++: Move -Wdangling-reference to -Wextra [PR109642]

2023-05-02 Thread Jason Merrill via Gcc-patches

On 5/1/23 19:54, Marek Polacek wrote:

Sadly, -Wdangling-reference generates false positives for std::span-like
user classes, and it seems imprudent to attempt to improve the heuristic
in GCC 13.  Let's move the warning to -Wextra, that will hopefully
reduce the number of false positives the users have been seeing with 13.

I'm leaving the warning in -Wall in 14 where I think I can write code
to detect std::span-like classes.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for 13.2?


OK.


PR c++/109642
PR c++/109640
PR c++/109671

gcc/c-family/ChangeLog:

* c.opt (Wdangling-reference): Move from -Wall to -Wextra.

gcc/ChangeLog:

* doc/invoke.texi: Document that -Wdangling-reference is
enabled by -Wextra.
---
  gcc/c-family/c.opt  | 2 +-
  gcc/doc/invoke.texi | 3 ++-
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cddeece..a75038930ae 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -560,7 +560,7 @@ C ObjC C++ ObjC++ Joined RejectNegative UInteger 
Var(warn_dangling_pointer) Warn
  Warn for uses of pointers to auto variables whose lifetime has ended.
  
  Wdangling-reference

-C++ ObjC++ Var(warn_dangling_reference) Warning LangEnabledBy(C++ ObjC++, Wall)
+C++ ObjC++ Var(warn_dangling_reference) Warning LangEnabledBy(C++ ObjC++, 
Wextra)
  Warn when a reference is bound to a temporary whose lifetime has ended.
  
  Wdate-time

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a38547f53e5..36ed1591440 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3781,7 +3781,7 @@ where @code{std::minmax} returns @code{std::pair}, and
  both references dangle after the end of the full expression that contains
  the call to @code{std::minmax}.
  
-This warning is enabled by @option{-Wall}.

+This warning is enabled by @option{-Wextra}.
  
  @opindex Wdelete-non-virtual-dtor

  @opindex Wno-delete-non-virtual-dtor
@@ -6126,6 +6126,7 @@ name is still supported, but the newer name is more 
descriptive.)
  
  @gccoptlist{-Wclobbered

  -Wcast-function-type
+-Wdangling-reference @r{(C++ only)}
  -Wdeprecated-copy @r{(C++ only)}
  -Wempty-body
  -Wenum-conversion @r{(C only)}




Re: [PATCH] libstdc++: Regenerate baseline_symbols.txt files for Linux

2023-05-02 Thread Jakub Jelinek via Gcc-patches
On Tue, May 02, 2023 at 05:50:17PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Tue, May 02, 2023 at 04:42:52PM +0100, Jonathan Wakely wrote:
> > On Tue, 2 May 2023 at 09:45, Jakub Jelinek  wrote:
> > 
> > > Hi!
> > >
> > > The following patch regenerates the ABI files (I've only changed the
> > > Linux files which were updated recently (last month)).
> > >
> > > Tested on x86_64-linux, ok for trunk and later 13.2?
> > >
> > 
> > OK, thanks.
> > 
> > I currently get:
> > FAIL: libstdc++-abi/abi_check
> > on powerpc64le for old glibc, with the _Float128 overloads for
> > std::from_chars and std::to_chars.
> 
> I'll try to regenerate it from latest Fedora build for ppc64le.

Strange.  The powerpc64-linux right before my commit matches exactly
Fedora 39 13.1.1 ppc64le build (except the usual 2 TLS lines).

Jakub



[RFC,patch] Linker plugin - extend API for offloading corner case (aka: LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook [GCC PR109128])

2023-05-02 Thread Tobias Burnus

See also https://gcc.gnu.org/PR109128 (+ description in the patch log)

The linker plugin API was designed to handle LTO - such that the compiler (i.e. 
GCC's lto-plugin)
can claim an input file if it finds LTO code. In that case, the symbols inside 
that file are ignored
by 'ld'.

However, GCC also uses the LTO for offloading: code designated for running on a 
non-host device
(GPUs) is saved in a special section in LTO format. This code then ends up 
being compiled for
offloading but otherwise not the file is not claimed, keeping the symbols for 
'ld' to process,
unless that file is also uses real, host-side LTO.

This mostly works okay, but a corner case exists (see PR for an example) where 
'ld' calls the
GCC's lto-plugin but does not actually use the symbols of that file. That's 
fine, in principle,
but if that file contains offloading code, there is a problem: To match host 
and device functions,
a table is created on both sides, but that table obviously must match. However, 
when lto-plugin's
offload code processes those while ld does not link them, it fails.

It turned out (kudos to Joseph for debugging + writing the patches) that in 
that case ld.bfd does
not actually regards that file as being used but just offers it to llto-plugin 
in case it needs
symbols from it.

To get this working, the current API is insufficient.

Possible solutions:
* Tell lto-plugin whether 'ld' actually needs symbols from a file or it just 
offers the file
  in case that lto-plugin wants to claim that file
  => That's implemented in the attached patch.
* Make it possible to "claim" a file without discarding the ld-visible symbols
* Asking the linker later whether the file/some symbols are actually used.
* something else ...


What this patch does:
* It adds a new API callback (LDPT_REGISTER_CLAIM_FILE_HOOK_V2) that takes an 
additional
  boolean argument which states whether ld.bdf intens to use that file/symbols 
from that
  file or whether it just asks the plugin in case it wants to claim it.
* On the ld.bfd side, it wires this up.
* On the GCC lto-plugin side, it uses that API is available, otherwise it uses 
the existing API.

The way the linker plugin handling is written, it works fine at runtime if only 
one side
supports the new hook. (Except, of course, that for fixing the issue both need 
to support it.)

Regarding those patches: Are they ok for mainline? Any comment, better 
approach, suggestion?

Tobias

PS: Attached is the Binutils' ld side of the patch and the GCC lto-plugin side 
of the patch set.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
From cb5bf8fad2e653e44fcae8f4ba0ce6eb5f928cb3 Mon Sep 17 00:00:00 2001
From: Joseph Myers 
Date: Tue, 2 May 2023 17:10:01 +
Subject: [PATCH] Implement LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook
 [PR109128]

This is one part of the fix for PR109128, along with a corresponding
binutils's linker change.  Without this patch, what happens in the
linker, when an unused object in a .a file has offload data, is that
elf_link_is_defined_archive_symbol calls bfd_link_plugin_object_p,
which ends up calling the plugin's claim_file_handler, which then
records the object as one with offload data. That is, the linker never
decides to use the object in the first place, but use of this _p
interface (called as part of trying to decide whether to use the
object) results in the plugin deciding to use its offload data (and a
consequent mismatch in the offload data present at runtime).

The new hook allows the linker plugin to distinguish calls to
claim_file_handler that know the object is being used by the linker
(from ldmain.c:add_archive_element), from calls that don't know it's
being used by the linker (from elf_link_is_defined_archive_symbol); in
the latter case, the plugin should avoid recording the object as one
with offload data.

	PR middle-end/109128

	include/
	* plugin-api.h (ld_plugin_claim_file_handler_v2)
	(ld_plugin_register_claim_file_v2)
	(LDPT_REGISTER_CLAIM_FILE_HOOK_V2): New.
   	(struct ld_plugin_tv): Add tv_register_claim_file_v2.

	lto-plugin/
	* lto-plugin.c (register_claim_file_v2): New.
	(claim_file_handler_v2): New.
	(claim_file_handler): Wrap claim_file_handler_v2.
	(onload): Handle LDPT_REGISTER_CLAIM_FILE_HOOK_V2.
---
 include/plugin-api.h| 16 
 lto-plugin/lto-plugin.c | 31 ---
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/plugin-api.h b/include/plugin-api.h
index 379828ba854..395d5bcc598 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -260,6 +260,13 @@ enum ld_plugin_status
 (*ld_plugin_claim_file_handler) (
   const struct ld_plugin_input_file *file, int *claimed);
 
+/* The plugin library's "claim file" handler, version 2.  */
+

RE: [PATCH 13/22] arm: [MVE intrinsics] rework vorrq

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 13/22] arm: [MVE intrinsics] rework vorrq
> 
> Implement vorrq using the new MVE builtins framework.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc
> (FUNCTION_WITH_RTX_M_N_NO_N_F): New.
>   (vorrq): New.
>   * config/arm/arm-mve-builtins-base.def (vorrq): New.
>   * config/arm/arm-mve-builtins-base.h (vorrq): New.
>   * config/arm/arm-mve-builtins.cc
>   (function_instance::has_inactive_argument): Handle vorrq.
>   * config/arm/arm_mve.h (vorrq): Remove.
>   (vorrq_m_n): Remove.
>   (vorrq_m): Remove.
>   (vorrq_x): Remove.
>   (vorrq_u8): Remove.
>   (vorrq_s8): Remove.
>   (vorrq_u16): Remove.
>   (vorrq_s16): Remove.
>   (vorrq_u32): Remove.
>   (vorrq_s32): Remove.
>   (vorrq_n_u16): Remove.
>   (vorrq_f16): Remove.
>   (vorrq_n_s16): Remove.
>   (vorrq_n_u32): Remove.
>   (vorrq_f32): Remove.
>   (vorrq_n_s32): Remove.
>   (vorrq_m_n_s16): Remove.
>   (vorrq_m_n_u16): Remove.
>   (vorrq_m_n_s32): Remove.
>   (vorrq_m_n_u32): Remove.
>   (vorrq_m_s8): Remove.
>   (vorrq_m_s32): Remove.
>   (vorrq_m_s16): Remove.
>   (vorrq_m_u8): Remove.
>   (vorrq_m_u32): Remove.
>   (vorrq_m_u16): Remove.
>   (vorrq_m_f32): Remove.
>   (vorrq_m_f16): Remove.
>   (vorrq_x_s8): Remove.
>   (vorrq_x_s16): Remove.
>   (vorrq_x_s32): Remove.
>   (vorrq_x_u8): Remove.
>   (vorrq_x_u16): Remove.
>   (vorrq_x_u32): Remove.
>   (vorrq_x_f16): Remove.
>   (vorrq_x_f32): Remove.
>   (__arm_vorrq_u8): Remove.
>   (__arm_vorrq_s8): Remove.
>   (__arm_vorrq_u16): Remove.
>   (__arm_vorrq_s16): Remove.
>   (__arm_vorrq_u32): Remove.
>   (__arm_vorrq_s32): Remove.
>   (__arm_vorrq_n_u16): Remove.
>   (__arm_vorrq_n_s16): Remove.
>   (__arm_vorrq_n_u32): Remove.
>   (__arm_vorrq_n_s32): Remove.
>   (__arm_vorrq_m_n_s16): Remove.
>   (__arm_vorrq_m_n_u16): Remove.
>   (__arm_vorrq_m_n_s32): Remove.
>   (__arm_vorrq_m_n_u32): Remove.
>   (__arm_vorrq_m_s8): Remove.
>   (__arm_vorrq_m_s32): Remove.
>   (__arm_vorrq_m_s16): Remove.
>   (__arm_vorrq_m_u8): Remove.
>   (__arm_vorrq_m_u32): Remove.
>   (__arm_vorrq_m_u16): Remove.
>   (__arm_vorrq_x_s8): Remove.
>   (__arm_vorrq_x_s16): Remove.
>   (__arm_vorrq_x_s32): Remove.
>   (__arm_vorrq_x_u8): Remove.
>   (__arm_vorrq_x_u16): Remove.
>   (__arm_vorrq_x_u32): Remove.
>   (__arm_vorrq_f16): Remove.
>   (__arm_vorrq_f32): Remove.
>   (__arm_vorrq_m_f32): Remove.
>   (__arm_vorrq_m_f16): Remove.
>   (__arm_vorrq_x_f16): Remove.
>   (__arm_vorrq_x_f32): Remove.
>   (__arm_vorrq): Remove.
>   (__arm_vorrq_m_n): Remove.
>   (__arm_vorrq_m): Remove.
>   (__arm_vorrq_x): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |   9 +
>  gcc/config/arm/arm-mve-builtins-base.def |   2 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   1 +
>  gcc/config/arm/arm-mve-builtins.cc   |   3 +
>  gcc/config/arm/arm_mve.h | 559 ---
>  5 files changed, 15 insertions(+), 559 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 51fed8f671f..499a1ef9f0e 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -98,10 +98,19 @@ namespace arm_mve {
>  UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
>   \
>  -1, -1, -1))
> 
> +  /* Helper for builtins with RTX codes, _m predicated and _n overrides.  */
> +#define FUNCTION_WITH_RTX_M_N_NO_N_F(NAME, RTX, UNSPEC)
> FUNCTION  \
> +  (NAME, unspec_based_mve_function_exact_insn,
>   \
> +   (RTX, RTX, RTX,   \
> +UNSPEC##_N_S, UNSPEC##_N_U, -1,  \
> +UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
>   \
> +UNSPEC##_M_N_S, UNSPEC##_M_N_U, -1))
> +
>  FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
>  FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
>  FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
>  FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
> +FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
>  FUNCTION (vreinterpretq, vreinterpretq_impl,)
>  FUNCTION_WITH_RTX_M_N (vsubq, MINUS, VSUBQ)
>  FUNCTION (vuninitializedq, vuninitializedq_impl,)
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index a933c9fc91e..c3f8c0f0eeb 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -22,6 +22,7 @@ DEF_MVE_FUNCTION (vaddq, 

Re: [committed] Enable LRA on several ports

2023-05-02 Thread Jeff Law via Gcc-patches




On 5/1/23 21:24, Hans-Peter Nilsson via Gcc-patches wrote:




There may be
minor code quality regressions or there may be minor code quality
improvements -- I'm leaving that for the port maintainers to own going
forward.


Right; I noticed performance regressions, and didn't want to
commit anything that knowingly degraded performance.  I did
follow the traces but fell into the rabbit-hole of
rtx_costs.  That's the main reason I didn't push a "-mlra"
option or remove the TARGET_LRA_P for CRIS.  (My story and I
stick to it.)

But thanks I guess, it saves me a commit, but (to all!)
please sync check_effective_target_lra for targets you
"convert".  ...oops, it's just CRIS and hppa there (wot, not
converted?)
Well, I'd say that my plan would be to deprecate any target that is not 
converted by the end of this development cycle.  So the change keeps 
cris from falling into that bucket.


I wasn't aware of check_effective_target_lra.  Thanks.  I'll make sure 
to get that updated as we go forward.


jeff


RE: [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 12/22] arm: [MVE intrinsics] add binary_orrq shape
> 
> patch adds the binary_orrq shape description.
> 
> MODE_n intrinsics use a set of predicates (preds_m_or_none) different
> the MODE_none ones, so we explicitly reference preds_m_or_none from
> the shape, thus we need to make it a global array.
> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_orrq): New.
>   * config/arm/arm-mve-builtins-shapes.h (binary_orrq): New.
>   * config/arm/arm-mve-builtins.cc (preds_m_or_none): Remove
> static.
>   * config/arm/arm-mve-builtins.h (preds_m_or_none): Declare.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 61 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  gcc/config/arm/arm-mve-builtins.cc|  2 +-
>  gcc/config/arm/arm-mve-builtins.h |  3 ++
>  4 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index e69faae4e2c..83410bbc51a 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -397,6 +397,67 @@ struct binary_opt_n_def : public
> overloaded_base<0>
>  };
>  SHAPE (binary_opt_n)
> 
> +/* _t vfoo[t0](_t, _t)
> +   _t vfoo[_n_t0](_t, _t)
> +
> +   Where the _n form has only supports s16/s32/u16/u32 types as for vorrq.

Delete the "has" in this sentence.
Ok otherwise.
Thanks,
Kyrill

> +
> +   Example: vorrq.
> +   int16x8_t [__arm_]vorrq[_s16](int16x8_t a, int16x8_t b)
> +   int16x8_t [__arm_]vorrq_m[_s16](int16x8_t inactive, int16x8_t a, int16x8_t
> b, mve_pred16_t p)
> +   int16x8_t [__arm_]vorrq_x[_s16](int16x8_t a, int16x8_t b, mve_pred16_t
> p)
> +   int16x8_t [__arm_]vorrq[_n_s16](int16x8_t a, const int16_t imm)
> +   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm,
> mve_pred16_t p)  */
> +struct binary_orrq_def : public overloaded_base<0>
> +{
> +  bool
> +  explicit_mode_suffix_p (enum predication_index pred, enum
> mode_suffix_index mode) const override
> +  {
> +return (mode == MODE_n
> + && pred == PRED_m);
> +  }
> +
> +  bool
> +  skip_overload_p (enum predication_index pred, enum mode_suffix_index
> mode) const override
> +  {
> +switch (mode)
> +  {
> +  case MODE_none:
> + return false;
> +
> + /* For MODE_n, share the overloaded instance with MODE_none,
> except for PRED_m.  */
> +  case MODE_n:
> + return pred != PRED_m;
> +
> +  default:
> + gcc_unreachable ();
> +  }
> +  }
> +
> +  void
> +  build (function_builder , const function_group_info ,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +b.add_overloaded_functions (group, MODE_n,
> preserve_user_namespace);
> +build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
> +build_16_32 (b, "v0,v0,s0", group, MODE_n, preserve_user_namespace,
> false, preds_m_or_none);
> +  }
> +
> +  tree
> +  resolve (function_resolver ) const override
> +  {
> +unsigned int i, nargs;
> +type_suffix_index type;
> +if (!r.check_gp_argument (2, i, nargs)
> + || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
> +  return error_mark_node;
> +
> +return r.finish_opt_n_resolution (i, 0, type);
> +  }
> +};
> +SHAPE (binary_orrq)
> +
>  /* [xN]_t vfoo_t0().
> 
> Example: vuninitializedq.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index b00ee5eb57a..618b3226050 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -36,6 +36,7 @@ namespace arm_mve
> 
>  extern const function_shape *const binary;
>  extern const function_shape *const binary_opt_n;
> +extern const function_shape *const binary_orrq;
>  extern const function_shape *const inherent;
>  extern const function_shape *const unary_convert;
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.cc b/gcc/config/arm/arm-mve-
> builtins.cc
> index e409a029346..c74e890bd3d 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -285,7 +285,7 @@ static const predication_index preds_none[] = {
> PRED_none, NUM_PREDS };
> 
>  /* Used by functions that have the m (merging) predicated form, and in
> addition have an unpredicated form.  */
> -static const predication_index preds_m_or_none[] = {
> +const predication_index preds_m_or_none[] = {
>PRED_m, PRED_none, NUM_PREDS
>  };
> 
> diff --git a/gcc/config/arm/arm-mve-builtins.h b/gcc/config/arm/arm-mve-
> builtins.h
> index a20d2fb5d86..c9b51a0c77b 100644
> --- 

RE: [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 11/22] arm: [MVE intrinsics] rework vandq veorq
> 
> Implement vamdq, veorq using the new MVE builtins framework.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M):
> New.
>   (vandq,veorq): New.
>   * config/arm/arm-mve-builtins-base.def (vandq, veorq): New.
>   * config/arm/arm-mve-builtins-base.h (vandq, veorq): New.
>   * config/arm/arm_mve.h (vandq): Remove.
>   (vandq_m): Remove.
>   (vandq_x): Remove.
>   (vandq_u8): Remove.
>   (vandq_s8): Remove.
>   (vandq_u16): Remove.
>   (vandq_s16): Remove.
>   (vandq_u32): Remove.
>   (vandq_s32): Remove.
>   (vandq_f16): Remove.
>   (vandq_f32): Remove.
>   (vandq_m_s8): Remove.
>   (vandq_m_s32): Remove.
>   (vandq_m_s16): Remove.
>   (vandq_m_u8): Remove.
>   (vandq_m_u32): Remove.
>   (vandq_m_u16): Remove.
>   (vandq_m_f32): Remove.
>   (vandq_m_f16): Remove.
>   (vandq_x_s8): Remove.
>   (vandq_x_s16): Remove.
>   (vandq_x_s32): Remove.
>   (vandq_x_u8): Remove.
>   (vandq_x_u16): Remove.
>   (vandq_x_u32): Remove.
>   (vandq_x_f16): Remove.
>   (vandq_x_f32): Remove.
>   (__arm_vandq_u8): Remove.
>   (__arm_vandq_s8): Remove.
>   (__arm_vandq_u16): Remove.
>   (__arm_vandq_s16): Remove.
>   (__arm_vandq_u32): Remove.
>   (__arm_vandq_s32): Remove.
>   (__arm_vandq_m_s8): Remove.
>   (__arm_vandq_m_s32): Remove.
>   (__arm_vandq_m_s16): Remove.
>   (__arm_vandq_m_u8): Remove.
>   (__arm_vandq_m_u32): Remove.
>   (__arm_vandq_m_u16): Remove.
>   (__arm_vandq_x_s8): Remove.
>   (__arm_vandq_x_s16): Remove.
>   (__arm_vandq_x_s32): Remove.
>   (__arm_vandq_x_u8): Remove.
>   (__arm_vandq_x_u16): Remove.
>   (__arm_vandq_x_u32): Remove.
>   (__arm_vandq_f16): Remove.
>   (__arm_vandq_f32): Remove.
>   (__arm_vandq_m_f32): Remove.
>   (__arm_vandq_m_f16): Remove.
>   (__arm_vandq_x_f16): Remove.
>   (__arm_vandq_x_f32): Remove.
>   (__arm_vandq): Remove.
>   (__arm_vandq_m): Remove.
>   (__arm_vandq_x): Remove.
>   (veorq_m): Remove.
>   (veorq_x): Remove.
>   (veorq_u8): Remove.
>   (veorq_s8): Remove.
>   (veorq_u16): Remove.
>   (veorq_s16): Remove.
>   (veorq_u32): Remove.
>   (veorq_s32): Remove.
>   (veorq_f16): Remove.
>   (veorq_f32): Remove.
>   (veorq_m_s8): Remove.
>   (veorq_m_s32): Remove.
>   (veorq_m_s16): Remove.
>   (veorq_m_u8): Remove.
>   (veorq_m_u32): Remove.
>   (veorq_m_u16): Remove.
>   (veorq_m_f32): Remove.
>   (veorq_m_f16): Remove.
>   (veorq_x_s8): Remove.
>   (veorq_x_s16): Remove.
>   (veorq_x_s32): Remove.
>   (veorq_x_u8): Remove.
>   (veorq_x_u16): Remove.
>   (veorq_x_u32): Remove.
>   (veorq_x_f16): Remove.
>   (veorq_x_f32): Remove.
>   (__arm_veorq_u8): Remove.
>   (__arm_veorq_s8): Remove.
>   (__arm_veorq_u16): Remove.
>   (__arm_veorq_s16): Remove.
>   (__arm_veorq_u32): Remove.
>   (__arm_veorq_s32): Remove.
>   (__arm_veorq_m_s8): Remove.
>   (__arm_veorq_m_s32): Remove.
>   (__arm_veorq_m_s16): Remove.
>   (__arm_veorq_m_u8): Remove.
>   (__arm_veorq_m_u32): Remove.
>   (__arm_veorq_m_u16): Remove.
>   (__arm_veorq_x_s8): Remove.
>   (__arm_veorq_x_s16): Remove.
>   (__arm_veorq_x_s32): Remove.
>   (__arm_veorq_x_u8): Remove.
>   (__arm_veorq_x_u16): Remove.
>   (__arm_veorq_x_u32): Remove.
>   (__arm_veorq_f16): Remove.
>   (__arm_veorq_f32): Remove.
>   (__arm_veorq_m_f32): Remove.
>   (__arm_veorq_m_f16): Remove.
>   (__arm_veorq_x_f16): Remove.
>   (__arm_veorq_x_f32): Remove.
>   (__arm_veorq): Remove.
>   (__arm_veorq_m): Remove.
>   (__arm_veorq_x): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc  |  10 +
>  gcc/config/arm/arm-mve-builtins-base.def |   4 +
>  gcc/config/arm/arm-mve-builtins-base.h   |   2 +
>  gcc/config/arm/arm_mve.h | 862 ---
>  4 files changed, 16 insertions(+), 862 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index 48b09bffd0c..51fed8f671f 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -90,7 +90,17 @@ namespace arm_mve {
>  UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,
>   \
>  UNSPEC##_M_N_S, UNSPEC##_M_N_U, UNSPEC##_M_N_F))
> 
> +  /* Helper for builtins with RTX codes, and _m predicated overrides.  */
> +#define FUNCTION_WITH_RTX_M(NAME, RTX, UNSPEC) 

RE: [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq vbicq

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 10/22] arm: [MVE intrinsics] factorize vandq veorq vorrq
> vbicq
> 
> Factorize vandq, veorq, vorrq, vbicq so that they use the same
> parameterized names.
> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
>   * config/arm/iterators.md (MVE_INT_M_BINARY_LOGIC)
>   (MVE_FP_M_BINARY_LOGIC): New.
>   (MVE_INT_M_N_BINARY_LOGIC): New.
>   (MVE_INT_N_BINARY_LOGIC): New.
>   (mve_insn): Add vand, veor, vorr, vbic.
>   * config/arm/mve.md (mve_vandq_m_)
>   (mve_veorq_m_, mve_vorrq_m_)
>   (mve_vbicq_m_): Merge into ...
>   (@mve_q_m_): ... this.
>   (mve_vandq_m_f, mve_veorq_m_f,
> mve_vorrq_m_f)
>   (mve_vbicq_m_f): Merge into ...
>   (@mve_q_m_f): ... this.
>   (mve_vorrq_n_)
>   (mve_vbicq_n_): Merge into ...
>   (@mve_q_n_): ... this.
>   (mve_vorrq_m_n_, mve_vbicq_m_n_):
> Merge
>   into ...
>   (@mve_q_m_n_): ... this.
> ---
>  gcc/config/arm/iterators.md |  32 +++
>  gcc/config/arm/mve.md   | 161 +---
>  2 files changed, 51 insertions(+), 142 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index d3bef594775..b0ea1af77d2 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -339,24 +339,48 @@ (define_int_iterator MVE_INT_M_BINARY   [
>VSUBQ_M_S VSUBQ_M_U
>])
> 
> +(define_int_iterator MVE_INT_M_BINARY_LOGIC   [
> +  VANDQ_M_S VANDQ_M_U
> +  VBICQ_M_S VBICQ_M_U
> +  VEORQ_M_S VEORQ_M_U
> +  VORRQ_M_S VORRQ_M_U
> +  ])
> +
>  (define_int_iterator MVE_INT_M_N_BINARY [
>VADDQ_M_N_S VADDQ_M_N_U
>VMULQ_M_N_S VMULQ_M_N_U
>VSUBQ_M_N_S VSUBQ_M_N_U
>])
> 
> +(define_int_iterator MVE_INT_M_N_BINARY_LOGIC [
> +  VBICQ_M_N_S VBICQ_M_N_U
> +  VORRQ_M_N_S VORRQ_M_N_U
> +  ])
> +
>  (define_int_iterator MVE_INT_N_BINARY   [
>VADDQ_N_S VADDQ_N_U
>VMULQ_N_S VMULQ_N_U
>VSUBQ_N_S VSUBQ_N_U
>])
> 
> +(define_int_iterator MVE_INT_N_BINARY_LOGIC   [
> +  VBICQ_N_S VBICQ_N_U
> +  VORRQ_N_S VORRQ_N_U
> +  ])
> +
>  (define_int_iterator MVE_FP_M_BINARY   [
>VADDQ_M_F
>VMULQ_M_F
>VSUBQ_M_F
>])
> 
> +(define_int_iterator MVE_FP_M_BINARY_LOGIC   [
> +  VANDQ_M_F
> +  VBICQ_M_F
> +  VEORQ_M_F
> +  VORRQ_M_F
> +  ])
> +
>  (define_int_iterator MVE_FP_M_N_BINARY [
>VADDQ_M_N_F
>VMULQ_M_N_F
> @@ -379,9 +403,17 @@ (define_int_attr mve_insn [
>(VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
> (VADDQ_M_N_F "vadd")
>(VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
> "vadd")
>(VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
> "vadd")
> +  (VANDQ_M_S "vand") (VANDQ_M_U "vand") (VANDQ_M_F
> "vand")
> +  (VBICQ_M_N_S "vbic") (VBICQ_M_N_U "vbic")
> +  (VBICQ_M_S "vbic") (VBICQ_M_U "vbic") (VBICQ_M_F
> "vbic")
> +  (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
> +  (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F
> "veor")
>(VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
>(VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
>(VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
> "vmul")
> +  (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
> +  (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F
> "vorr")
> +  (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
>(VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
>(VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
>(VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
> "vsub")
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index ccb3cf23304..fbae1d3791f 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1805,21 +1805,6 @@ (define_insn "mve_vbicq_f"
>[(set_attr "type" "mve_move")
>  ])
> 
> -;;
> -;; [vbicq_n_s, vbicq_n_u])
> -;;
> -(define_insn "mve_vbicq_n_"
> -  [
> -   (set (match_operand:MVE_5 0 "s_register_operand" "=w")
> - (unspec:MVE_5 [(match_operand:MVE_5 1 "s_register_operand" "0")
> -(match_operand:SI 2 "immediate_operand" "i")]
> -  VBICQ_N))
> -  ]
> -  "TARGET_HAVE_MVE"
> -  "vbic.i%#   %q0, %2"
> -  

RE: [PATCH 09/22] arm: [MVE intrinsics] add binary shape

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 09/22] arm: [MVE intrinsics] add binary shape
> 
> This patch adds the binary shape description.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary): New.
>   * config/arm/arm-mve-builtins-shapes.h (binary): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 27 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 033b304060a..e69faae4e2c 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -338,6 +338,33 @@ struct overloaded_base : public function_shape
>}
>  };
> 
> +/* _t vfoo[_t0](_t, _t)
> +
> +   i.e. the standard shape for binary operations that operate on
> +   uniform types.
> +
> +   Example: vandq.
> +   int8x16_t [__arm_]vandq[_s8](int8x16_t a, int8x16_t b)
> +   int8x16_t [__arm_]vandq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t
> b, mve_pred16_t p)
> +   int8x16_t [__arm_]vandq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)
> */
> +struct binary_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder , const function_group_info ,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver ) const override
> +  {
> +return r.resolve_uniform (2);
> +  }
> +};
> +SHAPE (binary)
> +
>  /* _t vfoo[_t0](_t, _t)
> _t vfoo[_n_t0](_t, _t)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index 43798fdde57..b00ee5eb57a 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -34,6 +34,7 @@ namespace arm_mve
>namespace shapes
>{
> 
> +extern const function_shape *const binary;
>  extern const function_shape *const binary_opt_n;
>  extern const function_shape *const inherent;
>  extern const function_shape *const unary_convert;
> --
> 2.34.1



RE: [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 08/22] arm: [MVE intrinsics] rework vaddq vmulq vsubq
> 
> Implement vaddq, vmulq, vsubq using the new MVE builtins framework.
> 
> 2022-09-08  Christophe Lyon 
> 
>   gcc/
> 
>   * config/arm/arm-mve-builtins-base.cc
> (FUNCTION_WITH_RTX_M_N):
>   New.
>   (vaddq, vmulq, vsubq): New.
>   * config/arm/arm-mve-builtins-base.def (vaddq, vmulq, vsubq): New.
>   * config/arm/arm-mve-builtins-base.h (vaddq, vmulq, vsubq): New.
>   * config/arm/arm_mve.h (vaddq): Remove.
>   (vaddq_m): Remove.
>   (vaddq_x): Remove.
>   (vaddq_n_u8): Remove.
>   (vaddq_n_s8): Remove.
>   (vaddq_n_u16): Remove.
>   (vaddq_n_s16): Remove.
>   (vaddq_n_u32): Remove.
>   (vaddq_n_s32): Remove.
>   (vaddq_n_f16): Remove.
>   (vaddq_n_f32): Remove.
>   (vaddq_m_n_s8): Remove.
>   (vaddq_m_n_s32): Remove.
>   (vaddq_m_n_s16): Remove.
>   (vaddq_m_n_u8): Remove.
>   (vaddq_m_n_u32): Remove.
>   (vaddq_m_n_u16): Remove.
>   (vaddq_m_s8): Remove.
>   (vaddq_m_s32): Remove.
>   (vaddq_m_s16): Remove.
>   (vaddq_m_u8): Remove.
>   (vaddq_m_u32): Remove.
>   (vaddq_m_u16): Remove.
>   (vaddq_m_f32): Remove.
>   (vaddq_m_f16): Remove.
>   (vaddq_m_n_f32): Remove.
>   (vaddq_m_n_f16): Remove.
>   (vaddq_s8): Remove.
>   (vaddq_s16): Remove.
>   (vaddq_s32): Remove.
>   (vaddq_u8): Remove.
>   (vaddq_u16): Remove.
>   (vaddq_u32): Remove.
>   (vaddq_f16): Remove.
>   (vaddq_f32): Remove.
>   (vaddq_x_s8): Remove.
>   (vaddq_x_s16): Remove.
>   (vaddq_x_s32): Remove.
>   (vaddq_x_n_s8): Remove.
>   (vaddq_x_n_s16): Remove.
>   (vaddq_x_n_s32): Remove.
>   (vaddq_x_u8): Remove.
>   (vaddq_x_u16): Remove.
>   (vaddq_x_u32): Remove.
>   (vaddq_x_n_u8): Remove.
>   (vaddq_x_n_u16): Remove.
>   (vaddq_x_n_u32): Remove.
>   (vaddq_x_f16): Remove.
>   (vaddq_x_f32): Remove.
>   (vaddq_x_n_f16): Remove.
>   (vaddq_x_n_f32): Remove.
>   (__arm_vaddq_n_u8): Remove.
>   (__arm_vaddq_n_s8): Remove.
>   (__arm_vaddq_n_u16): Remove.
>   (__arm_vaddq_n_s16): Remove.
>   (__arm_vaddq_n_u32): Remove.
>   (__arm_vaddq_n_s32): Remove.
>   (__arm_vaddq_m_n_s8): Remove.
>   (__arm_vaddq_m_n_s32): Remove.
>   (__arm_vaddq_m_n_s16): Remove.
>   (__arm_vaddq_m_n_u8): Remove.
>   (__arm_vaddq_m_n_u32): Remove.
>   (__arm_vaddq_m_n_u16): Remove.
>   (__arm_vaddq_m_s8): Remove.
>   (__arm_vaddq_m_s32): Remove.
>   (__arm_vaddq_m_s16): Remove.
>   (__arm_vaddq_m_u8): Remove.
>   (__arm_vaddq_m_u32): Remove.
>   (__arm_vaddq_m_u16): Remove.
>   (__arm_vaddq_s8): Remove.
>   (__arm_vaddq_s16): Remove.
>   (__arm_vaddq_s32): Remove.
>   (__arm_vaddq_u8): Remove.
>   (__arm_vaddq_u16): Remove.
>   (__arm_vaddq_u32): Remove.
>   (__arm_vaddq_x_s8): Remove.
>   (__arm_vaddq_x_s16): Remove.
>   (__arm_vaddq_x_s32): Remove.
>   (__arm_vaddq_x_n_s8): Remove.
>   (__arm_vaddq_x_n_s16): Remove.
>   (__arm_vaddq_x_n_s32): Remove.
>   (__arm_vaddq_x_u8): Remove.
>   (__arm_vaddq_x_u16): Remove.
>   (__arm_vaddq_x_u32): Remove.
>   (__arm_vaddq_x_n_u8): Remove.
>   (__arm_vaddq_x_n_u16): Remove.
>   (__arm_vaddq_x_n_u32): Remove.
>   (__arm_vaddq_n_f16): Remove.
>   (__arm_vaddq_n_f32): Remove.
>   (__arm_vaddq_m_f32): Remove.
>   (__arm_vaddq_m_f16): Remove.
>   (__arm_vaddq_m_n_f32): Remove.
>   (__arm_vaddq_m_n_f16): Remove.
>   (__arm_vaddq_f16): Remove.
>   (__arm_vaddq_f32): Remove.
>   (__arm_vaddq_x_f16): Remove.
>   (__arm_vaddq_x_f32): Remove.
>   (__arm_vaddq_x_n_f16): Remove.
>   (__arm_vaddq_x_n_f32): Remove.
>   (__arm_vaddq): Remove.
>   (__arm_vaddq_m): Remove.
>   (__arm_vaddq_x): Remove.
>   (vmulq): Remove.
>   (vmulq_m): Remove.
>   (vmulq_x): Remove.
>   (vmulq_u8): Remove.
>   (vmulq_n_u8): Remove.
>   (vmulq_s8): Remove.
>   (vmulq_n_s8): Remove.
>   (vmulq_u16): Remove.
>   (vmulq_n_u16): Remove.
>   (vmulq_s16): Remove.
>   (vmulq_n_s16): Remove.
>   (vmulq_u32): Remove.
>   (vmulq_n_u32): Remove.
>   (vmulq_s32): Remove.
>   (vmulq_n_s32): Remove.
>   (vmulq_n_f16): Remove.
>   (vmulq_f16): Remove.
>   (vmulq_n_f32): Remove.
>   (vmulq_f32): Remove.
>   (vmulq_m_n_s8): Remove.
>   (vmulq_m_n_s32): Remove.
>   (vmulq_m_n_s16): Remove.
>   (vmulq_m_n_u8): Remove.
>   (vmulq_m_n_u32): Remove.
>   (vmulq_m_n_u16): Remove.
>   (vmulq_m_s8): Remove.
>   (vmulq_m_s32): Remove.
>   (vmulq_m_s16): Remove.
>  

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-05-02 Thread Jeff Law via Gcc-patches




On 4/29/23 19:40, Kito Cheng wrote:

Hi Jeff:

The RTL pattern already models tail element and vector length well,
so I don't feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#(if_then_else:VNx2BI (unspec:VNx2BI [
#(const_vector:VNx2BI repeat [
#(const_int 1 [0x1])
#])  # all-1 mask
#(reg:DI 143)  # AVL reg, or vector length
#(const_int 2 [0x2]) # mask policy
#(const_int 0 [0])   # avl type
#(reg:SI 66 vl)
#(reg:SI 67 vtype)
#] UNSPEC_VPREDICATE)
#(geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#(reg/v:VNx2QI 137 [ v1 ]))
#(unspec:VNx2BI [
#(reg:SI 0 zero)
#] UNSPEC_VUNDEF))) # maskoff and tail operand
# (expr_list:REG_DEAD (reg:DI 143)
#(expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#(nil

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
  [(set (match_operand:VB  0 "register_operand")
(if_then_else:VB
  (unspec:VB
[(match_operand:VB 1 "vector_all_trues_mask_operand")
 (match_operand4 "vector_length_operand")
 (match_operand5 "const_int_operand")
 (match_operand6 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (match_operand:VB3 "vector_move_operand")
  (match_operand:VB2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since
maskoff is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
(if_then_else:VNx2BI (unspec:VNx2BI [
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])  # all-1 mask
(reg:DI 143) # AVL reg, or vector length
(const_int 2 [0x2]) # mask policy
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:VNx2BI repeat [
(const_int 1 [0x1])
])# all-1
(unspec:VNx2BI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) # still vundef
 (expr_list:REG_DEAD (reg:DI 143)
(nil)))
Right.  My concern is that when we call relational_result it's going to 
return -1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the 
potential to generate incorrect code.


For example, if there's a subsequent instruction that tried to set a 
vector register to -1, it could just copy from the destination of the 
vmset to the new target.  But if the vmset didn't set all the bits to 1, 
then the code is wrong.


With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.


Jeff


Re: [PATCH] RISC-V: Name newly added flags in changelog

2023-05-02 Thread Patrick O'Neill

On 5/2/23 08:50, Jeff Law wrote:

On 5/1/23 10:10, Patrick O'Neill wrote:

This patch fixes the changelog to explicitly name the added command line
flags introduced in this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html

2023-05-01 Patrick O'Neill 

gcc/ChangeLog:

* ChangeLog: Name the flags added by the patch in the changelog.
OK.  You *might* not actually need a ChangeLog entry for this. The 
hooks *might* have a special case for fixing up the ChangeLog entries.


jeff

Committed.

I tried without a ChangeLog and it worked. Thanks!

Patrick


Re: [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq

2023-05-02 Thread Christophe Lyon via Gcc-patches




On 5/2/23 18:19, Kyrylo Tkachov wrote:




-Original Message-
From: Christophe Lyon 
Sent: Tuesday, April 18, 2023 2:46 PM
To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
Richard Earnshaw ; Richard Sandiford

Cc: Christophe Lyon 
Subject: [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq

In order to avoid using a huge switch when generating all the
intrinsics (e.g. mve_vaddq_n_sv4si, ...), we want to generate a single
function taking the builtin code as parameter (e.g. mve_q_n (VADDQ_S,
)
This is achieved by using the new mve_insn iterator.

Having done that, it becomes easier to share similar patterns, to
avoid useless/error-prone code duplication.


Nice!
Ok but...



2022-09-08  Christophe Lyon  

gcc/ChangeLog:

* config/arm/iterators.md (MVE_INT_BINARY_RTX,
MVE_INT_M_BINARY)
(MVE_INT_M_N_BINARY, MVE_INT_N_BINARY, MVE_FP_M_BINARY)
(MVE_FP_M_N_BINARY, MVE_FP_N_BINARY, mve_addsubmul,
mve_insn): New
iterators.
* config/arm/mve.md
(mve_vsubq_n_f, mve_vaddq_n_f,
mve_vmulq_n_f):
Factorize into ...
(@mve_q_n_f): ... this.
(mve_vaddq_n_, mve_vmulq_n_)
(mve_vsubq_n_): Factorize into ...
(@mve_q_n_): ... this.
(mve_vaddq, mve_vmulq, mve_vsubq):
Factorize
into ...
(mve_q): ... this.
(mve_vaddq_f, mve_vmulq_f,
mve_vsubq_f):
Factorize into ...
(mve_q_f): ... this.
(mve_vaddq_m_, mve_vmulq_m_)
(mve_vsubq_m_): Factorize into ...
(@mve_q_m_): ... this,
(mve_vaddq_m_n_,
mve_vmulq_m_n_)
(mve_vsubq_m_n_): Factorize into ...
(@mve_q_m_n_): ... this.
(mve_vaddq_m_f, mve_vmulq_m_f,
mve_vsubq_m_f):
Factorize into ...
(@mve_q_m_f): ... this.
(mve_vaddq_m_n_f, mve_vmulq_m_n_f)
(mve_vsubq_m_n_f): Factorize into ...
(@mve_q_m_n_f): ... this.
---
  gcc/config/arm/iterators.md |  57 +++
  gcc/config/arm/mve.md   | 317 +---
  2 files changed, 99 insertions(+), 275 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 39895ad62aa..d3bef594775 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -330,6 +330,63 @@ (define_code_iterator FCVT [unsigned_float float])
  ;; Saturating addition, subtraction
  (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])

+;; MVE integer binary operations.
+(define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
+
+(define_int_iterator MVE_INT_M_BINARY   [
+VADDQ_M_S VADDQ_M_U
+VMULQ_M_S VMULQ_M_U
+VSUBQ_M_S VSUBQ_M_U
+])
+
+(define_int_iterator MVE_INT_M_N_BINARY [
+VADDQ_M_N_S VADDQ_M_N_U
+VMULQ_M_N_S VMULQ_M_N_U
+VSUBQ_M_N_S VSUBQ_M_N_U
+])
+
+(define_int_iterator MVE_INT_N_BINARY   [
+VADDQ_N_S VADDQ_N_U
+VMULQ_N_S VMULQ_N_U
+VSUBQ_N_S VSUBQ_N_U
+])
+
+(define_int_iterator MVE_FP_M_BINARY   [
+VADDQ_M_F
+VMULQ_M_F
+VSUBQ_M_F
+])
+
+(define_int_iterator MVE_FP_M_N_BINARY [
+VADDQ_M_N_F
+VMULQ_M_N_F
+VSUBQ_M_N_F
+])
+
+(define_int_iterator MVE_FP_N_BINARY   [
+VADDQ_N_F
+VMULQ_N_F
+VSUBQ_N_F
+])
+
+(define_code_attr mve_addsubmul [
+(minus "vsub")
+(mult "vmul")
+(plus "vadd")
+])
+
+(define_int_attr mve_insn [
+(VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
(VADDQ_M_N_F "vadd")
+(VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
"vadd")
+(VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
"vadd")
+(VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
(VMULQ_M_N_F "vmul")
+(VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
"vmul")
+(VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
"vmul")
+(VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
(VSUBQ_M_N_F "vsub")
+(VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
"vsub")
+(VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
"vsub")
+])
+
  ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
  ;; a stack pointer operand.  The minus operation is a candidate for an rsub
  ;; and hence only plus is supported.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ab688396f97..5167fbc6add 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -668,21 +668,6 @@ (define_insn "mve_vpnotv16bi"
[(set_attr "type" "mve_move")
  ])

-;;
-;; [vsubq_n_f])
-;;
-(define_insn "mve_vsubq_n_f"
-  [
-   (set 

RE: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Roger Sayle


On 02 May 2023 14:49, Segher Boessenkool wrote:
> On Tue, May 02, 2023 at 02:18:43PM +0100, Roger Sayle wrote:
> > On 02 May 2023 13:40, Paul Koning wrote:
> > > > On May 1, 2023, at 7:37 PM, Roger Sayle
> > > > 
> > > wrote:
> > > > The shiftsi.cc regression on xstormy16 is fixed by adding
> > > > -fno-split-wide-types.
> > > > In fact, if all the regression tests pass, I'd suggest that
> > > > flag_split_wide-types = false should be the default on xstormy16
> > > > now that we've moved to LRA.  And if this works for xstormy16, it
> > > > might be useful to other targets for the LRA transition; it's a
> > > > difference in behaviour between reload and LRA that could
> > > > potentially affect multiple targets.
> > >
> > > Is there documentation for that flag?
> >
> > Yes, see the section -fsplit-wide-types in
> > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> >
> > Interestingly, there's a recent-ish blog describing how
> > -fno-split-wide-types reduces executable size on AVR:
> > https://ufj.ddns.net/blog/marlin/2019/01/07/reducing-marlin-binary-siz
> > e.html and its interaction with (AVR) register allocation is seen in
> > PR middle-end/35860.
> 
> But what causes the problem?  Is something missing somewhere, or do we get
too
> high register pressure?
> 
> There also is -fsplit-wide-types-early, which might help.
> 
> In principle it always is good to describe a machine model as close as
possible to
> what the machine actually does.  What gets in the way here?

I can describe what GCC is doing, but I don't claim to understand how the
register allocators make the choices they do (nor where the fix should be).

The problematic test case is:
unsigned long foo(unsigned long x) { return x << 1; }
which with (the old) reload (and -fno-split-wide-types) is generated as:
foo:add r2,r2
adc r3,r3
ret
but with (the new) lra is generated as:
mov r6,r2
mov r7,r3
mov r2,r6
mov r3,r7
add r2,r6
adc r3,r7
ret

where the register allocator has failed to notice that (take advantage of)
the SImode addition can be performed "in-place", from the insn's
constraints:

;; Addition
(define_insn_and_split "addsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
(plus:SI (match_operand:SI 1 "register_operand" "%0")
 (match_operand:SI 2 "nonmemory_operand" "ri")))
   (clobber (reg:BI CARRY_REG))]
  ""
  "#"
  "reload_completed"
  [(pc)]
  { xstormy16_expand_arith (SImode, PLUS, operands[0], operands[1],
operands[2]);
DONE;
  }
  [(set_attr "length" "4")])


Before combine, we have simplicity itself (a two instruction function):
(insn 2 4 3 2 (set (reg/v:SI 26 [ x ])
(reg:SI 2 r2 [ x ])) "../../shiftsi.c":4:41 7 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 2 r2 [ x ])
(nil)))
(insn 6 3 11 2 (parallel [
(set (reg:SI 28)
(plus:SI (reg/v:SI 26 [ x ])
(reg/v:SI 26 [ x ])))
(clobber (reg:BI 16 carry))
]) "../../shiftsi.c":4:52 35 {addsi3}
 (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
(expr_list:REG_UNUSED (reg:BI 16 carry)
(nil

Then combine inserts an additional copy:
(insn 14 4 2 2 (set (reg:SI 29)
(reg:SI 2 r2 [ x ])) "../../shiftsi.c":4:41 -1
 (expr_list:REG_DEAD (reg:SI 2 r2 [ x ])
(nil)))
(insn 2 14 3 2 (set (reg/v:SI 26 [ x ])
(reg:SI 29)) "../../shiftsi.c":4:41 7 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 29)
(nil)))
(insn 6 3 11 2 (parallel [
(set (reg:SI 28)
(plus:SI (reg/v:SI 26 [ x ])
(reg/v:SI 26 [ x ])))
(clobber (reg:BI 16 carry))
]) "../../shiftsi.c":4:52 35 {addsi3}
 (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
(expr_list:REG_UNUSED (reg:BI 16 carry)
(nil

The just before reload, subreg3 "splits the wide types", to produce:
(insn 15 4 16 2 (set (reg:HI 30)
(reg:HI 2 r2 [ x ])) "../../shiftsi.c":4:41 6 {movhi_internal}
 (nil))
(insn 16 15 17 2 (set (reg:HI 31 [+2 ])
(reg:HI 3 r3 [ x+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
 (nil))
(insn 17 16 18 2 (clobber (reg/v:SI 26 [ x ])) "../../shiftsi.c":4:41 -1
 (nil))
(insn 18 17 19 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 0)
(reg:HI 30)) "../../shiftsi.c":4:41 6 {movhi_internal}
 (nil))
(insn 19 18 3 2 (set (subreg:HI (reg/v:SI 26 [ x ]) 2)
(reg:HI 31 [+2 ])) "../../shiftsi.c":4:41 6 {movhi_internal}
 (nil))
(insn 6 3 11 2 (parallel [
(set (reg:SI 28)
(plus:SI (reg/v:SI 26 [ x ])
(reg/v:SI 26 [ x ])))
(clobber (reg:BI 16 carry))
]) "../../shiftsi.c":4:52 35 {addsi3}
 (expr_list:REG_DEAD (reg/v:SI 26 [ x ])
(expr_list:REG_UNUSED (reg:BI 16 carry)
(nil

Given this input, with the movhi_internals and the clobber, reload
was able to do 

RE: [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 07/22] arm: [MVE intrinsics] factorize vadd vsubq vmulq
> 
> In order to avoid using a huge switch when generating all the
> intrinsics (e.g. mve_vaddq_n_sv4si, ...), we want to generate a single
> function taking the builtin code as parameter (e.g. mve_q_n (VADDQ_S,
> )
> This is achieved by using the new mve_insn iterator.
> 
> Having done that, it becomes easier to share similar patterns, to
> avoid useless/error-prone code duplication.

Nice!
Ok but...

> 
> 2022-09-08  Christophe Lyon  
> 
> gcc/ChangeLog:
> 
>   * config/arm/iterators.md (MVE_INT_BINARY_RTX,
> MVE_INT_M_BINARY)
>   (MVE_INT_M_N_BINARY, MVE_INT_N_BINARY, MVE_FP_M_BINARY)
>   (MVE_FP_M_N_BINARY, MVE_FP_N_BINARY, mve_addsubmul,
> mve_insn): New
>   iterators.
>   * config/arm/mve.md
>   (mve_vsubq_n_f, mve_vaddq_n_f,
> mve_vmulq_n_f):
>   Factorize into ...
>   (@mve_q_n_f): ... this.
>   (mve_vaddq_n_, mve_vmulq_n_)
>   (mve_vsubq_n_): Factorize into ...
>   (@mve_q_n_): ... this.
>   (mve_vaddq, mve_vmulq, mve_vsubq):
> Factorize
>   into ...
>   (mve_q): ... this.
>   (mve_vaddq_f, mve_vmulq_f,
> mve_vsubq_f):
>   Factorize into ...
>   (mve_q_f): ... this.
>   (mve_vaddq_m_, mve_vmulq_m_)
>   (mve_vsubq_m_): Factorize into ...
>   (@mve_q_m_): ... this,
>   (mve_vaddq_m_n_,
> mve_vmulq_m_n_)
>   (mve_vsubq_m_n_): Factorize into ...
>   (@mve_q_m_n_): ... this.
>   (mve_vaddq_m_f, mve_vmulq_m_f,
> mve_vsubq_m_f):
>   Factorize into ...
>   (@mve_q_m_f): ... this.
>   (mve_vaddq_m_n_f, mve_vmulq_m_n_f)
>   (mve_vsubq_m_n_f): Factorize into ...
>   (@mve_q_m_n_f): ... this.
> ---
>  gcc/config/arm/iterators.md |  57 +++
>  gcc/config/arm/mve.md   | 317 +---
>  2 files changed, 99 insertions(+), 275 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 39895ad62aa..d3bef594775 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -330,6 +330,63 @@ (define_code_iterator FCVT [unsigned_float float])
>  ;; Saturating addition, subtraction
>  (define_code_iterator SSPLUSMINUS [ss_plus ss_minus])
> 
> +;; MVE integer binary operations.
> +(define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
> +
> +(define_int_iterator MVE_INT_M_BINARY   [
> +  VADDQ_M_S VADDQ_M_U
> +  VMULQ_M_S VMULQ_M_U
> +  VSUBQ_M_S VSUBQ_M_U
> +  ])
> +
> +(define_int_iterator MVE_INT_M_N_BINARY [
> +  VADDQ_M_N_S VADDQ_M_N_U
> +  VMULQ_M_N_S VMULQ_M_N_U
> +  VSUBQ_M_N_S VSUBQ_M_N_U
> +  ])
> +
> +(define_int_iterator MVE_INT_N_BINARY   [
> +  VADDQ_N_S VADDQ_N_U
> +  VMULQ_N_S VMULQ_N_U
> +  VSUBQ_N_S VSUBQ_N_U
> +  ])
> +
> +(define_int_iterator MVE_FP_M_BINARY   [
> +  VADDQ_M_F
> +  VMULQ_M_F
> +  VSUBQ_M_F
> +  ])
> +
> +(define_int_iterator MVE_FP_M_N_BINARY [
> +  VADDQ_M_N_F
> +  VMULQ_M_N_F
> +  VSUBQ_M_N_F
> +  ])
> +
> +(define_int_iterator MVE_FP_N_BINARY   [
> +  VADDQ_N_F
> +  VMULQ_N_F
> +  VSUBQ_N_F
> +  ])
> +
> +(define_code_attr mve_addsubmul [
> +  (minus "vsub")
> +  (mult "vmul")
> +  (plus "vadd")
> +  ])
> +
> +(define_int_attr mve_insn [
> +  (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
> (VADDQ_M_N_F "vadd")
> +  (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F
> "vadd")
> +  (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F
> "vadd")
> +  (VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
> +  (VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
> +  (VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
> "vmul")
> +  (VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
> +  (VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
> +  (VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
> "vsub")
> +  ])
> +
>  ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
>  ;; a stack pointer operand.  The minus operation is a candidate for an rsub
>  ;; and hence only plus is supported.
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index ab688396f97..5167fbc6add 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -668,21 +668,6 @@ (define_insn "mve_vpnotv16bi"
>[(set_attr 

RE: [PATCH 06/22] arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 06/22] arm: [MVE intrinsics] add
> unspec_based_mve_function_exact_insn
> 
> Introduce a function that will be used to build intrinsics which use
> RTX codes for the non-predicated, no-mode version, and UNSPECS
> otherwise.
> 

Ok.
Thanks,
Kyrill

> 2022-09-08  Christophe Lyon 
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-mve-builtins-functions.h (class
>   unspec_based_mve_function_base): New.
>   (class unspec_based_mve_function_exact_insn): New.
> ---
>  gcc/config/arm/arm-mve-builtins-functions.h | 186 
>  1 file changed, 186 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-functions.h
> b/gcc/config/arm/arm-mve-builtins-functions.h
> index dff01999bcd..6d992b270b0 100644
> --- a/gcc/config/arm/arm-mve-builtins-functions.h
> +++ b/gcc/config/arm/arm-mve-builtins-functions.h
> @@ -39,6 +39,192 @@ public:
>}
>  };
> 
> +/* An incomplete function_base for functions that have an associated
> +   rtx_code for signed integers, unsigned integers and floating-point
> +   values for the non-predicated, non-suffixed intrinsic, and unspec
> +   codes, with separate codes for signed integers, unsigned integers
> +   and floating-point values.  The class simply records information
> +   about the mapping for derived classes to use.  */
> +class unspec_based_mve_function_base : public function_base
> +{
> +public:
> +  CONSTEXPR unspec_based_mve_function_base (rtx_code code_for_sint,
> + rtx_code code_for_uint,
> + rtx_code code_for_fp,
> + int unspec_for_n_sint,
> + int unspec_for_n_uint,
> + int unspec_for_n_fp,
> + int unspec_for_m_sint,
> + int unspec_for_m_uint,
> + int unspec_for_m_fp,
> + int unspec_for_m_n_sint,
> + int unspec_for_m_n_uint,
> + int unspec_for_m_n_fp)
> +: m_code_for_sint (code_for_sint),
> +  m_code_for_uint (code_for_uint),
> +  m_code_for_fp (code_for_fp),
> +  m_unspec_for_n_sint (unspec_for_n_sint),
> +  m_unspec_for_n_uint (unspec_for_n_uint),
> +  m_unspec_for_n_fp (unspec_for_n_fp),
> +  m_unspec_for_m_sint (unspec_for_m_sint),
> +  m_unspec_for_m_uint (unspec_for_m_uint),
> +  m_unspec_for_m_fp (unspec_for_m_fp),
> +  m_unspec_for_m_n_sint (unspec_for_m_n_sint),
> +  m_unspec_for_m_n_uint (unspec_for_m_n_uint),
> +  m_unspec_for_m_n_fp (unspec_for_m_n_fp)
> +  {}
> +
> +  /* The rtx code to use for signed, unsigned integers and
> + floating-point values respectively.  */
> +  rtx_code m_code_for_sint;
> +  rtx_code m_code_for_uint;
> +  rtx_code m_code_for_fp;
> +
> +  /* The unspec code associated with signed-integer, unsigned-integer
> + and floating-point operations respectively.  It covers the cases
> + with the _n suffix, and/or the _m predicate.  */
> +  int m_unspec_for_n_sint;
> +  int m_unspec_for_n_uint;
> +  int m_unspec_for_n_fp;
> +  int m_unspec_for_m_sint;
> +  int m_unspec_for_m_uint;
> +  int m_unspec_for_m_fp;
> +  int m_unspec_for_m_n_sint;
> +  int m_unspec_for_m_n_uint;
> +  int m_unspec_for_m_n_fp;
> +};
> +
> +/* Map the function directly to CODE (UNSPEC, M) where M is the vector
> +   mode associated with type suffix 0, except when there is no
> +   predicate and no _n suffix, in which case we use the appropriate
> +   rtx_code.  This is useful when the basic operation is mapped to a
> +   standard RTX code and all other versions use different unspecs.  */
> +class unspec_based_mve_function_exact_insn : public
> unspec_based_mve_function_base
> +{
> +public:
> +  CONSTEXPR unspec_based_mve_function_exact_insn (rtx_code
> code_for_sint,
> +   rtx_code code_for_uint,
> +   rtx_code code_for_fp,
> +   int unspec_for_n_sint,
> +   int unspec_for_n_uint,
> +   int unspec_for_n_fp,
> +   int unspec_for_m_sint,
> +   int unspec_for_m_uint,
> +   int unspec_for_m_fp,
> +   int unspec_for_m_n_sint,
> +   int unspec_for_m_n_uint,
> +   

RE: [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 05/22] arm: [MVE intrinsics] add binary_opt_n shape
> 
> This patch adds the binary_opt_n shape description.
> 

Ok.
Thanks,
Kyrill

>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_opt_n): New.
>   * config/arm/arm-mve-builtins-shapes.h (binary_opt_n): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 32 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 33 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index ce476aa196e..033b304060a 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -338,6 +338,38 @@ struct overloaded_base : public function_shape
>}
>  };
> 
> +/* _t vfoo[_t0](_t, _t)
> +   _t vfoo[_n_t0](_t, _t)
> +
> +   i.e. the standard shape for binary operations that operate on
> +   uniform types.
> +
> +   Example: vaddq.
> +   int8x16_t [__arm_]vaddq[_s8](int8x16_t a, int8x16_t b)
> +   int8x16_t [__arm_]vaddq[_n_s8](int8x16_t a, int8_t b)
> +   int8x16_t [__arm_]vaddq_m[_s8](int8x16_t inactive, int8x16_t a, int8x16_t
> b, mve_pred16_t p)
> +   int8x16_t [__arm_]vaddq_m[_n_s8](int8x16_t inactive, int8x16_t a, int8_t
> b, mve_pred16_t p)
> +   int8x16_t [__arm_]vaddq_x[_s8](int8x16_t a, int8x16_t b, mve_pred16_t p)
> +   int8x16_t [__arm_]vaddq_x[_n_s8](int8x16_t a, int8_t b, mve_pred16_t p)
> */
> +struct binary_opt_n_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder , const function_group_info ,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +build_all (b, "v0,v0,v0", group, MODE_none, preserve_user_namespace);
> +build_all (b, "v0,v0,s0", group, MODE_n, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver ) const override
> +  {
> +return r.resolve_uniform_opt_n (2);
> +  }
> +};
> +SHAPE (binary_opt_n)
> +
>  /* [xN]_t vfoo_t0().
> 
> Example: vuninitializedq.
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index a491369425c..43798fdde57 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -34,6 +34,7 @@ namespace arm_mve
>namespace shapes
>{
> 
> +extern const function_shape *const binary_opt_n;
>  extern const function_shape *const inherent;
>  extern const function_shape *const unary_convert;
> 
> --
> 2.34.1



RE: [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 04/22] arm: [MVE intrinsics] Rework vuninitialized
> 
> Implement vuninitialized using the new MVE builtins framework.
> 
> We need to keep the overloaded __arm_vuninitializedq definitions
> because their resolution depends on the result type only, which is not
> currently supported by the resolver.

Ok.
Thanks,
Kyrill

> 
> 2022-09-08  Murray Steele  
>   Christophe Lyon  
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-mve-builtins-base.cc (class
>   vuninitializedq_impl): New.
>   * config/arm/arm-mve-builtins-base.def (vuninitializedq): New.
>   * config/arm/arm-mve-builtins-base.h (vuninitializedq): New
>   declaration.
>   * config/arm/arm-mve-builtins-shapes.cc (inherent): New.
>   * config/arm/arm-mve-builtins-shapes.h (inherent): New
>   declaration.
>   * config/arm/arm_mve_types.h (__arm_vuninitializedq): Move to ...
>   * config/arm/arm_mve.h (__arm_vuninitializedq): ... here.
>   (__arm_vuninitializedq_u8): Remove.
>   (__arm_vuninitializedq_u16): Remove.
>   (__arm_vuninitializedq_u32): Remove.
>   (__arm_vuninitializedq_u64): Remove.
>   (__arm_vuninitializedq_s8): Remove.
>   (__arm_vuninitializedq_s16): Remove.
>   (__arm_vuninitializedq_s32): Remove.
>   (__arm_vuninitializedq_s64): Remove.
>   (__arm_vuninitializedq_f16): Remove.
>   (__arm_vuninitializedq_f32): Remove.
> ---
>  gcc/config/arm/arm-mve-builtins-base.cc   |  14 ++
>  gcc/config/arm/arm-mve-builtins-base.def  |   2 +
>  gcc/config/arm/arm-mve-builtins-base.h|   1 +
>  gcc/config/arm/arm-mve-builtins-shapes.cc |  16 ++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |   7 +-
>  gcc/config/arm/arm_mve.h  |  73 ++
>  gcc/config/arm/arm_mve_types.h| 169 --
>  7 files changed, 112 insertions(+), 170 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc b/gcc/config/arm/arm-
> mve-builtins-base.cc
> index ad8d500afc6..02a3b23865c 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -65,10 +65,24 @@ class vreinterpretq_impl : public
> quiet
>}
>  };
> 
> +/* Implements vuninitializedq_* intrinsics.  */
> +class vuninitializedq_impl : public quiet
> +{
> +
> +  rtx
> +  expand (function_expander ) const override
> +  {
> +rtx target = e.get_reg_target ();
> +emit_clobber (copy_rtx (target));
> +return target;
> +  }
> +};
> +
>  } /* end anonymous namespace */
> 
>  namespace arm_mve {
> 
>  FUNCTION (vreinterpretq, vreinterpretq_impl,)
> +FUNCTION (vuninitializedq, vuninitializedq_impl,)
> 
>  } /* end namespace arm_mve */
> diff --git a/gcc/config/arm/arm-mve-builtins-base.def b/gcc/config/arm/arm-
> mve-builtins-base.def
> index 5c0c1b9cee7..f669642a259 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.def
> +++ b/gcc/config/arm/arm-mve-builtins-base.def
> @@ -19,8 +19,10 @@
> 
>  #define REQUIRES_FLOAT false
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_integer,
> none)
> +DEF_MVE_FUNCTION (vuninitializedq, inherent, all_integer_with_64, none)
>  #undef REQUIRES_FLOAT
> 
>  #define REQUIRES_FLOAT true
>  DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
> +DEF_MVE_FUNCTION (vuninitializedq, inherent, all_float, none)
>  #undef REQUIRES_FLOAT
> diff --git a/gcc/config/arm/arm-mve-builtins-base.h b/gcc/config/arm/arm-
> mve-builtins-base.h
> index 60e7bd24eda..ec309cbe572 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.h
> +++ b/gcc/config/arm/arm-mve-builtins-base.h
> @@ -24,6 +24,7 @@ namespace arm_mve {
>  namespace functions {
> 
>  extern const function_base *const vreinterpretq;
> +extern const function_base *const vuninitializedq;
> 
>  } /* end namespace arm_mve::functions */
>  } /* end namespace arm_mve */
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index d0da0ffef91..ce476aa196e 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -338,6 +338,22 @@ struct overloaded_base : public function_shape
>}
>  };
> 
> +/* [xN]_t vfoo_t0().
> +
> +   Example: vuninitializedq.
> +   int8x16_t [__arm_]vuninitializedq_s8(void)
> +   int8x16_t [__arm_]vuninitializedq(int8x16_t t)  */
> +struct inherent_def : public nonoverloaded_base
> +{
> +  void
> +  build (function_builder , const function_group_info ,
> +  bool preserve_user_namespace) const override
> +  {
> +build_all (b, "t0", group, MODE_none, preserve_user_namespace);
> +  }
> +};
> +SHAPE (inherent)
> +
>  /* _t foo_t0[_t1](_t)
> 
> where the target type  must be specified explicitly but the source
> diff --git 

Re: [PATCH v5 06/11] RISC-V: Strengthen atomic stores

2023-05-02 Thread Patrick O'Neill

Discussed in the patchworks meeting with Jeff Law and decided to move
forward with the trailing fence compatibility approach. If the trailing
fence becomes a performance issue and people want to generate A.6 code,
we'll need a PSABI change to identify which mapping a binary uses. We'll
cross that bridge when/if we get to it.

Patrick

On 4/27/23 09:22, Patrick O'Neill wrote:

This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.

2023-04-27 Patrick O'Neill 

PR 89835

gcc/ChangeLog:

* config/riscv/sync.md:

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr89835.c: New test.

Signed-off-by: Patrick O'Neill 


Re: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Jeff Law via Gcc-patches




On 5/1/23 17:37, Roger Sayle wrote:


Jeff Law wrote:

This patch converts the xstormy16 patch to LRA.  It introduces a code
quality regression in the shiftsi testcase, but it also fixes numerous
aborts/errors.  IMHO it's a good tradeoff.


I've investigated the shiftsi regression on xstormy16 and the underlying
cause
appears to be an interaction between lower-subreg's "subreg3" pass and the
new LRA.  Previously, reload was not phased by the "clobbers" that are
introduced by the decompose_multiword_subregs function, but they appear
to interfere with LRA's register assignments.

combine's make_extra_copies introduces a new pseudo-to-pseudo move,
but when subreg3 inserts a naked clobber between the original and the
new move, LRA is recombine theses pseudos back to the same allocno.

The shiftsi.cc regression on xstormy16 is fixed by adding
-fno-split-wide-types.
In fact, if all the regression tests pass, I'd suggest that
flag_split_wide-types = false
should be the default on xstormy16 now that we've moved to LRA.  And if this
works for xstormy16, it might be useful to other targets for the LRA
transition;
it's a difference in behaviour between reload and LRA that could potentially
affect multiple targets.

For reference, xstormy16 has a post-reload define_insn_and_split for movsi
(i.e. a multi-word move).  If this insn was split during split1 (i.e. before
subreg3)
there wouldn't be a problem (no clobber), but alas the target's
xstormy16_split_move
function has several asserts insisting this only get called when
reload_completed.

I hope this is useful.

It is.

FWIW, turning that on by default for xstormy results in two new fails, 
but also in two new passes:



Tests that now fail, but worked before (2 tests):

xstormy16-sim: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline "hooray[^\\n]*inline 
copy in test"
xstormy16-sim: gcc.dg/setjmp-1.c spurious clobbered warning (test for bogus 
messages, line 17)

Tests that now work, but didn't before (2 tests):

xstormy16-sim: gcc.target/xstormy16/shiftsi.c --save-temps -fno-inline-functions -L/home/jlaw/jenkins/workspace/xstormy16-elf/gcc/gcc/testsuite/gcc.target/xstormy16  scan-assembler-not mov 
xstormy16-sim: gcc.target/xstormy16/zextendhisi2.c --save-temps -fno-inline-functions -L/home/jlaw/jenkins/workspace/xstormy16-elf/gcc/gcc/testsuite/gcc.target/xstormy16  scan-assembler mov r3,#0


What I find interesting is the two failures are in generic code. 
Definitely unexpected.


jeff


[PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

2023-05-02 Thread Carl Love via Gcc-patches
GCC maintainers:

The following patch adds three buitins for inserting and extracting the
exponent and significand for an IEEE 128-bit floating point values. 
The builtins are valid for Power 9 and Power 10.  

The patch has been tested on both Power 9 and Power 10.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 


--
>From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00 2001
From: Carl Love 
Date: Wed, 12 Apr 2023 17:46:37 -0400
Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int __builtin_extractf128_exp (__ieee128);
 __vector unsigned __int128 __builtin_extractf128_sig (__ieee128);
 __ieee128 __builtin_insertf128_exp (__vector unsigned __int128,
 __vector unsigned long long);

gcc/
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
* config/rs6000.md (extractf128_exp_, insertf128_exp_,
extractf128_sig_): Add define_expand for new builtins.
(xsxexpqp_f128_, xsxsigqp_f128_, siexpqpf_f128_):
Add define_insn for new builtins.
* doc/extend.texi (__builtin_extractf128_exp, __builtin_extractf128_sig,
__builtin_insertf128_exp): Add documentation for new builtins.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
---
 gcc/config/rs6000/rs6000-builtins.def |  9 +++
 gcc/config/rs6000/vsx.md  | 66 ++-
 gcc/doc/extend.texi   | 28 
 .../powerpc/bfp/extract-exp-ieee128.c | 49 ++
 .../powerpc/bfp/extract-sig-ieee128.c | 56 
 .../powerpc/bfp/insert-exp-ieee128.c  | 58 
 6 files changed, 265 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-ieee128.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..3247a7f7673 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2876,6 +2876,15 @@
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
 
+  vull __builtin_extractf128_exp (_Float128);
+EEXPKF extractf128_exp_kf {}
+
+  vuq __builtin_extractf128_sig (_Float128);
+ESIGKF extractf128_sig_kf {}
+
+  _Float128 __builtin_insertf128_exp (vuq, vull);
+IEXPKF_VULL insertf128_exp_kf {}
+
 
 ; Builtins requiring hardware support for IEEE-128 floating-point.
 [ieee128-hw]
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 7d845df5c2d..2a9f875ba57 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -369,7 +369,10 @@
UNSPEC_XXSPLTI32DX
UNSPEC_XXBLEND
UNSPEC_XXPERMX
-  ])
+   UNSPEC_EXTRACTEXPIEEE
+   UNSPEC_EXTRACTSIGIEEE
+   UNSPEC_INSERTEXPIEEE
+])
 
 (define_int_iterator XVCVBF16  [UNSPEC_VSX_XVCVSPBF16
 UNSPEC_VSX_XVCVBF16SPN])
@@ -4155,6 +4158,38 @@
  "vinsrx %0,%1,%2"
  [(set_attr "type" "vecsimple")])
 
+(define_expand "extractf128_exp_"
+  [(set (match_operand:V2DI 0 "altivec_register_operand")
+  (unspec:IEEE128 [(match_operand:IEEE128 1 "altivec_register_operand")]
+ UNSPEC_EXTRACTEXPIEEE))]
+"TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xsxexpqp_f128_ (operands[0], operands[1]));
+  DONE;
+})
+
+(define_expand "insertf128_exp_"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand")
+  (unspec:IEEE128 [(match_operand:V1TI 1 "altivec_register_operand")
+  (match_operand:V2DI 2 "altivec_register_operand")]
+ UNSPEC_INSERTEXPIEEE))]
+"TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xsiexpqpf_f128_ (operands[0], operands[1],
+   operands[2]));
+  DONE;
+})
+
+(define_expand "extractf128_sig_"
+  [(set (match_operand:V2DI 0 "altivec_register_operand")
+  (unspec:IEEE128 [(match_operand:IEEE128 1 "altivec_register_operand")]
+ UNSPEC_EXTRACTSIGIEEE))]
+"TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xsxsigqp_f128_ (operands[0], operands[1]));
+  DONE;
+})
+
 (define_expand "vreplace_elt_"
   [(set (match_operand:REPLACE_ELT 0 "register_operand")
   (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 "register_operand")
@@ -5016,6 +5051,15 @@
   "xsxexpqp %0,%1"
   [(set_attr "type" "vecmove")])
 
+;; VSX Scalar to Vector Extract Exponent IEEE 128-bit floating point format
+(define_insn "xsxexpqp_f128_"
+  [(set (match_operand:V2DI 0 

Re: [PATCH] RISC-V: fix build issue with gcc 4.9.x

2023-05-02 Thread Kito Cheng via Gcc-patches
> > Pushed to trunk, thanks for catching that, that's definitely should
> > use log2 no matter C++03 or C++11,
> > but I think GCC allows the usage of C++11 according to
> > https://gcc.gnu.org/install/prerequisites.html :P
> Yes, we should be able to use C++11.  I'd like to get that to C++17 at
> some point, but I think the biggest problem is the desire to support
> bootstrapping on something like centos7/rhel7.

At least we have auto and range based for loop, I am satisfied with
that enough.


>
> jeff


Re: [PATCH] libstdc++: Regenerate baseline_symbols.txt files for Linux

2023-05-02 Thread Jakub Jelinek via Gcc-patches
On Tue, May 02, 2023 at 04:42:52PM +0100, Jonathan Wakely wrote:
> On Tue, 2 May 2023 at 09:45, Jakub Jelinek  wrote:
> 
> > Hi!
> >
> > The following patch regenerates the ABI files (I've only changed the
> > Linux files which were updated recently (last month)).
> >
> > Tested on x86_64-linux, ok for trunk and later 13.2?
> >
> 
> OK, thanks.
> 
> I currently get:
> FAIL: libstdc++-abi/abi_check
> on powerpc64le for old glibc, with the _Float128 overloads for
> std::from_chars and std::to_chars.

I'll try to regenerate it from latest Fedora build for ppc64le.

> Those symbols were OK when GLIBCXX_3.4.31 was the latest, because added
> symbols in the latest version are OK. Now that GLIBCXX_3.4.32 is the
> latest, we can't have additions to the older version.

Jakub



Re: [PATCH] RISC-V: Name newly added flags in changelog

2023-05-02 Thread Jeff Law via Gcc-patches




On 5/1/23 10:10, Patrick O'Neill wrote:

This patch fixes the changelog to explicitly name the added command line
flags introduced in this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html

2023-05-01 Patrick O'Neill 

gcc/ChangeLog:

* ChangeLog: Name the flags added by the patch in the changelog.
OK.  You *might* not actually need a ChangeLog entry for this.  The 
hooks *might* have a special case for fixing up the ChangeLog entries.


jeff


Re: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Segher Boessenkool
On Tue, May 02, 2023 at 10:11:27AM -0400, Paul Koning wrote:
> > On May 2, 2023, at 9:18 AM, Roger Sayle  wrote:
> > Yes, see the section -fsplit-wide-types in
> > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> 
> Thanks.  So I'm wondering why that would be a problem.
> 
> The obvious question is whether it interacts badly with MD file entries that 
> describe wide operations, perhaps with constraints that require things like 
> odd/even register pairs.  But I would assume all that gets handled.
> 
> Along the same lines, why would a target, or a user, not do early wide 
> splitting all the time?  The documentation for that option gives no clue why 
> it would ever be bad.

In 

(the patch that created -fsplit-wide-types-early) I say "At least for
targets that do not have RTL patterns for operations on multi-register
modes it is a lot better to split patterns earlier, before combine and
all related passes."  If your target does in fact have patterns for
multi-reg modes it presumably wants those to be used preferably.

The patch did not change the default because that always is a lot of
pain.  The cost-benefit analysis did not work out here (for me!)


Segher


Re: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq

2023-05-02 Thread Christophe Lyon via Gcc-patches




On 5/2/23 17:28, Kyrylo Tkachov wrote:




-Original Message-
From: Christophe Lyon 
Sent: Tuesday, May 2, 2023 3:05 PM
To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org;
Richard Earnshaw ; Richard Sandiford

Subject: Re: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq




On 5/2/23 12:26, Kyrylo Tkachov wrote:


Hi Christophe,


-Original Message-
From: Christophe Lyon 

Sent: Tuesday, April 18, 2023 2:46 PM
To: gcc-patches@gcc.gnu.org  ; Kyrylo Tkachov 
 ;
Richard Earnshaw 
 ; Richard Sandiford


Cc: Christophe Lyon 

Subject: [PATCH 03/22] arm: [MVE intrinsics] Rework
vreinterpretq

This patch implements vreinterpretq using the new MVE
intrinsics
framework.

The old definitions for vreinterpretq are removed as a
consequence.

2022-09-08  Murray Steele  

Christophe Lyon  


gcc/
* config/arm/arm-mve-builtins-base.cc
(vreinterpretq_impl): New
class.
* config/arm/arm-mve-builtins-base.def: Define
vreinterpretq.
* config/arm/arm-mve-builtins-base.h
(vreinterpretq): New
declaration.
* config/arm/arm-mve-builtins-shapes.cc
(parse_element_type): New
function.
(parse_type): Likewise.
(parse_signature): Likewise.
(build_one): Likewise.
(build_all): Likewise.
(overloaded_base): New struct.
(unary_convert_def): Likewise.
* config/arm/arm-mve-builtins-shapes.h
(unary_convert): Declare.
* config/arm/arm-mve-builtins.cc
(TYPES_reinterpret_signed1): New
macro.
(TYPES_reinterpret_unsigned1): Likewise.
(TYPES_reinterpret_integer): Likewise.
(TYPES_reinterpret_integer1): Likewise.
(TYPES_reinterpret_float1): Likewise.
(TYPES_reinterpret_float): Likewise.
(reinterpret_integer): New.
(reinterpret_float): New.
(handle_arm_mve_h): Register builtins.
* config/arm/arm_mve.h (vreinterpretq_s16):
Remove.
(vreinterpretq_s32): Likewise.
(vreinterpretq_s64): Likewise.
(vreinterpretq_s8): Likewise.
(vreinterpretq_u16): Likewise.
(vreinterpretq_u32): Likewise.
(vreinterpretq_u64): Likewise.
(vreinterpretq_u8): Likewise.
(vreinterpretq_f16): Likewise.
(vreinterpretq_f32): Likewise.
(vreinterpretq_s16_s32): Likewise.
(vreinterpretq_s16_s64): Likewise.
(vreinterpretq_s16_s8): Likewise.
(vreinterpretq_s16_u16): Likewise.
(vreinterpretq_s16_u32): Likewise.
(vreinterpretq_s16_u64): Likewise.
(vreinterpretq_s16_u8): Likewise.
(vreinterpretq_s32_s16): Likewise.
(vreinterpretq_s32_s64): Likewise.
(vreinterpretq_s32_s8): Likewise.
(vreinterpretq_s32_u16): Likewise.
(vreinterpretq_s32_u32): Likewise.
(vreinterpretq_s32_u64): Likewise.
(vreinterpretq_s32_u8): Likewise.
(vreinterpretq_s64_s16): Likewise.
(vreinterpretq_s64_s32): Likewise.
(vreinterpretq_s64_s8): Likewise.
(vreinterpretq_s64_u16): Likewise.
(vreinterpretq_s64_u32): Likewise.
(vreinterpretq_s64_u64): Likewise.
(vreinterpretq_s64_u8): Likewise.
(vreinterpretq_s8_s16): Likewise.
(vreinterpretq_s8_s32): Likewise.
(vreinterpretq_s8_s64): Likewise.
(vreinterpretq_s8_u16): Likewise.
(vreinterpretq_s8_u32): Likewise.
(vreinterpretq_s8_u64): Likewise.
(vreinterpretq_s8_u8): Likewise.

Re: [PATCH] RISC-V: fix build issue with gcc 4.9.x

2023-05-02 Thread Jeff Law via Gcc-patches




On 5/2/23 08:31, Kito Cheng via Gcc-patches wrote:

Hi Romain:

Pushed to trunk, thanks for catching that, that's definitely should
use log2 no matter C++03 or C++11,
but I think GCC allows the usage of C++11 according to
https://gcc.gnu.org/install/prerequisites.html :P
Yes, we should be able to use C++11.  I'd like to get that to C++17 at 
some point, but I think the biggest problem is the desire to support 
bootstrapping on something like centos7/rhel7.


jeff


Re: [PATCH] libstdc++: Another attempt to ensure g++ 13+ compiled programs enforce gcc 13.2+ libstdc++.so.6 [PR108969]

2023-05-02 Thread Jonathan Wakely via Gcc-patches
On Fri, 28 Apr 2023 at 23:02, Jakub Jelinek  wrote:

> On Fri, Apr 28, 2023 at 09:35:49AM +0100, Jonathan Wakely wrote:
> > Yes, for both, thanks for the fix.
> >
> > After it lands on the gcc-13 branch I'll also update the manual with:
> >
> > --- a/libstdc++-v3/doc/xml/manual/abi.xml
> > +++ b/libstdc++-v3/doc/xml/manual/abi.xml
> > @@ -275,6 +275,7 @@ compatible.
> > GCC 11.1.0: libstdc++.so.6.0.29
> > GCC 12.1.0: libstdc++.so.6.0.30
> > GCC 13.1.0: libstdc++.so.6.0.31
> > +GCC 13.2.0: libstdc++.so.6.0.32
> > 
> > 
> >   Note 1: Error should be libstdc++.so.3.0.3.
>
> Don't you need to change later parts too?
> I mean adding
>   GCC 13.2.0: GLIBCXX_3.4.32, CXXABI_1.3.14
> entry.
>

Yep, and regenerate the HTML versions.


Re: [PATCH] libstdc++: Regenerate baseline_symbols.txt files for Linux

2023-05-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 2 May 2023 at 09:45, Jakub Jelinek  wrote:

> Hi!
>
> The following patch regenerates the ABI files (I've only changed the
> Linux files which were updated recently (last month)).
>
> Tested on x86_64-linux, ok for trunk and later 13.2?
>

OK, thanks.

I currently get:
FAIL: libstdc++-abi/abi_check
on powerpc64le for old glibc, with the _Float128 overloads for
std::from_chars and std::to_chars.
Those symbols were OK when GLIBCXX_3.4.31 was the latest, because added
symbols in the latest version are OK. Now that GLIBCXX_3.4.32 is the
latest, we can't have additions to the older version.




> 2023-05-02  Jakub Jelinek  
>
> * config/abi/post/aarch64-linux-gnu/baseline_symbols.txt: Update.
> * config/abi/post/i486-linux-gnu/baseline_symbols.txt: Update.
> * config/abi/post/m68k-linux-gnu/baseline_symbols.txt: Update.
> * config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Update.
> * config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Update.
> * config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Update.
> * config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Update.
> * config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Update.
>
> ---
> libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt.jj
> 2023-04-20 09:36:09.404371212 +0200
> +++ libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt
> 2023-05-02 10:33:35.251718474 +0200
> @@ -4232,6 +4232,7 @@ FUNC:_ZSt21__glibcxx_assert_failPKciS0_S
>  FUNC:_ZSt21__throw_bad_exceptionv@@GLIBCXX_3.4
>  FUNC:_ZSt21__throw_runtime_errorPKc@@GLIBCXX_3.4
>  FUNC:_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
> +FUNC:_ZSt21ios_base_library_initv@@GLIBCXX_3.4.32
>  FUNC:_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
>  FUNC:_ZSt22__throw_overflow_errorPKc@@GLIBCXX_3.4
>  FUNC:_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@
> @GLIBCXX_3.4.31
> @@ -4604,6 +4605,7 @@ OBJECT:0:GLIBCXX_3.4.29
>  OBJECT:0:GLIBCXX_3.4.3
>  OBJECT:0:GLIBCXX_3.4.30
>  OBJECT:0:GLIBCXX_3.4.31
> +OBJECT:0:GLIBCXX_3.4.32
>  OBJECT:0:GLIBCXX_3.4.4
>  OBJECT:0:GLIBCXX_3.4.5
>  OBJECT:0:GLIBCXX_3.4.6
> --- libstdc++-v3/config/abi/post/i486-linux-gnu/baseline_symbols.txt.jj
> 2023-04-20 09:36:09.406371182 +0200
> +++ libstdc++-v3/config/abi/post/i486-linux-gnu/baseline_symbols.txt
> 2023-05-02 10:32:56.908261585 +0200
> @@ -4233,6 +4233,7 @@ FUNC:_ZSt21__glibcxx_assert_failPKciS0_S
>  FUNC:_ZSt21__throw_bad_exceptionv@@GLIBCXX_3.4
>  FUNC:_ZSt21__throw_runtime_errorPKc@@GLIBCXX_3.4
>  FUNC:_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
> +FUNC:_ZSt21ios_base_library_initv@@GLIBCXX_3.4.32
>  FUNC:_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
>  FUNC:_ZSt22__throw_overflow_errorPKc@@GLIBCXX_3.4
>  FUNC:_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@
> @GLIBCXX_3.4.31
> @@ -4609,6 +4610,7 @@ OBJECT:0:GLIBCXX_3.4.29
>  OBJECT:0:GLIBCXX_3.4.3
>  OBJECT:0:GLIBCXX_3.4.30
>  OBJECT:0:GLIBCXX_3.4.31
> +OBJECT:0:GLIBCXX_3.4.32
>  OBJECT:0:GLIBCXX_3.4.4
>  OBJECT:0:GLIBCXX_3.4.5
>  OBJECT:0:GLIBCXX_3.4.6
> --- libstdc++-v3/config/abi/post/m68k-linux-gnu/baseline_symbols.txt.jj
> 2023-04-20 09:36:09.408371153 +0200
> +++ libstdc++-v3/config/abi/post/m68k-linux-gnu/baseline_symbols.txt
> 2023-05-02 10:32:37.296539368 +0200
> @@ -4232,6 +4232,7 @@ FUNC:_ZSt21__glibcxx_assert_failPKciS0_S
>  FUNC:_ZSt21__throw_bad_exceptionv@@GLIBCXX_3.4
>  FUNC:_ZSt21__throw_runtime_errorPKc@@GLIBCXX_3.4
>  FUNC:_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
> +FUNC:_ZSt21ios_base_library_initv@@GLIBCXX_3.4.32
>  FUNC:_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
>  FUNC:_ZSt22__throw_overflow_errorPKc@@GLIBCXX_3.4
>  FUNC:_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@
> @GLIBCXX_3.4.31
> @@ -4604,6 +4605,7 @@ OBJECT:0:GLIBCXX_3.4.29
>  OBJECT:0:GLIBCXX_3.4.3
>  OBJECT:0:GLIBCXX_3.4.30
>  OBJECT:0:GLIBCXX_3.4.31
> +OBJECT:0:GLIBCXX_3.4.32
>  OBJECT:0:GLIBCXX_3.4.4
>  OBJECT:0:GLIBCXX_3.4.5
>  OBJECT:0:GLIBCXX_3.4.6
> ---
> libstdc++-v3/config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt.jj
> 2023-04-20 09:36:09.409371138 +0200
> +++ libstdc++-v3/config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt
>  2023-05-02 10:32:00.138065690 +0200
> @@ -4613,6 +4613,7 @@ FUNC:_ZSt21__glibcxx_assert_failPKciS0_S
>  FUNC:_ZSt21__throw_bad_exceptionv@@GLIBCXX_3.4
>  FUNC:_ZSt21__throw_runtime_errorPKc@@GLIBCXX_3.4
>  FUNC:_ZSt21__to_chars_bfloat16_tPcS_fSt12chars_format@@GLIBCXX_3.4.31
> +FUNC:_ZSt21ios_base_library_initv@@GLIBCXX_3.4.32
>  FUNC:_ZSt22__from_chars_float16_tPKcS0_RfSt12chars_format@@GLIBCXX_3.4.31
>  FUNC:_ZSt22__throw_overflow_errorPKc@@GLIBCXX_3.4
>  FUNC:_ZSt23__from_chars_bfloat16_tPKcS0_RfSt12chars_format@
> @GLIBCXX_3.4.31
> @@ -5055,6 +5056,7 @@ OBJECT:0:GLIBCXX_3.4.29
>  OBJECT:0:GLIBCXX_3.4.3
>  OBJECT:0:GLIBCXX_3.4.30
>  OBJECT:0:GLIBCXX_3.4.31
> 

Re: [PATCH v2] RISC-V: ICE for vlmul_ext_v intrinsic API

2023-05-02 Thread Kito Cheng via Gcc-patches
committed, thanks for the patch :)

On Fri, Apr 28, 2023 at 6:37 PM Li, Pan2 via Gcc-patches
 wrote:
>
> Kindly ping for this ICE fix.
>
> Pan
>
> -Original Message-
> From: Wang, Yanzhang 
> Sent: Wednesday, April 26, 2023 9:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2 
> ; Wang, Yanzhang 
> Subject: [PATCH v2] RISC-V: ICE for vlmul_ext_v intrinsic API
>
> From: Yanzhang Wang 
>
> PR 109617
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Support VNx2HI and VNX4DI when 
> MIN_VLEN >= 128.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/vlmul_ext-1.c: New test.
>
> Signed-off-by: Yanzhang Wang 
> Co-authored-by: Pan Li 
> ---
>  gcc/config/riscv/vector-iterators.md   |  3 ++-
>  .../gcc.target/riscv/rvv/base/vlmul_ext-1.c| 14 ++
>  2 files changed, 16 insertions(+), 1 deletion(-)  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-1.c
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index a8e856161d3..033659930d1 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -189,6 +189,7 @@
>(VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI 
> "TARGET_MIN_VLEN >= 128")
>(VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI (VNx8SI "TARGET_MIN_VLEN >= 
> 128")
>(VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
> "TARGET_VECTOR_ELEN_64")
> +  (VNx4DI "TARGET_VECTOR_ELEN_64")
>(VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
>(VNx2SF "TARGET_VECTOR_ELEN_FP_32")
>(VNx4SF "TARGET_VECTOR_ELEN_FP_32")
> @@ -220,7 +221,7 @@
>
>  (define_mode_iterator VLMULEXT32 [
>(VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI (VNx4QI "TARGET_MIN_VLEN >= 128")
> -  (VNx1HI "TARGET_MIN_VLEN < 128")
> +  (VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN >= 128")
>  ])
>
>  (define_mode_iterator VLMULEXT64 [
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-1.c
> new file mode 100644
> index 000..501d98c5897
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-schedule-insns
> +-fno-schedule-insns2" } */
> +
> +#include 
> +
> +vint16m8_t test_vlmul_ext_v_i16mf4_i16m8(vint16mf4_t op1) {
> +  return __riscv_vlmul_ext_v_i16mf4_i16m8(op1);
> +}
> +
> +vint64m8_t test_vlmul_ext_v_i64m2_i64m8(vint64m2_t op1) {
> +  return __riscv_vlmul_ext_v_i64m2_i64m8(op1);
> +}
> +
> +/* { dg-final { scan-assembler-times {vs8r.v\s+[,\sa-x0-9()]+} 2} } */
> --
> 2.39.2
>


RE: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, May 2, 2023 3:05 PM
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org;
> Richard Earnshaw ; Richard Sandiford
> 
> Subject: Re: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
> 
> 
> 
> 
> On 5/2/23 12:26, Kyrylo Tkachov wrote:
> 
> 
>   Hi Christophe,
> 
> 
>   -Original Message-
>   From: Christophe Lyon 
> 
>   Sent: Tuesday, April 18, 2023 2:46 PM
>   To: gcc-patches@gcc.gnu.org  patc...@gcc.gnu.org> ; Kyrylo Tkachov 
>  ;
>   Richard Earnshaw 
>  ; Richard Sandiford
>   
> 
>   Cc: Christophe Lyon 
> 
>   Subject: [PATCH 03/22] arm: [MVE intrinsics] Rework
> vreinterpretq
> 
>   This patch implements vreinterpretq using the new MVE
> intrinsics
>   framework.
> 
>   The old definitions for vreinterpretq are removed as a
> consequence.
> 
>   2022-09-08  Murray Steele  
> 
>   Christophe Lyon  
> 
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc
> (vreinterpretq_impl): New
>   class.
>   * config/arm/arm-mve-builtins-base.def: Define
> vreinterpretq.
>   * config/arm/arm-mve-builtins-base.h
> (vreinterpretq): New
>   declaration.
>   * config/arm/arm-mve-builtins-shapes.cc
> (parse_element_type): New
>   function.
>   (parse_type): Likewise.
>   (parse_signature): Likewise.
>   (build_one): Likewise.
>   (build_all): Likewise.
>   (overloaded_base): New struct.
>   (unary_convert_def): Likewise.
>   * config/arm/arm-mve-builtins-shapes.h
> (unary_convert): Declare.
>   * config/arm/arm-mve-builtins.cc
> (TYPES_reinterpret_signed1): New
>   macro.
>   (TYPES_reinterpret_unsigned1): Likewise.
>   (TYPES_reinterpret_integer): Likewise.
>   (TYPES_reinterpret_integer1): Likewise.
>   (TYPES_reinterpret_float1): Likewise.
>   (TYPES_reinterpret_float): Likewise.
>   (reinterpret_integer): New.
>   (reinterpret_float): New.
>   (handle_arm_mve_h): Register builtins.
>   * config/arm/arm_mve.h (vreinterpretq_s16):
> Remove.
>   (vreinterpretq_s32): Likewise.
>   (vreinterpretq_s64): Likewise.
>   (vreinterpretq_s8): Likewise.
>   (vreinterpretq_u16): Likewise.
>   (vreinterpretq_u32): Likewise.
>   (vreinterpretq_u64): Likewise.
>   (vreinterpretq_u8): Likewise.
>   (vreinterpretq_f16): Likewise.
>   (vreinterpretq_f32): Likewise.
>   (vreinterpretq_s16_s32): Likewise.
>   (vreinterpretq_s16_s64): Likewise.
>   (vreinterpretq_s16_s8): Likewise.
>   (vreinterpretq_s16_u16): Likewise.
>   (vreinterpretq_s16_u32): Likewise.
>   (vreinterpretq_s16_u64): Likewise.
>   (vreinterpretq_s16_u8): Likewise.
>   (vreinterpretq_s32_s16): Likewise.
>   (vreinterpretq_s32_s64): Likewise.
>   (vreinterpretq_s32_s8): Likewise.
>   (vreinterpretq_s32_u16): Likewise.
>   (vreinterpretq_s32_u32): Likewise.
>   (vreinterpretq_s32_u64): Likewise.
>   (vreinterpretq_s32_u8): Likewise.
>   (vreinterpretq_s64_s16): Likewise.
>   (vreinterpretq_s64_s32): Likewise.
>   (vreinterpretq_s64_s8): Likewise.
>   (vreinterpretq_s64_u16): Likewise.
>   (vreinterpretq_s64_u32): Likewise.
>   (vreinterpretq_s64_u64): Likewise.
>   (vreinterpretq_s64_u8): Likewise.
>   (vreinterpretq_s8_s16): Likewise.
>   (vreinterpretq_s8_s32): Likewise.
>   (vreinterpretq_s8_s64): Likewise.
>   (vreinterpretq_s8_u16): Likewise.
>   (vreinterpretq_s8_u32): Likewise.
>   (vreinterpretq_s8_u64): Likewise.
>   (vreinterpretq_s8_u8): Likewise.
>   

[PATCH] c++: Fix up VEC_INIT_EXPR gimplification after r12-7069

2023-05-02 Thread Jakub Jelinek via Gcc-patches
Hi!

During patch backporting, I've noticed that while most cp_walk_tree calls
with cp_fold_r callback callers were changed from  to cp_fold_data
, the VEC_INIT_EXPR gimplifications has not, so it still passes just
address of a hash_set and so if during the folding we ever touch
data->flags, we use uninitialized data there.

The following patch changes it to do the same thing as cp_fold_function
because the VEC_INIT_EXPR gimplifications will happen on function bodies
only.

Ok for trunk if it passes bootstrap/regtest?

2023-05-02  Jakub Jelinek  

* cp-gimplify.cc (cp_fold_data): Move definition earlier.
(cp_gimplify_expr): Pass address of ff_genericize | ff_mce_false
constructed data rather than  to cp_walk_tree with cp_fold_r.

--- gcc/cp/cp-gimplify.cc.jj2023-03-16 22:01:02.295090975 +0100
+++ gcc/cp/cp-gimplify.cc   2023-05-02 17:05:03.079652427 +0200
@@ -57,6 +57,13 @@ enum fold_flags {
 
 using fold_flags_t = int;
 
+struct cp_fold_data
+{
+  hash_set pset;
+  fold_flags_t flags;
+  cp_fold_data (fold_flags_t flags): flags (flags) {}
+};
+
 /* Forward declarations.  */
 
 static tree cp_genericize_r (tree *, int *, void *);
@@ -505,8 +512,8 @@ cp_gimplify_expr (tree *expr_p, gimple_s
*expr_p = expand_vec_init_expr (NULL_TREE, *expr_p,
tf_warning_or_error);
 
-   hash_set pset;
-   cp_walk_tree (expr_p, cp_fold_r, , NULL);
+   cp_fold_data data (ff_genericize | ff_mce_false);
+   cp_walk_tree (expr_p, cp_fold_r, , NULL);
cp_genericize_tree (expr_p, false);
copy_if_shared (expr_p);
ret = GS_OK;
@@ -1029,13 +1036,6 @@ struct cp_genericize_data
  in fold-const, we need to perform this before transformation to
  GIMPLE-form.  */
 
-struct cp_fold_data
-{
-  hash_set pset;
-  fold_flags_t flags;
-  cp_fold_data (fold_flags_t flags): flags (flags) {}
-};
-
 static tree
 cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
 {

Jakub



Re: [PATCH 00/22] arm: New framework for MVE intrinsics

2023-05-02 Thread Christophe Lyon via Gcc-patches




On 5/2/23 11:18, Kyrylo Tkachov wrote:

Hi Christophe,


-Original Message-
From: Christophe Lyon 
Sent: Tuesday, April 18, 2023 2:46 PM
To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
Richard Earnshaw ; Richard Sandiford

Cc: Christophe Lyon 
Subject: [PATCH 00/22] arm: New framework for MVE intrinsics

Hi,

This is the beginning of a long patch series to change the way Arm MVE
intrinsics are implemented. The goal is to get rid of arm_mve.h, which
takes a long time to parse and compile.



Thanks for doing this. It is a significant improvement to the MVE intrinsics 
and should address some of the biggest maintainability and scalability issues 
we have in that area.
I'll be going through the patches one-by-one (I've looked at these offline 
already before), but the approach looks good to me at a high level.

My hope is that we'll move all the intrinsics, including the Neon ones to use 
this framework in the future, but getting the framework in place first is a 
good major first step in that direction.



Indeed. Ideally we'd probably want to make this framework more generic 
so that it supports aarch64 SVE, arm MVE and Neon, but that can be done 
later. I tried to highlight the differences I noticed compared to SVE, 
so that it helps us think what needs to be specialized for different 
targets, as opposed to what is already generic enough.


Thanks,

Christophe


Thanks,
Kyrill


Roughly speaking, it's about using a framework very similar to what is
implemented for AArch64/SVE intrinsics. I haven't converted all the
intrinsics yet, but I think it would be good to start the conversion
when stage-1 reopens.

* Factorizing names
One of the main implementation differences I noticed between SVE and
MVE is that mve.md provides only full builtin names at the moment, and
makes almost no use of "parameterized names"
(https://gcc.gnu.org/onlinedocs/gccint/Parameterized-
Names.html#Parameterized-Names).

Without this, we'd need the builtin expander to use a large
switch/case of the form:

switch (code)
case VADDQ_S: insn_code = code_for_mve_vaddq_s (...)
case VADDQ_U: insn_code = code_for_mve_vaddq_u (...)
case VSUBQ_S: insn_code = code_for_mve_vsubq_s (...)
case VSUBQ_U: insn_code = code_for_mve_vsubq_u (...)


so part of the work (which I called "factorize" in the commit
messages) is about replacing

(define_insn "mve_vaddq_n_"
with
(define_insn "@mve_q_n_"
with the help of a new iterator (mve_insn).

Doing so makes it more obvious that some patterns are identical,
except for the instruction name. I took this opportunity to merge
them, so for instance I have a patch which merges add, sub and mul
patterns.  Although not strictly necessary for the MVE intrinsics
restructuring work, this is a good opportunity to reduce such code
duplication (I did notice a few bugs during that process, which led me
to post a few small patches in the past months).  Note that identical
patterns will probably remain after the series, they can be merged
later if we want.

This factorization also implies the introduction of new iterators, but
also means that several existing ones become useless. These patches do
not remove them because it's a bit painful to reorder patches which
remove lines at some "random" places, leading to merge conflicts. It's
much simpler to write a big cleanup patch at the end of the serie to
remove all such useless iterators at once.

* Intrinsic re-implementation
After intrinsic names have been factorized, the actual
re-implementation patch is small:
- add 1 line in each of arm-mve-builtins-base.{cc,def,h} describing
   the intrinsic shape/signature, types and predicates involved,
   RTX/unspec codes
- remove the intrinsic definitions from arm_mve.h

The full series of ~140 patches is organized like this:
- patches 1 and 2 introduce the new framework
- new implementation of vreinterpretq
- new implementation of vuninitialized
- patch groups of varying size, consisting in:
   - add a new "shape" if needed (e.g. unary, binary, ternary, )
   - add framework support functions if needed
   - factorize a set of intrinsics (at minimum, just make use of
 parameterized-names)
   - actual re-implementation of the intrinsics

I kept patches small so the incremental progress is easy to follow and
check.  I'll submit the patches in small groups, this first one will
make sure we agree on the implementation.

Tested on arm-eabi with -mthumb/-mfloat-abi=hard/-march=armv8.1-
m.main+mve.

To help reviewers, I suggest to compare arm-mve-builtins.cc with
aarch64-sve-builtins.cc.

Christophe Lyon (22):
   arm: move builtin function codes into general numberspace
   arm: [MVE intrinsics] Add new framework
   arm: [MVE intrinsics] Rework vreinterpretq
   arm: [MVE intrinsics] Rework vuninitialized
   arm: [MVE intrinsics] add binary_opt_n shape
   arm: [MVE intrinsics] add unspec_based_mve_function_exact_insn
   arm: [MVE intrinsics] factorize vadd vsubq vmulq
   arm: [MVE intrinsics] rework vaddq vmulq vsubq
   

[PATCH] do not tailcall __sanitizer_cov_trace_pc [PR90746]

2023-05-02 Thread Alexander Monakov via Gcc-patches
When instrumentation is requested via -fsanitize-coverage=trace-pc, GCC
emits calls to __sanitizer_cov_trace_pc callback into each basic block.
This callback is supposed to be implemented by the user, and should be
able to identify the containing basic block by inspecting its return
address. Tailcalling the callback prevents that, so disallow it.

gcc/ChangeLog:

PR sanitizer/90746
* calls.cc (can_implement_as_sibling_call_p): Reject calls
to __sanitizer_cov_trace_pc.

gcc/testsuite/ChangeLog:

PR sanitizer/90746
* gcc.dg/sancov/basic0.c: Verify absence of tailcall.
---
 gcc/calls.cc | 10 ++
 gcc/testsuite/gcc.dg/sancov/basic0.c |  4 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 4d7f6c3d2..c6ed2f189 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -2541,6 +2541,16 @@ can_implement_as_sibling_call_p (tree exp,
   return false;
 }
 
+  /* __sanitizer_cov_trace_pc is supposed to inspect its return address
+ to identify the caller, and therefore should not be tailcalled.  */
+  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+  && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_SANITIZER_COV_TRACE_PC)
+{
+  /* No need for maybe_complain_about_tail_call here: the call
+ is synthesized by the compiler.  */
+  return false;
+}
+
   /* If the called function is nested in the current one, it might access
  some of the caller's arguments, but could clobber them beforehand if
  the argument areas are shared.  */
diff --git a/gcc/testsuite/gcc.dg/sancov/basic0.c 
b/gcc/testsuite/gcc.dg/sancov/basic0.c
index af69b2d12..dfdaea848 100644
--- a/gcc/testsuite/gcc.dg/sancov/basic0.c
+++ b/gcc/testsuite/gcc.dg/sancov/basic0.c
@@ -1,9 +1,11 @@
 /* Basic test on number of inserted callbacks.  */
 /* { dg-do compile } */
-/* { dg-options "-fsanitize-coverage=trace-pc -fdump-tree-optimized" } */
+/* { dg-options "-fsanitize-coverage=trace-pc -fdump-tree-optimized 
-fdump-rtl-expand" } */
 
 void foo(void)
 {
 }
 
 /* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_pc 
\\(\\)" 1 "optimized" } } */
+/* The built-in should not be tail-called: */
+/* { dg-final { scan-rtl-dump-not "call_insn/j" "expand" } } */
-- 
2.39.2



Re: [PATCH] RISC-V: fix build issue with gcc 4.9.x

2023-05-02 Thread Kito Cheng via Gcc-patches
Hi Romain:

Pushed to trunk, thanks for catching that, that's definitely should
use log2 no matter C++03 or C++11,
but I think GCC allows the usage of C++11 according to
https://gcc.gnu.org/install/prerequisites.html :P


On Tue, May 2, 2023 at 8:22 PM Romain Naour via Gcc-patches
 wrote:
>
> GCC should still build with GCC 4.8.3 or newer [1]
> using C++03 by default. But a recent change in
> RISC-V port introduced a C++11 feature "std::log2" [2].
>
> Use log2 from the C header, without the namespace [3].
>
> [1] https://gcc.gnu.org/install/prerequisites.html
> [2] 
> https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=7caa1ae5e451e780fbc4746a54e3f19d4f4304dc
> [3] 
> https://stackoverflow.com/questions/26733413/error-log2-is-not-a-member-of-std
>
> Fixes:
> https://gitlab.com/buildroot.org/toolchains-builder/-/jobs/4202276589
>
> gcc/ChangeLog:
> * config/riscv/genrvv-type-indexer.cc: Use log2 from the C header, 
> without
> the namespace.
>
> Signed-off-by: Romain Naour 
> ---
>  gcc/config/riscv/genrvv-type-indexer.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> b/gcc/config/riscv/genrvv-type-indexer.cc
> index e677b55290c..eebe382d1c3 100644
> --- a/gcc/config/riscv/genrvv-type-indexer.cc
> +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> @@ -115,9 +115,9 @@ same_ratio_eew_type (unsigned sew, int lmul_log2, 
> unsigned eew, bool unsigned_p,
>if (sew == eew)
>  elmul_log2 = lmul_log2;
>else if (sew > eew)
> -elmul_log2 = lmul_log2 - std::log2 (sew / eew);
> +elmul_log2 = lmul_log2 - log2 (sew / eew);
>else /* sew < eew */
> -elmul_log2 = lmul_log2 + std::log2 (eew / sew);
> +elmul_log2 = lmul_log2 + log2 (eew / sew);
>
>if (float_p)
>  return floattype (eew, elmul_log2);
> --
> 2.34.3
>


[PATCH] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-02 Thread Thomas Neumann via Gcc-patches

The atomic fastpath bypasses the code that releases the sort
array which was lazily allocated during unwinding. We now
check after deregistering if there is an array to free.

libgcc/ChangeLog:
* unwind-dw2-fde.c: Free sort array in atomic fast path.
---
 libgcc/unwind-dw2-fde.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 7b74c391ced..4d2737ff9f7 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -240,6 +240,12 @@ __deregister_frame_info_bases (const void *begin)

   // And remove
   ob = btree_remove (_frames, range[0]);
+
+  // Deallocate the sort array if any.
+  if (ob && ob->s.b.sorted)
+{
+  free (ob->u.sort);
+}
 #else
   init_object_mutex_once ();
   __gthread_mutex_lock (_mutex);
--
2.39.2



Re: [PATCH] Add emulated scatter capability to the vectorizer

2023-05-02 Thread Christophe Lyon via Gcc-patches
Hi Richard,

On Fri, 28 Apr 2023 at 14:41, Richard Biener via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> This adds a scatter vectorization capability to the vectorizer
> without target support by decomposing the offset and data vectors
> and then performing scalar stores in the order of vector lanes.
> This is aimed at cases where vectorizing the rest of the loop
> offsets the cost of vectorizing the scatter.
>
> The offset load is still vectorized and costed as such, but like
> with emulated gather those will be turned back to scalar loads
> by forwrpop.
>
> Slightly fixed compared to the version posted in autumn,
> re-bootstrapped & tested on x86_64-unknown-linux-gnu and pushed.
>
> Richard.
>
> * tree-vect-data-refs.cc (vect_analyze_data_refs): Always
> consider scatters.
> * tree-vect-stmts.cc (vect_model_store_cost): Pass in the
> gather-scatter info and cost emulated scatters accordingly.
> (get_load_store_type): Support emulated scatters.
> (vectorizable_store): Likewise.  Emulate them by extracting
> scalar offsets and data, doing scalar stores.
>
> * gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere.
>

We are now seeing these failures after this patch was committed:
FAIL:  gcc.dg/vect/pr25413a.c -flto -ffat-lto-objects  scan-tree-dump-times
vect "vectorized 2 loops" 1
FAIL:  gcc.dg/vect/pr25413a.c scan-tree-dump-times vect "vectorized 2
loops" 1
on aarch64

Christophe


* gcc.dg/vect/vect-71.c: Likewise.
> * gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise.
> * gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise.
> * gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/pr25413a.c  |   3 +-
>  .../gcc.dg/vect/tsvc/vect-tsvc-s4113.c|   2 +-
>  .../gcc.dg/vect/tsvc/vect-tsvc-s491.c |   2 +-
>  .../gcc.dg/vect/tsvc/vect-tsvc-vas.c  |   2 +-
>  gcc/testsuite/gcc.dg/vect/vect-71.c   |   2 +-
>  gcc/tree-vect-data-refs.cc|   4 +-
>  gcc/tree-vect-stmts.cc| 117 ++
>  7 files changed, 97 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr25413a.c
> b/gcc/testsuite/gcc.dg/vect/pr25413a.c
> index e444b2c3e8e..ffb517c9ce0 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr25413a.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr25413a.c
> @@ -123,7 +123,6 @@ int main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {
> target { ! vect_scatter_store } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" {
> target vect_scatter_store } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vector alignment may not be
> reachable" 1 "vect" { target { ! vector_alignment_reachable  } } } } */
>  /* { dg-final { scan-tree-dump-times "Alignment of access forced using
> versioning" 1 "vect" { target { ! vector_alignment_reachable } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
> index b64682a65df..ddb7e9dc0e8 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s4113.c
> @@ -39,4 +39,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { !
> aarch64_sve }  } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
> index 8465e137070..29e90ff0aff 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s491.c
> @@ -39,4 +39,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { !
> aarch64_sve }  } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
> b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
> index 5ff38851f43..b72ee21a9a3 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-vas.c
> @@ -39,4 +39,4 @@ int main (int argc, char **argv)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { !
> aarch64_sve }  } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-71.c
> b/gcc/testsuite/gcc.dg/vect/vect-71.c
> index f15521176df..581473fa4a1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-71.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-71.c
> @@ -36,4 +36,4 @@ int main (void)
>return main1 ();
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {
> xfail { ! vect_scatter_store } } } } */
> +/* { dg-final { 

Re: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Paul Koning via Gcc-patches



> On May 2, 2023, at 9:18 AM, Roger Sayle  wrote:
> 
> 
> On 02 May 2023 13:40, Paul Koning wrote:
>>> On May 1, 2023, at 7:37 PM, Roger Sayle 
>> wrote:
>>> 
>>> ...
>>> The shiftsi.cc regression on xstormy16 is fixed by adding
>>> -fno-split-wide-types.
>>> In fact, if all the regression tests pass, I'd suggest that
>>> flag_split_wide-types = false should be the default on xstormy16 now
>>> that we've moved to LRA.  And if this works for xstormy16, it might be
>>> useful to other targets for the LRA transition; it's a difference in
>>> behaviour between reload and LRA that could potentially affect
>>> multiple targets.
>> 
>> Is there documentation for that flag?
> 
> Yes, see the section -fsplit-wide-types in
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Thanks.  So I'm wondering why that would be a problem.

The obvious question is whether it interacts badly with MD file entries that 
describe wide operations, perhaps with constraints that require things like 
odd/even register pairs.  But I would assume all that gets handled.

Along the same lines, why would a target, or a user, not do early wide 
splitting all the time?  The documentation for that option gives no clue why it 
would ever be bad.

paul




[r14-358 Regression] FAIL: gcc.dg/cpp/undef2.c (test for warnings, line 9) on Linux/x86_64

2023-05-02 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

e7ce7c4905fd254760b1cd187752a03bc0c148ba is the first bad commit
commit e7ce7c4905fd254760b1cd187752a03bc0c148ba
Author: Longjun Luo 
Date:   Sun Apr 30 12:28:06 2023 -0600

[PATCH] libcpp: suppress builtin macro redefined warnings for __LINE__

caused

FAIL: c-c++-common/cpp/pr92296-2.c  -Wc++-compat   (test for warnings, line 41)
FAIL: gcc.dg/cpp/undef2.c (test for excess errors)
FAIL: gcc.dg/cpp/undef2.c  (test for warnings, line 9)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-358/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="cpp.exp=c-c++-common/cpp/pr92296-2.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="cpp.exp=c-c++-common/cpp/pr92296-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="cpp.exp=c-c++-common/cpp/pr92296-2.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="cpp.exp=c-c++-common/cpp/pr92296-2.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="cpp.exp=gcc.dg/cpp/undef2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="cpp.exp=gcc.dg/cpp/undef2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="cpp.exp=gcc.dg/cpp/undef2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="cpp.exp=gcc.dg/cpp/undef2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq

2023-05-02 Thread Christophe Lyon via Gcc-patches



On 5/2/23 12:26, Kyrylo Tkachov wrote:

Hi Christophe,


-Original Message-
From: Christophe Lyon
Sent: Tuesday, April 18, 2023 2:46 PM
To:gcc-patches@gcc.gnu.org; Kyrylo Tkachov;
Richard Earnshaw; Richard Sandiford

Cc: Christophe Lyon
Subject: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq

This patch implements vreinterpretq using the new MVE intrinsics
framework.

The old definitions for vreinterpretq are removed as a consequence.

2022-09-08  Murray Steele
Christophe Lyon

gcc/
* config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New
class.
* config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
* config/arm/arm-mve-builtins-base.h (vreinterpretq): New
declaration.
* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New
function.
(parse_type): Likewise.
(parse_signature): Likewise.
(build_one): Likewise.
(build_all): Likewise.
(overloaded_base): New struct.
(unary_convert_def): Likewise.
* config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
* config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
macro.
(TYPES_reinterpret_unsigned1): Likewise.
(TYPES_reinterpret_integer): Likewise.
(TYPES_reinterpret_integer1): Likewise.
(TYPES_reinterpret_float1): Likewise.
(TYPES_reinterpret_float): Likewise.
(reinterpret_integer): New.
(reinterpret_float): New.
(handle_arm_mve_h): Register builtins.
* config/arm/arm_mve.h (vreinterpretq_s16): Remove.
(vreinterpretq_s32): Likewise.
(vreinterpretq_s64): Likewise.
(vreinterpretq_s8): Likewise.
(vreinterpretq_u16): Likewise.
(vreinterpretq_u32): Likewise.
(vreinterpretq_u64): Likewise.
(vreinterpretq_u8): Likewise.
(vreinterpretq_f16): Likewise.
(vreinterpretq_f32): Likewise.
(vreinterpretq_s16_s32): Likewise.
(vreinterpretq_s16_s64): Likewise.
(vreinterpretq_s16_s8): Likewise.
(vreinterpretq_s16_u16): Likewise.
(vreinterpretq_s16_u32): Likewise.
(vreinterpretq_s16_u64): Likewise.
(vreinterpretq_s16_u8): Likewise.
(vreinterpretq_s32_s16): Likewise.
(vreinterpretq_s32_s64): Likewise.
(vreinterpretq_s32_s8): Likewise.
(vreinterpretq_s32_u16): Likewise.
(vreinterpretq_s32_u32): Likewise.
(vreinterpretq_s32_u64): Likewise.
(vreinterpretq_s32_u8): Likewise.
(vreinterpretq_s64_s16): Likewise.
(vreinterpretq_s64_s32): Likewise.
(vreinterpretq_s64_s8): Likewise.
(vreinterpretq_s64_u16): Likewise.
(vreinterpretq_s64_u32): Likewise.
(vreinterpretq_s64_u64): Likewise.
(vreinterpretq_s64_u8): Likewise.
(vreinterpretq_s8_s16): Likewise.
(vreinterpretq_s8_s32): Likewise.
(vreinterpretq_s8_s64): Likewise.
(vreinterpretq_s8_u16): Likewise.
(vreinterpretq_s8_u32): Likewise.
(vreinterpretq_s8_u64): Likewise.
(vreinterpretq_s8_u8): Likewise.
(vreinterpretq_u16_s16): Likewise.
(vreinterpretq_u16_s32): Likewise.
(vreinterpretq_u16_s64): Likewise.
(vreinterpretq_u16_s8): Likewise.
(vreinterpretq_u16_u32): Likewise.
(vreinterpretq_u16_u64): Likewise.
(vreinterpretq_u16_u8): Likewise.
(vreinterpretq_u32_s16): Likewise.
(vreinterpretq_u32_s32): Likewise.
(vreinterpretq_u32_s64): Likewise.
(vreinterpretq_u32_s8): Likewise.
(vreinterpretq_u32_u16): Likewise.
(vreinterpretq_u32_u64): Likewise.
(vreinterpretq_u32_u8): Likewise.
(vreinterpretq_u64_s16): Likewise.
(vreinterpretq_u64_s32): Likewise.
(vreinterpretq_u64_s64): Likewise.
(vreinterpretq_u64_s8): Likewise.
(vreinterpretq_u64_u16): Likewise.
(vreinterpretq_u64_u32): Likewise.
(vreinterpretq_u64_u8): Likewise.
(vreinterpretq_u8_s16): Likewise.
(vreinterpretq_u8_s32): Likewise.
(vreinterpretq_u8_s64): Likewise.
(vreinterpretq_u8_s8): Likewise.
(vreinterpretq_u8_u16): Likewise.
(vreinterpretq_u8_u32): Likewise.
(vreinterpretq_u8_u64): Likewise.
(vreinterpretq_s32_f16): Likewise.
(vreinterpretq_s32_f32): Likewise.
(vreinterpretq_u16_f16): Likewise.
(vreinterpretq_u16_f32): Likewise.
(vreinterpretq_u32_f16): Likewise.
(vreinterpretq_u32_f32): Likewise.
(vreinterpretq_u64_f16): Likewise.
(vreinterpretq_u64_f32): Likewise.
(vreinterpretq_u8_f16): Likewise.
(vreinterpretq_u8_f32): Likewise.
(vreinterpretq_f16_f32): Likewise.
(vreinterpretq_f16_s16): Likewise.
(vreinterpretq_f16_s32): Likewise.
(vreinterpretq_f16_s64): Likewise.
(vreinterpretq_f16_s8): Likewise.

Re: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Segher Boessenkool
Hi!

On Tue, May 02, 2023 at 02:18:43PM +0100, Roger Sayle wrote:
> On 02 May 2023 13:40, Paul Koning wrote:
> > > On May 1, 2023, at 7:37 PM, Roger Sayle 
> > wrote:
> > > The shiftsi.cc regression on xstormy16 is fixed by adding
> > > -fno-split-wide-types.
> > > In fact, if all the regression tests pass, I'd suggest that
> > > flag_split_wide-types = false should be the default on xstormy16 now
> > > that we've moved to LRA.  And if this works for xstormy16, it might be
> > > useful to other targets for the LRA transition; it's a difference in
> > > behaviour between reload and LRA that could potentially affect
> > > multiple targets.
> > 
> > Is there documentation for that flag?
> 
> Yes, see the section -fsplit-wide-types in
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> 
> Interestingly, there's a recent-ish blog describing how
> -fno-split-wide-types
> reduces executable size on AVR:
> https://ufj.ddns.net/blog/marlin/2019/01/07/reducing-marlin-binary-size.html
> and its interaction with (AVR) register allocation is seen in PR
> middle-end/35860.

But what causes the problem?  Is something missing somewhere, or do we
get too high register pressure?

There also is -fsplit-wide-types-early, which might help.

In principle it always is good to describe a machine model as close as
possible to what the machine actually does.  What gets in the way here?


Segher


Ping: [PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-05-02 Thread Lewis Hyatt via Gcc-patches
May I please ping this one? Thanks...
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html

On Thu, Mar 2, 2023 at 6:21 PM Lewis Hyatt  wrote:
>
> The PR complains that we do not handle UTF-8 in the suffix for a user-defined
> literal, such as:
>
> bool operator ""_π (unsigned long long);
>
> In fact we don't handle any extended identifier characters there, whether
> UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
> the "" tokens is included, since then the identifier is lexed in the "normal"
> way as its own token. But when it is lexed as part of the string token, this
> is handled in lex_string() with a one-off loop that is not aware of extended
> characters.
>
> This patch fixes it by adding a new function scan_cur_identifier() that can be
> used to lex an identifier while in the middle of lexing another token.
>
> BTW, the other place that has been mis-lexing identifiers is
> lex_identifier_intern(), which is used to implement #pragma push_macro
> and #pragma pop_macro. This does not support extended characters either.
> I will add that in a subsequent patch, because it can't directly reuse the
> new function, but rather needs to lex from a string instead of a cpp_buffer.
>
> With scan_cur_identifier(), we do also correctly warn about bidi and
> normalization issues in the extended identifiers comprising the suffix.
>
> libcpp/ChangeLog:
>
> PR preprocessor/103902
> * lex.cc (identifier_diagnostics_on_lex): New function refactoring
> some common code.
> (lex_identifier_intern): Use the new function.
> (lex_identifier): Don't run identifier diagnostics here, rather let
> the call site do it when needed.
> (_cpp_lex_direct): Adjust the call sites of lex_identifier ()
> acccordingly.
> (struct scan_id_result): New struct.
> (scan_cur_identifier): New function.
> (create_literal2): New function.
> (lit_accum::create_literal2): New function.
> (is_macro): Folded into new function...
> (maybe_ignore_udl_macro_suffix): ...here.
> (is_macro_not_literal_suffix): Folded likewise.
> (lex_raw_string): Handle UTF-8 in UDL suffix via scan_cur_identifier 
> ().
> (lex_string): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR preprocessor/103902
> * g++.dg/cpp0x/udlit-extended-id-1.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-2.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-3.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-4.C: New test.
> ---
>
> Notes:
> Hello-
>
> This is the updated version of the patch, incorporating feedback from 
> Jakub
> and Jason, most recently discussed here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html
>
> Please let me know how it looks? It is simpler than before with the new
> approach. Thanks!
>
> One thing to note. As Jason clarified for me, a usage like this:
>
>  #pragma GCC poison _x
> const char * operator "" _x (const char *, unsigned long);
>
> The space between the "" and the _x is currently allowed but will be
> deprecated in C++23. GCC currently will complain about the poisoned use of
> _x in this case, and this patch, which is just focused on handling UTF-8
> properly, does not change this. But it seems that it would be correct
> not to apply poison in this case. I can try to follow up with a patch to 
> do
> so, if it seems worthwhile? Given the syntax is deprecated, maybe it's not
> worth it...
>
> For the time being, this patch does add a testcase for the above and 
> xfails
> it. For the case where no space is present, which is the part touched by 
> the
> present patch, existing behavior is preserved correctly and no diagnostics
> such as poison are issued for the UDL suffix. (Contrary to v1 of this
> patch.)
>
> Thanks! bootstrap + regtested all languages on x86-64 Linux with
> no regressions.
>
> -Lewis
>
>  .../g++.dg/cpp0x/udlit-extended-id-1.C|  68 
>  .../g++.dg/cpp0x/udlit-extended-id-2.C|   6 +
>  .../g++.dg/cpp0x/udlit-extended-id-3.C|  15 +
>  .../g++.dg/cpp0x/udlit-extended-id-4.C|  14 +
>  libcpp/lex.cc | 382 ++
>  5 files changed, 317 insertions(+), 168 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-2.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-4.C
>
> diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C 
> b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> new file mode 100644
> index 000..411d4fdd0ba
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> @@ -0,0 +1,68 @@
> +// { dg-do run { target c++11 

RE: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Roger Sayle


On 02 May 2023 13:40, Paul Koning wrote:
> > On May 1, 2023, at 7:37 PM, Roger Sayle 
> wrote:
> >
> > ...
> > The shiftsi.cc regression on xstormy16 is fixed by adding
> > -fno-split-wide-types.
> > In fact, if all the regression tests pass, I'd suggest that
> > flag_split_wide-types = false should be the default on xstormy16 now
> > that we've moved to LRA.  And if this works for xstormy16, it might be
> > useful to other targets for the LRA transition; it's a difference in
> > behaviour between reload and LRA that could potentially affect
> > multiple targets.
> 
> Is there documentation for that flag?

Yes, see the section -fsplit-wide-types in
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Interestingly, there's a recent-ish blog describing how
-fno-split-wide-types
reduces executable size on AVR:
https://ufj.ddns.net/blog/marlin/2019/01/07/reducing-marlin-binary-size.html
and its interaction with (AVR) register allocation is seen in PR
middle-end/35860.

Cheers,
Roger
--




Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-02 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Tue, 2 May 2023 at 17:32, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Tue, 2 May 2023 at 14:56, Richard Sandiford
>> >  wrote:
>> >> > [aarch64] Improve code-gen for vector initialization with single 
>> >> > constant element.
>> >> >
>> >> > gcc/ChangeLog:
>> >> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
>> >> > condition
>> >> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>> >> >   and if maxv == 1, use constant element for duplicating into 
>> >> > register.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >> >
>> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
>> >> > b/gcc/config/aarch64/aarch64.cc
>> >> > index 2b0de7ca038..f46750133a6 100644
>> >> > --- a/gcc/config/aarch64/aarch64.cc
>> >> > +++ b/gcc/config/aarch64/aarch64.cc
>> >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx 
>> >> > vals)
>> >> >   and matches[X][1] with the count of duplicate elements (if X is 
>> >> > the
>> >> >   earliest element which has duplicates).  */
>> >> >
>> >> > -  if (n_var == n_elts && n_elts <= 16)
>> >> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
>> >> >  {
>> >> >int matches[16][2] = {0};
>> >> >for (int i = 0; i < n_elts; i++)
>> >> > @@ -7,6 +7,18 @@ aarch64_expand_vector_init (rtx target, rtx 
>> >> > vals)
>> >> >vector register.  For big-endian we want that position to 
>> >> > hold
>> >> >the last element of VALS.  */
>> >> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
>> >> > +
>> >> > +   /* If we have a single constant element, use that for 
>> >> > duplicating
>> >> > +  instead.  */
>> >> > +   if (n_var == n_elts - 1)
>> >> > + for (int i = 0; i < n_elts; i++)
>> >> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
>> >> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
>> >> > + {
>> >> > +   maxelement = i;
>> >> > +   break;
>> >> > + }
>> >> > +
>> >> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> >> > aarch64_emit_move (target, lowpart_subreg (mode, x, 
>> >> > inner_mode));
>> >>
>> >> We don't want to force the constant into a register though.
>> > OK right, sorry.
>> > With the attached patch, for the following test-case:
>> > int64x2_t f_s64(int64_t x)
>> > {
>> >   return (int64x2_t) { x, 1 };
>> > }
>> >
>> > it loads constant from memory (same code-gen as without patch).
>> > f_s64:
>> > adrpx1, .LC0
>> > ldr q0, [x1, #:lo12:.LC0]
>> > ins v0.d[0], x0
>> > ret
>> >
>> > Does the patch look OK ?
>> >
>> > Thanks,
>> > Prathamesh
>> > [...]
>> > [aarch64] Improve code-gen for vector initialization with single constant 
>> > element.
>> >
>> > gcc/ChangeLog:
>> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
>> > condition
>> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>> >   and if maxv == 1, use constant element for duplicating into register.
>> >
>> > gcc/testsuite/ChangeLog:
>> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 2b0de7ca038..97309ddec4f 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >   and matches[X][1] with the count of duplicate elements (if X is the
>> >   earliest element which has duplicates).  */
>> >
>> > -  if (n_var == n_elts && n_elts <= 16)
>> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
>>
>> No need for the extra brackets.
> Adjusted, thanks. Sorry if this sounds like a silly question, but why
> do we need the n_elts <= 16 check ?
> Won't n_elts be always <= 16 since max number of elements in a vector
> would be 16 for V16QI ?

Was wondering the same thing :)

Let's leave it though.

>> >  {
>> >int matches[16][2] = {0};
>> >for (int i = 0; i < n_elts; i++)
>> > @@ -7,8 +7,26 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >vector register.  For big-endian we want that position to hold
>> >the last element of VALS.  */
>> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
>> > -   rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> > -   aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>> > +
>> > +   /* If we have a single constant element, use that for duplicating
>> > +  instead.  */
>> > +   if (n_var == n_elts - 1)
>> > + for (int i = 0; i < n_elts; i++)
>> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
>> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
>> > +  

[pushed] c++: Add testcase for already fixed PR [PR109506]

2023-05-02 Thread Patrick Palka via Gcc-patches
The PR109666 fix r14-386-g07c52d1eec967 incidentally also fixes this PR.

PR c++/109506

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template26.C: New test.
---
 gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C | 22 +++
 1 file changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C 
b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C
new file mode 100644
index 000..032b7b61619
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template26.C
@@ -0,0 +1,22 @@
+// PR c++/109506
+// { dg-do link { target c++11 } }
+// { dg-additional-options "-fchecking=2" }
+
+template
+struct foo {
+  foo() { };
+};
+
+template
+class bar {
+  foo alloc_{};
+};
+
+template
+bar func1() {
+  return bar{};
+}
+
+int main() {
+  func1();
+}
-- 
2.40.1.459.g48d89b51b3



Re: [committed] Convert xstormy16 to LRA

2023-05-02 Thread Paul Koning via Gcc-patches



> On May 1, 2023, at 7:37 PM, Roger Sayle  wrote:
> 
> ...
> The shiftsi.cc regression on xstormy16 is fixed by adding
> -fno-split-wide-types.
> In fact, if all the regression tests pass, I'd suggest that
> flag_split_wide-types = false
> should be the default on xstormy16 now that we've moved to LRA.  And if this
> works for xstormy16, it might be useful to other targets for the LRA
> transition;
> it's a difference in behaviour between reload and LRA that could potentially
> affect multiple targets.

Is there documentation for that flag?  

paul



Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-02 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 2 May 2023 at 17:32, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 2 May 2023 at 14:56, Richard Sandiford
> >  wrote:
> >> > [aarch64] Improve code-gen for vector initialization with single 
> >> > constant element.
> >> >
> >> > gcc/ChangeLog:
> >> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
> >> > condition
> >> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
> >> >   and if maxv == 1, use constant element for duplicating into 
> >> > register.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
> >> > b/gcc/config/aarch64/aarch64.cc
> >> > index 2b0de7ca038..f46750133a6 100644
> >> > --- a/gcc/config/aarch64/aarch64.cc
> >> > +++ b/gcc/config/aarch64/aarch64.cc
> >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> >> >   and matches[X][1] with the count of duplicate elements (if X is the
> >> >   earliest element which has duplicates).  */
> >> >
> >> > -  if (n_var == n_elts && n_elts <= 16)
> >> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
> >> >  {
> >> >int matches[16][2] = {0};
> >> >for (int i = 0; i < n_elts; i++)
> >> > @@ -7,6 +7,18 @@ aarch64_expand_vector_init (rtx target, rtx 
> >> > vals)
> >> >vector register.  For big-endian we want that position to hold
> >> >the last element of VALS.  */
> >> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
> >> > +
> >> > +   /* If we have a single constant element, use that for duplicating
> >> > +  instead.  */
> >> > +   if (n_var == n_elts - 1)
> >> > + for (int i = 0; i < n_elts; i++)
> >> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
> >> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
> >> > + {
> >> > +   maxelement = i;
> >> > +   break;
> >> > + }
> >> > +
> >> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> >> > aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
> >>
> >> We don't want to force the constant into a register though.
> > OK right, sorry.
> > With the attached patch, for the following test-case:
> > int64x2_t f_s64(int64_t x)
> > {
> >   return (int64x2_t) { x, 1 };
> > }
> >
> > it loads constant from memory (same code-gen as without patch).
> > f_s64:
> > adrpx1, .LC0
> > ldr q0, [x1, #:lo12:.LC0]
> > ins v0.d[0], x0
> > ret
> >
> > Does the patch look OK ?
> >
> > Thanks,
> > Prathamesh
> > [...]
> > [aarch64] Improve code-gen for vector initialization with single constant 
> > element.
> >
> > gcc/ChangeLog:
> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
> > condition
> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
> >   and if maxv == 1, use constant element for duplicating into register.
> >
> > gcc/testsuite/ChangeLog:
> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 2b0de7ca038..97309ddec4f 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> >   and matches[X][1] with the count of duplicate elements (if X is the
> >   earliest element which has duplicates).  */
> >
> > -  if (n_var == n_elts && n_elts <= 16)
> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
>
> No need for the extra brackets.
Adjusted, thanks. Sorry if this sounds like a silly question, but why
do we need the n_elts <= 16 check ?
Won't n_elts be always <= 16 since max number of elements in a vector
would be 16 for V16QI ?
>
> >  {
> >int matches[16][2] = {0};
> >for (int i = 0; i < n_elts; i++)
> > @@ -7,8 +7,26 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> >vector register.  For big-endian we want that position to hold
> >the last element of VALS.  */
> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
> > -   rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> > -   aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
> > +
> > +   /* If we have a single constant element, use that for duplicating
> > +  instead.  */
> > +   if (n_var == n_elts - 1)
> > + for (int i = 0; i < n_elts; i++)
> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
> > + {
> > +   maxelement = i;
> > +   break;
> > + }
> > +
> > +   rtx maxval = XVECEXP (vals, 0, maxelement);
> > +   if (!(CONST_INT_P (maxval) || CONST_DOUBLE_P (maxval)))
> > + {
> > +   

[PATCH (pushed)] docs: port documentation of VRP params

2023-05-02 Thread Martin Liška

gcc/ChangeLog:

* doc/invoke.texi: Update documentation based on param.opt file.
---
 gcc/doc/invoke.texi | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2f40c58b21c..b92b8576027 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15988,15 +15988,9 @@ The maximum number of may-defs we analyze when looking 
for a must-def
 specifying the dynamic type of an object that invokes a virtual call
 we may be able to devirtualize speculatively.
 
-@item evrp-sparse-threshold

-Maximum number of basic blocks before EVRP uses a sparse cache.
-
 @item ranger-debug
 Specifies the type of debug output to be issued for ranges.
 
-@item evrp-switch-limit

-Specifies the maximum number of switch cases before EVRP ignores a switch.
-
 @item unroll-jam-min-percent
 The minimum percentage of memory references that must be optimized
 away for the unroll-and-jam transformation to be considered profitable.
@@ -16055,6 +16049,15 @@ this parameter.  The default value of this parameter 
is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vrp-sparse-threshold

+Maximum number of basic blocks before VRP uses a sparse bitmap cache.
+
+@item vrp-switch-limit
+Maximum number of outgoing edges in a switch before VRP will not process it.
+
+@item vrp-vector-threshold
+Maximum number of basic blocks for VRP to use a basic cache vector.
+
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
 
--

2.40.1



Re: [PATCH 09/10] arm testsuite: XFAIL or relax registers in some tests

2023-05-02 Thread Stamatis Markianos-Wright via Gcc-patches



On 28/04/2023 17:54, Kyrylo Tkachov wrote:



-Original Message-
From: Andrea Corallo 
Sent: Friday, April 28, 2023 12:30 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Stam Markianos-Wright 
Subject: [PATCH 09/10] arm testsuite: XFAIL or relax registers in some tests

From: Stam Markianos-Wright 

Hi all,

This is a simple testsuite tidy-up patch, addressing to types of errors:

* The vcmp vector-scalar tests failing due to the compiler's preference
of vector-vector comparisons, over vector-scalar comparisons. This is
due to the lack of cost model for MVE and the compiler not knowing that
the RTL vec_duplicate is free in those instructions. For now, we simply
XFAIL these checks.

I'd like to see this deficiency tracked in Bugzilla before we mark these as 
XFAIL.


Yep! Raised https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109697
(And I'll also update this commit message to reference that PR now)




* The tests for pr108177 had strict usage of q0 and r0 registers,
meaning that they would FAIL with -mfloat-abi=softf. The register checks
have now been relaxed.

This part is ok.
Thanks,
Kyrill


gcc/testsuite/ChangeLog:

   * gcc.target/arm/mve/intrinsics/srshr.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/srshrl.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/uqshl.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/uqshll.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/urshr.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/urshrl.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vadciq_s32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vadciq_u32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vadcq_s32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vadcq_u32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vsbciq_s32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vsbciq_u32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vsbcq_s32.c: XFAIL check.
   * gcc.target/arm/mve/intrinsics/vsbcq_u32.c: XFAIL check.
   * gcc.target/arm/mve/pr108177-1.c: Relax registers.
   * gcc.target/arm/mve/pr108177-10.c: Relax registers.
   * gcc.target/arm/mve/pr108177-11.c: Relax registers.
   * gcc.target/arm/mve/pr108177-12.c: Relax registers.
   * gcc.target/arm/mve/pr108177-13.c: Relax registers.
   * gcc.target/arm/mve/pr108177-14.c: Relax registers.
   * gcc.target/arm/mve/pr108177-2.c: Relax registers.
   * gcc.target/arm/mve/pr108177-3.c: Relax registers.
   * gcc.target/arm/mve/pr108177-4.c: Relax registers.
   * gcc.target/arm/mve/pr108177-5.c: Relax registers.
   * gcc.target/arm/mve/pr108177-6.c: Relax registers.
   * gcc.target/arm/mve/pr108177-7.c: Relax registers.
   * gcc.target/arm/mve/pr108177-8.c: Relax registers.
   * gcc.target/arm/mve/pr108177-9.c: Relax registers.
---
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c  | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c  | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c  | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c | 2 +-
  gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c | 2 +-
  

Re: [PATCH] PHIOPT: Improve replace_phi_edge_with_variable for diamond shapped bb

2023-05-02 Thread Richard Biener via Gcc-patches
On Sun, Apr 30, 2023 at 11:14 PM Andrew Pinski via Gcc-patches
 wrote:
>
> While looking at differences between what minmax_replacement
> and match_simplify_replacement does. I noticed that they sometimes
> chose different edges to remove. I decided we should be able to do
> better and be able to remove both empty basic blocks in the
> case of match_simplify_replacement as that moves the statements.
>
> This also updates the testcases as now match_simplify_replacement
> will remove the unused MIN/MAX_EXPR and they were checking for
> those.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (copy_phi_args): New function.
> (replace_phi_edge_with_variable): Handle diamond form bb
> with forwarder only empty blocks better.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/minmax-15.c: Update test.
> * gcc.dg/tree-ssa/minmax-16.c: Update test.
> * gcc.dg/tree-ssa/minmax-3.c: Update test.
> * gcc.dg/tree-ssa/minmax-4.c: Update test.
> * gcc.dg/tree-ssa/minmax-5.c: Update test.
> * gcc.dg/tree-ssa/minmax-8.c: Update test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c |  3 +-
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c |  9 ++--
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c  |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c  |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c  |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c  |  2 +-
>  gcc/tree-ssa-phiopt.cc| 51 ++-
>  7 files changed, 59 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
> index 8a39871c938..6731f91e6c3 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-15.c
> @@ -30,5 +30,6 @@ main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
> +/* There should only be two MIN_EXPR left, the 3rd one was removed. */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
>  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> index 623b12b3f74..094364e6424 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-16.c
> @@ -25,11 +25,8 @@ main (void)
>return 0;
>  }
>
> -/* After phiopt1, there really should be only 3 MIN_EXPR in the IR 
> (including debug statements).
> -   But the way phiopt does not cleanup the CFG all the time, the PHI might 
> still reference the
> -   alternative bb's moved statement.
> -   Note in the end, we do dce the statement and other debug statements to 
> end up with only 2 MIN_EXPR.
> -   So check that too. */
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 4 "phiopt1" } } */
> +/* After phiopt1, will be only 2 MIN_EXPR in the IR (including debug 
> statements). */
> +/* xk will only have the final result so the extra debug info does not 
> change anything. */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
>  /* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> index 2af10776346..521afe3e4d9 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-3.c
> @@ -25,5 +25,5 @@ main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 3 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
>  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 0 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> index 973f39bfed3..49e27185b5e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-4.c
> @@ -26,4 +26,4 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "MIN_EXPR" 0 "phiopt1" } } */
> -/* { dg-final { scan-tree-dump-times "MAX_EXPR" 3 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> index 34e4e720511..194c881cc98 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-5.c
> @@ -25,5 +25,5 @@ main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "phiopt1" } } */
>  /* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-8.c
> index 0160e573fef..d5cb53145ea 100644

Re: [PATCH] PHIOPT: small refactoring of match_simplify_replacement.

2023-05-02 Thread Richard Biener via Gcc-patches
On Sun, Apr 30, 2023 at 11:14 PM Andrew Pinski via Gcc-patches
 wrote:
>
> When I added diamond shaped form bb to match_simplify_replacement,
> I copied the code to move the statement rather than factoring it
> out to a new function. This does the refactoring to a new function
> to avoid the duplicated code. It will make adding support for having
> two statements to move easier (the second statement will only be a
> conversion).
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (move_stmt): New function.
> (match_simplify_replacement): Use move_stmt instead
> of the inlined version.
> ---
>  gcc/tree-ssa-phiopt.cc | 57 ++
>  1 file changed, 24 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 024a4362093..65b3deea34a 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -643,6 +643,28 @@ empty_bb_or_one_feeding_into_p (basic_block bb,
>return true;
>  }
>
> +/* Move STMT to before GSI and insert its defining
> +   name into INSERTED_EXPRS bitmap. */
> +static void
> +move_stmt (gimple *stmt, gimple_stmt_iterator *gsi, auto_bitmap 
> _exprs)
> +{
> +  if (!stmt)
> +return;
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +{
> +  fprintf (dump_file, "statement un-sinked:\n");
> +  print_gimple_stmt (dump_file, stmt, 0,
> +TDF_VOPS|TDF_MEMSYMS);
> +}
> +
> +  tree name = gimple_get_lhs (stmt);
> +  // Mark the name to be renamed if there is one.
> +  bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (name));
> +  gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt);
> +  gsi_move_before (, gsi);
> +  reset_flow_sensitive_info (name);
> +}
> +
>  /*  The function match_simplify_replacement does the main work of doing the
>  replacement using match and simplify.  Return true if the replacement is 
> done.
>  Otherwise return false.
> @@ -727,39 +749,8 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>
>/* If there was a statement to move, move it to right before
>   the original conditional.  */
> -  if (stmt_to_move)
> -{
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> -   {
> - fprintf (dump_file, "statement un-sinked:\n");
> - print_gimple_stmt (dump_file, stmt_to_move, 0,
> -  TDF_VOPS|TDF_MEMSYMS);
> -   }
> -
> -  tree name = gimple_get_lhs (stmt_to_move);
> -  // Mark the name to be renamed if there is one.
> -  bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (name));
> -  gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt_to_move);
> -  gsi_move_before (, );
> -  reset_flow_sensitive_info (name);
> -}
> -
> -  if (stmt_to_move_alt)
> -{
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> -   {
> - fprintf (dump_file, "statement un-sinked:\n");
> - print_gimple_stmt (dump_file, stmt_to_move_alt, 0,
> -  TDF_VOPS|TDF_MEMSYMS);
> -   }
> -
> -  tree name = gimple_get_lhs (stmt_to_move_alt);
> -  // Mark the name to be renamed if there is one.
> -  bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (name));
> -  gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt_to_move_alt);
> -  gsi_move_before (, );
> -  reset_flow_sensitive_info (name);
> -}
> +  move_stmt (stmt_to_move, , inserted_exprs);
> +  move_stmt (stmt_to_move_alt, , inserted_exprs);
>
>replace_phi_edge_with_variable (cond_bb, e1, phi, result, inserted_exprs);
>
> --
> 2.31.1
>


Re: [PATCH] MATCH: Port CLRSB part of builtin_zero_pattern

2023-05-02 Thread Richard Biener via Gcc-patches
On Sun, Apr 30, 2023 at 11:13 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This ports the clrsb builtin part of builtin_zero_pattern
> to match.pd. A simple pattern to port.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> * match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New
> pattern.
> ---
>  gcc/match.pd | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0e782cde71d..bf918ba70ce 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7787,6 +7787,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(cond (ne @0 integer_zerop@1) (func@4 (convert? @2)) integer_zerop@3)
>@4))
>
> +/* a != 0 ? FUN(a) : CST -> Fun(a) for some CLRSB builtins
> +   where CST is precision-1. */
> +(for func (CLRSB)
> + (simplify
> +  (cond (ne @0 integer_zerop@1) (func@5 (convert?@4 @2)) INTEGER_CST@3)

As you don't seem to use @2 why not match (func@5 @4) only?

Otherwise LGTM.

> +  (if (wi::to_widest (@3) == TYPE_PRECISION (TREE_TYPE (@4)) - 1)
> +   @5)))
> +
>  #if GIMPLE
>  /* a != 0 ? CLZ(a) : CST -> .CLZ(a) where CST is the result of the internal 
> function for 0. */
>  (for func (CLZ)
> --
> 2.31.1
>


Re: [PATCH] target: [PR109657] (a ? -1 : 0) | b could be optimized better for aarch64

2023-05-02 Thread Richard Sandiford via Gcc-patches
Andrew Pinski via Gcc-patches  writes:
> There is no canonical form for this case defined. So the aarch64 backend needs
> a pattern to match both of these forms.
>
> The forms are:
> (set (reg/i:SI 0 x0)
> (if_then_else:SI (eq (reg:CC 66 cc)
> (const_int 0 [0]))
> (reg:SI 97)
> (const_int -1 [0x])))
> and
> (set (reg/i:SI 0 x0)
> (ior:SI (neg:SI (ne:SI (reg:CC 66 cc)
> (const_int 0 [0])))
> (reg:SI 102)))
>
> Currently the aarch64 backend matches the first form so this
> patch adds a insn_and_split to match the second form and
> convert it to the first form.
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
>
>   PR target/109657
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (*cmov_insn_m1): New
>   insn_and_split pattern.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/csinv-2.c: New test.
> ---
>  gcc/config/aarch64/aarch64.md  | 20 +
>  gcc/testsuite/gcc.target/aarch64/csinv-2.c | 26 ++
>  2 files changed, 46 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/csinv-2.c
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index e1a2b265b20..57fe5601350 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4194,6 +4194,26 @@ (define_insn "*cmovsi_insn_uxtw"
>[(set_attr "type" "csel, csel, csel, csel, csel, mov_imm, mov_imm")]
>  )
>  
> +;; There are two canonical forms for `cmp ? -1 : a`.
> +;; This is the second form and is here to help combine.
> +;; Support `-(cmp) | a` into `cmp ? -1 : a` to be canonical in the backend.
> +(define_insn_and_split "*cmov_insn_m1"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +(ior:GPI
> +  (neg:GPI
> +   (match_operator:GPI 1 "aarch64_comparison_operator"
> +[(match_operand 2 "cc_register" "") (const_int 0)]))
> +  (match_operand 3 "register_operand" "r")))]
> +  ""
> +  "#"
> +  "&& true"
> +  [(set (match_dup 0)
> + (if_then_else:GPI (match_dup 1)
> +  (const_int -1) (match_dup 3)))]

Sorry for the nit, but the formatting of the last two lines looks odd IMO.
How about:

(if_then_else:GPI (match_dup 1) (const_int -1) (match_dup 3))...

or:

(if_then_else:GPI (match_dup 1)
  (const_int -1)
  (match_dup 3))...

OK with that change, thanks.

Richard

> +  {}
> +  [(set_attr "type" "csel")]
> +)
> +
>  (define_insn "*cmovdi_insn_uxtw"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (if_then_else:DI
> diff --git a/gcc/testsuite/gcc.target/aarch64/csinv-2.c 
> b/gcc/testsuite/gcc.target/aarch64/csinv-2.c
> new file mode 100644
> index 000..89132acb713
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/csinv-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* PR target/109657: (a ? -1 : 0) | b could be better */
> +
> +/* Both functions should have the same assembly of:
> +   cmp w1, 0
> +   csinv   w0, w0, wzr, eq
> +
> +   We should not get:
> +   cmp w1, 0
> +   csetm   w1, ne
> +   orr w0, w1, w0
> + */
> +/* { dg-final { scan-assembler-times "csinv\tw\[0-9\]" 2 } } */
> +/* { dg-final { scan-assembler-not "csetm\tw\[0-9\]" } } */
> +unsigned b(unsigned a, unsigned b)
> +{
> +  if(b)
> +return -1;
> +  return a;
> +}
> +unsigned b1(unsigned a, unsigned b)
> +{
> +unsigned t = b ? -1 : 0;
> +return a | t;
> +}


[PATCH] RISC-V: fix build issue with gcc 4.9.x

2023-05-02 Thread Romain Naour via Gcc-patches
GCC should still build with GCC 4.8.3 or newer [1]
using C++03 by default. But a recent change in
RISC-V port introduced a C++11 feature "std::log2" [2].

Use log2 from the C header, without the namespace [3].

[1] https://gcc.gnu.org/install/prerequisites.html
[2] 
https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=7caa1ae5e451e780fbc4746a54e3f19d4f4304dc
[3] 
https://stackoverflow.com/questions/26733413/error-log2-is-not-a-member-of-std

Fixes:
https://gitlab.com/buildroot.org/toolchains-builder/-/jobs/4202276589

gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc: Use log2 from the C header, 
without
the namespace.

Signed-off-by: Romain Naour 
---
 gcc/config/riscv/genrvv-type-indexer.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index e677b55290c..eebe382d1c3 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -115,9 +115,9 @@ same_ratio_eew_type (unsigned sew, int lmul_log2, unsigned 
eew, bool unsigned_p,
   if (sew == eew)
 elmul_log2 = lmul_log2;
   else if (sew > eew)
-elmul_log2 = lmul_log2 - std::log2 (sew / eew);
+elmul_log2 = lmul_log2 - log2 (sew / eew);
   else /* sew < eew */
-elmul_log2 = lmul_log2 + std::log2 (eew / sew);
+elmul_log2 = lmul_log2 + log2 (eew / sew);
 
   if (float_p)
 return floattype (eew, elmul_log2);
-- 
2.34.3



Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-02 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Tue, 2 May 2023 at 14:56, Richard Sandiford
>  wrote:
>> > [aarch64] Improve code-gen for vector initialization with single constant 
>> > element.
>> >
>> > gcc/ChangeLog:
>> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
>> > condition
>> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>> >   and if maxv == 1, use constant element for duplicating into register.
>> >
>> > gcc/testsuite/ChangeLog:
>> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 2b0de7ca038..f46750133a6 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >   and matches[X][1] with the count of duplicate elements (if X is the
>> >   earliest element which has duplicates).  */
>> >
>> > -  if (n_var == n_elts && n_elts <= 16)
>> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
>> >  {
>> >int matches[16][2] = {0};
>> >for (int i = 0; i < n_elts; i++)
>> > @@ -7,6 +7,18 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>> >vector register.  For big-endian we want that position to hold
>> >the last element of VALS.  */
>> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
>> > +
>> > +   /* If we have a single constant element, use that for duplicating
>> > +  instead.  */
>> > +   if (n_var == n_elts - 1)
>> > + for (int i = 0; i < n_elts; i++)
>> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
>> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
>> > + {
>> > +   maxelement = i;
>> > +   break;
>> > + }
>> > +
>> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
>> > aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>>
>> We don't want to force the constant into a register though.
> OK right, sorry.
> With the attached patch, for the following test-case:
> int64x2_t f_s64(int64_t x)
> {
>   return (int64x2_t) { x, 1 };
> }
>
> it loads constant from memory (same code-gen as without patch).
> f_s64:
> adrpx1, .LC0
> ldr q0, [x1, #:lo12:.LC0]
> ins v0.d[0], x0
> ret
>
> Does the patch look OK ?
>
> Thanks,
> Prathamesh
> [...]
> [aarch64] Improve code-gen for vector initialization with single constant 
> element.
>
> gcc/ChangeLog:
>   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak condition
>   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>   and if maxv == 1, use constant element for duplicating into register.
>
> gcc/testsuite/ChangeLog:
>   * gcc.target/aarch64/vec-init-single-const.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 2b0de7ca038..97309ddec4f 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>   and matches[X][1] with the count of duplicate elements (if X is the
>   earliest element which has duplicates).  */
>  
> -  if (n_var == n_elts && n_elts <= 16)
> +  if ((n_var >= n_elts - 1) && n_elts <= 16)

No need for the extra brackets.

>  {
>int matches[16][2] = {0};
>for (int i = 0; i < n_elts; i++)
> @@ -7,8 +7,26 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>vector register.  For big-endian we want that position to hold
>the last element of VALS.  */
> maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
> -   rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> -   aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
> +
> +   /* If we have a single constant element, use that for duplicating
> +  instead.  */
> +   if (n_var == n_elts - 1)
> + for (int i = 0; i < n_elts; i++)
> +   if (CONST_INT_P (XVECEXP (vals, 0, i))
> +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
> + {
> +   maxelement = i;
> +   break;
> + }
> +
> +   rtx maxval = XVECEXP (vals, 0, maxelement);
> +   if (!(CONST_INT_P (maxval) || CONST_DOUBLE_P (maxval)))
> + {
> +   rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> +   aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
> + }
> +   else
> + aarch64_emit_move (target, gen_vec_duplicate (mode, maxval));
>   }
>else
>   {

This seems a bit convoluted.  It might be easier to record whether
we see a CONST_INT_P or a CONST_DOUBLE_P during the previous loop,
and if so what the constant is.  Then handle that case first,
as a separate arm of the "if".

> 

Re: [PATCH] libstdc++: Another attempt to ensure g++ 13+ compiled programs enforce gcc 13.2+ libstdc++.so.6 [PR108969]

2023-05-02 Thread Rainer Orth
Hi Jakub,

> Bootstrapped/regtested on x86_64-linux, i686-linux and sparc-sun-solaris2.11
>
> On the last one I've actually checked a version which had
> defined(_GLIBCXX_SYMVER_SUN) next to defined(_GLIBCXX_SYMVER_GNU), but
> init_priority attribute doesn't seem to be supported there and so I couldn't
> actually test how this works there.  Using gas and Sun ld, Rainer, does one
> need to use gas + gld for init_priority or something else?

it's complicated, as usual ;-)  While Solaris 11.3 ld has basic
.init_priority etc. support, it doesn't know about priorities, so the
support is disabled.  This is what you see on gcc211 in the cfarm.

Solaris 11.4 ld on the other hand added everything to make the configure
test work, so HAVE_INITFINI_ARRAY_SUPPORT is enabled.  Unfortunately,
there's still PR c++/52477 (dup of PR c++/81337) where g++ makes
unwarranted assumptions which happen to hold on Linux/glibc (only?), but
often break other targets (Solaris, Darwin, FreeBSD) in unexpected
places (like libsanitizer).  I've made at least two attempts at a fix
which got me nowhere, so I'm considering disabling
HAVE_INITFINI_ARRAY_SUPPORT on Solaris for good.  This is purely a g++
issue, clang++ gets this right.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] OpenACC: Further attach/detach clause fixes for Fortran [PR109622]

2023-05-02 Thread Tobias Burnus

On 29.04.23 12:57, Julian Brown wrote:

This patch moves several tests introduced by the following patch:

   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616939.html


I believe you intent this as git log entry. Can you add
  ... commit r14-325-gcacf65d74463600815773255e8b82b4043432bd7
as this makes looking at the git history easier.


into the proper location for OpenACC testing (thanks to Thomas for
spotting my mistake!), and also fixes a few additional problems --
missing diagnostics for non-pointer attaches, and a case where a pointer
was incorrectly dereferenced. Tests are also adjusted for vector-length
warnings on nvidia accelerators.

Tested with offloading to nvptx. OK?

2023-04-29  Julian Brown  

  PR fortran/109622

gcc/fortran/
  * trans-openmp.cc (gfc_trans_omp_clauses): Add diagnostic for
  non-pointer/non-allocatable attach/detach.  Remove dereference for
  pointer-to-scalar derived type component attach/detach.


In general, we prefer resolution-time diagnostic to tree-translation diagnostic,
unless there is a good reason to do the latter.
At a glance, it should be even sufficient to have a single diagnostic
instead of two when placed into openmp.cc.

Search for lastref in resolve_omp_clauses; I think it should do,
but otherwise something like:
 symbol_attr attr = gfc_expr_attr(e);
 if (attr.pointer || attr.allocatable)
should work.

You currently have:


@@ -3430,6 +3432,13 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  = TYPE_SIZE_UNIT (gfc_charlen_type_node);
  }
  }
+   else if (openacc
+&& (n->u.map_op == OMP_MAP_ATTACH
+|| n->u.map_op == OMP_MAP_DETACH))
+ gfc_error ("%qs clause argument not pointer or "
+"allocatable at %L",
+(n->u.map_op == OMP_MAP_ATTACH)
+? "attach" : "detach", );


Additionally, I think we we usually have wording like: 'must be ALLOCATABLE or 
a POINTER'.
(Which avoids also the question whether 'neither' instead of 'not should be used
and/besides to 'nor' instead of 'or'.)

Additionally, I think there should be a also an error for:

integer :: a
!$acc enter data attach(a)
end

(Well, in that case, just looking at n->expr won't work, but also n->sym needs 
to be
handled for list == OMP_LIST_MAP && n->u.map_op == OMP_MAP_(DE)ATTACH.

The other changes look fine. Thanks!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq

2023-05-02 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, April 18, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 03/22] arm: [MVE intrinsics] Rework vreinterpretq
> 
> This patch implements vreinterpretq using the new MVE intrinsics
> framework.
> 
> The old definitions for vreinterpretq are removed as a consequence.
> 
> 2022-09-08  Murray Steele  
>   Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New
> class.
>   * config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
>   * config/arm/arm-mve-builtins-base.h (vreinterpretq): New
> declaration.
>   * config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New
> function.
>   (parse_type): Likewise.
>   (parse_signature): Likewise.
>   (build_one): Likewise.
>   (build_all): Likewise.
>   (overloaded_base): New struct.
>   (unary_convert_def): Likewise.
>   * config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
>   * config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
>   macro.
>   (TYPES_reinterpret_unsigned1): Likewise.
>   (TYPES_reinterpret_integer): Likewise.
>   (TYPES_reinterpret_integer1): Likewise.
>   (TYPES_reinterpret_float1): Likewise.
>   (TYPES_reinterpret_float): Likewise.
>   (reinterpret_integer): New.
>   (reinterpret_float): New.
>   (handle_arm_mve_h): Register builtins.
>   * config/arm/arm_mve.h (vreinterpretq_s16): Remove.
>   (vreinterpretq_s32): Likewise.
>   (vreinterpretq_s64): Likewise.
>   (vreinterpretq_s8): Likewise.
>   (vreinterpretq_u16): Likewise.
>   (vreinterpretq_u32): Likewise.
>   (vreinterpretq_u64): Likewise.
>   (vreinterpretq_u8): Likewise.
>   (vreinterpretq_f16): Likewise.
>   (vreinterpretq_f32): Likewise.
>   (vreinterpretq_s16_s32): Likewise.
>   (vreinterpretq_s16_s64): Likewise.
>   (vreinterpretq_s16_s8): Likewise.
>   (vreinterpretq_s16_u16): Likewise.
>   (vreinterpretq_s16_u32): Likewise.
>   (vreinterpretq_s16_u64): Likewise.
>   (vreinterpretq_s16_u8): Likewise.
>   (vreinterpretq_s32_s16): Likewise.
>   (vreinterpretq_s32_s64): Likewise.
>   (vreinterpretq_s32_s8): Likewise.
>   (vreinterpretq_s32_u16): Likewise.
>   (vreinterpretq_s32_u32): Likewise.
>   (vreinterpretq_s32_u64): Likewise.
>   (vreinterpretq_s32_u8): Likewise.
>   (vreinterpretq_s64_s16): Likewise.
>   (vreinterpretq_s64_s32): Likewise.
>   (vreinterpretq_s64_s8): Likewise.
>   (vreinterpretq_s64_u16): Likewise.
>   (vreinterpretq_s64_u32): Likewise.
>   (vreinterpretq_s64_u64): Likewise.
>   (vreinterpretq_s64_u8): Likewise.
>   (vreinterpretq_s8_s16): Likewise.
>   (vreinterpretq_s8_s32): Likewise.
>   (vreinterpretq_s8_s64): Likewise.
>   (vreinterpretq_s8_u16): Likewise.
>   (vreinterpretq_s8_u32): Likewise.
>   (vreinterpretq_s8_u64): Likewise.
>   (vreinterpretq_s8_u8): Likewise.
>   (vreinterpretq_u16_s16): Likewise.
>   (vreinterpretq_u16_s32): Likewise.
>   (vreinterpretq_u16_s64): Likewise.
>   (vreinterpretq_u16_s8): Likewise.
>   (vreinterpretq_u16_u32): Likewise.
>   (vreinterpretq_u16_u64): Likewise.
>   (vreinterpretq_u16_u8): Likewise.
>   (vreinterpretq_u32_s16): Likewise.
>   (vreinterpretq_u32_s32): Likewise.
>   (vreinterpretq_u32_s64): Likewise.
>   (vreinterpretq_u32_s8): Likewise.
>   (vreinterpretq_u32_u16): Likewise.
>   (vreinterpretq_u32_u64): Likewise.
>   (vreinterpretq_u32_u8): Likewise.
>   (vreinterpretq_u64_s16): Likewise.
>   (vreinterpretq_u64_s32): Likewise.
>   (vreinterpretq_u64_s64): Likewise.
>   (vreinterpretq_u64_s8): Likewise.
>   (vreinterpretq_u64_u16): Likewise.
>   (vreinterpretq_u64_u32): Likewise.
>   (vreinterpretq_u64_u8): Likewise.
>   (vreinterpretq_u8_s16): Likewise.
>   (vreinterpretq_u8_s32): Likewise.
>   (vreinterpretq_u8_s64): Likewise.
>   (vreinterpretq_u8_s8): Likewise.
>   (vreinterpretq_u8_u16): Likewise.
>   (vreinterpretq_u8_u32): Likewise.
>   (vreinterpretq_u8_u64): Likewise.
>   (vreinterpretq_s32_f16): Likewise.
>   (vreinterpretq_s32_f32): Likewise.
>   (vreinterpretq_u16_f16): Likewise.
>   (vreinterpretq_u16_f32): Likewise.
>   (vreinterpretq_u32_f16): Likewise.
>   (vreinterpretq_u32_f32): Likewise.
>   (vreinterpretq_u64_f16): Likewise.
>   (vreinterpretq_u64_f32): Likewise.
>   (vreinterpretq_u8_f16): Likewise.
>   (vreinterpretq_u8_f32): Likewise.
>   (vreinterpretq_f16_f32): Likewise.
>   (vreinterpretq_f16_s16): Likewise.
>   (vreinterpretq_f16_s32): Likewise.
>   (vreinterpretq_f16_s64): Likewise.
>   (vreinterpretq_f16_s8): Likewise.

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-02 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 2 May 2023 at 14:56, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 25 Apr 2023 at 16:29, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > Hi Richard,
> >> > While digging thru aarch64_expand_vector_init, I noticed it gives
> >> > priority to loading a constant first:
> >> >  /* Initialise a vector which is part-variable.  We want to first try
> >> >  to build those lanes which are constant in the most efficient way we
> >> >  can.  */
> >> >
> >> > which results in suboptimal code-gen for following case:
> >> > int16x8_t f_s16(int16_t x)
> >> > {
> >> >   return (int16x8_t) { x, x, x, x, x, x, x, 1 };
> >> > }
> >> >
> >> > code-gen trunk:
> >> > f_s16:
> >> > moviv0.8h, 0x1
> >> > ins v0.h[0], w0
> >> > ins v0.h[1], w0
> >> > ins v0.h[2], w0
> >> > ins v0.h[3], w0
> >> > ins v0.h[4], w0
> >> > ins v0.h[5], w0
> >> > ins v0.h[6], w0
> >> > ret
> >> >
> >> > The attached patch tweaks the following condition:
> >> > if (n_var == n_elts && n_elts <= 16)
> >> >   {
> >> > ...
> >> >   }
> >> >
> >> > to pass if maxv >= 80% of n_elts, with 80% being an
> >> > arbitrary "high enough" threshold. The intent is to dup
> >> > the most repeating variable if it it's repetition
> >> > is "high enough" and insert constants which should be "better" than
> >> > loading constant first and inserting variables like in the above case.
> >>
> >> I'm not too keen on the 80%.  Like you say, it seems a bit arbitrary.
> >>
> >> The case above can also be handled by relaxing n_var == n_elts to
> >> n_var >= n_elts - 1, so that if there's just one constant element,
> >> we look for duplicated variable elements.  If there are none
> >> (maxv == 1), but there is a constant element, we can duplicate
> >> the constant element into a register.
> >>
> >> The case when there's more than one constant element needs more thought
> >> (and testcases :-)).  E.g. after a certain point, it would probably be
> >> better to load the variable and constant parts separately and blend them
> >> using TBL.  It also matters whether the constants are equal or not.
> >>
> >> There are also cases that could be handled using EXT.
> >>
> >> Plus, if we're inserting many variable elements that are already
> >> in GPRs, we can probably do better by coalescing them into bigger
> >> GPR values and inserting them as wider elements.
> >>
> >> Because of things like that, I think we should stick to the
> >> single-constant case for now.
> > Hi Richard,
> > Thanks for the suggestions. The attached patch only handles the single
> > constant case.
> > Bootstrap+test in progress on aarch64-linux-gnu.
> > Does it look OK ?
> >
> > Thanks,
> > Prathamesh
> >>
> >> Thanks,
> >> Richard
> >
> > [aarch64] Improve code-gen for vector initialization with single constant 
> > element.
> >
> > gcc/ChangeLog:
> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
> > condition
> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
> >   and if maxv == 1, use constant element for duplicating into register.
> >
> > gcc/testsuite/ChangeLog:
> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 2b0de7ca038..f46750133a6 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> >   and matches[X][1] with the count of duplicate elements (if X is the
> >   earliest element which has duplicates).  */
> >
> > -  if (n_var == n_elts && n_elts <= 16)
> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
> >  {
> >int matches[16][2] = {0};
> >for (int i = 0; i < n_elts; i++)
> > @@ -7,6 +7,18 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> >vector register.  For big-endian we want that position to hold
> >the last element of VALS.  */
> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
> > +
> > +   /* If we have a single constant element, use that for duplicating
> > +  instead.  */
> > +   if (n_var == n_elts - 1)
> > + for (int i = 0; i < n_elts; i++)
> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
> > + {
> > +   maxelement = i;
> > +   break;
> > + }
> > +
> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> > aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
>
> We don't want to force the constant into a register though.
OK right, sorry.
With the attached patch, for the following test-case:
int64x2_t f_s64(int64_t x)
{
  return (int64x2_t) { x, 1 };
}

it loads constant from memory (same code-gen as 

Re: [PATCH 02/10] arm: Fix vstrwq* backend + testsuite

2023-05-02 Thread Andrea Corallo via Gcc-patches
Christophe Lyon  writes:

> Hi Andrea,
>
> Minor comments below:
>
> On 4/28/23 13:29, Andrea Corallo via Gcc-patches wrote:
>> Hi all,
>> this patch fixes the vstrwq* MVE instrinsics failing to emit the
>> correct sequence of instruction due to a missing predicates. Also the
> nit: you have a typo, should be "predicate"

Ack thanks.

>> immediate range is fixed to be multiples of 2 up between [-252, 252].
>
> Out of curiosity, which tests were affected by this error in the
> immediate range?

None I'd say, so far we have no extensive tests checking for immediate
range in the testsuite.

BR

  Andrea


  1   2   >