date:20220916

Re: [PATCH] vect: Fix SLP layout handling of masked loads [PR106794]

2022-09-16 Thread Richard Biener via Gcc-patches


On Fri, 16 Sep 2022, Richard Sandiford wrote:


PR106794 shows that I'd forgotten about masked loads when
doing the SLP layout changes.  These loads can't currently
be permuted independently of their mask input, so during
construction they never get a load permutation.

(If we did support permuting masked loads in future, the mask
would need to be in the right order for the load, rather than in
the order implied by the result of the permutation.  Since masked
loads can't be partly or fully scalarised in the way that normal
permuted loads can be, there's probably no benefit to fusing the
permutation and the load.  Permutation after the fact is probably
good enough.)

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?


OK.

Thanks,
Richard.


Richard


gcc/
PR tree-optimization/106794
PR tree-optimization/106914
* tree-vect-slp.cc (vect_optimize_slp_pass::internal_node_cost):
Only consider loads that already have a permutation.
(vect_optimize_slp_pass::start_choosing_layouts): Assert that
loads with permutations are leaf nodes.  Prevent any kind of grouped
access from changing layout if it doesn't have a load permutation.

gcc/testsuite/
* gcc.dg/vect/pr106914.c: New test.
* g++.dg/vect/pr106794.cc: Likewise.
---
gcc/testsuite/g++.dg/vect/pr106794.cc | 40 +++
gcc/testsuite/gcc.dg/vect/pr106914.c  | 15 ++
gcc/tree-vect-slp.cc  | 30 ++--
3 files changed, 77 insertions(+), 8 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/vect/pr106794.cc
create mode 100644 gcc/testsuite/gcc.dg/vect/pr106914.c

diff --git a/gcc/testsuite/g++.dg/vect/pr106794.cc 
b/gcc/testsuite/g++.dg/vect/pr106794.cc
new file mode 100644
index 000..f056563c4e1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr106794.cc
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast" } */
+/* { dg-additional-options "-march=bdver2" { target x86_64-*-* i?86-*-* } } */
+
+template  struct Vector3 {
+  Vector3();
+  Vector3(T, T, T);
+  T length() const;
+  T x, y, z;
+};
+template 
+Vector3::Vector3(T _x, T _y, T _z) : x(_x), y(_y), z(_z) {}
+Vector3 cross(Vector3 a, Vector3 b) {
+  return Vector3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z,
+a.x * b.y - a.y * b.x);
+}
+template  T Vector3::length() const { return z; }
+int generateNormals_i;
+float generateNormals_p2_0, generateNormals_p0_0;
+struct SphereMesh {
+  void generateNormals();
+  float vertices;
+};
+void SphereMesh::generateNormals() {
+  Vector3 *faceNormals = new Vector3;
+  for (int j; j; j++) {
+float *p0 =  + 3, *p1 =  + j * 3, *p2 =  + 3,
+  *p3 =  + generateNormals_i + j * 3;
+Vector3 v0(p1[0] - generateNormals_p0_0, p1[1] - 1, p1[2] - 2),
+v1(0, 1, 2);
+if (v0.length())
+  v1 = Vector3(p3[0] - generateNormals_p2_0, p3[1] - p2[1],
+  p3[2] - p2[2]);
+else
+  v1 = Vector3(generateNormals_p0_0 - p3[0], p0[1] - p3[1],
+  p0[2] - p3[2]);
+Vector3 faceNormal = cross(v0, v1);
+faceNormals[j] = faceNormal;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr106914.c 
b/gcc/testsuite/gcc.dg/vect/pr106914.c
new file mode 100644
index 000..9d9b3e30081
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr106914.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fprofile-generate" } */
+/* { dg-additional-options "-mavx512vl" { target x86_64-*-* i?86-*-* } } */
+
+int *mask_slp_int64_t_8_2_x, *mask_slp_int64_t_8_2_y, *mask_slp_int64_t_8_2_z;
+
+void
+__attribute__mask_slp_int64_t_8_2() {
+  for (int i; i; i += 8) {
+mask_slp_int64_t_8_2_x[i + 6] =
+mask_slp_int64_t_8_2_y[i + 6] ? mask_slp_int64_t_8_2_z[i] : 1;
+mask_slp_int64_t_8_2_x[i + 7] =
+mask_slp_int64_t_8_2_y[i + 7] ? mask_slp_int64_t_8_2_z[i + 7] : 2;
+  }
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ca3422c2a1e..229f2663ebc 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4494,7 +4494,8 @@ vect_optimize_slp_pass::internal_node_cost (slp_tree 
node, int in_layout_i,
  stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node);
  if (rep
  && STMT_VINFO_DATA_REF (rep)
-  && DR_IS_READ (STMT_VINFO_DATA_REF (rep)))
+  && DR_IS_READ (STMT_VINFO_DATA_REF (rep))
+  && SLP_TREE_LOAD_PERMUTATION (node).exists ())
{
  auto_load_permutation_t tmp_perm;
  tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node));
@@ -4569,8 +4570,12 @@ vect_optimize_slp_pass::start_choosing_layouts ()
  if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
{
  /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node
-unpermuted, record a layout that reverses this permutation.  */
- gcc_assert (partition.layout == 0);
+unpermuted, record a layout that reverses this permutation.
+
+We would need more

Re: [PATCH] vect: Fix missed gather load opportunity

2022-09-16 Thread Richard Biener via Gcc-patches





On Fri, 16 Sep 2022, Richard Sandiford wrote:


While writing a testcase for PR106794, I noticed that we failed
to vectorise the testcase in the patch for SVE.  The code that
recognises gather loads tries to optimise the point at which
the offset is calculated, to avoid unnecessary extensions or
truncations:

  /* Don't include the conversion if the target is happy with
 the current offset type.  */

But breaking only makes sense if we're at an SSA_NAME (which could
then be vectorised).  We shouldn't break on a conversion embedded
in a generic expression.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?


OK,

Thanks,
Richard.


Richard


gcc/
* tree-vect-data-refs.cc (vect_check_gather_scatter): Restrict
early-out optimisation to SSA_NAMEs.

gcc/testsuite/
* gcc.dg/vect/vect-gather-5.c: New test.
---
gcc/testsuite/gcc.dg/vect/vect-gather-5.c | 42 +++
gcc/tree-vect-data-refs.cc|  1 +
2 files changed, 43 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-5.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-5.c
new file mode 100644
index 000..8b5074bba88
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-5.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+#ifdef __aarch64__
+#pragma GCC target "+sve"
+#endif
+
+long a[100], b[100], c[100];
+
+void g1 ()
+{
+  for (int i = 0; i < 100; i += 2)
+{
+  c[i] += a[b[i]] + 1;
+  c[i + 1] += a[b[i + 1]] + 2;
+}
+}
+
+long g2 ()
+{
+  long res = 0;
+  for (int i = 0; i < 100; i += 2)
+{
+  res += a[b[i + 1]];
+  res += a[b[i]];
+}
+  return res;
+}
+
+long g3 ()
+{
+  long res = 0;
+  for (int i = 0; i < 100; i += 2)
+{
+  res += a[b[i]];
+  res += a[b[i + 1]];
+}
+  return res;
+}
+
+/* { dg-final { scan-tree-dump-times {add new stmt[^\n]*GATHER_LOAD} 3 "vect" 
{ target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-not {add new stmt[^\n]*VEC_PERM_EXPR} "vect" { 
target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551e..e03b50498d1 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4151,6 +4151,7 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
  /* Don't include the conversion if the target is happy with
 the current offset type.  */
  if (use_ifn_p
+ && TREE_CODE (off) == SSA_NAME
  && !POINTER_TYPE_P (TREE_TYPE (off))
  && vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr),
   masked_p, vectype, memory_type,
--
2.25.1

[PATCH] RISC-V: Suppress riscv-selftests.cc warning.

2022-09-16 Thread juzhe . zhong

From: Ju-Zhe Zhong 

This patch is a fix patch for:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601643.html

Suppress the warning as follows:

../../../riscv-gcc/gcc/poly-int.h: In function
‘poly_int64 eval_value(rtx, std::map&)’:
../../../riscv-gcc/gcc/poly-int.h:845:48: warning:
‘*((void*)& op2_val +8)’ may be used uninitialized
in this function [-Wmaybe-uninitialized]
 POLY_SET_COEFF (C, r, i, NCa (a.coeffs[i]) + b.coeffs[i]);
^
../../../riscv-gcc/gcc/config/riscv/riscv-selftests.cc:74:23:
note: ‘*((void*)& op2_val +8)’ was declared here
   poly_int64 op1_val, op2_val;

gcc/ChangeLog:

* config/riscv/riscv-selftests.cc (eval_value): Add initial value.

---
 gcc/config/riscv/riscv-selftests.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-selftests.cc 
b/gcc/config/riscv/riscv-selftests.cc
index 167cd47c880..490b6ed6b8e 100644
--- a/gcc/config/riscv/riscv-selftests.cc
+++ b/gcc/config/riscv/riscv-selftests.cc
@@ -71,7 +71,8 @@ eval_value (rtx x, std::map _to_rtx)
   unsigned regno = REGNO (x);
   expr = regno_to_rtx[regno];
 
-  poly_int64 op1_val, op2_val;
+  poly_int64 op1_val = 0;
+  poly_int64 op2_val = 0;
   if (UNARY_P (expr))
 {
   op1_val = eval_value (XEXP (expr, 0), regno_to_rtx);
-- 
2.36.1

Re: [PATCH] C-SKY: Fix unsigned comparison warning

2022-09-16 Thread Jeff Law via Gcc-patches




On 9/12/22 06:19, Jan-Benedict Glaw wrote:

Hi!

When -mfloat-abi=hard support was added, a cast went missing that
used to silence a warning in common code:

/usr/lib/gcc-snapshot/bin/g++  -fno-PIE -c   -g -O2   -DIN_GCC  
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common 
 -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. 
-I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include 
-I../../gcc/gcc/../libcody  -I../../gcc/gcc/../libdecnumber 
-I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I../../gcc/gcc/../libbacktrace   -o builtins.o -MT builtins.o -MMD -MP -MF 
./.deps/builtins.TPo ../../gcc/gcc/builtins.cc
In file included from ./tm.h:21,
  from ../../gcc/gcc/backend.h:28,
  from ../../gcc/gcc/builtins.cc:27:
../../gcc/gcc/builtins.cc: In function 'int apply_args_size()':
../../gcc/gcc/config/csky/csky.h:421:13: error: comparison of unsigned expression 
in '>= 0' is always true [-Werror=type-limits]
   421 |   (((REGNO) >= CSKY_FIRST_PARM_REGNUM\
../../gcc/gcc/builtins.cc:1444:13: note: in expansion of macro 
'FUNCTION_ARG_REGNO_P'
  1444 | if (FUNCTION_ARG_REGNO_P (regno))
   | ^~~~
cc1plus: all warnings being treated as errors
make[1]: *** [Makefile:1146: builtins.o] Error 1

The needed (int) cast is even mentioned in the comment above, so reinstate
it here.



2022-09-06  Jan-Benedict Glaw  

gcc/ChangeLog:
* config/csky/csky.h (FUNCTION_ARG_REGNO_P): Cast REGNO to (int)
to prevent warning.


OK

jeff

Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-09-16 Thread Jeff Law via Gcc-patches




On 9/6/22 05:39, Alexander Monakov via Gcc-patches wrote:

On Mon, 5 Sep 2022, Philipp Tomsich wrote:


+riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode mode_rep)
+{
+  /* On 64-bit targets, SImode register values are sign-extended to DImode.  */
+  if (TARGET_64BIT && mode == SImode && mode_rep == DImode)
+return SIGN_EXTEND;

I think this leads to a counter-intuitive requirement that a hand-written
inline asm must sign-extend its output operands that are bound to either
signed or unsigned 32-bit lvalues. Will compiler users be aware of that?


Is this significantly different than on MIPS?  Hand-written code there 
also has to ensure that the results are properly sign extended and it's 
been that way for 20+ years since the introduction of mips64 IIRC.  
Though I don't think we had MODE_REP_EXTENDED that long.


Haha, MIPS is the only target that currently defines 
TARGET_MODE_REP_EXTENDED :-)






Moreover, without adjusting TARGET_TRULY_NOOP_TRUNCATION this should cause
miscompilation when a 64-bit variable is truncated to 32 bits: the pre-existing
hook says that nothing needs to be done to truncate, but the new hook says
that the result of the truncation is properly sign-extended.

The documentation for TARGET_MODE_REP_EXTENDED warns about that:

 In order to enforce the representation of mode, 
TARGET_TRULY_NOOP_TRUNCATION
 should return false when truncating to mode.


This may well need adjusting in Philipp's patch.   I'd be surprised if 
the MIPS definition wasn't usable nearly verbatim here.



jeff

[PATCH][PUSHED] Fix for an AutoFDO test.

2022-09-16 Thread Eugene Rozenfeld via Gcc-patches

After 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c17975d81aaed49ff759c20c68b31304a6953d58
the expected inlining in indir-call-prof-2.c test happens during afdo phase 
instead of einline.
This patch adjusts the test accordingly.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/indir-call-prof-2.c: Fix dg-final-use-autofdo.
---
 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c 
b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c
index 594c3f34d57..1d64d9f3f62 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-2.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-profile-optimized 
-fdump-tree-einline-optimized" } */
+/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-profile-optimized 
-fdump-ipa-afdo-optimized" } */
 volatile int one;
 static int
 add1 (int val)
@@ -31,5 +31,5 @@ main (void)
 }
 /* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* 
add1 .will resolve by ipa-profile" "profile"} } */
 /* { dg-final-use-not-autofdo { scan-ipa-dump "Indirect call -> direct call.* 
sub1 .will resolve by ipa-profile" "profile"} } */
-/* { dg-final-use-autofdo { scan-tree-dump "Inlining add1/1 into main/4." 
"einline"} } */
-/* { dg-final-use-autofdo { scan-tree-dump "Inlining sub1/2 into main/4." 
"einline"} } */
+/* { dg-final-use-autofdo { scan-ipa-dump "Inlining add1/1 into main/4." 
"afdo"} } */
+/* { dg-final-use-autofdo { scan-ipa-dump "Inlining sub1/2 into main/4." 
"afdo"} } */
--
2.25.1

Re: [PATCH] c++: constraint matching, TEMPLATE_ID_EXPR, current inst

2022-09-16 Thread Jason Merrill via Gcc-patches


On 9/16/22 10:59, Patrick Palka wrote:

On Fri, 16 Sep 2022, Jason Merrill wrote:


On 9/15/22 11:58, Patrick Palka wrote:

Here we're crashing during constraint matching for the instantiated
hidden friends due to two issues with dependent substitution into a
TEMPLATE_ID_EXPR naming a template from the current instantiation
(as performed from maybe_substitute_reqs_for for C<3> with T=T):

* tsubst_copy substitutes into such a TEMPLATE_DECL by looking it
  up from the substituted class scope.  But for this to not fail when
  the args are dependent, we need to pass entering_scope=true for the
  class scope substitution so that we obtain the primary template type
  A (which has TYPE_BINFO) instead of the implicit instantiation
  A (which doesn't).
* lookup_and_finish_template_variable shouldn't instantiate a
  TEMPLATE_ID_EXPR that names a TEMPLATE_DECL which has more than
  one level of (unsubstituted) parameters (such as A::C).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Don't
instantiate if the template's scope is dependent.
(tsubst_copy) : Pass entering_scope=true
when substituting the class scope.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-friend10.C: New test.
---
   gcc/cp/pt.cc  | 14 +++--
   .../g++.dg/cpp2a/concepts-friend10.C  | 21 +++
   2 files changed, 29 insertions(+), 6 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index db4e808adec..bfcbe0b8670 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10475,14 +10475,15 @@ tree
   lookup_and_finish_template_variable (tree templ, tree targs,
 tsubst_flags_t complain)
   {
-  templ = lookup_template_variable (templ, targs);
-  if (!any_dependent_template_arguments_p (targs))
+  tree var = lookup_template_variable (templ, targs);
+  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) == 1
+  && !any_dependent_template_arguments_p (targs))


I notice that finish_id_expression_1 uses the equivalent of
type_dependent_expression_p (var).  Does that work here?


Hmm, it does, but kind of by accident: type_dependent_expression_p
returns true for all variable TEMPLATE_ID_EXPRs because of their empty
TREE_TYPE (as set by finish_template_variable).  So testing t_d_e_p here
is equivalent to testing processing_template_decl, it seems -- maximally
conservative.

We can improve type_dependent_expression_p for variable TEMPLATE_ID_EXPR
by ignoring its (always empty) TREE_TYPE and just considering dependence
of its template and args directly.

Doing so exposes that value_dependent_expression_p is wrong for
(non-type-dependent) variable template specializations -- since we don't
set/track DECL_DEPENDENT_INIT_P for them,


Hmm, why not?


the VAR_DECL branch ends up
returning false even if the initializer depends on outer args.  Instead,
I suppose we can give a reasonably conservative answer by considering
dependence of its enclosing scope as we do for FUNCTION_DECL.


I wonder why we do that for functions rather than rely on the later 
any_dependent_template_arguments_p?  Perhaps because checking whether 
the scope is dependent is cached, so should be faster.  I wonder if it 
would be worthwhile to have similar dependent/dependent_valid flags on 
template arg vecs



Does the following seem reasonable?  Bootstrapped and regtested on
x86_64-pc-linux-gnu.

-- >8 --

gcc/cp/ChangeLog:

* pt.cc (finish_template_variable): Consider only the innermost
template parms since we have only the innermost args.
(lookup_and_finish_template_variable): Check
type_dependent_expression_p instead.
(tsubst_copy) : Pass entering_scope=true
when substituting the class scope.
(value_dependent_expression_p) : Move below ...
: ... here.  Fall through for variable template
specializations.
(type_dependent_expression_p): Handle variable TEMPLATE_ID_EXPR
precisely.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/noexcept1.C: Expect another ahead of time error.
* g++.dg/cpp1y/var-templ70.C: New test.
* g++.dg/cpp2a/concepts-friend10.C: New test.
---
  gcc/cp/pt.cc  | 53 ---
  gcc/testsuite/g++.dg/cpp1y/noexcept1.C|  2 +-
  gcc/testsuite/g++.dg/cpp1y/var-templ70.C  | 22 
  .../g++.dg/cpp2a/concepts-friend10.C  | 24 +
  4 files changed, 82 insertions(+), 19 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ70.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index db4e808adec..88a09891a00 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10447,10 +10447,10 @@

Re: [PATCH] c++: Implement C++23 P1169R4 - static operator() [PR106651]

2022-09-16 Thread Jason Merrill via Gcc-patches


On 9/13/22 12:42, Jakub Jelinek wrote:

Hi!

The following patch attempts to implement C++23 P1169R4 - static operator()
paper's compiler side (there is some small library side too not implemented
yet).  This allows static members as user operator() declarations and
static specifier on lambdas without lambda capture.  As decl specifier
parsing doesn't track about the presence and locations of all specifiers,
the patch unfortunately replaces the diagnostics about duplicate mutable
with diagnostics about conflicting specifiers because the information
whether it was mutable mutable, mutable static, static mutable or static
static is lost.


I wonder why we don't give an error when setting the 
conflicting_specifiers_p flag in cp_parser_set_storage_class?  We should 
be able to give a better diagnostic at that point.



Beyond this, the synthetized conversion operator changes
for static lambdas as it can just return the operator() static method
address, doesn't need to create a thunk for it.
The change I'm least sure about is the call.cc (joust) change, one thing
is that we ICEd because we assumed that len could be different only if
both candidates are direct calls but it can be one direct and one indirect
call,


How do you mean?


and then I'm trying to implement my understanding of the
[over.match.best.general]/1 and [over.best.ics.general] changes from
the paper (implemented both for C++23 and when the static member function
is operator() which we accept with pedwarn in earlier standards too).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-09-13  Jakub Jelinek  

PR c++/106651
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_static_call_operator=202207L for C++23.
gcc/cp/
* cp-tree.h (LAMBDA_EXPR_STATIC_P): Implement C++23
P1169R4 - static operator().  Define.
* parser.cc (CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR): Document
that it also allows static.
(cp_parser_lambda_declarator_opt): Handle static lambda specifier.
(cp_parser_decl_specifier_seq): Allow RID_STATIC for
CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR.
* decl.cc (grok_op_properties): If operator() isn't a method,
use a different error wording, if it is static member function,
allow it (for C++20 and older with a pedwarn unless it is
a lambda function or template instantiation).
* call.cc (joust): Don't ICE if one candidate is static member
function and the other is an indirect call.  For C++23 or if
the static member is operator() and the parameter conversion on
the other candidate is user defined conversion, ellipsis or bad
conversion, make static member function candidate a winner for
that parameter.
* lambda.cc (maybe_add_lambda_conv_op): Handle static lambdas.
* error.cc (dump_lambda_function): Print static for static lambdas.
gcc/testsuite/
* g++.dg/template/error30.C: Adjust expected diagnostics.
* g++.dg/cpp1z/constexpr-lambda13.C: Likewise.
* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_static_call_operator.
* g++.dg/cpp23/static-operator-call1.C: New test.
* g++.dg/cpp23/static-operator-call2.C: New test.
* g++.old-deja/g++.jason/operator.C: Adjust expected diagnostics.

--- gcc/cp/cp-tree.h.jj 2022-09-13 09:21:28.052541628 +0200
+++ gcc/cp/cp-tree.h2022-09-13 12:14:31.674733861 +0200
@@ -504,6 +504,7 @@ extern GTY(()) tree cp_global_trees[CPTI
OVL_NESTED_P (in OVERLOAD)
DECL_MODULE_EXPORT_P (in _DECL)
PACK_EXPANSION_FORCE_EXTRA_ARGS_P (in *_PACK_EXPANSION)
+  LAMBDA_EXPR_STATIC_P (in LAMBDA_EXPR)
 4: IDENTIFIER_MARKED (IDENTIFIER_NODEs)
TREE_HAS_CONSTRUCTOR (in INDIRECT_REF, SAVE_EXPR, CONSTRUCTOR,
  CALL_EXPR, or FIELD_DECL).
@@ -1488,6 +1489,10 @@ enum cp_lambda_default_capture_mode_type
  #define LAMBDA_EXPR_CAPTURE_OPTIMIZED(NODE) \
TREE_LANG_FLAG_2 (LAMBDA_EXPR_CHECK (NODE))
  
+/* Predicate tracking whether the lambda was declared 'static'.  */

+#define LAMBDA_EXPR_STATIC_P(NODE) \
+  TREE_LANG_FLAG_3 (LAMBDA_EXPR_CHECK (NODE))
+
  /* True if this TREE_LIST in LAMBDA_EXPR_CAPTURE_LIST is for an explicit
 capture.  */
  #define LAMBDA_CAPTURE_EXPLICIT_P(NODE) \
--- gcc/cp/parser.cc.jj 2022-09-13 09:21:01.276920558 +0200
+++ gcc/cp/parser.cc2022-09-13 12:14:31.683733738 +0200
@@ -1994,7 +1994,7 @@ enum
   constexpr.  */
CP_PARSER_FLAGS_ONLY_TYPE_OR_CONSTEXPR = 0x8,
/* When parsing a decl-specifier-seq, only allow mutable, constexpr or
- for C++20 consteval.  */
+ for C++20 consteval or for C++23 static.  */
CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR = 0x10,
/* When parsing a decl-specifier-seq, allow missing typename.  */
CP_PARSER_FLAGS_TYPENAME_OPTIONAL = 0x20,
@@ -11714,13 +11714,26 @@ cp_parser_lambda_declarator_opt (cp_pars
omitted_parms_loc = UNKNOWN_LOCATION;

[committed] libstdc++: Add preprocessor conditions for freestanding [PR106953]

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

This doesn't actually change anything for the freestanding build, so is
effectively a no-op. _GLIBCXX_HOSTED is always defined to 1 when these
headers are included. However, somebody else is working on installing the
additional headers for freestanding, so this is a prerequisite for that.

-- >8 --

This adds checks for _GLIBCXX_HOSTED to a number of headers which are
not currently installed for freestanding, but need to be for P1642R11
support. For example,  needs to be installed for C++23
freestanding mode, but without stream iterators and streambuf iterators.
Similarly,  needs to be installed, but without std::allocator
and std::shared_ptr. This change disables the non-freestanding parts of
those headers.

libstdc++-v3/ChangeLog:

PR libstdc++/106953
* include/backward/auto_ptr.h [!_GLIBCXX_HOSTED]: Do not define
shared_ptr members.
* include/bits/alloc_traits.h [!_GLIBCXX_HOSTED]: Do not declare
std::allocator_traits> specializations for
freestanding.
* include/bits/memoryfwd.h [!_GLIBCXX_HOSTED] (allocator): Do
not declare for freestanding.
* include/bits/stl_algo.h [!_GLIBCXX_HOSTED] (stable_partition):
Do not define for freestanding.
[!_GLIBCXX_HOSTED] (merge, stable_sort): Do not use temporary
buffers for freestanding.
* include/bits/stl_algobase.h [!_GLIBCXX_HOSTED]: Do not declare
streambuf iterators and overloaded algorithms using them.
* include/bits/stl_uninitialized.h [!_GLIBCXX_HOSTED]: Do not
define specialized overloads for std::allocator.
* include/bits/unique_ptr.h [!_GLIBCXX_HOSTED] (make_unique)
(make_unique_for_overwrite, operator<<): Do not define for
freestanding.
* include/c_global/cstdlib [!_GLIBCXX_HOSTED] (_Exit): Declare.
Use _GLIBCXX_NOTHROW instead of throw().
* include/debug/assertions.h [!_GLIBCXX_HOSTED]: Ignore
_GLIBCXX_DEBUG for freestanding.
* include/debug/debug.h [!_GLIBCXX_DEBUG]: Likewise.
* include/std/bit [!_GLIBCXX_HOSTED]: Do not use the custom
__int_traits if  is available.
* include/std/functional [!_GLIBCXX_HOSTED]: Do not include
headers that aren't valid for freestanding.
(boyer_moore_searcher, boyer_moore_horspool_searcher): Do not
define for freestanding.
* include/std/iterator [!_GLIBCXX_HOSTED]: Do not include
headers that aren't valid for freestanding.
* include/std/memory [!_GLIBCXX_HOSTED]: Likewise.
* include/std/ranges [!_GLIBCXX_HOSTED] (istream_view): Do not
define for freestanding.
(views::__detail::__is_basic_string_view) [!_GLIBCXX_HOSTED]:
Do not define partial specialization for freestanding.
---
 libstdc++-v3/include/backward/auto_ptr.h  |  4 ++-
 libstdc++-v3/include/bits/alloc_traits.h  | 13 +-
 libstdc++-v3/include/bits/memoryfwd.h |  2 ++
 libstdc++-v3/include/bits/stl_algo.h  | 25 ++-
 libstdc++-v3/include/bits/stl_algobase.h  |  4 +++
 libstdc++-v3/include/bits/stl_uninitialized.h | 17 ++---
 libstdc++-v3/include/bits/unique_ptr.h| 14 ++-
 libstdc++-v3/include/c_global/cstdlib | 24 ++
 libstdc++-v3/include/debug/assertions.h   | 16 ++--
 libstdc++-v3/include/debug/debug.h|  2 +-
 libstdc++-v3/include/std/bit  |  2 +-
 libstdc++-v3/include/std/functional   | 22 ++--
 libstdc++-v3/include/std/iterator |  6 +++--
 libstdc++-v3/include/std/memory   | 11 +---
 libstdc++-v3/include/std/ranges   |  4 +++
 15 files changed, 114 insertions(+), 52 deletions(-)

diff --git a/libstdc++-v3/include/backward/auto_ptr.h 
b/libstdc++-v3/include/backward/auto_ptr.h
index 184ab403466..093db5260fc 100644
--- a/libstdc++-v3/include/backward/auto_ptr.h
+++ b/libstdc++-v3/include/backward/auto_ptr.h
@@ -300,6 +300,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 } _GLIBCXX11_DEPRECATED;
 
 #if __cplusplus >= 201103L
+#if _GLIBCXX_HOSTED
   template<_Lock_policy _Lp>
   template
 inline
@@ -325,13 +326,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline
 shared_ptr<_Tp>::shared_ptr(std::auto_ptr<_Tp1>&& __r)
 : __shared_ptr<_Tp>(std::move(__r)) { }
+#endif // HOSTED
 
   template
   template
 inline
 unique_ptr<_Tp, _Dp>::unique_ptr(auto_ptr<_Up>&& __u) noexcept
 : _M_t(__u.release(), deleter_type()) { }
-#endif
+#endif // C++11
 
 #pragma GCC diagnostic pop
 
diff --git a/libstdc++-v3/include/bits/alloc_traits.h 
b/libstdc++-v3/include/bits/alloc_traits.h
index 35bdf6ecf98..507e8f1b6b2 100644
--- a/libstdc++-v3/include/bits/alloc_traits.h
+++ b/libstdc++-v3/include/bits/alloc_traits.h
@@ -33,9 +33,11 @@
 #include 
 #include 
 #if __cplusplus >= 201103L
-# include 
 # include 
 # include 
+# if _GLIBCXX_HOSTED
+#

[committed] libstdc++: Move allocator-related helpers to

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

-- >8 --

The __alloc_swap and __shrink_to_fit_aux helpers are not specific to
std::allocator, so don't belong in . This also
simplifies enabling  for freestanding, as now we can just omit
the whole of  for freestanding.

libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h (__alloc_swap)
(__shrink_to_fit_aux): Move here, from ...
* include/bits/allocator.h: ... here.
* include/ext/alloc_traits.h: Do not include allocator.h.
---
 libstdc++-v3/include/bits/alloc_traits.h | 48 ++
 libstdc++-v3/include/bits/allocator.h| 51 
 libstdc++-v3/include/ext/alloc_traits.h  |  3 --
 3 files changed, 48 insertions(+), 54 deletions(-)

diff --git a/libstdc++-v3/include/bits/alloc_traits.h 
b/libstdc++-v3/include/bits/alloc_traits.h
index f9ca37fd7d6..35bdf6ecf98 100644
--- a/libstdc++-v3/include/bits/alloc_traits.h
+++ b/libstdc++-v3/include/bits/alloc_traits.h
@@ -824,6 +824,54 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// @cond undocumented
 
+  // To implement Option 3 of DR 431.
+  template
+struct __alloc_swap
+{ static void _S_do_it(_Alloc&, _Alloc&) _GLIBCXX_NOEXCEPT { } };
+
+  template
+struct __alloc_swap<_Alloc, false>
+{
+  static void
+  _S_do_it(_Alloc& __one, _Alloc& __two) _GLIBCXX_NOEXCEPT
+  {
+   // Precondition: swappable allocators.
+   if (__one != __two)
+ swap(__one, __two);
+  }
+};
+
+#if __cplusplus >= 201103L
+  template,
+is_nothrow_move_constructible>::value>
+struct __shrink_to_fit_aux
+{ static bool _S_do_it(_Tp&) noexcept { return false; } };
+
+  template
+struct __shrink_to_fit_aux<_Tp, true>
+{
+  _GLIBCXX20_CONSTEXPR
+  static bool
+  _S_do_it(_Tp& __c) noexcept
+  {
+#if __cpp_exceptions
+   try
+ {
+   _Tp(__make_move_if_noexcept_iterator(__c.begin()),
+   __make_move_if_noexcept_iterator(__c.end()),
+   __c.get_allocator()).swap(__c);
+   return true;
+ }
+   catch(...)
+ { return false; }
+#else
+   return false;
+#endif
+  }
+};
+#endif
+
   /**
* Destroy a range of objects using the supplied allocator.  For
* non-default allocators we do not optimize away invocation of
diff --git a/libstdc++-v3/include/bits/allocator.h 
b/libstdc++-v3/include/bits/allocator.h
index c39166e24fe..54f5acf85d7 100644
--- a/libstdc++-v3/include/bits/allocator.h
+++ b/libstdc++-v3/include/bits/allocator.h
@@ -279,57 +279,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Undefine.
 #undef __allocator_base
 
-  /// @cond undocumented
-
-  // To implement Option 3 of DR 431.
-  template
-struct __alloc_swap
-{ static void _S_do_it(_Alloc&, _Alloc&) _GLIBCXX_NOEXCEPT { } };
-
-  template
-struct __alloc_swap<_Alloc, false>
-{
-  static void
-  _S_do_it(_Alloc& __one, _Alloc& __two) _GLIBCXX_NOEXCEPT
-  {
-   // Precondition: swappable allocators.
-   if (__one != __two)
- swap(__one, __two);
-  }
-};
-
-#if __cplusplus >= 201103L
-  template,
-is_nothrow_move_constructible>::value>
-struct __shrink_to_fit_aux
-{ static bool _S_do_it(_Tp&) noexcept { return false; } };
-
-  template
-struct __shrink_to_fit_aux<_Tp, true>
-{
-  _GLIBCXX20_CONSTEXPR
-  static bool
-  _S_do_it(_Tp& __c) noexcept
-  {
-#if __cpp_exceptions
-   try
- {
-   _Tp(__make_move_if_noexcept_iterator(__c.begin()),
-   __make_move_if_noexcept_iterator(__c.end()),
-   __c.get_allocator()).swap(__c);
-   return true;
- }
-   catch(...)
- { return false; }
-#else
-   return false;
-#endif
-  }
-};
-#endif
-  /// @endcond
-
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 
diff --git a/libstdc++-v3/include/ext/alloc_traits.h 
b/libstdc++-v3/include/ext/alloc_traits.h
index 1d7d9598cb2..c9547c7305c 100644
--- a/libstdc++-v3/include/ext/alloc_traits.h
+++ b/libstdc++-v3/include/ext/alloc_traits.h
@@ -32,9 +32,6 @@
 #pragma GCC system_header
 
 # include 
-#if __cplusplus < 201103L
-# include   // for __alloc_swap
-#endif
 
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
-- 
2.37.3

[committed] libstdc++: Make more internal headers include their own dependencies

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

-- >8 --

This adds required headers to a few internal headers that currently
assume their deps will be included first. It's more robust to make them
include their own dependencies, so that later refactoring or reuse of
those headers in new contexts doesn't break.

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h: Include .
* include/bits/stl_tempbuf.h: Include headers for __try and
__catch macros, std::pair, and __gnu_cxx::__numeric_traits.
* include/bits/stream_iterator.h: Include  and headers
for std::addressof and std::iterator.
* include/bits/streambuf_iterator.h: Include header for
std::iterator.
* include/std/iterator: Do not include .
---
 libstdc++-v3/include/bits/stl_algo.h   | 1 +
 libstdc++-v3/include/bits/stl_tempbuf.h| 4 +++-
 libstdc++-v3/include/bits/stream_iterator.h| 3 +++
 libstdc++-v3/include/bits/streambuf_iterator.h | 1 +
 libstdc++-v3/include/std/iterator  | 1 -
 5 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 57fa1c1dc55..9cb708ab2fd 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -57,6 +57,7 @@
 #define _STL_ALGO_H 1
 
 #include 
+#include 
 #include 
 #include   // for _Temporary_buffer
 #include 
diff --git a/libstdc++-v3/include/bits/stl_tempbuf.h 
b/libstdc++-v3/include/bits/stl_tempbuf.h
index 82f2dc8055f..b13aa3b0fcc 100644
--- a/libstdc++-v3/include/bits/stl_tempbuf.h
+++ b/libstdc++-v3/include/bits/stl_tempbuf.h
@@ -57,8 +57,10 @@
 #define _STL_TEMPBUF_H 1
 
 #include 
-#include 
+#include 
 #include 
+#include 
+#include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
diff --git a/libstdc++-v3/include/bits/stream_iterator.h 
b/libstdc++-v3/include/bits/stream_iterator.h
index 86c5845b835..0a1362a2eea 100644
--- a/libstdc++-v3/include/bits/stream_iterator.h
+++ b/libstdc++-v3/include/bits/stream_iterator.h
@@ -32,6 +32,9 @@
 
 #pragma GCC system_header
 
+#include 
+#include 
+#include 
 #include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h 
b/libstdc++-v3/include/bits/streambuf_iterator.h
index 72344c63088..c26ac249e01 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -33,6 +33,7 @@
 #pragma GCC system_header
 
 #include 
+#include 
 #include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
diff --git a/libstdc++-v3/include/std/iterator 
b/libstdc++-v3/include/std/iterator
index 7f8fc50b39d..2da2fb6e4a3 100644
--- a/libstdc++-v3/include/std/iterator
+++ b/libstdc++-v3/include/std/iterator
@@ -61,7 +61,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.37.3

[ping] [PATCH] C-SKY: Fix unsigned comparison warning

2022-09-16 Thread Jan-Benedict Glaw

On Mon, 2022-09-12 14:19:23 +0200, Jan-Benedict Glaw  wrote:
> 2022-09-06  Jan-Benedict Glaw  
> 
> gcc/ChangeLog:
>   * config/csky/csky.h (FUNCTION_ARG_REGNO_P): Cast REGNO to (int)
>   to prevent warning.
> 
> diff --git a/gcc/config/csky/csky.h b/gcc/config/csky/csky.h
> index 37410f0cda4..730d1b44ef1 100644
> --- a/gcc/config/csky/csky.h
> +++ b/gcc/config/csky/csky.h
> @@ -418,7 +418,7 @@ typedef struct
> The int cast is to prevent a complaint about unsigned comparison to
> zero, since CSKY_FIRST_PARM_REGNUM is zero.  */
>  #define FUNCTION_ARG_REGNO_P(REGNO)  \
> -  (((REGNO) >= CSKY_FIRST_PARM_REGNUM\
> +  (((int)(REGNO) >= CSKY_FIRST_PARM_REGNUM   \
>  && (REGNO) < (CSKY_NPARM_REGS + CSKY_FIRST_PARM_REGNUM)) \
> || FUNCTION_VARG_REGNO_P(REGNO))
>  
> 
> Ok for HEAD?

Just wanted to give this a ping.

MfG, JBG

-- 


signature.asc
Description: PGP signature

[PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-16 Thread Mikael Morin via Gcc-patches

This adds support for clobbering of partial variable references, when
they are passed as actual argument and the associated dummy has the
INTENT(OUT) attribute.
Support includes array elements, derived type component references,
and complex real or imaginary parts.

This is done by removing the check for lack of subreferences, which is
basically a revert of r9-4911-gbd810d637041dba49a5aca3d085504575374ac6f.
This removal allows more expressions than just array elements,
components and complex parts, but the other expressions are excluded by
other conditions: substrings are excluded by the check on expression
type (CHARACTER is excluded), KIND and LEN references are rejected by
the compiler as not valid in a variable definition context.

The check for scalarness is also updated as it was only valid when there
was no subreference.

PR fortran/88364
PR fortran/41453

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Don’t check for lack
of subreference.  Check the global expression rank instead of
the root symbol dimension attribute.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_7.f90: New test.
---
 gcc/fortran/trans-expr.cc |  5 +-
 .../gfortran.dg/intent_optimize_7.f90 | 65 +++
 2 files changed, 66 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_7.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index ae685157e22..f1026d7f309 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6521,10 +6521,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  && !dsym->attr.allocatable
  && !dsym->attr.pointer
  && e->expr_type == EXPR_VARIABLE
- && e->ref == NULL
- && e->symtree
- && e->symtree->n.sym
- && !e->symtree->n.sym->attr.dimension
+ && e->rank == 0
  && e->ts.type != BT_CHARACTER
  && e->ts.type != BT_DERIVED
  && e->ts.type != BT_CLASS
diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_7.f90 
b/gcc/testsuite/gfortran.dg/intent_optimize_7.f90
new file mode 100644
index 000..14dcfd9961b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_optimize_7.f90
@@ -0,0 +1,65 @@
+! { dg-do run }
+! { dg-additional-options "-fno-inline -fno-ipa-modref -fdump-tree-optimized 
-fdump-tree-original" }
+!
+! PR fortran/41453
+! Check that the INTENT(OUT) attribute causes one clobber to be emitted in
+! the caller before each call to FOO or BAR in the *.original dump, and the
+! initialization constants to be optimized away in the *.optimized dump,
+! in the case of scalar array elements, derived type components,
+! and complex real and imaginary part.
+
+module x
+implicit none
+contains
+  subroutine foo(a)
+integer, intent(out) :: a
+a = 42
+  end subroutine foo
+  subroutine bar(a)
+real, intent(out) :: a
+a = 24.0
+  end subroutine bar
+end module x
+
+program main
+  use x
+  implicit none
+  type :: t
+integer :: c
+  end type t
+  type(t) :: dc
+  integer :: ac(3)
+  complex :: xc, xd
+
+  dc = t(123456789)
+  call foo(dc%c)
+  if (dc%c /= 42) stop 1
+
+  ac = 100
+  ac(2) = 987654321
+  call foo(ac(2))
+  if (any(ac /= [100, 42, 100])) stop 2
+
+  xc = (12345.0, 11.0)
+  call bar(xc%re)
+  if (xc /= (24.0, 11.0)) stop 3
+
+  xd = (17.0, 67890.0)
+  call bar(xd%im)
+  if (xd /= (17.0, 24.0)) stop 4
+
+end program main
+
+! { dg-final { scan-tree-dump-times "CLOBBER" 4 "original" } }
+! { dg-final { scan-tree-dump "dc\\.c = {CLOBBER};" "original" } }
+! { dg-final { scan-tree-dump "ac\\\[1\\\] = {CLOBBER};" "original" } }
+! { dg-final { scan-tree-dump "REALPART_EXPR  = {CLOBBER};" "original" } }
+! { dg-final { scan-tree-dump "IMAGPART_EXPR  = {CLOBBER};" "original" } }
+! { dg-final { scan-tree-dump "123456789" "original" } }
+! { dg-final { scan-tree-dump-not "123456789" "optimized" { target 
__OPTIMIZE__ } } }
+! { dg-final { scan-tree-dump "987654321" "original" } }
+! { dg-final { scan-tree-dump-not "987654321" "optimized" { target 
__OPTIMIZE__ } } }
+! { dg-final { scan-tree-dump "1\\.2345e\\+4" "original"  } }
+! { dg-final { scan-tree-dump-not "1\\.2345e\\+4" "optimized" { target 
__OPTIMIZE__ } } }
+! { dg-final { scan-tree-dump "6\\.789e\\+4" "original"  } }
+! { dg-final { scan-tree-dump-not "6\\.789e\\+4" "optimized" { target 
__OPTIMIZE__ } } }
-- 
2.35.1

[PATCH 07/10] fortran: Support clobbering of ASSOCIATE variables [PR87397]

2022-09-16 Thread Mikael Morin via Gcc-patches

This is in spirit a revert of:
r9-3051-gc109362313623d83fe0a5194bceaf994cf0c6ce0

That commit added a condition to avoid generating ICE with clobbers
of ASSOCIATE variables.
The test added at that point continues to pass if we remove that
condition now.

PR fortran/87397
PR fortran/41453

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Remove condition
disabling clobber generation for ASSOCIATE variables.
---
 gcc/fortran/trans-expr.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index d169df44a71..4491465c4d6 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6527,7 +6527,6 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  && !e->symtree->n.sym->attr.dimension
  && !e->symtree->n.sym->attr.pointer
  && !e->symtree->n.sym->attr.allocatable
- && !e->symtree->n.sym->attr.associate_var
  && e->ts.type != BT_CHARACTER
  && e->ts.type != BT_DERIVED
  && e->ts.type != BT_CLASS
-- 
2.35.1

[PATCH 10/10] fortran: Support clobbering of derived types [PR41453]

2022-09-16 Thread Mikael Morin via Gcc-patches

This is probably the most risky patch in the series.

A previous version of this patch allowing all exactly matching derived
types showed two regressions.  One of them uncovered PR106817 for which
I added a fix in this series, and for the other I have excluded
types with allocatable components from clobbering.

I have additionnally excluded finalizable types for similar reasons, and
parameterized derived type because they may not be constant-sized.

I hope we are safe for all the rest.

-- >8 --

This adds support for clobbering of non-polymorphic derived type
variables, when they are passed as actual argument whose associated
dummy has the INTENT(OUT) attribute.

We avoid to play with non-constant type sizes or class descriptors by
requiring that the types are derived (not class) and strictly matching,
and by excluding parameterized derived types.

Types that are used in the callee are also excluded: they are types with
allocatable components (the components will be deallocated), and
finalizable types or types with finalizable components (they will be
passed to finalization routines).

PR fortran/41453

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Allow strictly
matching derived types.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_8.f90: New test.
---
 gcc/fortran/trans-expr.cc | 18 -
 .../gfortran.dg/intent_optimize_8.f90 | 67 +++
 2 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_8.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index f1026d7f309..f8fcd2d97d9 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6523,8 +6523,24 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  && e->expr_type == EXPR_VARIABLE
  && e->rank == 0
  && e->ts.type != BT_CHARACTER
- && e->ts.type != BT_DERIVED
  && e->ts.type != BT_CLASS
+ && (e->ts.type != BT_DERIVED
+ || (dsym->ts.type == BT_DERIVED
+ && e->ts.u.derived == dsym->ts.u.derived
+ /* Types with allocatable components are
+excluded from clobbering because we need
+the unclobbered pointers to free the
+allocatable components in the callee.
+Same goes for finalizable types or types
+with finalizable components, we need to
+pass the unclobbered values to the
+finalization routines.
+For parameterized types, it's less clear
+but they may not have a constant size
+so better exclude them in any case.  */
+ && !e->ts.u.derived->attr.alloc_comp
+ && !e->ts.u.derived->attr.pdt_type
+ && !gfc_is_finalizable (e->ts.u.derived, 
NULL)))
  && !sym->attr.elemental)
{
  tree var;
diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_8.f90 
b/gcc/testsuite/gfortran.dg/intent_optimize_8.f90
new file mode 100644
index 000..584592842e1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_optimize_8.f90
@@ -0,0 +1,67 @@
+! { dg-do run }
+! { dg-additional-options "-fno-inline -fno-ipa-modref -fdump-tree-optimized 
-fdump-tree-original" }
+!
+! PR fortran/41453
+! Check that the INTENT(OUT) attribute causes in the case of non-polymorphic 
derived type arguments:
+!  - one clobber to be emitted in the caller before calls to FOO in the 
*.original dump,
+!  - no clobber to be emitted in the caller before calls to BAR in the 
*.original dump,
+!  - the initialization constants to be optimized away in the *.optimized dump.
+
+module x
+  implicit none
+  type :: t
+integer :: c
+  end type t
+  type, extends(t) :: u
+integer :: d
+  end type u
+contains
+  subroutine foo(a)
+type(t), intent(out) :: a
+a = t(42)
+  end subroutine foo
+  subroutine bar(b)
+class(t), intent(out) :: b
+b%c = 24
+  end subroutine bar
+end module x
+
+program main
+  use x
+  implicit none
+  type(t) :: tc
+  type(u) :: uc, ud
+  class(t), allocatable :: te, tf
+
+  tc = t(123456789)
+  call foo(tc)
+  if (tc%c /= 42) stop 1
+
+  uc = u(987654321, 0)
+  call foo(uc%t)
+  if (uc%c /= 42) stop 2
+  if (uc%d /= 0) stop 3
+
+  ud = u(11223344, 0)
+  call bar(ud)
+  if (ud%c /= 24) stop 4
+
+  te = t(55667788)
+  call foo(te)
+  if (te%c /= 42) stop 5
+
+  tf = t(99887766)
+  call bar(tf)
+  if (tf%c /= 24)

[PATCH 05/10] fortran: Support clobbering of reference variables [PR41453]

2022-09-16 Thread Mikael Morin via Gcc-patches

This adds support for clobbering of variables passed by reference,
when the reference is forwarded to a subroutine as actual argument
whose associated dummy has the INTENT(OUT) attribute.
This was explicitly disabled and enabling it seems to work, as
demonstrated by the new testcase.

PR fortran/41453

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Remove condition
disabling clobber generation for dummy variables.  Remove
obsolete comment.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_5.f90: New test.
---
 gcc/fortran/trans-expr.cc |  4 ---
 .../gfortran.dg/intent_optimize_5.f90 | 34 +++
 2 files changed, 34 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_5.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 2301724729f..9b2832bdb26 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6527,8 +6527,6 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  && !e->symtree->n.sym->attr.dimension
  && !e->symtree->n.sym->attr.pointer
  && !e->symtree->n.sym->attr.allocatable
- /* See PR 41453.  */
- && !e->symtree->n.sym->attr.dummy
  /* FIXME - PR 87395 and PR 41453  */
  && e->symtree->n.sym->attr.save == SAVE_NONE
  && !e->symtree->n.sym->attr.associate_var
@@ -6538,8 +6536,6 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  && !sym->attr.elemental)
{
  tree var;
- /* FIXME: This fails if var is passed by reference, 
see PR
-41453.  */
  var = build_fold_indirect_ref_loc (input_location,
 parmse.expr);
  tree clobber = build_clobber (TREE_TYPE (var));
diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_5.f90 
b/gcc/testsuite/gfortran.dg/intent_optimize_5.f90
new file mode 100644
index 000..1633b681fc3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_optimize_5.f90
@@ -0,0 +1,34 @@
+! { dg-do run }
+! { dg-additional-options "-fno-inline -fno-ipa-modref -fdump-tree-optimized 
-fdump-tree-original" }
+!
+! PR fortran/41453
+! Check that the INTENT(OUT) attribute causes one clobber to be emitted in
+! the caller before each call to FOO in the *.original dump, and the
+! initialization constant to be optimized away in the *.optimized dump,
+! in the case of an argument passed by reference to the caller.
+
+module x
+implicit none
+contains
+  subroutine foo(a)
+integer, intent(out) :: a
+a = 42
+  end subroutine foo
+  subroutine bar(b)
+integer :: b
+b = 123456789
+call foo(b)
+  end subroutine bar
+end module x
+
+program main
+  use x
+  implicit none
+  integer :: c
+  call bar(c)
+  if (c /= 42) stop 1
+end program main
+
+! { dg-final { scan-tree-dump-times "CLOBBER" 1 "original" } }
+! { dg-final { scan-tree-dump "\\*\\\(integer\\\(kind=4\\\) \\*\\\) b = 
{CLOBBER};" "original" } }
+! { dg-final { scan-tree-dump-not "123456789" "optimized" { target 
__OPTIMIZE__ } } }
-- 
2.35.1

[PATCH 08/10] fortran: Support clobbering of allocatables and pointers [PR41453]

2022-09-16 Thread Mikael Morin via Gcc-patches

This adds support for clobbering of allocatable and pointer scalar
variables passed as actual argument to a subroutine when the associated
dummy has the INTENT(OUT) attribute.
Support was explicitly disabled, but the clobber generation code seems
to support it well, as demonstrated by the newly added testcase.

PR fortran/41453

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Remove conditions
on ALLOCATABLE and POINTER attributes guarding clobber
generation.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_6.f90: New test.
---
 gcc/fortran/trans-expr.cc |  2 -
 .../gfortran.dg/intent_optimize_6.f90 | 42 +++
 2 files changed, 42 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_6.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 4491465c4d6..ae685157e22 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6525,8 +6525,6 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  && e->symtree
  && e->symtree->n.sym
  && !e->symtree->n.sym->attr.dimension
- && !e->symtree->n.sym->attr.pointer
- && !e->symtree->n.sym->attr.allocatable
  && e->ts.type != BT_CHARACTER
  && e->ts.type != BT_DERIVED
  && e->ts.type != BT_CLASS
diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_6.f90 
b/gcc/testsuite/gfortran.dg/intent_optimize_6.f90
new file mode 100644
index 000..0146dff4e20
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_optimize_6.f90
@@ -0,0 +1,42 @@
+! { dg-do run }
+! { dg-additional-options "-fno-inline -fno-ipa-modref -fdump-tree-optimized 
-fdump-tree-original" }
+!
+! PR fortran/41453
+! Check that the INTENT(OUT) attribute causes one clobber to be emitted in
+! the caller before each call to FOO in the *.original dump, and the
+! initialization constants to be optimized away in the *.optimized dump,
+! in the case of scalar allocatables and pointers.
+
+module x
+implicit none
+contains
+  subroutine foo(a)
+integer, intent(out) :: a
+a = 42
+  end subroutine foo
+end module x
+
+program main
+  use x
+  implicit none
+  integer, allocatable :: ca
+  integer, target :: ct
+  integer, pointer :: cp
+
+  allocate(ca)
+  ca = 123456789
+  call foo(ca)
+  if (ca /= 42) stop 1
+  deallocate(ca)
+
+  ct = 987654321
+  cp => ct
+  call foo(cp)
+  if (ct /= 42) stop 2
+end program main
+
+! { dg-final { scan-tree-dump-times "CLOBBER" 2 "original" } }
+! { dg-final { scan-tree-dump "\\*ca = {CLOBBER};" "original" } }
+! { dg-final { scan-tree-dump "\\*cp = {CLOBBER};" "original" } }
+! { dg-final { scan-tree-dump-not "123456789" "optimized" { target 
__OPTIMIZE__ } } }
+! { dg-final { scan-tree-dump-not "987654321" "optimized" { target 
__OPTIMIZE__ } } }
-- 
2.35.1

[PATCH 04/10] fortran: Support clobbering with implicit interfaces [PR105012]

2022-09-16 Thread Mikael Morin via Gcc-patches

From: Harald Anlauf 

Before procedure calls, we clobber actual arguments whose associated
dummy is INTENT(OUT).  This only applies to procedures with explicit
interfaces, as the knowledge of the interface is necessary to know
whether an argument has the INTENT(OUT) attribute.

This change also enables clobber generation for procedure calls without
explicit interface, when the procedure has been defined in the same
file because we can use the dummy arguments' characteristics from the
procedure definition in that case.

The knowledge of the dummy characteristics is directly available through
gfc_actual_arglist’s associated_dummy pointers which have been populated
as a side effect of calling gfc_check_externals.

PR fortran/105012

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Use dummy
information from associated_dummy if there is no information
from the procedure interface.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_4.f90: New test.

Co-Authored-By: Mikael Morin 
---
 gcc/fortran/trans-expr.cc | 19 +++
 .../gfortran.dg/intent_optimize_4.f90 | 24 +++
 2 files changed, 39 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_4.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index a62a3bb642d..2301724729f 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6505,10 +6505,21 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
{
  gfc_conv_expr_reference (, e);
 
- if (fsym
- && fsym->attr.intent == INTENT_OUT
- && !fsym->attr.allocatable
- && !fsym->attr.pointer
+ gfc_symbol *dsym = fsym;
+ gfc_dummy_arg *dummy;
+
+ /* Use associated dummy as fallback for formal
+argument if there is no explicit interface.  */
+ if (dsym == NULL
+ && (dummy = arg->associated_dummy)
+ && dummy->intrinsicness == GFC_NON_INTRINSIC_DUMMY_ARG
+ && dummy->u.non_intrinsic->sym)
+   dsym = dummy->u.non_intrinsic->sym;
+
+ if (dsym
+ && dsym->attr.intent == INTENT_OUT
+ && !dsym->attr.allocatable
+ && !dsym->attr.pointer
  && e->expr_type == EXPR_VARIABLE
  && e->ref == NULL
  && e->symtree
diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_4.f90 
b/gcc/testsuite/gfortran.dg/intent_optimize_4.f90
new file mode 100644
index 000..2f184bf84a8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_optimize_4.f90
@@ -0,0 +1,24 @@
+! { dg-do run }
+! { dg-additional-options "-fno-inline -fno-ipa-modref -fdump-tree-optimized 
-fdump-tree-original" }
+!
+! PR fortran/105012
+! Check that the INTENT(OUT) attribute causes one clobber to be emitted in
+! the caller before the call to Y in the *.original dump, and the
+! initialization constant to be optimized away in the *.optimized dump,
+! despite the non-explicit interface if the subroutine with the INTENT(OUT)
+! is declared in the same file.
+
+SUBROUTINE Y (Z)
+  integer, intent(out) :: Z
+  Z = 42
+END SUBROUTINE Y
+PROGRAM TEST
+integer :: X
+X = 123456789
+CALL Y (X)
+if (X.ne.42) STOP 1
+END PROGRAM
+
+! { dg-final { scan-tree-dump-times "CLOBBER" 1 "original" } }
+! { dg-final { scan-tree-dump "x = {CLOBBER};" "original" } }
+! { dg-final { scan-tree-dump-not "123456789" "optimized" { target 
__OPTIMIZE__ } } }
-- 
2.35.1

[PATCH 06/10] fortran: Support clobbering of SAVE variables [PR87395]

2022-09-16 Thread Mikael Morin via Gcc-patches

This is in spirit a revert of:
r9-3032-gee7fb0588c6361b4d77337ab0f7527be64fcdde2

That commit added a condition to avoid generating ICE with clobbers
of variables with the SAVE attribute.
The test added at that point continues to pass if we remove that
condition now.

PR fortran/87395
PR fortran/41453

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Remove condition
on SAVE attribute guarding clobber generation.
---
 gcc/fortran/trans-expr.cc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 9b2832bdb26..d169df44a71 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6527,8 +6527,6 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  && !e->symtree->n.sym->attr.dimension
  && !e->symtree->n.sym->attr.pointer
  && !e->symtree->n.sym->attr.allocatable
- /* FIXME - PR 87395 and PR 41453  */
- && e->symtree->n.sym->attr.save == SAVE_NONE
  && !e->symtree->n.sym->attr.associate_var
  && e->ts.type != BT_CHARACTER
  && e->ts.type != BT_DERIVED
-- 
2.35.1

[PATCH 00/10] fortran: clobber fixes [PR41453]

2022-09-16 Thread Mikael Morin via Gcc-patches

Hello,

this is a set of changes around the clobber we generate in the caller
before a procedure call, for each actual argument whose associated dummy
has the INTENT(OUT) attribute.

The first patch is a refactoring moving the clobber generation in
gfc_conv_procedure_call where it feels more appropriate.
The second patch is a fix for the ICE originally motivating my work
on this topic.
The third patch is a fix for some wrong code issue discovered with an
earlier version of this series.
The following patches are gradual condition loosenings to enable clobber 
generation in more and more cases.

Each patch has been tested through an incremental bootstrap and a
partial testsuite run on fortran *intent* tests, and the whole lot has
been run through the full fortran regression testsuite.
OK for master?


Harald Anlauf (1):
  fortran: Support clobbering with implicit interfaces [PR105012]

Mikael Morin (9):
  fortran: Move the clobber generation code
  fortran: Fix invalid function decl clobber ICE [PR105012]
  fortran: Move clobbers after evaluation of all arguments [PR106817]
  fortran: Support clobbering of reference variables [PR41453]
  fortran: Support clobbering of SAVE variables [PR87395]
  fortran: Support clobbering of ASSOCIATE variables [PR87397]
  fortran: Support clobbering of allocatables and pointers [PR41453]
  fortran: Support clobbering of variable subreferences [PR88364]
  fortran: Support clobbering of derived types [PR41453]

 gcc/fortran/trans-expr.cc | 78 ---
 gcc/fortran/trans.h   |  3 +-
 .../gfortran.dg/intent_optimize_4.f90 | 24 ++
 .../gfortran.dg/intent_optimize_5.f90 | 34 
 .../gfortran.dg/intent_optimize_6.f90 | 42 ++
 .../gfortran.dg/intent_optimize_7.f90 | 65 
 .../gfortran.dg/intent_optimize_8.f90 | 67 
 .../gfortran.dg/intent_optimize_9.f90 | 43 ++
 gcc/testsuite/gfortran.dg/intent_out_15.f90   | 27 +++
 9 files changed, 353 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_6.f90
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_9.f90
 create mode 100644 gcc/testsuite/gfortran.dg/intent_out_15.f90

-- 
2.35.1

[PATCH 02/10] fortran: Fix invalid function decl clobber ICE [PR105012]

2022-09-16 Thread Mikael Morin via Gcc-patches

The fortran frontend, as result symbol for a function without
declared result symbol, uses the function symbol itself.  This caused
an invalid clobber of a function decl to be emitted, leading to an
ICE, whereas the intended behaviour was to clobber the function result
variable.  This change fixes the problem by getting the decl from the
just-retrieved variable reference after the call to
gfc_conv_expr_reference, instead of copying it from the frontend symbol.

PR fortran/105012

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Retrieve variable
from the just calculated variable reference.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_out_15.f90: New test.
---
 gcc/fortran/trans-expr.cc   |  3 ++-
 gcc/testsuite/gfortran.dg/intent_out_15.f90 | 27 +
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_out_15.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 7902b941c2d..76c587e3d9f 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6528,7 +6528,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  tree var;
  /* FIXME: This fails if var is passed by reference, 
see PR
 41453.  */
- var = e->symtree->n.sym->backend_decl;
+ var = build_fold_indirect_ref_loc (input_location,
+parmse.expr);
  tree clobber = build_clobber (TREE_TYPE (var));
  gfc_add_modify (>pre, var, clobber);
}
diff --git a/gcc/testsuite/gfortran.dg/intent_out_15.f90 
b/gcc/testsuite/gfortran.dg/intent_out_15.f90
new file mode 100644
index 000..64334e6f038
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_out_15.f90
@@ -0,0 +1,27 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! PR fortran/105012
+! The following case was triggering an ICE because of a clobber
+! on the DERFC function decl instead of its result.
+
+module error_function
+integer, parameter :: r8 = selected_real_kind(12) ! 8 byte real
+contains
+SUBROUTINE CALERF_r8(ARG, RESULT, JINT)
+   integer, parameter :: rk = r8
+   real(rk), intent(in)  :: arg
+   real(rk), intent(out) :: result
+   IF (Y .LE. THRESH) THEN
+   END IF
+end SUBROUTINE CALERF_r8
+FUNCTION DERFC(X)
+   integer, parameter :: rk = r8 ! 8 byte real
+   real(rk), intent(in) :: X
+   real(rk) :: DERFC
+   CALL CALERF_r8(X, DERFC, JINT)
+END FUNCTION DERFC
+end module error_function
+
+! { dg-final { scan-tree-dump-times "CLOBBER" 1 "original" } }
+! { dg-final { scan-tree-dump "__result_derfc = {CLOBBER};" "original" } }
-- 
2.35.1

[PATCH 01/10] fortran: Move the clobber generation code

2022-09-16 Thread Mikael Morin via Gcc-patches

This change inlines the clobber generation code from
gfc_conv_expr_reference to the single caller from where the add_clobber
flag can be true, and removes the add_clobber argument.

What motivates this is the standard making the procedure call a cause
for a variable to become undefined, which translates to a clobber
generation, so clobber generation should be closely related to procedure
call generation, whereas it is rather orthogonal to variable reference
generation.  Thus the generation of the clobber feels more appropriate
in gfc_conv_procedure_call than in gfc_conv_expr_reference.

Behaviour remains unchanged.

gcc/fortran/ChangeLog:

* trans.h (gfc_conv_expr_reference): Remove add_clobber
argument.
* trans-expr.cc (gfc_conv_expr_reference): Ditto. Inline code
depending on add_clobber and conditions controlling it ...
(gfc_conv_procedure_call): ... to here.
---
 gcc/fortran/trans-expr.cc | 58 +--
 gcc/fortran/trans.h   |  3 +-
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 850007fd2e1..7902b941c2d 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6395,7 +6395,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
&& e->symtree->n.sym->attr.pointer))
&& fsym && fsym->attr.target)
/* Make sure the function only gets called once.  */
-   gfc_conv_expr_reference (, e, false);
+   gfc_conv_expr_reference (, e);
  else if (e->expr_type == EXPR_FUNCTION
   && e->symtree->n.sym->result
   && e->symtree->n.sym->result != e->symtree->n.sym
@@ -6502,22 +6502,36 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
}
  else
{
- bool add_clobber;
- add_clobber = fsym && fsym->attr.intent == INTENT_OUT
-   && !fsym->attr.allocatable && !fsym->attr.pointer
-   && e->symtree && e->symtree->n.sym
-   && !e->symtree->n.sym->attr.dimension
-   && !e->symtree->n.sym->attr.pointer
-   && !e->symtree->n.sym->attr.allocatable
-   /* See PR 41453.  */
-   && !e->symtree->n.sym->attr.dummy
-   /* FIXME - PR 87395 and PR 41453  */
-   && e->symtree->n.sym->attr.save == SAVE_NONE
-   && !e->symtree->n.sym->attr.associate_var
-   && e->ts.type != BT_CHARACTER && e->ts.type != 
BT_DERIVED
-   && e->ts.type != BT_CLASS && !sym->attr.elemental;
+ gfc_conv_expr_reference (, e);
 
- gfc_conv_expr_reference (, e, add_clobber);
+ if (fsym
+ && fsym->attr.intent == INTENT_OUT
+ && !fsym->attr.allocatable
+ && !fsym->attr.pointer
+ && e->expr_type == EXPR_VARIABLE
+ && e->ref == NULL
+ && e->symtree
+ && e->symtree->n.sym
+ && !e->symtree->n.sym->attr.dimension
+ && !e->symtree->n.sym->attr.pointer
+ && !e->symtree->n.sym->attr.allocatable
+ /* See PR 41453.  */
+ && !e->symtree->n.sym->attr.dummy
+ /* FIXME - PR 87395 and PR 41453  */
+ && e->symtree->n.sym->attr.save == SAVE_NONE
+ && !e->symtree->n.sym->attr.associate_var
+ && e->ts.type != BT_CHARACTER
+ && e->ts.type != BT_DERIVED
+ && e->ts.type != BT_CLASS
+ && !sym->attr.elemental)
+   {
+ tree var;
+ /* FIXME: This fails if var is passed by reference, 
see PR
+41453.  */
+ var = e->symtree->n.sym->backend_decl;
+ tree clobber = build_clobber (TREE_TYPE (var));
+ gfc_add_modify (>pre, var, clobber);
+   }
}
  /* Catch base objects that are not variables.  */
  if (e->ts.type == BT_CLASS
@@ -9485,7 +9499,7 @@ gfc_conv_expr_type (gfc_se * se, gfc_expr * expr, tree 
type)
values only.  */
 
 void
-gfc_conv_expr_reference (gfc_se * se, gfc_expr * expr, bool add_clobber)
+gfc_conv_expr_reference (gfc_se * se, gfc_expr * expr)
 {
   gfc_ss *ss;
   tree var;
@@ -9525,16 +9539,6 @@ gfc_conv_expr_reference (gfc_se * se, gfc_expr * expr, 
bool add_clobber)

[PATCH 03/10] fortran: Move clobbers after evaluation of all arguments [PR106817]

2022-09-16 Thread Mikael Morin via Gcc-patches

For actual arguments whose dummy is INTENT(OUT), we used to generate
clobbers on them at the same time we generated the argument reference
for the function call.  This was wrong if for an argument coming
later, the value expression was depending on the value of the just-
clobbered argument, and we passed an undefined value in that case.

With this change, clobbers are collected separatedly and appended
to the procedure call preliminary code after all the arguments have been
evaluated.

PR fortran/106817

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Collect all clobbers
to their own separate block.  Append the block of clobbers to
the procedure preliminary block after the argument evaluation
codes for all the arguments.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_optimize_9.f90: New test.
---
 gcc/fortran/trans-expr.cc |  6 ++-
 .../gfortran.dg/intent_optimize_9.f90 | 43 +++
 2 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_optimize_9.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 76c587e3d9f..a62a3bb642d 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6018,7 +6018,6 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   gfc_charlen cl;
   gfc_expr *e;
   gfc_symbol *fsym;
-  stmtblock_t post;
   enum {MISSING = 0, ELEMENTAL, SCALAR, SCALAR_POINTER, ARRAY};
   gfc_component *comp = NULL;
   int arglen;
@@ -6062,7 +6061,9 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   else
 info = NULL;
 
+  stmtblock_t post, clobbers;
   gfc_init_block ();
+  gfc_init_block ();
   gfc_init_interface_mapping ();
   if (!comp)
 {
@@ -6531,7 +6532,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  var = build_fold_indirect_ref_loc (input_location,
 parmse.expr);
  tree clobber = build_clobber (TREE_TYPE (var));
- gfc_add_modify (>pre, var, clobber);
+ gfc_add_modify (, var, clobber);
}
}
  /* Catch base objects that are not variables.  */
@@ -7400,6 +7401,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 
   vec_safe_push (arglist, parmse.expr);
 }
+  gfc_add_block_to_block (>pre, );
   gfc_finish_interface_mapping (, >pre, >post);
 
   if (comp)
diff --git a/gcc/testsuite/gfortran.dg/intent_optimize_9.f90 
b/gcc/testsuite/gfortran.dg/intent_optimize_9.f90
new file mode 100644
index 000..effbaa12a2d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_optimize_9.f90
@@ -0,0 +1,43 @@
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-final { scan-tree-dump-times "CLOBBER" 2 "original" } }
+!
+! PR fortran/106817
+! Check that for an actual argument whose dummy is INTENT(OUT),
+! the clobber that is emitted in the caller before a procedure call
+! happens after any expression depending on the argument value has been
+! evaluated.
+! 
+
+module m
+  implicit none
+contains
+  subroutine copy1(out, in)
+integer, intent(in) :: in
+integer, intent(out) :: out
+out = in
+  end subroutine copy1
+  subroutine copy2(in, out)
+integer, intent(in) :: in
+integer, intent(out) :: out
+out = in
+  end subroutine copy2
+end module m
+
+program p
+  use m
+  implicit none
+  integer :: a, b
+
+  ! Clobbering of a should happen after a+1 has been evaluated.
+  a = 3
+  call copy1(a, a+1)
+  if (a /= 4) stop 1
+
+  ! Clobbering order does not depend on the order of arguments.
+  ! It should also come last with reversed arguments.
+  b = 12
+  call copy2(b+1, b)
+  if (b /= 13) stop 2
+
+end program p
-- 
2.35.1

[committed] libstdc++: Fix compare_exchange_padding.cc test for std::atomic_ref

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

The test was only failing for me with -m32 (and not -m64), so I didn't
notice until now. That probably means we should make the test fail more
reliably if the padding isn't being cleared.

-- >8 --

This test was written assuming that std::atomic_ref clears its target's
padding on construction, but that could introduce data races. Change the
test to store a value after construction and check that its padding is
cleared by the store.

libstdc++-v3/ChangeLog:

* testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc:
Store value with non-zero padding bits after construction.
---
 .../29_atomics/atomic_ref/compare_exchange_padding.cc | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc
index 1b1a12a..e9f8a4bdf2a 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc
@@ -20,14 +20,15 @@ int
 main ()
 {
   S s;
-  fill_struct(s);
-  s.c = 'a';
-  s.s = 42;
-
   S ss{ s };
+  fill_struct(ss);
+  ss.c = 'a';
+  ss.s = 42;
+
   std::atomic_ref as{ s };
+  as.store(ss);
   auto ts = as.load();
-  VERIFY( !compare_struct(ss, ts) ); // padding cleared on construction
+  VERIFY( !compare_struct(ss, ts) ); // padding cleared on store
   as.exchange(ss);
   auto es = as.load();
   VERIFY( compare_struct(ts, es) ); // padding cleared on exchange
-- 
2.37.3

[PATCH] gcc/config/t-i386: add build dependencies on i386-builtin-types.inc

2022-09-16 Thread Sergei Trofimovich via Gcc-patches

From: Sergei Trofimovich 

i386-builtin-types.inc is included indirectly via i386-builtins.h
into 4 files: i386.cc i386-builtins.cc i386-expand.cc i386-features.cc

Only i386.cc dependency was present in gcc/config/t-i386 makefile.

As a result parallel builds occasionally fail as:

g++ ... -o i386-builtins.o ... 
../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc
In file included from 
../../gcc-13-20220911/gcc/config/i386/i386-builtins.cc:92:
../../gcc-13-20220911/gcc/config/i386/i386-builtins.h:25:10:
 fatal error: i386-builtin-types.inc: No such file or directory
   25 | #include "i386-builtin-types.inc"
  |  ^~~~
compilation terminated.
make[3]: *** [../../gcc-13-20220911/gcc/config/i386/t-i386:54: 
i386-builtins.o]
  Error 1 shuffle=1663349189

gcc/
* config/i386/t-i386: Add build-time dependencies against
i386-builtin-types.inc to i386-builtins.o, i386-expand.o,
i386-features.o.
---
 gcc/config/i386/t-i386 | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/i386/t-i386 b/gcc/config/i386/t-i386
index 4e2a0efc615..ffdbbdfe8ce 100644
--- a/gcc/config/i386/t-i386
+++ b/gcc/config/i386/t-i386
@@ -62,7 +62,12 @@ i386-features.o: $(srcdir)/config/i386/i386-features.cc
$(COMPILE) $<
$(POSTCOMPILE)
 
+# i386-builtin-types.inc is included into i386-builtins.h.
+# Below are direct users of i386-builtins.h:
 i386.o: i386-builtin-types.inc
+i386-builtins.o: i386-builtin-types.inc
+i386-expand.o: i386-builtin-types.inc
+i386-features.o: i386-builtin-types.inc
 
 i386-builtin-types.inc: s-i386-bt ; @true
 s-i386-bt: $(srcdir)/config/i386/i386-builtin-types.awk \
-- 
2.37.2

Re: [PATCH] c++: modules ICE with typename friend declaration

2022-09-16 Thread Bernhard Reutner-Fischer via Gcc-patches

On 16 September 2022 17:54:32 CEST, Patrick Palka via Gcc-patches 
 wrote:

>+++ b/gcc/testsuite/g++.dg/modules/typename-friend_a.C
>@@ -0,0 +1,11 @@
>+// { dg-additional-options "-fmodules-ts" }
>+export module foo;
>+// { dg-module-cmi foo }
>+

If that's a constant, repeating pain, you could instrument the test suite so 
that upon seeing -fmodules-ts (or maybe a later std) it greps for export module 
and calls dg-module-cmi on those names automatically on its own.
We do the same for stuff like PCH or fortran module .mod or many other cleanup 
stuff because such manual annotations are IMHO a waste of developer resources..

Just a thought..
cheers,

Re: [PATCH] c++: Implement P1467R9 - Extended floating-point types and standard names compiler part except for bfloat16 [PR106652]

2022-09-16 Thread Jakub Jelinek via Gcc-patches

On Fri, Sep 16, 2022 at 01:48:54PM +0200, Jason Merrill wrote:
> On 9/12/22 04:05, Jakub Jelinek wrote:
> > The following patch implements the compiler part of C++23
> > P1467R9 - Extended floating-point types and standard names compiler part
> > by introducing _Float{16,32,64,128} as keywords and builtin types
> > like they are implemented for C already since GCC 7.
> > It doesn't introduce _Float{32,64,128}x for C++, those remain C only
> > for now, mainly because 
> > https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling
> > has mangling for:
> > ::= DF  _ # ISO/IEC TS 18661 binary floating point type _FloatN (N 
> > bits)
> > but doesn't for _FloatNx.  And it doesn't add anything for bfloat16_t
> > support, see below.
> > Regarding mangling, I think mangling _FloatNx as DF  x _ would be
> > possible, but would need to be discussed and voted in.
> 
> As you've seen, I opened a pull request for these.  I think we can go ahead
> and implement that and make sure it's resolved before the GCC 13 release.
> 
> Or we could temporarily mangle them as an extension, i.e. u9_Float32x.
> 
> I would expect _Float64x, at least, to be fairly popular.

If we get the mangling for _Floatx agreed on, whether it is DFx_ as
you proposed, or DFx or DFx_, sure, I agree we should just enable
those too and will tweak the patch.  It would then fix also PR85518.

Though, when we add support for _Floatx and declare they are extended
floating-point types, the question is about subrank comparisons, shall those
have lower conversion subrank than _Float type with the same conversion
rank or the other way around?  Say on x86_64 where _Float32x and _Float64
has the same rank, shall (_Float32x) x + 0.0f64 have _Float32x type or
_Float64?
And, shall we support f32x etc. constant literal suffixes (with pedwarn
always even in C++23)?

> > The patch wants to keep backwards compatibility with how __float128 has
> > been handled in C++ before, both for mangling and behavior in binary
> > operations, overload resolution etc.  So, there are some backend changes
> > where for C __float128 and _Float128 are the same type (float128_type_node
> > and float128t_type_node are the same pointer), but for C++ they are distinct
> > types which mangle differently and _Float128 is treated as extended
> > floating-point type while __float128 is treated as non-standard floating
> > point type.
> 
> How important do you think this backwards compatibility is?
> 
> As I mentioned in the ABI proposal, I think it makes sense to make
> __float128 an alias for std::float128_t, and continue using the current
> mangling for __float128.

I thought it is fairly important because __float128 has been around in GCC
for 19 years already.  To be precise, I think e.g. for x86_64 GCC 3.4
introduced it, but mangling was implemented only in GCC 4.1 (2006), before we 
ICEd
on those.  Until glibc 2.26 (2017) one had to use libquadmath when
math library functions were needed, but since then one can just use libm.
__float128 is on some targets (e.g. PA) just another name for long double,
not a distinct type.

Another thing are the PowerPC __ieee128 and __ibm128 type, I think for the
former we can't make it the same type as _Float128, because e.g. libstdc++
code relies on __ieee128 and __ibm128 being long double type of the other
ABI, so they should mangle as long double of the other ABI.  But in that
case they can't act as distinct types when long double should mangle the
same as they do.  And it would be weird if those types in one
-mabi=*longdouble mode worked as standard floating-point type and in another
as extended floating-point type, rather than just types which are neither
standard nor extended as before.

> I don't think we want the two types to have different semantics.  If we want
> to support existing __float128 code that relies on implicit narrowing
> conversions, we could allow them generally with a pedwarn using the 'bad'
> conversion machinery.  That's probably useful anyway for better diagnostics.

So you mean instead of
+  if (fcode == REAL_TYPE
+ && tcode == REAL_TYPE
+ && (extended_float_type_p (from)
+ || extended_float_type_p (to))
+ && cp_compare_floating_point_conversion_ranks (from, to) >= 2)
+   return NULL;
before conv = build_conv (ck_std, to, conv); do those checks in else if
after:
  if ((same_type_p (to, type_promotes_to (from))
   || (underlying_type && same_type_p (to, underlying_type)))
  && next_conversion (conv)->rank <= cr_promotion)
conv->rank = cr_promotion;
and pedwarn there (or somewhere later?) and set conv->bad_p = true;?
I can certainly try that what will it do on the tests in the patch.

> > The patch predefines __STDCPP_FLOAT{16,32,64,128}_T__ macros when
> > those types are available, but only for C++23, while the underlying types
> > are available in C++98 and later including the {f,F}{16,32,64,128} literal
> > suffixes (but those with a pedwarn for C++20

[committed] libstdc++: Fix tr1::variate_generator::engine_value_type

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

-- >8 --

The tr1/5_numerical_facilities/random/variate_generator/37986.cc test
fails for strict -std=c++98 mode because _Adaptor(const _Engine&) is
ill-formed in C++98 when _Engine is a reference type.

Rather than attempt to make the _Adaptor handle references and pointers,
just strip references and pointers from the _Engine type before we adapt
it. That removes the need for the _Adaptor<_Engine*> partial
specialization and avoids the reference-to-reference problem for c++98
mode.

While looking into this I noticed that the TR1 spec requires the
variate_generator::engine_value_type to be the underlying engine
type, whereas we make it the _Adaptor type that wraps the engine.

libstdc++-v3/ChangeLog:

* include/tr1/random.h (__detail::_Adaptor::_BEngine): Remove.
(__detail::_Adaptor::_M_g): Make public.
(__detail::_Adaptor<_Engine*, _Dist>): Remove partial
specialization.
(variate_generate::_Value): New helper to simplify handling of
_Engine* and _Engine& template arguments.
(variate_generate::engine_value_type): Define to underlying
engine type, not adapted type.
(variate_generate::engine()): Return underlying engine instead
of adaptor.
* 
testsuite/tr1/5_numerical_facilities/random/variate_generator/37986.cc:
Fix comment.
* 
testsuite/tr1/5_numerical_facilities/random/variate_generator/requirements/typedefs.cc:
Check member typedefs have the correct types.
---
 libstdc++-v3/include/tr1/random.h | 115 ++
 .../random/variate_generator/37986.cc |   2 +-
 .../requirements/typedefs.cc  |  51 ++--
 3 files changed, 84 insertions(+), 84 deletions(-)

diff --git a/libstdc++-v3/include/tr1/random.h 
b/libstdc++-v3/include/tr1/random.h
index 535f142a004..6061649c5a4 100644
--- a/libstdc++-v3/include/tr1/random.h
+++ b/libstdc++-v3/include/tr1/random.h
@@ -81,9 +81,8 @@ namespace tr1
 template
   struct _Adaptor
   { 
-   typedef typename remove_reference<_Engine>::type _BEngine;
-   typedef typename _BEngine::result_type   _Engine_result_type;
-   typedef typename _Distribution::input_type   result_type;
+   typedef typename _Engine::result_type   _Engine_result_type;
+   typedef typename _Distribution::input_type  result_type;
 
   public:
_Adaptor(const _Engine& __g)
@@ -146,72 +145,8 @@ namespace tr1
  return __return_value;
}
 
-  private:
_Engine _M_g;
   };
-
-// Specialization for _Engine*.
-template
-  struct _Adaptor<_Engine*, _Distribution>
-  {
-   typedef typename _Engine::result_type  _Engine_result_type;
-   typedef typename _Distribution::input_type result_type;
-
-  public:
-   _Adaptor(_Engine* __g)
-   : _M_g(__g) { }
-
-   result_type
-   min() const
-   {
- result_type __return_value;
- if (is_integral<_Engine_result_type>::value
- && is_integral::value)
-   __return_value = _M_g->min();
- else
-   __return_value = result_type(0);
- return __return_value;
-   }
-
-   result_type
-   max() const
-   {
- result_type __return_value;
- if (is_integral<_Engine_result_type>::value
- && is_integral::value)
-   __return_value = _M_g->max();
- else if (!is_integral::value)
-   __return_value = result_type(1);
- else
-   __return_value = std::numeric_limits::max() - 1;
- return __return_value;
-   }
-
-   result_type
-   operator()()
-   {
- result_type __return_value;
- if (is_integral<_Engine_result_type>::value
- && is_integral::value)
-   __return_value = (*_M_g)();
- else if (!is_integral<_Engine_result_type>::value
-  && !is_integral::value)
-   __return_value = result_type((*_M_g)() - _M_g->min())
- / result_type(_M_g->max() - _M_g->min());
- else if (is_integral<_Engine_result_type>::value
-  && !is_integral::value)
-   __return_value = result_type((*_M_g)() - _M_g->min())
- / result_type(_M_g->max() - _M_g->min() + result_type(1));
- else
-   __return_value = *_M_g)() - _M_g->min()) 
-  / (_M_g->max() - _M_g->min()))
- * std::numeric_limits::max());
- return __return_value;
-   }
-
-  private:
-   _Engine* _M_g;
-  };
   } // namespace __detail
 
   /**
@@ -223,17 +158,45 @@ namespace tr1
   template
 class variate_generator
 {
-  // Concept requirements.
-  __glibcxx_class_requires(_Engine, _CopyConstructibleConcept)
-  //  __glibcxx_class_requires(_Engine, _EngineConcept)
-  //  __glibcxx_class_requires(_Dist, _EngineConcept)
+

[committed] libstdc++: Remove __alloc_neq helper

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

-- >8 --

This class template and partial specialization were added 15 years ago
to optimize allocator equality comparisons in std::list. I think it's
safe to assume that GCC is now capable of optimizing an inline
operator!= that just returns false at least as well as an inline member
function that just returns false.

libstdc++-v3/ChangeLog:

* include/bits/allocator.h (__alloc_neq): Remove.
* include/bits/stl_list.h (list::_M_check_equal_allocators):
Compare allocators directly, without __alloc_neq.
---
 libstdc++-v3/include/bits/allocator.h | 17 -
 libstdc++-v3/include/bits/stl_list.h  |  5 ++---
 2 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/include/bits/allocator.h 
b/libstdc++-v3/include/bits/allocator.h
index 28abf13eba9..c39166e24fe 100644
--- a/libstdc++-v3/include/bits/allocator.h
+++ b/libstdc++-v3/include/bits/allocator.h
@@ -298,23 +298,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 };
 
-  // Optimize for stateless allocators.
-  template
-struct __alloc_neq
-{
-  static bool
-  _S_do_it(const _Alloc&, const _Alloc&)
-  { return false; }
-};
-
-  template
-struct __alloc_neq<_Alloc, false>
-{
-  static bool
-  _S_do_it(const _Alloc& __one, const _Alloc& __two)
-  { return __one != __two; }
-};
-
 #if __cplusplus >= 201103L
   template,
diff --git a/libstdc++-v3/include/bits/stl_list.h 
b/libstdc++-v3/include/bits/stl_list.h
index b8bd46191fc..a73ca60df5a 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -2026,10 +2026,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   // To implement the splice (and merge) bits of N1599.
   void
-  _M_check_equal_allocators(list& __x) _GLIBCXX_NOEXCEPT
+  _M_check_equal_allocators(const list& __x) _GLIBCXX_NOEXCEPT
   {
-   if (std::__alloc_neq::
-   _S_do_it(_M_get_Node_allocator(), __x._M_get_Node_allocator()))
+   if (_M_get_Node_allocator() != __x._M_get_Node_allocator())
  __builtin_abort();
   }
 
-- 
2.37.3

[committed] libstdc++: Do not use nullptr in C++03-compatible code

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

-- >8 --

This has to be valid as C++98/C++03.

libstdc++-v3/ChangeLog:

* include/debug/formatter.h [_GLIBCXX_DEBUG_BACKTRACE]
(_Error_formatter): Use 0 as null pointer constant.
---
 libstdc++-v3/include/debug/formatter.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/debug/formatter.h 
b/libstdc++-v3/include/debug/formatter.h
index b4b72383e22..f120163c6d4 100644
--- a/libstdc++-v3/include/debug/formatter.h
+++ b/libstdc++-v3/include/debug/formatter.h
@@ -609,8 +609,7 @@ namespace __gnu_debug
 , _M_function(__function)
 #if _GLIBCXX_HAVE_STACKTRACE
 # ifdef _GLIBCXX_DEBUG_BACKTRACE
-, _M_backtrace_state(
-  __glibcxx_backtrace_create_state(nullptr, 0, nullptr, nullptr))
+, _M_backtrace_state(__glibcxx_backtrace_create_state(0, 0, 0, 0))
 , _M_backtrace_full(&__glibcxx_backtrace_full)
 # else
 , _M_backtrace_state()
-- 
2.37.3

[committed] libstdc++: Fix Doxygen commands

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

-- >8 --

Remove the bogus -D__allocator_base=std::__new_allocator macro
definition for Doxygen, because that's an alias template for C++11 and
later, not a macro.

Fix the @cond/@endcond pair that span the end of an @addtogroup group.
Add another @endcond inside the group, and another @cond after it.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (PREDEFINED): Remove __allocator_base.
* include/bits/allocator.h: Fix nesting of Doxygen commands.
---
 libstdc++-v3/doc/doxygen/user.cfg.in  | 1 -
 libstdc++-v3/include/bits/allocator.h | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 57270bdeb7a..834ad9e4fd5 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -2407,7 +2407,6 @@ PREDEFINED = __cplusplus=202002L \
  _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED \
  _GLIBCXX_HAVE_BUILTIN_LAUNDER \
 "_GLIBCXX_DOXYGEN_ONLY(X)=X " \
-__allocator_base=std::__new_allocator \
 __exception_ptr=__unspecified__ \
 
 # If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this
diff --git a/libstdc++-v3/include/bits/allocator.h 
b/libstdc++-v3/include/bits/allocator.h
index aec0b374fd1..28abf13eba9 100644
--- a/libstdc++-v3/include/bits/allocator.h
+++ b/libstdc++-v3/include/bits/allocator.h
@@ -265,6 +265,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef _Tp value_type;
   template allocator(const allocator<_Up>&) { }
 };
+  /// @endcond
 
   /// @} group allocator
 
@@ -278,6 +279,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Undefine.
 #undef __allocator_base
 
+  /// @cond undocumented
+
   // To implement Option 3 of DR 431.
   template
 struct __alloc_swap
-- 
2.37.3

Re: [PATCH] c++: modules ICE with typename friend declaration

2022-09-16 Thread Patrick Palka via Gcc-patches

On Fri, 16 Sep 2022, Nathan Sidwell wrote:

> Thanks, this looks right. Sigh templates can mess up ones mental invariants!
> The test case should really be a foo_[ab].C kind, to test both sides of the 
> streaming. Bonus points for using the template after importing.  And you need 
> the dg-module-cmi annotation to check /and then
> delete/ the gcm file produced.

Aha, thanks very much for the pointers, I redid the testcase using
lang-3_[abc].C as an example.  How does the following look?

-- >8 --

gcc/cp/ChangeLog:

* module.cc (friend_from_decl_list): Don't consider
CLASSTYPE_TEMPLATE_INFO for a TYPENAME_TYPE friend.
(trees_in::read_class_def): Don't add to
CLASSTYPE_BEFRIENDING_CLASSES for a TYPENAME_TYPE friend.

gcc/testsuite/ChangeLog:

* g++.dg/modules/typename-friend_a.C: New test.
* g++.dg/modules/typename-friend_b.C: New test.
---
 gcc/cp/module.cc |  5 +++--
 gcc/testsuite/g++.dg/modules/typename-friend_a.C | 11 +++
 gcc/testsuite/g++.dg/modules/typename-friend_b.C |  6 ++
 3 files changed, 20 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/typename-friend_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/typename-friend_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f27f4d091e5..1a1ff5be574 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -4734,7 +4734,8 @@ friend_from_decl_list (tree frnd)
   if (TYPE_P (frnd))
{
  res = TYPE_NAME (frnd);
- if (CLASSTYPE_TEMPLATE_INFO (frnd))
+ if (CLASS_TYPE_P (frnd)
+ && CLASSTYPE_TEMPLATE_INFO (frnd))
tmpl = CLASSTYPE_TI_TEMPLATE (frnd);
}
   else if (DECL_TEMPLATE_INFO (frnd))
@@ -12121,7 +12122,7 @@ trees_in::read_class_def (tree defn, tree 
maybe_template)
{
  tree f = TREE_VALUE (friend_classes);
 
- if (TYPE_P (f))
+ if (CLASS_TYPE_P (f))
{
  CLASSTYPE_BEFRIENDING_CLASSES (f)
= tree_cons (NULL_TREE, type,
diff --git a/gcc/testsuite/g++.dg/modules/typename-friend_a.C 
b/gcc/testsuite/g++.dg/modules/typename-friend_a.C
new file mode 100644
index 000..aa426fe6cf0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/typename-friend_a.C
@@ -0,0 +1,11 @@
+// { dg-additional-options "-fmodules-ts" }
+export module foo;
+// { dg-module-cmi foo }
+
+template
+struct A {
+  friend typename T::type;
+  friend void f(A) { }
+private:
+  static constexpr int value = 42;
+};
diff --git a/gcc/testsuite/g++.dg/modules/typename-friend_b.C 
b/gcc/testsuite/g++.dg/modules/typename-friend_b.C
new file mode 100644
index 000..97da9d82e11
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/typename-friend_b.C
@@ -0,0 +1,6 @@
+// { dg-additional-options "-fmodules-ts" }
+module foo;
+
+struct C;
+struct B { using type = C; };
+struct C { static_assert(A::value == 42); };
-- 
2.38.0.rc0

> 
> nathan
> 
> On Thu, Sep 15, 2022, 22:16 Patrick Palka  wrote:
>   A couple of xtreme-header-* modules tests began ICEing in C++23 mode
>   ever since r13-2650-g5d84a4418aa962 introduced into  the
>   dependently scoped friend declaration
> 
>     friend /* typename */ _OuterIter::value_type;
> 
>   ultimately because the streaming code assumes a TYPE_P friend must
>   be a class type, but here it's a TYPENAME_TYPE, which doesn't have
>   a TEMPLATE_INFO or CLASSTYPE_BEFRIENDING_CLASSES.  This patch tries
>   to correct this in a minimal way.
> 
>   Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
>   gcc/cp/ChangeLog:
> 
>           * module.cc (friend_from_decl_list): Don't consider
>           CLASSTYPE_TEMPLATE_INFO for a TYPENAME_TYPE friend.
>           (trees_in::read_class_def): Don't add to
>           CLASSTYPE_BEFRIENDING_CLASSES for a TYPENAME_TYPE friend.
> 
>   gcc/testsuite/ChangeLog:
> 
>           * g++.dg/modules/typename-friend.C: New test.
>   ---
>    gcc/cp/module.cc                               | 5 +++--
>    gcc/testsuite/g++.dg/modules/typename-friend.C | 9 +
>    2 files changed, 12 insertions(+), 2 deletions(-)
>    create mode 100644 gcc/testsuite/g++.dg/modules/typename-friend.C
> 
>   diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
>   index f27f4d091e5..1a1ff5be574 100644
>   --- a/gcc/cp/module.cc
>   +++ b/gcc/cp/module.cc
>   @@ -4734,7 +4734,8 @@ friend_from_decl_list (tree frnd)
>          if (TYPE_P (frnd))
>           {
>             res = TYPE_NAME (frnd);
>   -         if (CLASSTYPE_TEMPLATE_INFO (frnd))
>   +         if (CLASS_TYPE_P (frnd)
>   +             && CLASSTYPE_TEMPLATE_INFO (frnd))
>               tmpl = CLASSTYPE_TI_TEMPLATE (frnd);
>           }
>          else if (DECL_TEMPLATE_INFO (frnd))
>   @@ -12121,7 +12122,7 @@ trees_in::read_class_def (tree

Re: [PATCH] c++: constraint matching, TEMPLATE_ID_EXPR, current inst

2022-09-16 Thread Patrick Palka via Gcc-patches

On Fri, 16 Sep 2022, Patrick Palka wrote:

> On Fri, 16 Sep 2022, Jason Merrill wrote:
> 
> > On 9/15/22 11:58, Patrick Palka wrote:
> > > Here we're crashing during constraint matching for the instantiated
> > > hidden friends due to two issues with dependent substitution into a
> > > TEMPLATE_ID_EXPR naming a template from the current instantiation
> > > (as performed from maybe_substitute_reqs_for for C<3> with T=T):
> > > 
> > >* tsubst_copy substitutes into such a TEMPLATE_DECL by looking it
> > >  up from the substituted class scope.  But for this to not fail when
> > >  the args are dependent, we need to pass entering_scope=true for the
> > >  class scope substitution so that we obtain the primary template type
> > >  A (which has TYPE_BINFO) instead of the implicit instantiation
> > >  A (which doesn't).
> > >* lookup_and_finish_template_variable shouldn't instantiate a
> > >  TEMPLATE_ID_EXPR that names a TEMPLATE_DECL which has more than
> > >  one level of (unsubstituted) parameters (such as A::C).
> > > 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > > trunk?
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * pt.cc (lookup_and_finish_template_variable): Don't
> > >   instantiate if the template's scope is dependent.
> > >   (tsubst_copy) : Pass entering_scope=true
> > >   when substituting the class scope.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/cpp2a/concepts-friend10.C: New test.
> > > ---
> > >   gcc/cp/pt.cc  | 14 +++--
> > >   .../g++.dg/cpp2a/concepts-friend10.C  | 21 +++
> > >   2 files changed, 29 insertions(+), 6 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C
> > > 
> > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > index db4e808adec..bfcbe0b8670 100644
> > > --- a/gcc/cp/pt.cc
> > > +++ b/gcc/cp/pt.cc
> > > @@ -10475,14 +10475,15 @@ tree
> > >   lookup_and_finish_template_variable (tree templ, tree targs,
> > >tsubst_flags_t complain)
> > >   {
> > > -  templ = lookup_template_variable (templ, targs);
> > > -  if (!any_dependent_template_arguments_p (targs))
> > > +  tree var = lookup_template_variable (templ, targs);
> > > +  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) == 1
> > > +  && !any_dependent_template_arguments_p (targs))
> > 
> > I notice that finish_id_expression_1 uses the equivalent of
> > type_dependent_expression_p (var).  Does that work here?
> 
> Hmm, it does, but kind of by accident: type_dependent_expression_p
> returns true for all variable TEMPLATE_ID_EXPRs because of their empty
> TREE_TYPE (as set by finish_template_variable).

... as set by lookup_template_variable, rather

> So testing t_d_e_p here
> is equivalent to testing processing_template_decl, it seems -- maximally
> conservative.
> 
> We can improve type_dependent_expression_p for variable TEMPLATE_ID_EXPR
> by ignoring its (always empty) TREE_TYPE and just considering dependence
> of its template and args directly.
> 
> Doing so exposes that value_dependent_expression_p is wrong for
> (non-type-dependent) variable template specializations -- since we don't
> set/track DECL_DEPENDENT_INIT_P for them, the VAR_DECL branch ends up
> returning false even if the initializer depends on outer args.  Instead,
> I suppose we can give a reasonably conservative answer by considering
> dependence of its enclosing scope as we do for FUNCTION_DECL.
> 
> Does the following seem reasonable?  Bootstrapped and regtested on
> x86_64-pc-linux-gnu.
> 
> -- >8 --
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (finish_template_variable): Consider only the innermost
>   template parms since we have only the innermost args.
>   (lookup_and_finish_template_variable): Check
>   type_dependent_expression_p instead.
>   (tsubst_copy) : Pass entering_scope=true
>   when substituting the class scope.
>   (value_dependent_expression_p) : Move below ...
>   : ... here.  Fall through for variable template
>   specializations.
>   (type_dependent_expression_p): Handle variable TEMPLATE_ID_EXPR
>   precisely.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1y/noexcept1.C: Expect another ahead of time error.
>   * g++.dg/cpp1y/var-templ70.C: New test.
>   * g++.dg/cpp2a/concepts-friend10.C: New test.
> ---
>  gcc/cp/pt.cc  | 53 ---
>  gcc/testsuite/g++.dg/cpp1y/noexcept1.C|  2 +-
>  gcc/testsuite/g++.dg/cpp1y/var-templ70.C  | 22 
>  .../g++.dg/cpp2a/concepts-friend10.C  | 24 +
>  4 files changed, 82 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ70.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index db4e808adec..88a09891a00 100644
> --- a/gcc/cp/pt.cc

Re: [PATCH] c++: constraint matching, TEMPLATE_ID_EXPR, current inst

2022-09-16 Thread Patrick Palka via Gcc-patches

On Fri, 16 Sep 2022, Jason Merrill wrote:

> On 9/15/22 11:58, Patrick Palka wrote:
> > Here we're crashing during constraint matching for the instantiated
> > hidden friends due to two issues with dependent substitution into a
> > TEMPLATE_ID_EXPR naming a template from the current instantiation
> > (as performed from maybe_substitute_reqs_for for C<3> with T=T):
> > 
> >* tsubst_copy substitutes into such a TEMPLATE_DECL by looking it
> >  up from the substituted class scope.  But for this to not fail when
> >  the args are dependent, we need to pass entering_scope=true for the
> >  class scope substitution so that we obtain the primary template type
> >  A (which has TYPE_BINFO) instead of the implicit instantiation
> >  A (which doesn't).
> >* lookup_and_finish_template_variable shouldn't instantiate a
> >  TEMPLATE_ID_EXPR that names a TEMPLATE_DECL which has more than
> >  one level of (unsubstituted) parameters (such as A::C).
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (lookup_and_finish_template_variable): Don't
> > instantiate if the template's scope is dependent.
> > (tsubst_copy) : Pass entering_scope=true
> > when substituting the class scope.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-friend10.C: New test.
> > ---
> >   gcc/cp/pt.cc  | 14 +++--
> >   .../g++.dg/cpp2a/concepts-friend10.C  | 21 +++
> >   2 files changed, 29 insertions(+), 6 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index db4e808adec..bfcbe0b8670 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -10475,14 +10475,15 @@ tree
> >   lookup_and_finish_template_variable (tree templ, tree targs,
> >  tsubst_flags_t complain)
> >   {
> > -  templ = lookup_template_variable (templ, targs);
> > -  if (!any_dependent_template_arguments_p (targs))
> > +  tree var = lookup_template_variable (templ, targs);
> > +  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) == 1
> > +  && !any_dependent_template_arguments_p (targs))
> 
> I notice that finish_id_expression_1 uses the equivalent of
> type_dependent_expression_p (var).  Does that work here?

Hmm, it does, but kind of by accident: type_dependent_expression_p
returns true for all variable TEMPLATE_ID_EXPRs because of their empty
TREE_TYPE (as set by finish_template_variable).  So testing t_d_e_p here
is equivalent to testing processing_template_decl, it seems -- maximally
conservative.

We can improve type_dependent_expression_p for variable TEMPLATE_ID_EXPR
by ignoring its (always empty) TREE_TYPE and just considering dependence
of its template and args directly.

Doing so exposes that value_dependent_expression_p is wrong for
(non-type-dependent) variable template specializations -- since we don't
set/track DECL_DEPENDENT_INIT_P for them, the VAR_DECL branch ends up
returning false even if the initializer depends on outer args.  Instead,
I suppose we can give a reasonably conservative answer by considering
dependence of its enclosing scope as we do for FUNCTION_DECL.

Does the following seem reasonable?  Bootstrapped and regtested on
x86_64-pc-linux-gnu.

-- >8 --

gcc/cp/ChangeLog:

* pt.cc (finish_template_variable): Consider only the innermost
template parms since we have only the innermost args.
(lookup_and_finish_template_variable): Check
type_dependent_expression_p instead.
(tsubst_copy) : Pass entering_scope=true
when substituting the class scope.
(value_dependent_expression_p) : Move below ...
: ... here.  Fall through for variable template
specializations.
(type_dependent_expression_p): Handle variable TEMPLATE_ID_EXPR
precisely.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/noexcept1.C: Expect another ahead of time error.
* g++.dg/cpp1y/var-templ70.C: New test.
* g++.dg/cpp2a/concepts-friend10.C: New test.
---
 gcc/cp/pt.cc  | 53 ---
 gcc/testsuite/g++.dg/cpp1y/noexcept1.C|  2 +-
 gcc/testsuite/g++.dg/cpp1y/var-templ70.C  | 22 
 .../g++.dg/cpp2a/concepts-friend10.C  | 24 +
 4 files changed, 82 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ70.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index db4e808adec..88a09891a00 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10447,10 +10447,10 @@ finish_template_variable (tree var, tsubst_flags_t 
complain)
   tree templ = TREE_OPERAND (var, 0);
   tree arglist = TREE_OPERAND (var, 1);

-  tree parms = DECL_TEMPLATE_PARMS (templ);
-  arglist =

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-16 Thread Jason Merrill via Gcc-patches


On 9/16/22 06:19, Thomas Neumann wrote:

The __register_frame/__deregister_frame functions are used to register
unwinding frames from JITed code in a sorted list. That list itself
is protected by object_mutex, which leads to terrible performance
in multi-threaded code and is somewhat expensive even if single-threaded.
There was already a fast-path that avoided taking the mutex if no
frame was registered at all.

This commit eliminates both the mutex and the sorted list from
the atomic fast path, and replaces it with a btree that uses
optimistic lock coupling during lookup. This allows for fully parallel
unwinding and is essential to scale exception handling to large
core counts.

Changes since v3:
- Avoid code duplication by adding query mode to classify_object_over_fdes
- Adjust all comments as requested

libgcc/ChangeLog:

     * unwind-dw2-fde.c (release_registered_frames): Cleanup at 
shutdown.
     (__register_frame_info_table_bases): Use btree in atomic fast 
path.

     (__deregister_frame_info_bases): Likewise.
     (_Unwind_Find_FDE): Likewise.
     (base_from_object): Make parameter const.
     (classify_object_over_fdes): Add query-only mode.
     (get_pc_range): Compute PC range for lookup.
     * unwind-dw2-fde.h (last_fde): Make parameter const.
     * unwind-dw2-btree.h: New file.
---
  libgcc/unwind-dw2-btree.h | 953 ++
  libgcc/unwind-dw2-fde.c   | 195 ++--
  libgcc/unwind-dw2-fde.h   |   2 +-
  3 files changed, 1098 insertions(+), 52 deletions(-)
  create mode 100644 libgcc/unwind-dw2-btree.h

diff --git a/libgcc/unwind-dw2-btree.h b/libgcc/unwind-dw2-btree.h
new file mode 100644
index 000..8853f0eab48
--- /dev/null
+++ b/libgcc/unwind-dw2-btree.h
@@ -0,0 +1,953 @@
+/* Lock-free btree for manually registered unwind frames.  */
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Thomas Neumann
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#ifndef GCC_UNWIND_DW2_BTREE_H
+#define GCC_UNWIND_DW2_BTREE_H
+
+#include 
+
+// Common logic for version locks.
+struct version_lock
+{
+  // The lock itself. The lowest bit indicates an exclusive lock,
+  // the second bit indicates waiting threads. All other bits are
+  // used as counter to recognize changes.
+  // Overflows are okay here, we must only prevent overflow to the
+  // same value within one lock_optimistic/validate
+  // range. Even on 32 bit platforms that would require 1 billion
+  // frame registrations within the time span of a few assembler
+  // instructions.
+  uintptr_t version_lock;
+};
+
+#ifdef __GTHREAD_HAS_COND
+// We should never get contention within the tree as it rarely changes.
+// But if we ever do get contention we use these for waiting.
+static __gthread_mutex_t version_lock_mutex = __GTHREAD_MUTEX_INIT;
+static __gthread_cond_t version_lock_cond = __GTHREAD_COND_INIT;
+#endif
+
+// Initialize in locked state.
+static inline void
+version_lock_initialize_locked_exclusive (struct version_lock *vl)
+{
+  vl->version_lock = 1;
+}
+
+// Try to lock the node exclusive.
+static inline bool
+version_lock_try_lock_exclusive (struct version_lock *vl)
+{
+  uintptr_t state = __atomic_load_n (&(vl->version_lock), 
__ATOMIC_SEQ_CST);

+  if (state & 1)
+    return false;
+  return __atomic_compare_exchange_n (&(vl->version_lock), , 
state | 1,

+  false, __ATOMIC_SEQ_CST,
+  __ATOMIC_SEQ_CST);
+}
+
+// Lock the node exclusive, blocking as needed.
+static void
+version_lock_lock_exclusive (struct version_lock *vl)
+{
+#ifndef __GTHREAD_HAS_COND
+restart:
+#endif
+
+  // We should virtually never get contention here, as frame
+  // changes are rare.
+  uintptr_t state = __atomic_load_n (&(vl->version_lock), 
__ATOMIC_SEQ_CST);

+  if (!(state & 1))
+    {
+  if (__atomic_compare_exchange_n (&(vl->version_lock), , 
state | 1,

+   false, __ATOMIC_SEQ_CST,
+   __ATOMIC_SEQ_CST))
+    return;
+    }
+
+    // We did get contention, wait properly.
+#ifdef __GTHREAD_HAS_COND
+

Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-16 Thread Alexander Monakov via Gcc-patches

On Fri, 16 Sep 2022, Uros Bizjak via Gcc-patches wrote:

> On Fri, Sep 16, 2022 at 3:32 AM Jeff Law via Gcc-patches
>  wrote:
> >
> >
> > On 9/15/22 19:06, liuhongt via Gcc-patches wrote:
> > > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
> > > reg + test reg, reg. I don't know exact reason why gcc do this.
> > >
> > > For latest x86 processors, ciscization should help processor frontend
> > > also codesize, for processor backend, they should be the same(has same
> > > uops).
> > >
> > > So the patch deleted the peephole2, and also modify another splitter to
> > > generate more cmp mem, 0 for 32-bit target.
> > >
> > > It will help instruction fetch.
> > >
> > > for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan 
> > > there's no
> > > comparison to 1 or -1, so adjust the testcase since under 32-bit
> > > target, we now generate cmp mem, 0 instead of load + test.
> > >
> > > Similar for pr78035.c.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > > No performance impact for SPEC2017 on ICX/Znver3.
> > >
> > It was almost certainly for PPro/P2 given it was rth's work from
> > 1999.Probably should have been conditionalized on PPro/P2 at the
> > time.   No worries losing it now...
> 
> Please add a tune flag in x86-tune.def under "Historical relics" and
> use it in the relevant peephole2 instead of deleting it.

When the next instruction after 'load mem; test reg, reg' is a conditional
branch, this disables macro-op fusion because Intel CPUs do not macro-fuse
'cmp mem, imm; jcc'.

It would be nice to rephrase the commit message to acknowledge this (the
statement 'has same uops' is not always true with this considered).

AMD CPUs can fuse some 'cmp mem, imm; jcc' under some conditions, so this
should be beneficial for AMD.

Alexander

Re: [PATCH] Rewrite NAN and sign handling in frange

2022-09-16 Thread Aldy Hernandez via Gcc-patches

On Fri, Sep 16, 2022 at 10:33 AM Richard Sandiford
 wrote:
>
> Aldy Hernandez via Gcc-patches  writes:
> > On Thu, Sep 15, 2022 at 9:06 AM Richard Biener
> >  wrote:
> >>
> >> On Thu, Sep 15, 2022 at 7:41 AM Aldy Hernandez  wrote:
> >> >
> >> > Hi Richard.  Hi all.
> >> >
> >> > The attatched patch rewrites the NAN and sign handling, dropping both
> >> > tristates in favor of a pair of boolean flags for NANs, and nothing at
> >> > all for signs.  The signs are tracked in the range itself, so now it's
> >> > possible to describe things like [-0.0, +0.0] +NAN, [+0, +0], [-5, +0],
> >> > [+0, 3] -NAN, etc.
> >> >
> >> > There are a lot of changes, as the tristate was quite pervasive.  I
> >> > could use another pair of eyes.  The code IMO is cleaner and handles
> >> > all the cases we discussed.
> >> >
> >> > Here is an example of the various ranges and how they are displayed:
> >> >
> >> > [frange] float VARYING NAN ;; Varying includes NAN
> >> > [frange] UNDEFINED  ;; Empty set as always
> >> > [frange] float [] NAN   ;; Unknown sign NAN
> >> > [frange] float [] -NAN  ;; -NAN
> >> > [frange] float [] +NAN  ;; +NAN
> >> > [frange] float [-0.0, 0.0]  ;; All zeros.
> >> > [frange] float [-0.0, -0.0] NAN ;; -0 or NAN.
> >> > [frange] float [-5.0e+0, -1.0e+0] +NAN  ;; [-5, -1] or +NAN
> >> > [frange] float [-5.0e+0, -0.0] NAN  ;; [-5, -0] or +-NAN
> >> > [frange] float [-5.0e+0, -0.0]  ;; [-5, -0]
> >> > [frange] float [5.0e+0, 1.0e+1] ;; [5, 10]
> >> >
> >> > We could represent an unknown sign with +NAN -NAN if preferred.
> >>
> >> maybe -+NAN or +-NAN?  I prefer to somehow show both signs for clarity
> >
> > Sure.
> >
> >>
> >> >
> >> > Notice the NAN signs are decoupled from the range, so we can represent
> >> > a negative range with a positive NAN.  For this range,
> >> > frange::known_bit() would return false, as only when the signs of the
> >> > NANs and range agree can we be certain.
> >> >
> >> > There is no longer any pessimization of ranges for intersects
> >> > involving NANs.  Also, union and intersect work with signed zeros:
> >> >
> >> > //   [-0,  x] U [+0,  x] => [-0,  x]
> >> > //   [ x, -0] U [ x, +0] => [ x, +0]
> >> > //   [-0,  x] ^ [+0,  x] => [+0,  x]
> >> > //   [ x, -0] ^ [ x, +0] => [ x, -0]
> >> >
> >> > The special casing for signed zeros in the singleton code is gone in
> >> > favor of just making sure the signs in the range agree, that is
> >> > [-0, -0] for example.
> >> >
> >> > I have removed the idea that a known NAN is a "range", so a NAN is no
> >> > longer in the endpoints itself.  Requesting the bound of a known NAN
> >> > is a hard fail.  For that matter, we don't store the actual NAN in the
> >> > range.  The only information we have are the set of boolean flags.
> >> > This way we make sure nothing seeps into the frange.  This also means
> >> > it's explicit that we don't track anything but the sign in NANs.  We
> >> > can revisit this if we desire to track signalling or whatever
> >> > concoction y'all can imagine.
> >> >
> >> > All in all, I'm quite happy with this.  It does look better, and we
> >> > handle all the corner cases we couldn't before.  Thanks for the
> >> > suggestion.
> >> >
> >> > Regstrapped with mpfr tests on x86-64 and ppc64le Linux.  Selftests
> >> > were also run with -ffinite-math-only on x86-64.
> >> >
> >> > At Jakub's suggestion, I built lapack with associated tests.  They
> >> > pass on x86-64 and ppc64le Linux with no regressions from mainline.
> >> > As a sanity check, I also ran them for -ffinite-math-only on x86 which
> >> > (as expected) returned:
> >> >
> >> > NaN arithmetic did not perform per the ieee spec
> >> >
> >> > Otherwise, all tests pass for -ffinite-math-only.
> >> >
> >> > How does this look?
> >>
> >> Overall it looks good.
> >>
> >> Reading ::intersect and ::union I find it less clear to spread out the _nan
> >> cases into separate functions.
> >
> > OK, will inline them.
> >
> >>
> >> Can you add a comment to frange that its representation is
> >> a single value-range specified by m_type, m_min, m_max
> >> unioned with the set of { -NaN, +NaN }?  Because somehow
> >> the ::undefined_p vs. m_type == VR_UNDEFINED checks are
> >> a bit confusing to the occasional reader can we instead use
> >> ::nan_p to complement ::undefined_p?
> >
> > Wouldn't that just make nan_p the same as known_nan?  Speaking of
> > which, I'm not a big fan of known_nan.  Perhaps we should rename all
> > the known_foo variants to foo_p variants?  Or...maybe even:
> >
> >   // fpclassify like API
> >   bool isfinite () const;
> >   bool isinf () const;
> >   bool maybe_isinf () const;
> >   bool isnan () const;
> >   bool maybe_isnan () const;
> >   bool signbit_p (bool ) const;
> >
> > That would make it clear how they map to the fpclassify API.  And the
> > signbit_p() follows what we do for

[pushed] c++: member fn in omp loc list [PR106858]

2022-09-16 Thread Jason Merrill via Gcc-patches

this->f names a member function, which isn't an addressable lvalue.  Give a
helpful error instead of crashing.  The first hunk makes the error range
cover the whole expression.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/106858

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_var_list_no_open): Pass the
initial token location down.
* semantics.cc (finish_omp_clauses): Check
invalid_nonstatic_memfn_p.
* typeck.cc (invalid_nonstatic_memfn_p): Handle null TREE_TYPE.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/map-3.C: New test.
---
 gcc/cp/parser.cc  | 7 +++
 gcc/cp/semantics.cc   | 4 
 gcc/cp/typeck.cc  | 3 ++-
 gcc/testsuite/g++.dg/gomp/map-3.C | 9 +
 4 files changed, 18 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/map-3.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 841ba6ed997..3cbe0d69de1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -36938,10 +36938,9 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, 
enum omp_clause_code kind,
  cp_id_kind idk = CP_ID_KIND_NONE;
  cp_lexer_consume_token (parser->lexer);
  decl = convert_from_reference (decl);
- decl
-   = cp_parser_postfix_dot_deref_expression (parser, ttype,
- decl, false,
- , loc);
+ decl = (cp_parser_postfix_dot_deref_expression
+ (parser, ttype, cp_expr (decl, token->location),
+  false, , loc));
}
  /* FALLTHROUGH.  */
case OMP_CLAUSE_AFFINITY:
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 427b1ab5ebc..86562071612 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -8119,6 +8119,10 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  t = TREE_OPERAND (t, 1);
  STRIP_NOPS (t);
}
+ if (TREE_CODE (t) == COMPONENT_REF
+ && invalid_nonstatic_memfn_p (EXPR_LOCATION (t), t,
+   tf_warning_or_error))
+   remove = true;
  indir_component_ref_p = false;
  if (TREE_CODE (t) == COMPONENT_REF
  && (TREE_CODE (TREE_OPERAND (t, 0)) == INDIRECT_REF
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 3e461d5cdcb..22d834d3a58 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -2196,7 +2196,8 @@ invalid_nonstatic_memfn_p (location_t loc, tree expr, 
tsubst_flags_t complain)
 return false;
   if (is_overloaded_fn (expr) && !really_overloaded_fn (expr))
 expr = get_first_fn (expr);
-  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (expr))
+  if (TREE_TYPE (expr)
+  && DECL_NONSTATIC_MEMBER_FUNCTION_P (expr))
 {
   if (complain & tf_error)
{
diff --git a/gcc/testsuite/g++.dg/gomp/map-3.C 
b/gcc/testsuite/g++.dg/gomp/map-3.C
new file mode 100644
index 000..c45f8509521
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gomp/map-3.C
@@ -0,0 +1,9 @@
+// PR c++/106858
+// { dg-additional-options "-fopenmp -fsanitize=undefined" }
+
+class A {
+  void f() {
+#pragma omp target map(this->f) // { dg-error "member function" }
+;
+  }
+};

base-commit: 3e8c4b925a9825fdb8c81f47b621f63108894362
-- 
2.31.1

Re: [PATCH] testsuite: Disable zero-scratch-regs-{7, 9, 11}.c on arm

2022-09-16 Thread Torbjorn SVENSSON via Gcc-patches


Hi all,

Appears that this is just a problem for gcc11 (and perhaps gcc12?). 
Master already has the needed implementation, so the patch below is not 
needed.


Sorry for the buzz.

Kind regards,
Torbjörn

On 2022-09-15 08:54, Torbjörn SVENSSON wrote:

-fzero-call-used-regs=all and -fzero-call-used-regs=all-gpr are not
supported on arm*. On arm-none-eabi, the testcases fails with:

   sorry, unimplemented: '-fzero-call-used-regs' not supported on this target

2022-09-15  Torbjörn SVENSSON  

gcc/testsuite/ChangeLog:

* c-c++-common/zero-scratch-regs-7.c: Skip on arm.
* c-c++-common/zero-scratch-regs-9.c: Likewise.
* c-c++-common/zero-scratch-regs-11.c: Likewise.

Co-Authored-By: Yvan ROUX  
Signed-off-by: Torbjörn SVENSSON  
---
  gcc/testsuite/c-c++-common/zero-scratch-regs-11.c | 2 +-
  gcc/testsuite/c-c++-common/zero-scratch-regs-7.c  | 1 +
  gcc/testsuite/c-c++-common/zero-scratch-regs-9.c  | 2 +-
  3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c
index b7739b2c6f6..6fd2a1dc382 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c
@@ -1,5 +1,5 @@
  /* { dg-do run } */
-/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* arm*-*-* nvptx*-*-* s390*-*-* loongarch64*-*-* } } } */
+/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* nvptx*-*-* s390*-*-* loongarch64*-*-* } } } */
  /* { dg-options "-O2 -fzero-call-used-regs=all" } */
  
  #include "zero-scratch-regs-10.c"

diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-7.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-7.c
index 2a4c8b2e73d..c684b4a02f9 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-7.c
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-7.c
@@ -1,5 +1,6 @@
  /* { dg-do run } */
  /* { dg-skip-if "not implemented" { ia64*-*-* } } */
+/* { dg-skip-if "not implemented" { arm*-*-* } } */
  /* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
  
  #include "zero-scratch-regs-1.c"

diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-9.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-9.c
index ea83bc146b7..0e8922053e8 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-9.c
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-9.c
@@ -1,5 +1,5 @@
  /* { dg-do run } */
-/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* arm*-*-* nvptx*-*-* s390*-*-* loongarch64*-*-* } } } */
+/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* nvptx*-*-* s390*-*-* loongarch64*-*-* } } } */
  /* { dg-options "-O2 -fzero-call-used-regs=all" } */
  
  #include "zero-scratch-regs-1.c"

[PATCH v2] MIPS: improve -march=native arch detection

2022-09-16 Thread YunQiang Su

If we cannot get info from options and cpuinfo, we try to get from:
  1. getauxval(AT_BASE_PLATFORM), introduced since Linux 5.7
  2. _MIPS_ARCH from host compiler.

mnan=2008 option is also used if __mips_nan2008__ is used.
This can fix the wrong loader usage on r5/r6 platform with
 -march=native.

gcc/ChangeLog:
* config.gcc: set with_arch to default_mips_arch if no defined.
* config/mips/driver-native.cc (host_detect_local_cpu):
  try getauxval(AT_BASE_PLATFORM) and _MIPS_ARCH, too.
  pass -mnan=2008 if __mips_nan2008__ is defined.
* config.in: define HAVE_SYS_AUXV_H and HAVE_GETAUXVAL.
* configure.ac: detect sys/auxv.h and getauxval.
* configure: regenerated.
---
 gcc/config.gcc   |  2 ++
 gcc/config.in| 10 ++
 gcc/config/mips/driver-native.cc | 25 ++---
 gcc/configure|  4 ++--
 gcc/configure.ac |  4 ++--
 5 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f4e757bd853..181a062825d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5590,6 +5590,8 @@ case ${target} in
esac
if test x$with_arch != x; then
default_mips_arch=$with_arch
+   else
+   with_arch=$default_mips_arch
fi
if test x$with_abi != x; then
default_mips_abi=$with_abi
diff --git a/gcc/config.in b/gcc/config.in
index 6ac17be189e..cc217b94e0c 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1939,6 +1939,12 @@
 #endif
 
 
+/* Define to 1 if you have the  header file. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_SYS_AUXV_H
+#endif
+
+
 /* Define to 1 if you have the  header file. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_SYS_FILE_H
@@ -2672,3 +2678,7 @@
 #undef vfork
 #endif
 
+/* Define to 1 if you have the `getauxval' function. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GETAUXVAL
+#endif
diff --git a/gcc/config/mips/driver-native.cc b/gcc/config/mips/driver-native.cc
index 47627f85ce1..327ad255c3e 100644
--- a/gcc/config/mips/driver-native.cc
+++ b/gcc/config/mips/driver-native.cc
@@ -23,6 +23,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+#ifdef HAVE_SYS_AUXV_H
+#include 
+#endif
 
 /* This will be called by the spec parser in gcc.cc when it sees
a %:local_cpu_detect(args) construct.  Currently it will be called
@@ -41,6 +44,7 @@ const char *
 host_detect_local_cpu (int argc, const char **argv)
 {
   const char *cpu = NULL;
+  char *ret = NULL;
   char buf[128];
   FILE *f;
   bool arch;
@@ -54,7 +58,7 @@ host_detect_local_cpu (int argc, const char **argv)
 
   f = fopen ("/proc/cpuinfo", "r");
   if (f == NULL)
-return NULL;
+goto fallback_cpu;
 
   while (fgets (buf, sizeof (buf), f) != NULL)
 if (startswith (buf, "cpu model"))
@@ -84,8 +88,23 @@ host_detect_local_cpu (int argc, const char **argv)
 
   fclose (f);
 
+fallback_cpu:
+#if defined (__mips_nan2008)
+  ret = reconcat (ret, " -mnan=2008 ", NULL);
+#endif
+
+#ifdef HAVE_GETAUXVAL
   if (cpu == NULL)
-return NULL;
+cpu = (const char *) getauxval (AT_BASE_PLATFORM);
+#endif
+
+#if defined (_MIPS_ARCH)
+  if (cpu == NULL)
+cpu = _MIPS_ARCH;
+#endif
+
+  if (cpu)
+ret = reconcat (ret, ret, "-m", argv[0], "=", cpu, NULL);
 
-  return concat ("-m", argv[0], "=", cpu, NULL);
+  return ret;
 }
diff --git a/gcc/configure b/gcc/configure
index 817d765568e..a419ac66576 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -9327,7 +9327,7 @@ $as_echo "#define GWINSZ_IN_SYS_IOCTL 1" >>confdefs.h
 fi
 
 for ac_header in limits.h stddef.h string.h strings.h stdlib.h time.h iconv.h \
-fcntl.h ftw.h unistd.h sys/file.h sys/time.h sys/mman.h \
+fcntl.h ftw.h unistd.h sys/auxv.h sys/file.h sys/time.h 
sys/mman.h \
 sys/resource.h sys/param.h sys/times.h sys/stat.h 
sys/locking.h \
 direct.h malloc.h langinfo.h ldfcn.h locale.h wchar.h
 do :
@@ -10622,7 +10622,7 @@ fi
 for ac_func in times clock kill getrlimit setrlimit atoq \
popen sysconf strsignal getrusage nl_langinfo \
gettimeofday mbstowcs wcswidth mmap posix_fallocate setlocale \
-   clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked 
fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked fputc_unlocked 
fputs_unlocked   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked  
 putchar_unlocked putc_unlocked madvise mallinfo mallinfo2 fstatat
+   clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked 
fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked fputc_unlocked 
fputs_unlocked   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked  
 putchar_unlocked putc_unlocked madvise mallinfo mallinfo2 fstatat getauxval
 do :
   as_ac_var=`$as_echo

Re: [PATCH] c++: 'mutable' within constexpr [PR92505]

2022-09-16 Thread Jason Merrill via Gcc-patches


On 9/15/22 14:03, Patrick Palka wrote:

This patch permits accessing 'mutable' members of local objects during
constexpr evaluation (which other compilers seem to accept in C++14
mode, while we reject), while continuing to reject it for global objects
(as in the last line of cpp0x/constexpr-mutable1.C, which other
compilers also reject).  To distinguish between the two cases, it looks
like we just need to additionally check CONSTRUCTOR_MUTABLE_POISION
alongside DECL_MUTABLE_P in cxx_eval_component_reference before
rejecting the access.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/92505

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_component_reference): Test non_constant_p
earlier.  In C++14 or later, reject DECL_MUTABLE_P member
accesses only if CONSTRUCTOR_MUTABLE_POISION is also set.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-mutable3.C: New test.
* g++.dg/cpp1y/constexpr-mutable1.C: New test.
---
  gcc/cp/constexpr.cc | 11 +++
  gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C |  7 +++
  gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C | 16 
  3 files changed, 30 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 57283eabf3c..10639876d9c 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4088,6 +4088,8 @@ cxx_eval_component_reference (const constexpr_ctx *ctx, 
tree t,
tree whole = cxx_eval_constant_expression (ctx, orig_whole,
 lval,
 non_constant_p, overflow_p);
+  if (*non_constant_p)
+return t;
if (INDIRECT_REF_P (whole)
&& integer_zerop (TREE_OPERAND (whole, 0)))
  {
@@ -4108,20 +4110,21 @@ cxx_eval_component_reference (const constexpr_ctx *ctx, 
tree t,
whole, part, NULL_TREE);
/* Don't VERIFY_CONSTANT here; we only want to check that we got a
   CONSTRUCTOR.  */
-  if (!*non_constant_p && TREE_CODE (whole) != CONSTRUCTOR)
+  if (TREE_CODE (whole) != CONSTRUCTOR)
  {
if (!ctx->quiet)
error ("%qE is not a constant expression", orig_whole);
*non_constant_p = true;
+  return t;
  }
-  if (DECL_MUTABLE_P (part))
+  if ((cxx_dialect < cxx14 || CONSTRUCTOR_MUTABLE_POISON (whole))
+  && DECL_MUTABLE_P (part))
  {
if (!ctx->quiet)
error ("mutable %qD is not usable in a constant expression", part);
*non_constant_p = true;
+  return t;
  }
-  if (*non_constant_p)
-return t;
bool pmf = TYPE_PTRMEMFUNC_P (TREE_TYPE (whole));
FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (whole), i, field, value)
  {
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C
new file mode 100644
index 000..46c9d8437be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable3.C
@@ -0,0 +1,7 @@
+// PR c++/92505
+// { dg-do compile { target c++11 } }
+
+struct A { mutable int m; };
+constexpr int f(A a) { return a.m; }
+static_assert(f({42}) == 42, "");
+// { dg-error "non-constant|mutable" "" { target c++11_only } .-1 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C
new file mode 100644
index 000..6c47988c01a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-mutable1.C
@@ -0,0 +1,16 @@
+// PR c++/92505
+// { dg-do compile { target c++14 } }
+
+struct S { mutable int m; };
+
+static_assert(S{42}.m == 42, "");
+
+constexpr int f() {
+  S s = {40};
+  s.m++;
+  const auto& cs = s;
+  ++cs.m;
+  return cs.m;
+}
+
+static_assert(f() == 42, "");

Re: [PATCH] c++: constraint matching, TEMPLATE_ID_EXPR, current inst

2022-09-16 Thread Jason Merrill via Gcc-patches


On 9/15/22 11:58, Patrick Palka wrote:

Here we're crashing during constraint matching for the instantiated
hidden friends due to two issues with dependent substitution into a
TEMPLATE_ID_EXPR naming a template from the current instantiation
(as performed from maybe_substitute_reqs_for for C<3> with T=T):

   * tsubst_copy substitutes into such a TEMPLATE_DECL by looking it
 up from the substituted class scope.  But for this to not fail when
 the args are dependent, we need to pass entering_scope=true for the
 class scope substitution so that we obtain the primary template type
 A (which has TYPE_BINFO) instead of the implicit instantiation
 A (which doesn't).
   * lookup_and_finish_template_variable shouldn't instantiate a
 TEMPLATE_ID_EXPR that names a TEMPLATE_DECL which has more than
 one level of (unsubstituted) parameters (such as A::C).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Don't
instantiate if the template's scope is dependent.
(tsubst_copy) : Pass entering_scope=true
when substituting the class scope.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-friend10.C: New test.
---
  gcc/cp/pt.cc  | 14 +++--
  .../g++.dg/cpp2a/concepts-friend10.C  | 21 +++
  2 files changed, 29 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index db4e808adec..bfcbe0b8670 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10475,14 +10475,15 @@ tree
  lookup_and_finish_template_variable (tree templ, tree targs,
 tsubst_flags_t complain)
  {
-  templ = lookup_template_variable (templ, targs);
-  if (!any_dependent_template_arguments_p (targs))
+  tree var = lookup_template_variable (templ, targs);
+  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) == 1
+  && !any_dependent_template_arguments_p (targs))


I notice that finish_id_expression_1 uses the equivalent of 
type_dependent_expression_p (var).  Does that work here?



  {
-  templ = finish_template_variable (templ, complain);
-  mark_used (templ);
+  var = finish_template_variable (var, complain);
+  mark_used (var);
  }
  
-  return convert_from_reference (templ);

+  return convert_from_reference (var);
  }
  
  /* If the set of template parameters PARMS contains a template parameter

@@ -17282,7 +17283,8 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 TEMPLATE_DECL with `D' as its DECL_CONTEXT.  Now we
 have to substitute this with one having context `D'.  */
  
-	  tree context = tsubst (DECL_CONTEXT (t), args, complain, in_decl);

+ tree context = tsubst_aggr_type (DECL_CONTEXT (t), args, complain,
+  in_decl, /*entering_scope=*/true);
  return lookup_field (context, DECL_NAME(t), 0, false);
}
else


This hunk is OK.


diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C
new file mode 100644
index 000..4b21a379f59
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-friend10.C
@@ -0,0 +1,21 @@
+// { dg-do compile { target c++20 } }
+// Verify we don't crash during constraint matching containing
+// a TEMPLATE_ID_EXPR referring to a template from the current
+// instantiation.
+
+template
+struct A {
+  template static constexpr bool C = sizeof(T) > N;
+  friend constexpr void f(A) requires C<3> { }
+  friend constexpr void f(A) requires C<3> || true { }
+};
+
+template
+struct A {
+  template static constexpr bool C = sizeof(T) > N;
+  friend constexpr void g(A) requires C<3> { }
+  friend constexpr void g(A) requires C<3> || true { }
+};
+
+template struct A;
+template struct A;

Re: [PATCH] Introduce -nolibstdc++ option

2022-09-16 Thread Jason Merrill via Gcc-patches


On 9/16/22 07:52, Jason Merrill wrote:

On 6/24/22 01:23, Alexandre Oliva via Gcc-patches wrote:

On Jun 23, 2022, Alexandre Oliva  wrote:


Here's the patch.  Regstrapped on x86_64-linux-gnu, also tested with a
cross to aarch64-rtems6.  Ok to install?



Introduce -nostdlib++ option


Uhh, I went ahead and installed this.  The earlier patch was approved if
nobody objected, and so, having overcome the objection to the option
spelling, it ended up in my "approved" patchset.

In case there are objections to it, please let me know, and I'll revert
it promptly, but I guess it makes little sense to revert it on the odd
change that someone does.  Thanks for your understanding.


I'm getting failures from pure-virtual1.C with

xg++: error: unrecognized command-line option '-nostdlib++'

I guess that's because it isn't handled by the specs in the way nostdlib 
and nodefautlibs are.  Maybe the solution is to set SKIPOPT in the driver?


Are you not seeing this problem?


Now of course I notice that it's been months since you installed the 
patch, I wonder what broke it...

Re: [PATCH] Introduce -nolibstdc++ option

2022-09-16 Thread Jason Merrill via Gcc-patches


On 6/24/22 01:23, Alexandre Oliva via Gcc-patches wrote:

On Jun 23, 2022, Alexandre Oliva  wrote:


Here's the patch.  Regstrapped on x86_64-linux-gnu, also tested with a
cross to aarch64-rtems6.  Ok to install?



Introduce -nostdlib++ option


Uhh, I went ahead and installed this.  The earlier patch was approved if
nobody objected, and so, having overcome the objection to the option
spelling, it ended up in my "approved" patchset.

In case there are objections to it, please let me know, and I'll revert
it promptly, but I guess it makes little sense to revert it on the odd
change that someone does.  Thanks for your understanding.


I'm getting failures from pure-virtual1.C with

xg++: error: unrecognized command-line option '-nostdlib++'

I guess that's because it isn't handled by the specs in the way nostdlib 
and nodefautlibs are.  Maybe the solution is to set SKIPOPT in the driver?


Are you not seeing this problem?From e648ff579bb4b4a690553d4c6f4a3a1b7ff7a287 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Fri, 16 Sep 2022 13:51:49 +0200
Subject: [PATCH] c++: fix -nostdlib++
To: gcc-patches@gcc.gnu.org

gcc/cp/ChangeLog:

	* g++spec.cc (lang_specific_driver): Set SKIPOPT on -nostdlib++.
---
 gcc/cp/g++spec.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/g++spec.cc b/gcc/cp/g++spec.cc
index b63d8350ba1..31345f7869e 100644
--- a/gcc/cp/g++spec.cc
+++ b/gcc/cp/g++spec.cc
@@ -158,8 +158,10 @@ lang_specific_driver (struct cl_decoded_option **in_decoded_options,
 
   switch (decoded_options[i].opt_index)
 	{
-	case OPT_nostdlib:
 	case OPT_nostdlib__:
+	  args[i] |= SKIPOPT;
+	  /* FALLTHRU */
+	case OPT_nostdlib:
 	case OPT_nodefaultlibs:
 	  library = -1;
 	  break;
-- 
2.31.1

Re: [PATCH] c++: Implement P1467R9 - Extended floating-point types and standard names compiler part except for bfloat16 [PR106652]

2022-09-16 Thread Jason Merrill via Gcc-patches


On 9/12/22 04:05, Jakub Jelinek wrote:

Hi!

The following patch implements the compiler part of C++23
P1467R9 - Extended floating-point types and standard names compiler part
by introducing _Float{16,32,64,128} as keywords and builtin types
like they are implemented for C already since GCC 7.
It doesn't introduce _Float{32,64,128}x for C++, those remain C only
for now, mainly because 
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling
has mangling for:
::= DF  _ # ISO/IEC TS 18661 binary floating point type _FloatN (N bits)
but doesn't for _FloatNx.  And it doesn't add anything for bfloat16_t
support, see below.
Regarding mangling, I think mangling _FloatNx as DF  x _ would be
possible, but would need to be discussed and voted in.


As you've seen, I opened a pull request for these.  I think we can go 
ahead and implement that and make sure it's resolved before the GCC 13 
release.


Or we could temporarily mangle them as an extension, i.e. u9_Float32x.

I would expect _Float64x, at least, to be fairly popular.


As there is no _FloatNx support for C++, I think it is wrong to announce
it through __FLT{32,64,128}X_*__ predefined macros (so the patch disables
those for C++; unfortunately g++ 7 to 12 will predefine those and also
__FLT{32,64,128}_*__ even when _FloatN support isn't implemented).



The patch wants to keep backwards compatibility with how __float128 has
been handled in C++ before, both for mangling and behavior in binary
operations, overload resolution etc.  So, there are some backend changes
where for C __float128 and _Float128 are the same type (float128_type_node
and float128t_type_node are the same pointer), but for C++ they are distinct
types which mangle differently and _Float128 is treated as extended
floating-point type while __float128 is treated as non-standard floating
point type.


How important do you think this backwards compatibility is?

As I mentioned in the ABI proposal, I think it makes sense to make 
__float128 an alias for std::float128_t, and continue using the current 
mangling for __float128.


I don't think we want the two types to have different semantics.  If we 
want to support existing __float128 code that relies on implicit 
narrowing conversions, we could allow them generally with a pedwarn 
using the 'bad' conversion machinery.  That's probably useful anyway for 
better diagnostics.



The various C++23 changes about how floating-point types
are changed are actually implemented as written in the spec only if at least
one of the types involved is _Float{16,32,64,128} and kept previous behavior
otherwise.  For float/double/long double the rules are actually written that
they behave the same as before.
There is some backwards incompatibility at least on x86 regarding _Float16,
because that type was already used by that name and with the DF16_ mangling
(but only since GCC 12 and I think it isn't that widely used in the wild
yet).  E.g. config/i386/avx512fp16intrin.h shows the issues, where
in C or in GCC 12 in C++ one could pass 0.0f to a builtin taking _Float16
argument, but with the changes that is not possible anymore, one needs
to either use 0.0f16 or (_Float16) 0.0f.
We have also a problem with glibc headers, where since glibc 2.27
math.h and complex.h aren't compilable with these changes.  One gets
errors like:
In file included from /usr/include/math.h:43,
  from abc.c:1:
/usr/include/bits/floatn.h:86:9: error: multiple types in one declaration
86 | typedef __float128 _Float128;
   | ^~
/usr/include/bits/floatn.h:86:20: error: declaration does not declare anything 
[-fpermissive]
86 | typedef __float128 _Float128;
   |^
In file included from /usr/include/bits/floatn.h:119:
/usr/include/bits/floatn-common.h:214:9: error: multiple types in one 
declaration
   214 | typedef float _Float32;
   | ^
/usr/include/bits/floatn-common.h:214:15: error: declaration does not declare 
anything [-fpermissive]
   214 | typedef float _Float32;
   |   ^~~~
/usr/include/bits/floatn-common.h:251:9: error: multiple types in one 
declaration
   251 | typedef double _Float64;
   | ^~
/usr/include/bits/floatn-common.h:251:16: error: declaration does not declare 
anything [-fpermissive]
   251 | typedef double _Float64;
   |^~~~
This is from snippets like:
/* The remaining of this file provides support for older compilers.  */
# if __HAVE_FLOAT128

/* The type _Float128 exists only since GCC 7.0.  */
#  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
typedef __float128 _Float128;
#  endif
where it hardcodes that C++ doesn't have _Float{16,32,64,128} support nor
{f,F}{16,32,64,128} literal suffixes nor _Complex _Float{16,32,64,128}.
The patch fixincludes this for now and hopefully if this is committed, then
glibc can change those.  Right now the patch changes those
#  if !__GNUC_PREREQ (7, 0) || defined __cplusplus

[PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-16 Thread Thomas Neumann via Gcc-patches


The __register_frame/__deregister_frame functions are used to register
unwinding frames from JITed code in a sorted list. That list itself
is protected by object_mutex, which leads to terrible performance
in multi-threaded code and is somewhat expensive even if single-threaded.
There was already a fast-path that avoided taking the mutex if no
frame was registered at all.

This commit eliminates both the mutex and the sorted list from
the atomic fast path, and replaces it with a btree that uses
optimistic lock coupling during lookup. This allows for fully parallel
unwinding and is essential to scale exception handling to large
core counts.

Changes since v3:
- Avoid code duplication by adding query mode to classify_object_over_fdes
- Adjust all comments as requested

libgcc/ChangeLog:

* unwind-dw2-fde.c (release_registered_frames): Cleanup at shutdown.
(__register_frame_info_table_bases): Use btree in atomic fast path.
(__deregister_frame_info_bases): Likewise.
(_Unwind_Find_FDE): Likewise.
(base_from_object): Make parameter const.
(classify_object_over_fdes): Add query-only mode.
(get_pc_range): Compute PC range for lookup.
* unwind-dw2-fde.h (last_fde): Make parameter const.
* unwind-dw2-btree.h: New file.
---
 libgcc/unwind-dw2-btree.h | 953 ++
 libgcc/unwind-dw2-fde.c   | 195 ++--
 libgcc/unwind-dw2-fde.h   |   2 +-
 3 files changed, 1098 insertions(+), 52 deletions(-)
 create mode 100644 libgcc/unwind-dw2-btree.h

diff --git a/libgcc/unwind-dw2-btree.h b/libgcc/unwind-dw2-btree.h
new file mode 100644
index 000..8853f0eab48
--- /dev/null
+++ b/libgcc/unwind-dw2-btree.h
@@ -0,0 +1,953 @@
+/* Lock-free btree for manually registered unwind frames.  */
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Thomas Neumann
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#ifndef GCC_UNWIND_DW2_BTREE_H
+#define GCC_UNWIND_DW2_BTREE_H
+
+#include 
+
+// Common logic for version locks.
+struct version_lock
+{
+  // The lock itself. The lowest bit indicates an exclusive lock,
+  // the second bit indicates waiting threads. All other bits are
+  // used as counter to recognize changes.
+  // Overflows are okay here, we must only prevent overflow to the
+  // same value within one lock_optimistic/validate
+  // range. Even on 32 bit platforms that would require 1 billion
+  // frame registrations within the time span of a few assembler
+  // instructions.
+  uintptr_t version_lock;
+};
+
+#ifdef __GTHREAD_HAS_COND
+// We should never get contention within the tree as it rarely changes.
+// But if we ever do get contention we use these for waiting.
+static __gthread_mutex_t version_lock_mutex = __GTHREAD_MUTEX_INIT;
+static __gthread_cond_t version_lock_cond = __GTHREAD_COND_INIT;
+#endif
+
+// Initialize in locked state.
+static inline void
+version_lock_initialize_locked_exclusive (struct version_lock *vl)
+{
+  vl->version_lock = 1;
+}
+
+// Try to lock the node exclusive.
+static inline bool
+version_lock_try_lock_exclusive (struct version_lock *vl)
+{
+  uintptr_t state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST);
+  if (state & 1)
+return false;
+  return __atomic_compare_exchange_n (&(vl->version_lock), , state | 1,
+ false, __ATOMIC_SEQ_CST,
+ __ATOMIC_SEQ_CST);
+}
+
+// Lock the node exclusive, blocking as needed.
+static void
+version_lock_lock_exclusive (struct version_lock *vl)
+{
+#ifndef __GTHREAD_HAS_COND
+restart:
+#endif
+
+  // We should virtually never get contention here, as frame
+  // changes are rare.
+  uintptr_t state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST);
+  if (!(state & 1))
+{
+  if (__atomic_compare_exchange_n (&(vl->version_lock), , state | 1,
+  false, __ATOMIC_SEQ_CST,
+  __ATOMIC_SEQ_CST))
+   return;
+}
+
+// We did get contention, wait properly.
+#ifdef __GTHREAD_HAS_COND
+

[committed] libstdc++: Document new libstdc++.so symbol versions

2022-09-16 Thread Jonathan Wakely via Gcc-patches

Pushed to trunk. I'll backport the first line to gcc-12 too.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/abi.xml: Document GLIBCXX_3.4.30 and
GLIBCXX_3.4.31 versions.
* doc/html/manual/abi.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/abi.html | 2 +-
 libstdc++-v3/doc/xml/manual/abi.xml   | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/doc/xml/manual/abi.xml 
b/libstdc++-v3/doc/xml/manual/abi.xml
index c2c0c028a8b..0153395f477 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -348,6 +348,8 @@ compatible.
 GCC 9.3.0: GLIBCXX_3.4.28, CXXABI_1.3.12
 GCC 10.1.0: GLIBCXX_3.4.28, CXXABI_1.3.12
 GCC 11.1.0: GLIBCXX_3.4.29, CXXABI_1.3.13
+GCC 12.1.0: GLIBCXX_3.4.30, CXXABI_1.3.13
+GCC 13.1.0: GLIBCXX_3.4.31, CXXABI_1.3.13
 
 
 
-- 
2.37.3

Re: [PATCH] Rewrite NAN and sign handling in frange

2022-09-16 Thread Richard Sandiford via Gcc-patches

Aldy Hernandez via Gcc-patches  writes:
> On Thu, Sep 15, 2022 at 9:06 AM Richard Biener
>  wrote:
>>
>> On Thu, Sep 15, 2022 at 7:41 AM Aldy Hernandez  wrote:
>> >
>> > Hi Richard.  Hi all.
>> >
>> > The attatched patch rewrites the NAN and sign handling, dropping both
>> > tristates in favor of a pair of boolean flags for NANs, and nothing at
>> > all for signs.  The signs are tracked in the range itself, so now it's
>> > possible to describe things like [-0.0, +0.0] +NAN, [+0, +0], [-5, +0],
>> > [+0, 3] -NAN, etc.
>> >
>> > There are a lot of changes, as the tristate was quite pervasive.  I
>> > could use another pair of eyes.  The code IMO is cleaner and handles
>> > all the cases we discussed.
>> >
>> > Here is an example of the various ranges and how they are displayed:
>> >
>> > [frange] float VARYING NAN ;; Varying includes NAN
>> > [frange] UNDEFINED  ;; Empty set as always
>> > [frange] float [] NAN   ;; Unknown sign NAN
>> > [frange] float [] -NAN  ;; -NAN
>> > [frange] float [] +NAN  ;; +NAN
>> > [frange] float [-0.0, 0.0]  ;; All zeros.
>> > [frange] float [-0.0, -0.0] NAN ;; -0 or NAN.
>> > [frange] float [-5.0e+0, -1.0e+0] +NAN  ;; [-5, -1] or +NAN
>> > [frange] float [-5.0e+0, -0.0] NAN  ;; [-5, -0] or +-NAN
>> > [frange] float [-5.0e+0, -0.0]  ;; [-5, -0]
>> > [frange] float [5.0e+0, 1.0e+1] ;; [5, 10]
>> >
>> > We could represent an unknown sign with +NAN -NAN if preferred.
>>
>> maybe -+NAN or +-NAN?  I prefer to somehow show both signs for clarity
>
> Sure.
>
>>
>> >
>> > Notice the NAN signs are decoupled from the range, so we can represent
>> > a negative range with a positive NAN.  For this range,
>> > frange::known_bit() would return false, as only when the signs of the
>> > NANs and range agree can we be certain.
>> >
>> > There is no longer any pessimization of ranges for intersects
>> > involving NANs.  Also, union and intersect work with signed zeros:
>> >
>> > //   [-0,  x] U [+0,  x] => [-0,  x]
>> > //   [ x, -0] U [ x, +0] => [ x, +0]
>> > //   [-0,  x] ^ [+0,  x] => [+0,  x]
>> > //   [ x, -0] ^ [ x, +0] => [ x, -0]
>> >
>> > The special casing for signed zeros in the singleton code is gone in
>> > favor of just making sure the signs in the range agree, that is
>> > [-0, -0] for example.
>> >
>> > I have removed the idea that a known NAN is a "range", so a NAN is no
>> > longer in the endpoints itself.  Requesting the bound of a known NAN
>> > is a hard fail.  For that matter, we don't store the actual NAN in the
>> > range.  The only information we have are the set of boolean flags.
>> > This way we make sure nothing seeps into the frange.  This also means
>> > it's explicit that we don't track anything but the sign in NANs.  We
>> > can revisit this if we desire to track signalling or whatever
>> > concoction y'all can imagine.
>> >
>> > All in all, I'm quite happy with this.  It does look better, and we
>> > handle all the corner cases we couldn't before.  Thanks for the
>> > suggestion.
>> >
>> > Regstrapped with mpfr tests on x86-64 and ppc64le Linux.  Selftests
>> > were also run with -ffinite-math-only on x86-64.
>> >
>> > At Jakub's suggestion, I built lapack with associated tests.  They
>> > pass on x86-64 and ppc64le Linux with no regressions from mainline.
>> > As a sanity check, I also ran them for -ffinite-math-only on x86 which
>> > (as expected) returned:
>> >
>> > NaN arithmetic did not perform per the ieee spec
>> >
>> > Otherwise, all tests pass for -ffinite-math-only.
>> >
>> > How does this look?
>>
>> Overall it looks good.
>>
>> Reading ::intersect and ::union I find it less clear to spread out the _nan
>> cases into separate functions.
>
> OK, will inline them.
>
>>
>> Can you add a comment to frange that its representation is
>> a single value-range specified by m_type, m_min, m_max
>> unioned with the set of { -NaN, +NaN }?  Because somehow
>> the ::undefined_p vs. m_type == VR_UNDEFINED checks are
>> a bit confusing to the occasional reader can we instead use
>> ::nan_p to complement ::undefined_p?
>
> Wouldn't that just make nan_p the same as known_nan?  Speaking of
> which, I'm not a big fan of known_nan.  Perhaps we should rename all
> the known_foo variants to foo_p variants?  Or...maybe even:
>
>   // fpclassify like API
>   bool isfinite () const;
>   bool isinf () const;
>   bool maybe_isinf () const;
>   bool isnan () const;
>   bool maybe_isnan () const;
>   bool signbit_p (bool ) const;
>
> That would make it clear how they map to the fpclassify API.  And the
> signbit_p() follows what we do for singleton_p(tree *).
>
> isnan() would be your nan_p suggestion.

FWIW, the reason I didn't do this with the poly_int stuff is that
it makes negative conditions harder to reason about.  It's easy for
tired eyes to read:

   !isfinite()

as meaning "is infinite", especially since there

[PATCH] vect: Fix SLP layout handling of masked loads [PR106794]

2022-09-16 Thread Richard Sandiford via Gcc-patches

PR106794 shows that I'd forgotten about masked loads when
doing the SLP layout changes.  These loads can't currently
be permuted independently of their mask input, so during
construction they never get a load permutation.

(If we did support permuting masked loads in future, the mask
would need to be in the right order for the load, rather than in
the order implied by the result of the permutation.  Since masked
loads can't be partly or fully scalarised in the way that normal
permuted loads can be, there's probably no benefit to fusing the
permutation and the load.  Permutation after the fact is probably
good enough.)

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR tree-optimization/106794
PR tree-optimization/106914
* tree-vect-slp.cc (vect_optimize_slp_pass::internal_node_cost):
Only consider loads that already have a permutation.
(vect_optimize_slp_pass::start_choosing_layouts): Assert that
loads with permutations are leaf nodes.  Prevent any kind of grouped
access from changing layout if it doesn't have a load permutation.

gcc/testsuite/
* gcc.dg/vect/pr106914.c: New test.
* g++.dg/vect/pr106794.cc: Likewise.
---
 gcc/testsuite/g++.dg/vect/pr106794.cc | 40 +++
 gcc/testsuite/gcc.dg/vect/pr106914.c  | 15 ++
 gcc/tree-vect-slp.cc  | 30 ++--
 3 files changed, 77 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr106794.cc
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr106914.c

diff --git a/gcc/testsuite/g++.dg/vect/pr106794.cc 
b/gcc/testsuite/g++.dg/vect/pr106794.cc
new file mode 100644
index 000..f056563c4e1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr106794.cc
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast" } */
+/* { dg-additional-options "-march=bdver2" { target x86_64-*-* i?86-*-* } } */
+
+template  struct Vector3 {
+  Vector3();
+  Vector3(T, T, T);
+  T length() const;
+  T x, y, z;
+};
+template 
+Vector3::Vector3(T _x, T _y, T _z) : x(_x), y(_y), z(_z) {}
+Vector3 cross(Vector3 a, Vector3 b) {
+  return Vector3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z,
+a.x * b.y - a.y * b.x);
+}
+template  T Vector3::length() const { return z; }
+int generateNormals_i;
+float generateNormals_p2_0, generateNormals_p0_0;
+struct SphereMesh {
+  void generateNormals();
+  float vertices;
+};
+void SphereMesh::generateNormals() {
+  Vector3 *faceNormals = new Vector3;
+  for (int j; j; j++) {
+float *p0 =  + 3, *p1 =  + j * 3, *p2 =  + 3,
+  *p3 =  + generateNormals_i + j * 3;
+Vector3 v0(p1[0] - generateNormals_p0_0, p1[1] - 1, p1[2] - 2),
+v1(0, 1, 2);
+if (v0.length())
+  v1 = Vector3(p3[0] - generateNormals_p2_0, p3[1] - p2[1],
+  p3[2] - p2[2]);
+else
+  v1 = Vector3(generateNormals_p0_0 - p3[0], p0[1] - p3[1],
+  p0[2] - p3[2]);
+Vector3 faceNormal = cross(v0, v1);
+faceNormals[j] = faceNormal;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr106914.c 
b/gcc/testsuite/gcc.dg/vect/pr106914.c
new file mode 100644
index 000..9d9b3e30081
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr106914.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fprofile-generate" } */
+/* { dg-additional-options "-mavx512vl" { target x86_64-*-* i?86-*-* } } */
+
+int *mask_slp_int64_t_8_2_x, *mask_slp_int64_t_8_2_y, *mask_slp_int64_t_8_2_z;
+
+void
+__attribute__mask_slp_int64_t_8_2() {
+  for (int i; i; i += 8) {
+mask_slp_int64_t_8_2_x[i + 6] =
+mask_slp_int64_t_8_2_y[i + 6] ? mask_slp_int64_t_8_2_z[i] : 1;
+mask_slp_int64_t_8_2_x[i + 7] =
+mask_slp_int64_t_8_2_y[i + 7] ? mask_slp_int64_t_8_2_z[i + 7] : 2;
+  }
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ca3422c2a1e..229f2663ebc 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4494,7 +4494,8 @@ vect_optimize_slp_pass::internal_node_cost (slp_tree 
node, int in_layout_i,
   stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node);
   if (rep
   && STMT_VINFO_DATA_REF (rep)
-  && DR_IS_READ (STMT_VINFO_DATA_REF (rep)))
+  && DR_IS_READ (STMT_VINFO_DATA_REF (rep))
+  && SLP_TREE_LOAD_PERMUTATION (node).exists ())
 {
   auto_load_permutation_t tmp_perm;
   tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node));
@@ -4569,8 +4570,12 @@ vect_optimize_slp_pass::start_choosing_layouts ()
   if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
{
  /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node
-unpermuted, record a layout that reverses this permutation.  */
- gcc_assert (partition.layout == 0);
+unpermuted, record a layout that reverses this permutation.
+
+We would need more work to cope with loads that are internally
+

[PATCH] vect: Fix missed gather load opportunity

2022-09-16 Thread Richard Sandiford via Gcc-patches

While writing a testcase for PR106794, I noticed that we failed
to vectorise the testcase in the patch for SVE.  The code that
recognises gather loads tries to optimise the point at which
the offset is calculated, to avoid unnecessary extensions or
truncations:

  /* Don't include the conversion if the target is happy with
 the current offset type.  */

But breaking only makes sense if we're at an SSA_NAME (which could
then be vectorised).  We shouldn't break on a conversion embedded
in a generic expression.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vect-data-refs.cc (vect_check_gather_scatter): Restrict
early-out optimisation to SSA_NAMEs.

gcc/testsuite/
* gcc.dg/vect/vect-gather-5.c: New test.
---
 gcc/testsuite/gcc.dg/vect/vect-gather-5.c | 42 +++
 gcc/tree-vect-data-refs.cc|  1 +
 2 files changed, 43 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-5.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-5.c
new file mode 100644
index 000..8b5074bba88
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-5.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+#ifdef __aarch64__
+#pragma GCC target "+sve"
+#endif
+
+long a[100], b[100], c[100];
+
+void g1 ()
+{
+  for (int i = 0; i < 100; i += 2)
+{
+  c[i] += a[b[i]] + 1;
+  c[i + 1] += a[b[i + 1]] + 2;
+}
+}
+
+long g2 ()
+{
+  long res = 0;
+  for (int i = 0; i < 100; i += 2)
+{
+  res += a[b[i + 1]];
+  res += a[b[i]];
+}
+  return res;
+}
+
+long g3 ()
+{
+  long res = 0;
+  for (int i = 0; i < 100; i += 2)
+{
+  res += a[b[i]];
+  res += a[b[i + 1]];
+}
+  return res;
+}
+
+/* { dg-final { scan-tree-dump-times {add new stmt[^\n]*GATHER_LOAD} 3 "vect" 
{ target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-not {add new stmt[^\n]*VEC_PERM_EXPR} "vect" { 
target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index b279a82551e..e03b50498d1 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4151,6 +4151,7 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
  /* Don't include the conversion if the target is happy with
 the current offset type.  */
  if (use_ifn_p
+ && TREE_CODE (off) == SSA_NAME
  && !POINTER_TYPE_P (TREE_TYPE (off))
  && vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr),
   masked_p, vectype, memory_type,
-- 
2.25.1

Re: [PATCH] c++: modules ICE with typename friend declaration

2022-09-16 Thread Nathan Sidwell via Gcc-patches

Thanks, this looks right. Sigh templates can mess up ones mental invariants!

The test case should really be a foo_[ab].C kind, to test both sides of the
streaming. Bonus points for using the template after importing.  And you
need the dg-module-cmi annotation to check /and then delete/ the gcm file
produced.

nathan

On Thu, Sep 15, 2022, 22:16 Patrick Palka  wrote:

> A couple of xtreme-header-* modules tests began ICEing in C++23 mode
> ever since r13-2650-g5d84a4418aa962 introduced into  the
> dependently scoped friend declaration
>
>   friend /* typename */ _OuterIter::value_type;
>
> ultimately because the streaming code assumes a TYPE_P friend must
> be a class type, but here it's a TYPENAME_TYPE, which doesn't have
> a TEMPLATE_INFO or CLASSTYPE_BEFRIENDING_CLASSES.  This patch tries
> to correct this in a minimal way.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> gcc/cp/ChangeLog:
>
> * module.cc (friend_from_decl_list): Don't consider
> CLASSTYPE_TEMPLATE_INFO for a TYPENAME_TYPE friend.
> (trees_in::read_class_def): Don't add to
> CLASSTYPE_BEFRIENDING_CLASSES for a TYPENAME_TYPE friend.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/modules/typename-friend.C: New test.
> ---
>  gcc/cp/module.cc   | 5 +++--
>  gcc/testsuite/g++.dg/modules/typename-friend.C | 9 +
>  2 files changed, 12 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/typename-friend.C
>
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index f27f4d091e5..1a1ff5be574 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -4734,7 +4734,8 @@ friend_from_decl_list (tree frnd)
>if (TYPE_P (frnd))
> {
>   res = TYPE_NAME (frnd);
> - if (CLASSTYPE_TEMPLATE_INFO (frnd))
> + if (CLASS_TYPE_P (frnd)
> + && CLASSTYPE_TEMPLATE_INFO (frnd))
> tmpl = CLASSTYPE_TI_TEMPLATE (frnd);
> }
>else if (DECL_TEMPLATE_INFO (frnd))
> @@ -12121,7 +12122,7 @@ trees_in::read_class_def (tree defn, tree
> maybe_template)
> {
>   tree f = TREE_VALUE (friend_classes);
>
> - if (TYPE_P (f))
> + if (CLASS_TYPE_P (f))
> {
>   CLASSTYPE_BEFRIENDING_CLASSES (f)
> = tree_cons (NULL_TREE, type,
> diff --git a/gcc/testsuite/g++.dg/modules/typename-friend.C
> b/gcc/testsuite/g++.dg/modules/typename-friend.C
> new file mode 100644
> index 000..d8faf7955c3
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/typename-friend.C
> @@ -0,0 +1,9 @@
> +// { dg-additional-options "-fmodules-ts" }
> +
> +export module x;
> +
> +template
> +struct A {
> +  friend typename T::type;
> +  friend void f(A) { }
> +};
> --
> 2.37.3.662.g36f8e7ed7d
>
>

RE: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-16 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Kong, Lingling 
> Sent: Friday, September 16, 2022 3:40 PM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: RE: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]
> 
> Hi,
> 
> > >   machine_mode hvmode = (mode == V16HImode ? V8HImode
> > >  : mode == V16HFmode ? V8HFmode
> > > +: mode == V16BFmode ? V8BFmode
> > Can it be written as switch case?
> Sure, I fixed it in new patch. Thanks again for take a look.
> OK for master ?
+ switch (mode)
+   {
+ case V16HImode:
+   hvmode = V8HImode;
+   break;
+ case V16HFmode:
+   hvmode = V8HFmode;
+   break;
+ case V16BFmode:
+   hvmode = V8BFmode;
+   break;
+ case V32QImode:
+   hvmode = V16QImode;
+   break;
+ default:
+   gcc_unreachable ();
+   } > 

For the format, case aligns with {?
Others LGTM.

> Thanks,
> Lingling
> 
> > -Original Message-
> > From: Hongtao Liu 
> > Sent: Thursday, September 15, 2022 11:46 AM
> > To: Kong, Lingling 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> > Subject: Re: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]
> >
> > On Thu, Sep 15, 2022 at 11:36 AM Kong, Lingling via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > Hi
> > >
> > > The patch is to fix vec_init_dup_v16bf, add correct handle for v16bf
> > > mode in
> > ix86_expand_vector_init_duplicate.
> > > Add testcase with sse2 without avx2.
> > >
> > > OK for master?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/106887
> > > * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
> > > Fixed V16BF mode case.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/106887
> > > * gcc.target/i386/vect-bfloat16-2c.c: New test.
> > > ---
> > >  gcc/config/i386/i386-expand.cc|  1 +
> > >  .../gcc.target/i386/vect-bfloat16-2c.c| 76 +++
> > >  2 files changed, 77 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.cc
> > > b/gcc/config/i386/i386-expand.cc index d7b49c99dc8..9451c561489
> > > 100644
> > > --- a/gcc/config/i386/i386-expand.cc
> > > +++ b/gcc/config/i386/i386-expand.cc
> > > @@ -15111,6 +15111,7 @@ ix86_expand_vector_init_duplicate (bool
> > mmx_ok, machine_mode mode,
> > > {
> > >   machine_mode hvmode = (mode == V16HImode ? V8HImode
> > >  : mode == V16HFmode ? V8HFmode
> > > +: mode == V16BFmode ? V8BFmode
> > Can it be written as switch case?
> > >  : V16QImode);
> > >   rtx x = gen_reg_rtx (hvmode);
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > > b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > > new file mode 100644
> > > index 000..bead94e46a1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > > @@ -0,0 +1,76 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-mf16c -msse2 -mno-avx2 -O2" } */
> > > +
> > > +typedef __bf16 v8bf __attribute__ ((__vector_size__ (16))); typedef
> > > +__bf16 v16bf __attribute__ ((__vector_size__ (32)));
> > > +
> > > +#define VEC_EXTRACT(V,S,IDX)   \
> > > +  S\
> > > +  __attribute__((noipa))   \
> > > +  vec_extract_##V##_##IDX (V v)\
> > > +  {\
> > > +return v[IDX]; \
> > > +  }
> > > +
> > > +#define VEC_SET(V,S,IDX)   \
> > > +  V\
> > > +  __attribute__((noipa))   \
> > > +  vec_set_##V##_##IDX (V v, S s)   \
> > > +  {\
> > > +v[IDX] = s;\
> > > +return v;  \
> > > +  }
> > > +
> > > +v8bf
> > > +vec_init_v8bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> > > +  __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8) {
> > > +return __extension__ (v8bf) {a1, a2, a3, a4, a5, a6, a7, a8}; }
> > > +
> > > +v16bf
> > > +vec_init_v16bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> > > +  __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8,
> > > +  __bf16 a9,  __bf16 a10, __bf16 a11, __bf16 a12,
> > > +  __bf16 a13,  __bf16 a14, __bf16 a15, __bf16 a16) {
> > > +return __extension__ (v16bf) {a1, a2, a3, a4, a5, a6, a7, a8,
> > > + a9, a10, a11, a12, a13, a14, a15,
> > > +a16}; }
> > > +
> > > +v8bf
> > > +vec_init_dup_v8bf (__bf16 a1)
> > > +{
> >

RE: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-16 Thread Kong, Lingling via Gcc-patches

Hi,
 
> >   machine_mode hvmode = (mode == V16HImode ? V8HImode
> >  : mode == V16HFmode ? V8HFmode
> > +: mode == V16BFmode ? V8BFmode
> Can it be written as switch case?
Sure, I fixed it in new patch. Thanks again for take a look.
OK for master ?

Thanks,
Lingling

> -Original Message-
> From: Hongtao Liu 
> Sent: Thursday, September 15, 2022 11:46 AM
> To: Kong, Lingling 
> Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao 
> Subject: Re: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]
> 
> On Thu, Sep 15, 2022 at 11:36 AM Kong, Lingling via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi
> >
> > The patch is to fix vec_init_dup_v16bf, add correct handle for v16bf mode in
> ix86_expand_vector_init_duplicate.
> > Add testcase with sse2 without avx2.
> >
> > OK for master?
> >
> > gcc/ChangeLog:
> >
> > PR target/106887
> > * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
> > Fixed V16BF mode case.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/106887
> > * gcc.target/i386/vect-bfloat16-2c.c: New test.
> > ---
> >  gcc/config/i386/i386-expand.cc|  1 +
> >  .../gcc.target/i386/vect-bfloat16-2c.c| 76 +++
> >  2 files changed, 77 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> >
> > diff --git a/gcc/config/i386/i386-expand.cc
> > b/gcc/config/i386/i386-expand.cc index d7b49c99dc8..9451c561489 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -15111,6 +15111,7 @@ ix86_expand_vector_init_duplicate (bool
> mmx_ok, machine_mode mode,
> > {
> >   machine_mode hvmode = (mode == V16HImode ? V8HImode
> >  : mode == V16HFmode ? V8HFmode
> > +: mode == V16BFmode ? V8BFmode
> Can it be written as switch case?
> >  : V16QImode);
> >   rtx x = gen_reg_rtx (hvmode);
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > new file mode 100644
> > index 000..bead94e46a1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-bfloat16-2c.c
> > @@ -0,0 +1,76 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mf16c -msse2 -mno-avx2 -O2" } */
> > +
> > +typedef __bf16 v8bf __attribute__ ((__vector_size__ (16))); typedef
> > +__bf16 v16bf __attribute__ ((__vector_size__ (32)));
> > +
> > +#define VEC_EXTRACT(V,S,IDX)   \
> > +  S\
> > +  __attribute__((noipa))   \
> > +  vec_extract_##V##_##IDX (V v)\
> > +  {\
> > +return v[IDX]; \
> > +  }
> > +
> > +#define VEC_SET(V,S,IDX)   \
> > +  V\
> > +  __attribute__((noipa))   \
> > +  vec_set_##V##_##IDX (V v, S s)   \
> > +  {\
> > +v[IDX] = s;\
> > +return v;  \
> > +  }
> > +
> > +v8bf
> > +vec_init_v8bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> > +  __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8) {
> > +return __extension__ (v8bf) {a1, a2, a3, a4, a5, a6, a7, a8}; }
> > +
> > +v16bf
> > +vec_init_v16bf (__bf16 a1, __bf16 a2, __bf16 a3, __bf16 a4,
> > +  __bf16 a5,  __bf16 a6, __bf16 a7, __bf16 a8,
> > +  __bf16 a9,  __bf16 a10, __bf16 a11, __bf16 a12,
> > +  __bf16 a13,  __bf16 a14, __bf16 a15, __bf16 a16) {
> > +return __extension__ (v16bf) {a1, a2, a3, a4, a5, a6, a7, a8,
> > + a9, a10, a11, a12, a13, a14, a15,
> > +a16}; }
> > +
> > +v8bf
> > +vec_init_dup_v8bf (__bf16 a1)
> > +{
> > +return __extension__ (v8bf) {a1, a1, a1, a1, a1, a1, a1, a1}; }
> > +
> > +v16bf
> > +vec_init_dup_v16bf (__bf16 a1)
> > +{
> > +return __extension__ (v16bf) {a1, a1, a1, a1, a1, a1, a1, a1,
> > + a1, a1, a1, a1, a1, a1, a1, a1}; }
> > +
> > +/* { dg-final { scan-assembler-times "vpunpcklwd" 12 } } */
> > +/* { dg-final { scan-assembler-times "vpunpckldq" 6 } } */
> > +/* { dg-final { scan-assembler-times "vpunpcklqdq" 3 } } */
> > +
> > +VEC_EXTRACT (v8bf, __bf16, 0);
> > +VEC_EXTRACT (v8bf, __bf16, 4);
> > +VEC_EXTRACT (v16bf, __bf16, 0);
> > +VEC_EXTRACT (v16bf, __bf16, 3);
> > +VEC_EXTRACT (v16bf, __bf16, 8);
> > +VEC_EXTRACT (v16bf, __bf16, 15);
> > +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$8" 1 } } */
> > +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$6" 1 } } */
> > +/* { dg-final { scan-assembler-times "vpsrldq\[\t ]*\\\$14" 1 } } */
> > +/* { dg-final { scan-assembler-times

Re: [PATCH] Modernize ix86_builtin_vectorized_function with corresponding expanders.

2022-09-16 Thread Uros Bizjak via Gcc-patches

On Fri, Sep 16, 2022 at 2:55 AM liuhongt via Gcc-patches
 wrote:
>
> For ifloor/lfloor/iceil/lceil/irint/lrint/iround/lround when size of
> in_mode is not equal out_mode, vectorizer doesn't go to internal fn
> way,still left that part in the ix86_builtin_vectorized_function.
>
> Remove others builtins and add corresponding expanders.
> Note the patch just refactor the codes, doesn't solve the related case
> in the PR which needs extra expander for 64-bit vector.
>
> Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> gcc/ChangeLog:
>
> PR target/106910
> * config/i386/i386-builtins.cc
> (ix86_builtin_vectorized_function): Modernized with
> corresponding expanders.
> * config/i386/sse.md (lrint2): New
> expander.
> (floor2): Ditto.
> (lfloor2): Ditto.
> (ceil2): Ditto.
> (lceil2): Ditto.
> (btrunc2): Ditto.
> (lround2): Ditto.
> (exp22): Ditto.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-builtins.cc | 185 +--
>  gcc/config/i386/sse.md   |  80 +
>  2 files changed, 84 insertions(+), 181 deletions(-)
>
> diff --git a/gcc/config/i386/i386-builtins.cc 
> b/gcc/config/i386/i386-builtins.cc
> index 6a04fb57e65..af2faee245b 100644
> --- a/gcc/config/i386/i386-builtins.cc
> +++ b/gcc/config/i386/i386-builtins.cc
> @@ -1540,21 +1540,16 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>
>switch (fn)
>  {
> -CASE_CFN_EXP2:
> -  if (out_mode == SFmode && in_mode == SFmode)
> -   {
> - if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_EXP2PS);
> -   }
> -  break;
> -
>  CASE_CFN_IFLOOR:
>  CASE_CFN_LFLOOR:
> -CASE_CFN_LLFLOOR:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
>
> +  /* PR106910, currently vectorizer doesn't go direct internal fn way
> +when out_n != in_n, so let's still keep this.
> +Otherwise, it relies on expander of
> +lceilmn2/lfloormn2/lroundmn2/lrintmn2.  */
>if (out_mode == SImode && in_mode == DFmode)
> {
>   if (out_n == 4 && in_n == 2)
> @@ -1564,20 +1559,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_FLOORPS_SFIX512);
> -   }
>break;
>
>  CASE_CFN_ICEIL:
>  CASE_CFN_LCEIL:
> -CASE_CFN_LLCEIL:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
> @@ -1591,20 +1576,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_CEILPS_SFIX512);
> -   }
>break;
>
>  CASE_CFN_IRINT:
>  CASE_CFN_LRINT:
> -CASE_CFN_LLRINT:
>if (out_mode == SImode && in_mode == DFmode)
> {
>   if (out_n == 4 && in_n == 2)
> @@ -1614,20 +1589,10 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 && in_n == 8)
> return ix86_get_builtin (IX86_BUILTIN_VEC_PACK_SFIX512);
> }
> -  if (out_mode == SImode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_CVTPS2DQ512);
> -   }
>break;
>
>  CASE_CFN_IROUND:
>  CASE_CFN_LROUND:
> -CASE_CFN_LLROUND:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_SSE4_1)
> break;
> @@ -1641,150 +1606,8 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>   else if (out_n == 16 &&

Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-16 Thread Uros Bizjak via Gcc-patches

On Fri, Sep 16, 2022 at 3:32 AM Jeff Law via Gcc-patches
 wrote:
>
>
> On 9/15/22 19:06, liuhongt via Gcc-patches wrote:
> > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
> > reg + test reg, reg. I don't know exact reason why gcc do this.
> >
> > For latest x86 processors, ciscization should help processor frontend
> > also codesize, for processor backend, they should be the same(has same
> > uops).
> >
> > So the patch deleted the peephole2, and also modify another splitter to
> > generate more cmp mem, 0 for 32-bit target.
> >
> > It will help instruction fetch.
> >
> > for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan 
> > there's no
> > comparison to 1 or -1, so adjust the testcase since under 32-bit
> > target, we now generate cmp mem, 0 instead of load + test.
> >
> > Similar for pr78035.c.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > No performance impact for SPEC2017 on ICX/Znver3.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >   * config/i386/i386.md (*3_1): Replace
> >   register_operand with nonimmediate_operand for operand 1. Also
> >   force_reg it when mode is QImode.
> >   (define_peephole2): Deleted related peephole2.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/minmax-1.c: Scan-assemble-not for cmp with 1
> >   or -1, also don't scan-assembler test for ia32.
> >   * gcc.target/i386/minmax-10.c: Ditto.
> >   * gcc.target/i386/minmax-2.c: Ditto.
> >   * gcc.target/i386/pr78035.c: Ditto.
> >   * gcc.target/i386/pr96861.c: Scan either cmp or test 3 times.
>
> It was almost certainly for PPro/P2 given it was rth's work from
> 1999.Probably should have been conditionalized on PPro/P2 at the
> time.   No worries losing it now...

Please add a tune flag in x86-tune.def under "Historical relics" and
use it in the relevant peephole2 instead of deleting it.

Uros.

[PATCH] [x86] Adjust issue_rate for latest Intel processors.

2022-09-16 Thread liuhongt via Gcc-patches

For Skylake based processor, decoder is 4-way.
For Sunny Cove and Willow Cove, decoder is 5-way.
For Golden cove, decoder is 6-way.

Bootstrapped and regtested on x86-64-pc-linux-gnu{-m32,}.
Ready to install.

gcc/ChangeLog:

* config/i386/x86-tune-sched.cc (ix86_issue_rate): Adjust for
latest Intel processors.
---
 gcc/config/i386/x86-tune-sched.cc | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/i386/x86-tune-sched.cc 
b/gcc/config/i386/x86-tune-sched.cc
index 1ffaeef037c..e2765f81902 100644
--- a/gcc/config/i386/x86-tune-sched.cc
+++ b/gcc/config/i386/x86-tune-sched.cc
@@ -73,10 +73,24 @@ ix86_issue_rate (void)
 case PROCESSOR_SANDYBRIDGE:
 case PROCESSOR_HASWELL:
 case PROCESSOR_TREMONT:
+case PROCESSOR_SKYLAKE:
+case PROCESSOR_SKYLAKE_AVX512:
+case PROCESSOR_CASCADELAKE:
+case PROCESSOR_CANNONLAKE:
 case PROCESSOR_ALDERLAKE:
 case PROCESSOR_GENERIC:
   return 4;
 
+case PROCESSOR_ICELAKE_CLIENT:
+case PROCESSOR_ICELAKE_SERVER:
+case PROCESSOR_TIGERLAKE:
+case PROCESSOR_COOPERLAKE:
+case PROCESSOR_ROCKETLAKE:
+  return 5;
+
+case PROCESSOR_SAPPHIRERAPIDS:
+  return 6;
+
 default:
   return 1;
 }
-- 
2.18.1

56 matches

Mail list logo