[pushed] c++: C++20 class NTTP trailing zero-init [PR100079]

2021-04-15 Thread Jason Merrill via Gcc-patches
The new testcase was breaking because constexpr evaluation was simplifying
Bar{Baz<42>{}} to Bar{empty}, but then we weren't treating them as
equivalent.  Poking at this revealed that the code for eliding trailing
zero-initialization in class non-type template argument mangling was pretty
broken, including the test, mangle71.

I dealt with the FIXME to support RANGE_EXPR, and fixed the confusion
between a list-initialized temporary mangled as written (i.e. in the
signature of a function template) and a template parameter object mangled as
the value representation of the object.  I'm distinguishing between these
using COMPOUND_LITERAL_P.  A later patch will adjust the use of
COMPOUND_LITERAL_P to be more useful for this distinction, but it works now
for distinguishing these cases in mangling.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/100079
* cp-tree.h (first_field): Declare.
* mangle.c (range_expr_nelts): New.
(write_expression): Improve class NTTP mangling.
* pt.c (get_template_parm_object): Clear TREE_HAS_CONSTRUCTOR.
* tree.c (zero_init_expr_p): Improve class NTTP handling.
* decl.c: Adjust comment.

gcc/testsuite/ChangeLog:

PR c++/100079
* g++.dg/abi/mangle71.C: Fix expected mangling.
* g++.dg/abi/mangle77.C: New test.
* g++.dg/cpp2a/nontype-class-union1.C: Likewise.
* g++.dg/cpp2a/nontype-class-equiv1.C: Removed.
* g++.dg/cpp2a/nontype-class44.C: New test.
---
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/decl.c |  2 +-
 gcc/cp/mangle.c   | 40 ++-
 gcc/cp/pt.c   |  3 ++
 gcc/cp/tree.c | 28 -
 gcc/testsuite/g++.dg/abi/mangle71.C   | 12 +++---
 gcc/testsuite/g++.dg/abi/mangle77.C   | 31 ++
 .../g++.dg/cpp2a/nontype-class-equiv1.C   | 25 
 .../g++.dg/cpp2a/nontype-class-union1.C   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/nontype-class44.C  | 25 
 10 files changed, 117 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/mangle77.C
 delete mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class-equiv1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class44.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index e42b82ae5a4..23a77a2b2e0 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6695,6 +6695,7 @@ extern void initialize_artificial_var (tree, 
vec *);
 extern tree check_var_type (tree, tree, location_t);
 extern tree reshape_init(tree, tree, tsubst_flags_t);
 extern tree next_initializable_field (tree);
+extern tree first_field(const_tree);
 extern tree fndecl_declared_return_type(tree);
 extern bool undeduced_auto_decl(tree);
 extern bool require_deduced_type   (tree, tsubst_flags_t = 
tf_warning_or_error);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1cb47313923..d40b7a7da5f 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6152,7 +6152,7 @@ struct reshape_iter
 
 static tree reshape_init_r (tree, reshape_iter *, tree, tsubst_flags_t);
 
-/* FIELD is a FIELD_DECL or NULL.  In the former case, the value
+/* FIELD is an element of TYPE_FIELDS or NULL.  In the former case, the value
returned is the next FIELD_DECL (possibly FIELD itself) that can be
initialized.  If there are no more such fields, the return value
will be NULL.  */
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 4399165ee23..49f1266bef3 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2940,6 +2940,16 @@ write_base_ref (tree expr, tree base = NULL_TREE)
   return true;
 }
 
+/* The number of elements spanned by a RANGE_EXPR.  */
+
+unsigned HOST_WIDE_INT
+range_expr_nelts (tree expr)
+{
+  tree lo = TREE_OPERAND (expr, 0);
+  tree hi = TREE_OPERAND (expr, 1);
+  return tree_to_uhwi (hi) - tree_to_uhwi (lo) + 1;
+}
+
 /*  ::=  
::=   
::= 
@@ -3284,8 +3294,14 @@ write_expression (tree expr)
  write_type (etype);
}
 
-  bool nontriv = !trivial_type_p (etype);
-  if (nontriv || !zero_init_expr_p (expr))
+  /* If this is an undigested initializer, mangle it as written.
+COMPOUND_LITERAL_P doesn't actually distinguish between digested and
+undigested braced casts, but it should work to use it to distinguish
+between braced casts in a template signature (undigested) and template
+parm object values (digested), and all CONSTRUCTORS that get here
+should be one of those two cases.  */
+  bool undigested = braced_init || COMPOUND_LITERAL_P (expr);
+  if (undigested || !zero_init_expr_p (expr))
{
  /* Convert braced initializer lists to STRING_CSTs so that
 

Re: [PATCH] c++: ICE with bogus late return type [PR99803]

2021-04-15 Thread Jason Merrill via Gcc-patches

On 4/15/21 10:02 PM, Marek Polacek wrote:

On Thu, Apr 15, 2021 at 03:31:24PM -0400, Jason Merrill wrote:

On 4/14/21 9:21 PM, Marek Polacek wrote:

Here we ICE when compiling this code in C++20, because we're trying to
slam a 'typename' after the ->.  The cp_parser_template_id call just
before the spot I'm changing parsed A::template A as a BASELINK
that contains a constructor, but make_typename_type crashes on that.

My fix is the same as c++/88325, add an is_overloaded_fn check.


Instead of handling this in various callers, maybe make_typename_type should
handle the error, like it already does for e.g.

   struct A {  template  static T t; };
   A(unsigned) -> A:: template t;


Okay, done.
  

Incidentally, in the testcase for 88325:


A::A () // { dg-error "partial specialization" }


this error is misleading; this isn't a partial specialization, it's
redundant template arguments while trying to define the primary template.  I
wouldn't worry about fixing the compiler now, but let's not test for the
wrong error.


Makes sense.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
Here we ICE when compiling this code in C++20, because we're trying to
slam a 'typename' after the ->.  The cp_parser_template_id call just
before the spot I'm changing parsed A::template A as a BASELINK
that contains a constructor, but make_typename_type crashes on that.

This patch makes make_typename_type more robust instead of checking
for is_overloaded_fn prior calling it.

gcc/cp/ChangeLog:

PR c++/99803
* decl.c (make_typename_type): Give an error and return when
name is is_overloaded_fn.
* parser.c (cp_parser_class_name): Don't check is_overloaded_fn
before calling make_typename_type.

gcc/testsuite/ChangeLog:

PR c++/99803
* g++.dg/cpp2a/typename14.C: Don't expect particular error
messages.
* g++.dg/cpp2a/typename19.C: New test.
---
  gcc/cp/decl.c   | 8 +++-
  gcc/cp/parser.c | 4 +---
  gcc/testsuite/g++.dg/cpp2a/typename14.C | 4 ++--
  gcc/testsuite/g++.dg/cpp2a/typename19.C | 5 +
  4 files changed, 15 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/typename19.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1cb47313923..6668a65acbf 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4055,6 +4055,12 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
error ("%qD used without template arguments", name);
return error_mark_node;
  }
+  else if (is_overloaded_fn (name))
+{
+  if (complain & tf_error)
+   error ("%qD is a function, not a type", name);
+  return error_mark_node;
+}
gcc_assert (identifier_p (name));
gcc_assert (TYPE_P (context));
  
@@ -4066,7 +4072,7 @@ make_typename_type (tree context, tree name, enum tag_types tag_type,

error ("%q#T is not a class", context);
return error_mark_node;
  }
-
+
/* When the CONTEXT is a dependent type,  NAME could refer to a
   dependent base class of CONTEXT.  But look inside it anyway
   if CONTEXT is a currently open scope, in case it refers to a
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 940751b5f05..3640ae51331 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -24730,9 +24730,7 @@ cp_parser_class_name (cp_parser *parser,
decl = cp_parser_maybe_treat_template_as_class (decl, class_head_p);
  
/* If this is a typename, create a TYPENAME_TYPE.  */

-  if (typename_p
-  && decl != error_mark_node
-  && !is_overloaded_fn (decl))
+  if (typename_p && decl != error_mark_node)
  {
decl = make_typename_type (scope, decl, typename_type,
 /*complain=*/tf_error);
diff --git a/gcc/testsuite/g++.dg/cpp2a/typename14.C 
b/gcc/testsuite/g++.dg/cpp2a/typename14.C
index 8d82b6b8d34..ba7dad8245f 100644
--- a/gcc/testsuite/g++.dg/cpp2a/typename14.C
+++ b/gcc/testsuite/g++.dg/cpp2a/typename14.C
@@ -8,7 +8,7 @@ template struct A
  
  template

  template
-A::A () // { dg-error "partial specialization" }
+A::A () // { dg-error "" }
  {
  }
  
@@ -19,7 +19,7 @@ template struct B
  
  template

  template
-B::foo(int) // { dg-error "partial specialization|declaration" }
+B::foo(int) // { dg-error "" }
  {
return 1;
  }
diff --git a/gcc/testsuite/g++.dg/cpp2a/typename19.C 
b/gcc/testsuite/g++.dg/cpp2a/typename19.C
new file mode 100644
index 000..320a14d6a0c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/typename19.C
@@ -0,0 +1,5 @@
+// PR c++/99803
+// { dg-do compile { target c++20 } }
+
+struct A { template A(T); };
+auto A(unsigned) -> A::template A; // { dg-error "not a type" }

base-commit: ee351f7fdbd82f8947fe9a0e74cea65d216a8549





Re: [PATCH] c++: ICE with bogus late return type [PR99803]

2021-04-15 Thread Marek Polacek via Gcc-patches
On Thu, Apr 15, 2021 at 03:31:24PM -0400, Jason Merrill wrote:
> On 4/14/21 9:21 PM, Marek Polacek wrote:
> > Here we ICE when compiling this code in C++20, because we're trying to
> > slam a 'typename' after the ->.  The cp_parser_template_id call just
> > before the spot I'm changing parsed A::template A as a BASELINK
> > that contains a constructor, but make_typename_type crashes on that.
> > 
> > My fix is the same as c++/88325, add an is_overloaded_fn check.
> 
> Instead of handling this in various callers, maybe make_typename_type should
> handle the error, like it already does for e.g.
> 
>   struct A {  template  static T t; };
>   A(unsigned) -> A:: template t;

Okay, done.
 
> Incidentally, in the testcase for 88325:
> 
> > A::A () // { dg-error "partial specialization" }
> 
> this error is misleading; this isn't a partial specialization, it's
> redundant template arguments while trying to define the primary template.  I
> wouldn't worry about fixing the compiler now, but let's not test for the
> wrong error.

Makes sense.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we ICE when compiling this code in C++20, because we're trying to
slam a 'typename' after the ->.  The cp_parser_template_id call just
before the spot I'm changing parsed A::template A as a BASELINK
that contains a constructor, but make_typename_type crashes on that.

This patch makes make_typename_type more robust instead of checking
for is_overloaded_fn prior calling it.

gcc/cp/ChangeLog:

PR c++/99803
* decl.c (make_typename_type): Give an error and return when
name is is_overloaded_fn.
* parser.c (cp_parser_class_name): Don't check is_overloaded_fn
before calling make_typename_type.

gcc/testsuite/ChangeLog:

PR c++/99803
* g++.dg/cpp2a/typename14.C: Don't expect particular error
messages.
* g++.dg/cpp2a/typename19.C: New test.
---
 gcc/cp/decl.c   | 8 +++-
 gcc/cp/parser.c | 4 +---
 gcc/testsuite/g++.dg/cpp2a/typename14.C | 4 ++--
 gcc/testsuite/g++.dg/cpp2a/typename19.C | 5 +
 4 files changed, 15 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/typename19.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1cb47313923..6668a65acbf 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4055,6 +4055,12 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
error ("%qD used without template arguments", name);
   return error_mark_node;
 }
+  else if (is_overloaded_fn (name))
+{
+  if (complain & tf_error)
+   error ("%qD is a function, not a type", name);
+  return error_mark_node;
+}
   gcc_assert (identifier_p (name));
   gcc_assert (TYPE_P (context));
 
@@ -4066,7 +4072,7 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
error ("%q#T is not a class", context);
   return error_mark_node;
 }
-  
+
   /* When the CONTEXT is a dependent type,  NAME could refer to a
  dependent base class of CONTEXT.  But look inside it anyway
  if CONTEXT is a currently open scope, in case it refers to a
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 940751b5f05..3640ae51331 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -24730,9 +24730,7 @@ cp_parser_class_name (cp_parser *parser,
   decl = cp_parser_maybe_treat_template_as_class (decl, class_head_p);
 
   /* If this is a typename, create a TYPENAME_TYPE.  */
-  if (typename_p
-  && decl != error_mark_node
-  && !is_overloaded_fn (decl))
+  if (typename_p && decl != error_mark_node)
 {
   decl = make_typename_type (scope, decl, typename_type,
 /*complain=*/tf_error);
diff --git a/gcc/testsuite/g++.dg/cpp2a/typename14.C 
b/gcc/testsuite/g++.dg/cpp2a/typename14.C
index 8d82b6b8d34..ba7dad8245f 100644
--- a/gcc/testsuite/g++.dg/cpp2a/typename14.C
+++ b/gcc/testsuite/g++.dg/cpp2a/typename14.C
@@ -8,7 +8,7 @@ template struct A
 
 template
 template
-A::A () // { dg-error "partial specialization" }
+A::A () // { dg-error "" }
 {
 }
 
@@ -19,7 +19,7 @@ template struct B
 
 template
 template
-B::foo(int) // { dg-error "partial specialization|declaration" }
+B::foo(int) // { dg-error "" }
 {
   return 1;
 }
diff --git a/gcc/testsuite/g++.dg/cpp2a/typename19.C 
b/gcc/testsuite/g++.dg/cpp2a/typename19.C
new file mode 100644
index 000..320a14d6a0c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/typename19.C
@@ -0,0 +1,5 @@
+// PR c++/99803
+// { dg-do compile { target c++20 } }
+
+struct A { template A(T); };
+auto A(unsigned) -> A::template A; // { dg-error "not a type" }

base-commit: ee351f7fdbd82f8947fe9a0e74cea65d216a8549
-- 
2.30.2



Re: [PATCH v10] Practical improvement to libgcc complex divide

2021-04-15 Thread Patrick McGehearty via Gcc-patches

- ping

[A sincere and special thanks for the sustained perseverance of the
reviewers in pointing me in the proper direction to get this patch
polished. The original proposal was June 1, 2020 and only covered
double precision. The current version is dramatically better, both
from extending coverage to most precisions, improving the computation
for accuracy and speed, and from improving the code maintainability.
- Patrick McGehearty]


On 4/7/2021 3:21 PM, Patrick McGehearty via Gcc-patches wrote:

Changes in this version from Version 9:

Replaced all uses of alloca with XALLOCAVEC in
c_cpp_builtins() in c-cppbuiltin.c
Did not replace alloca elsewhere in the same file.

Fixed type of name_len.
Fixed prototypes for abort & exit in new test programs.
Fixed spelling errors and omitted words in comments.
Changed XMTYPE to AMTYPE to avoid confusion with extended types.
Removed XCTYPE as unused. (A for additional)

Correctness and performance test programs used during development of
this project may be found in the attachment to:
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html

Summary of Purpose

This patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test data by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rare and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Further investigation showed changing the half precision algorithm
to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no
loss of precision and modest improvement in performance.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.  This
use is made generic by defining appropriate __LIBGCC2_* macros in
c-cppbuiltin.c.

Tests are added for when both parts of the denominator have exponents
small enough 

Re: [PATCH] c++: Fix up handling of structured bindings in extract_locals_r [PR99833]

2021-04-15 Thread Jason Merrill via Gcc-patches

On 4/14/21 3:10 PM, Jakub Jelinek wrote:

Hi!

The following testcase ICEs in tsubst_decomp_names because the assumptions
that the structured binding artificial var is followed in DECL_CHAIN by
the corresponding structured binding vars is violated.
I've tracked it to extract_locals* which is done for the constexpr
IF_STMT.  extract_locals_r when it sees a DECL_EXPR adds that decl
into a hash set so that such decls aren't returned from extract_locals*,
but in the case of a structured binding that just means the artificial var
and not the vars corresponding to structured binding identifiers.
The following patch fixes it by pushing not just the artificial var
for structured bindings but also the other vars.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2021-04-14  Jakub Jelinek  

PR c++/99833
* pt.c (extract_locals_r): When handling DECL_EXPR of a structured
binding, add to data.internal also all corresponding structured
binding decls.

* g++.dg/cpp1z/pr99833.C: New test.
* g++.dg/cpp2a/pr99833.C: New test.

--- gcc/cp/pt.c.jj  2021-04-14 10:48:41.322103670 +0200
+++ gcc/cp/pt.c 2021-04-14 12:52:53.116896754 +0200
@@ -12811,7 +12811,27 @@ extract_locals_r (tree *tp, int */*walk_
  tp = _NAME (*tp);
  
if (TREE_CODE (*tp) == DECL_EXPR)

-data.internal.add (DECL_EXPR_DECL (*tp));
+{
+  tree decl = DECL_EXPR_DECL (*tp);
+  data.internal.add (decl);
+  if (VAR_P (decl)
+ && DECL_DECOMPOSITION_P (decl)
+ && TREE_TYPE (decl) != error_mark_node)
+   {
+ gcc_assert (DECL_NAME (decl) == NULL_TREE);
+ for (tree decl2 = DECL_CHAIN (decl);
+  decl2
+  && VAR_P (decl2)
+  && DECL_DECOMPOSITION_P (decl2)
+  && DECL_NAME (decl2)
+  && TREE_TYPE (decl2) != error_mark_node;
+  decl2 = DECL_CHAIN (decl2))
+   {
+ gcc_assert (DECL_DECOMP_BASE (decl2) == decl);
+ data.internal.add (decl2);
+   }
+   }
+}
else if (TREE_CODE (*tp) == LAMBDA_EXPR)
  {
/* Since we defer implicit capture, look in the parms and body.  */
--- gcc/testsuite/g++.dg/cpp1z/pr99833.C.jj 2021-04-14 13:03:14.654879632 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/pr99833.C2021-04-14 13:03:39.599598004 
+0200
@@ -0,0 +1,11 @@
+// PR c++/99833
+// { dg-do compile { target c++17 } }
+
+struct S { int a, b; };
+template 
+void
+foo ()
+{
+  [](auto d) { if constexpr (auto [a, b]{d}; sizeof (a) > 0) a++; } (S{});
+}
+template void foo ();
--- gcc/testsuite/g++.dg/cpp2a/pr99833.C.jj 2021-04-14 13:04:08.975266383 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/pr99833.C2021-04-14 13:04:23.191105881 
+0200
@@ -0,0 +1,18 @@
+// PR c++/99833
+// { dg-do compile { target c++20 } }
+
+#include 
+
+auto f(auto&& x)
+{
+  [&](auto...) {
+auto y = std::tuple{ "what's happening here?", x };
+if constexpr (auto [_, z] = y; requires { z; })
+  return;
+  }();
+}
+
+int main()
+{
+  f(42);
+}

Jakub





[PATCH] PR fortran/63797 - Bogus ambiguous reference to 'sqrt'

2021-04-15 Thread Harald Anlauf via Gcc-patches
Hello everybody,

we currently write the interface for intrinsic procedures to module
files although that should not be necessary.  (F2018:15.4.2.1 actually
states that interfaces e.g. of intrinsic procedures are 'explicit'.)
This lead to bogus errors due to an apparently bogus ambiguity.
A simple solution is to just avoid writing that (redundant) information
to the module file.

Regtested on x86_64-pc-linux-gnu.  OK for (current) mainline?
Or rather wait after 11 release?

Thanks,
Harald


PR fortran/63797 - Bogus ambiguous reference to 'sqrt'

The interface of an intrinsic procedure is automatically explicit.
Do not write it to the module file.

gcc/fortran/ChangeLog:

* module.c (write_symtree): Do not write interface of intrinsic
procedure to module file.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr63797.f90: New test.

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 4db0a3ac76d..b4b7b437f86 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -6218,6 +6218,9 @@ write_symtree (gfc_symtree *st)
   if (check_unique_name (st->name))
 return;

+  if (strcmp (sym->module, "(intrinsic)") == 0)
+return;
+
   p = find_pointer (sym);
   if (p == NULL)
 gfc_internal_error ("write_symtree(): Symbol not written");
diff --git a/gcc/testsuite/gfortran.dg/pr63797.f90 b/gcc/testsuite/gfortran.dg/pr63797.f90
new file mode 100644
index 000..1131e8167b1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr63797.f90
@@ -0,0 +1,60 @@
+! { dg-do compile }
+! PR63797 - Bogus ambiguous reference to 'sqrt'
+
+module mod1
+  implicit none
+  real, parameter :: z = sqrt (0.0)
+  real:: w = sqrt (1.0)
+  interface
+ pure real function sqrt_ifc (x)
+   real, intent(in) :: x
+ end function sqrt_ifc
+  end interface
+contains
+  pure function myroot () result (f)
+procedure(sqrt_ifc), pointer :: f
+intrinsic :: sqrt
+f => sqrt
+  end function myroot
+end module mod1
+
+module mod2
+  implicit none
+  type t
+ real :: a = 0.
+  end type
+  interface sqrt
+ module procedure sqrt
+  end interface
+contains
+  elemental function sqrt (a)
+type(t), intent(in) :: a
+type(t) :: sqrt
+sqrt% a = a% a
+  end function sqrt
+end module mod2
+
+module mod3
+  implicit none
+  abstract interface
+ function real_func (x)
+   real  :: real_func
+   real, intent (in) :: x
+ end function real_func
+  end interface
+  intrinsic :: sqrt
+  procedure(real_func), pointer :: real_root => sqrt
+end module mod3
+
+program test
+  use mod1
+  use mod2
+  use mod3
+  implicit none
+  type(t) :: x, y
+  procedure(sqrt_ifc), pointer :: root
+  root => myroot ()
+  y= sqrt (x)
+  y% a = sqrt (x% a) + z - w + root (x% a)
+  y% a = real_root (x% a)
+end program test


Committed: gcc.dg/pr84877.c: Xfail for cris-*-*

2021-04-15 Thread Hans-Peter Nilsson via Gcc-patches
Unfortunately it appears that this PR is on nobody's radar.
Xfailing it to get an arguably artificial zero regression
state (since T0=2007-01-05) helps my autotester.

Caveat: the pass/fail state of this test, as long as stack
alignment isn't adjusted, is dependent on the alignment of
the stack at the entry of main, so depending on the target,
e.g. the size and number of environment variables at
invocation time can affect the result (including simulator
runs where environment variables are propagated to the
target).

gcc/testsuite:
PR middle-end/84877
* gcc.dg/pr84877.c: Xfail for cris-*-*.
---
 gcc/testsuite/gcc.dg/pr84877.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr84877.c b/gcc/testsuite/gcc.dg/pr84877.c
index 8a34dd4fb66d..8551d27bcbb8 100644
--- a/gcc/testsuite/gcc.dg/pr84877.c
+++ b/gcc/testsuite/gcc.dg/pr84877.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { xfail { cris-*-* } } } */
 /* { dg-options "-O2" } */
 
 #include 
-- 
2.11.0



[pushed] c++: constexpr and volatile member function [PR80456]

2021-04-15 Thread Jason Merrill via Gcc-patches
When calling a static member function we still need to evaluate an explicit
object argument.  But we don't want to force a load of the entire object
if the argument is volatile, so we take its address.  If as a result it no
longer has any side-effects, we don't need to evaluate it after all.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/80456
* call.c (build_new_method_call_1): Check again for side-effects
with a volatile object.

gcc/testsuite/ChangeLog:

PR c++/80456
* g++.dg/cpp0x/constexpr-volatile3.C: New test.
---
 gcc/cp/call.c|  3 ++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-volatile3.C | 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-volatile3.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index c9a8c0d305f..678e120a165 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -10793,7 +10793,8 @@ build_new_method_call_1 (tree instance, tree fns, 
vec **args,
  tree a = instance;
  if (TREE_THIS_VOLATILE (a))
a = build_this (a);
- call = build2 (COMPOUND_EXPR, TREE_TYPE (call), a, call);
+ if (TREE_SIDE_EFFECTS (a))
+   call = build2 (COMPOUND_EXPR, TREE_TYPE (call), a, call);
}
  else if (call != error_mark_node
   && DECL_DESTRUCTOR_P (cand->fn)
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-volatile3.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-volatile3.C
new file mode 100644
index 000..5c1e865e0ac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-volatile3.C
@@ -0,0 +1,15 @@
+// PR c++/80456
+// { dg-do compile { target c++11 } }
+
+struct A {
+  static constexpr bool test() noexcept { return true; }
+
+  void f() volatile {
+constexpr bool b = test();
+  }
+};
+
+void g() {
+  A a;
+  a.f();
+}

base-commit: 2efbbba16a0630fac8cadcd6d9e0ffaabfadb79f
-- 
2.27.0



Re: [PATCH] propagate attributes to local redeclaration (PR 99420)

2021-04-15 Thread Joseph Myers
On Thu, 8 Apr 2021, Martin Sebor via Gcc-patches wrote:

> There's another similar piece of code in pushdecl() that I didn't
> touch, although  I couldn't come up with a test case showing it's
> necessary.  Both hunks go back ages so I wonder if they might have
> been obviated by other improvements.

The other similar code in pushdecl is executed in cases where there are 
multiple declarations of the same identifier in the same scope, e.g.:

int f (void);
void
g (void)
{
  int f (void);
  int f (void);
}

That particular example isn't interesting, but the idea is that the type 
the declaration ends up getting is based on only visible type information 
(if an intermediate scope had a variable "int f;" with automatic scope, 
for example, the file-scope declaration wouldn't be visible, so the type 
within the scope of the inner declarations should be the composite only of 
the types of those declarations and not that of the file-scope 
declaration).

I expect that the attribute handling currently there for built-in 
functions only is because there were problems in some cases if a built-in 
function were referenced without its built-in attributes.  As the 
attributes don't affect the function type in standard terms, it's 
certainly OK, and improves diagnostic quality, to include attributes from 
declarations that aren't visible.

It's possible that the piece of code you're changing always ensures that 
the attributes are copied from the built-in function to the first 
declaration in the inner scope (even when any file scope / external scope 
declaration is shadowed), and, if composite_type and duplicate_decls 
always preserve attributes, this might mean the code you're not changing 
doesn't actually need its attribute handling because the attributes are 
always present (for all functions, given your patch, or for built-in 
functions, without it) even without that handling.  So if something 
changed that did make that code unnecessary, it might have been a fix in 
composite_type or duplicate_decls.

The patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[committed] add tests for Bug 89230

2021-04-15 Thread Martin Sebor via Gcc-patches

The false positives have disappeared thanks to
g:520d5ad337eaa15860a5a964daf7ca46cf31c029.  I have added the two
test cases in the attached diff in r11-8202 after testing on aarch64,
arm, powerpc64le, and x86_64, out of an abundance of caution.

Martin
commit 2dbbbe893f75f587c48111ab4c97cf5e74fb91bb
Author: Martin Sebor 
Date:   Thu Apr 15 14:09:56 2021 -0600

PR middle-end/89230 - Bogus uninited usage warning with printf

gcc/testsuite/ChangeLog:
* gcc.dg/uninit-pr89230-1.c: New test.
* gcc.dg/uninit-pr89230-2.c: Same.

diff --git a/gcc/testsuite/gcc.dg/uninit-pr89230-1.c b/gcc/testsuite/gcc.dg/uninit-pr89230-1.c
new file mode 100644
index 000..1c07c4f6d78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr89230-1.c
@@ -0,0 +1,25 @@
+/* PR middle-end/89230 - Bogus uninited usage warning with printf
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+struct S { int i, j; };
+
+/* attribute__ ((malloc)) */ struct S* f (void);
+
+int g (void)
+{
+  struct S *p = f (), *q;
+
+  if (p->i || !(q = f ()) || p->j != q->i)
+   {
+ __builtin_printf ("%i", p->i);
+
+ if (p->i)
+   return 1;
+
+ if (!q)// { dg-bogus "\\\[-Wmaybe-uninitialized" }
+   return 2;
+   }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/uninit-pr89230-2.c b/gcc/testsuite/gcc.dg/uninit-pr89230-2.c
new file mode 100644
index 000..473d2da5d3d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr89230-2.c
@@ -0,0 +1,54 @@
+/* PR middle-end/89230 - Bogus uninited usage warning with printf
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+typedef __SIZE_TYPE__ size_t;
+
+extern void* memset (void*, int, size_t);
+extern int printf (const char*, ...);
+extern int rand (void);
+
+struct S
+{
+  int a;
+  int b;
+};
+
+struct H
+{
+  int c;
+  int d;
+};
+
+void getblk (void* blk)
+{
+  struct S* s = (struct S*) blk;
+  memset (blk, 0, 512);
+  s->a = rand () & 1;
+}
+
+struct H* gethdr (void* blk)
+{
+  memset (blk, 0, 512);
+  return rand () & 1 ? (struct H*) blk : 0;
+}
+
+int main (void)
+{
+  char blk[512], tmp[512];
+  struct S *s = (struct S*) blk;
+  struct H *h;
+
+  getblk (blk);
+
+  if (s->a  ||  !(h = gethdr (tmp))  ||  s->a != h->d) {
+
+printf ("%d\n", s->b);
+if (s->a)
+  printf ("s->a = %d\n", s->a);
+else if (!h)
+  printf ("!h\n");
+else
+  printf ("h->d = %d\n", h->d);
+  }
+}


Re: [PATCH] c++: partially initialized constexpr array [PR99699]

2021-04-15 Thread Jason Merrill via Gcc-patches

On 4/15/21 3:51 PM, Patrick Palka wrote:

Here, reduced_constant_expression_p is incorrectly returning true for a
partially initialized array CONSTRUCTOR, because when the
CONSTRUCTOR_NO_CLEARING flag is set the predicate doesn't check that
every array element is initialized by the CONSTRUCTOR, it just checks
that every initializer within the CONSTRUCTOR is a valid constant
expression.  This patch makes reduced_constant_expression_p check both
conditions in a single iteration over the CONSTRUCTOR elements.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

PR c++/99700
* constexpr.c (reduced_constant_expression_p): Return false
if an array CONSTRUCTOR is missing an element at some array
index when the CONSTRUCTOR_NO_CLEARING flag is set.

gcc/testsuite/ChangeLog:

PR c++/99700
* g++.dg/cpp2a/constexpr-init21.C: New test.
---
  gcc/cp/constexpr.c| 24 +++--
  gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C | 27 +++
  2 files changed, 49 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index c8d9dae36fb..b74bbac3cd2 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -46,6 +46,7 @@ do {  
\
  
  static HOST_WIDE_INT find_array_ctor_elt (tree ary, tree dindex,

  bool insert = false);
+static int array_index_cmp (tree key, tree index);
  
  /* Returns true iff FUN is an instantiation of a constexpr function

 template or a defaulted constexpr function.  */
@@ -2910,9 +2911,27 @@ reduced_constant_expression_p (tree t)
/* An initialized vector would have a VECTOR_CST.  */
return false;
  else if (cxx_dialect >= cxx20
-  /* An ARRAY_TYPE doesn't have any TYPE_FIELDS.  */
   && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
-   field = NULL_TREE;
+   {
+ /* There must be a valid constant initializer at every array
+index.  */
+ tree min = TYPE_MIN_VALUE (TYPE_DOMAIN (TREE_TYPE (t)));
+ tree max = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (t)));
+ tree cursor = min;
+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (t), i, idx, val)
+   {
+ if (!reduced_constant_expression_p (val))
+   return false;
+ if (array_index_cmp (cursor, idx) != 0)
+   return false;
+ if (TREE_CODE (idx) == RANGE_EXPR)
+   cursor = TREE_OPERAND (idx, 1);
+ cursor = int_const_binop (PLUS_EXPR, cursor, size_one_node);
+   }
+ if (find_array_ctor_elt (t, max) == -1)
+   return false;
+ goto ok;
+   }
  else if (cxx_dialect >= cxx20
   && TREE_CODE (TREE_TYPE (t)) == UNION_TYPE)
{
@@ -2946,6 +2965,7 @@ reduced_constant_expression_p (tree t)
for (; field; field = next_initializable_field (DECL_CHAIN (field)))
if (!is_really_empty_class (TREE_TYPE (field), /*ignore_vptr*/false))
  return false;
+ok:
if (CONSTRUCTOR_NO_CLEARING (t))
/* All the fields are initialized.  */
CONSTRUCTOR_NO_CLEARING (t) = false;
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C
new file mode 100644
index 000..47b4bff21b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C
@@ -0,0 +1,27 @@
+// PR c++/99700
+// { dg-do compile { target c++20 } }
+
+template 
+struct A {
+  T c[5];
+  constexpr A(int skip = -1) {
+for (int i = 0; i < 5; i++)
+  if (skip != i)
+c[i] = {};
+  }
+};
+
+constexpr A a;
+constexpr A a0(0); // { dg-error "not a constant expression|incompletely 
initialized" }
+constexpr A a1(1); // { dg-error "not a constant expression|incompletely 
initialized" }
+constexpr A a2(2); // { dg-error "not a constant expression|incompletely 
initialized" }
+constexpr A a3(3); // { dg-error "not a constant expression|incompletely 
initialized" }
+constexpr A a4(4); // { dg-error "not a constant expression|incompletely 
initialized" }
+
+struct s { int n; };
+constexpr A b;
+constexpr A b0(0); // {  dg-error "not a constant expression|incompletely 
initialized" }
+
+struct empty {};
+constexpr A c;
+constexpr A c0(0);





[PATCH] c++: partially initialized constexpr array [PR99699]

2021-04-15 Thread Patrick Palka via Gcc-patches
Here, reduced_constant_expression_p is incorrectly returning true for a
partially initialized array CONSTRUCTOR, because when the
CONSTRUCTOR_NO_CLEARING flag is set the predicate doesn't check that
every array element is initialized by the CONSTRUCTOR, it just checks
that every initializer within the CONSTRUCTOR is a valid constant
expression.  This patch makes reduced_constant_expression_p check both
conditions in a single iteration over the CONSTRUCTOR elements.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

PR c++/99700
* constexpr.c (reduced_constant_expression_p): Return false
if an array CONSTRUCTOR is missing an element at some array
index when the CONSTRUCTOR_NO_CLEARING flag is set.

gcc/testsuite/ChangeLog:

PR c++/99700
* g++.dg/cpp2a/constexpr-init21.C: New test.
---
 gcc/cp/constexpr.c| 24 +++--
 gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C | 27 +++
 2 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index c8d9dae36fb..b74bbac3cd2 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -46,6 +46,7 @@ do {  
\
 
 static HOST_WIDE_INT find_array_ctor_elt (tree ary, tree dindex,
  bool insert = false);
+static int array_index_cmp (tree key, tree index);
 
 /* Returns true iff FUN is an instantiation of a constexpr function
template or a defaulted constexpr function.  */
@@ -2910,9 +2911,27 @@ reduced_constant_expression_p (tree t)
/* An initialized vector would have a VECTOR_CST.  */
return false;
  else if (cxx_dialect >= cxx20
-  /* An ARRAY_TYPE doesn't have any TYPE_FIELDS.  */
   && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
-   field = NULL_TREE;
+   {
+ /* There must be a valid constant initializer at every array
+index.  */
+ tree min = TYPE_MIN_VALUE (TYPE_DOMAIN (TREE_TYPE (t)));
+ tree max = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (t)));
+ tree cursor = min;
+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (t), i, idx, val)
+   {
+ if (!reduced_constant_expression_p (val))
+   return false;
+ if (array_index_cmp (cursor, idx) != 0)
+   return false;
+ if (TREE_CODE (idx) == RANGE_EXPR)
+   cursor = TREE_OPERAND (idx, 1);
+ cursor = int_const_binop (PLUS_EXPR, cursor, size_one_node);
+   }
+ if (find_array_ctor_elt (t, max) == -1)
+   return false;
+ goto ok;
+   }
  else if (cxx_dialect >= cxx20
   && TREE_CODE (TREE_TYPE (t)) == UNION_TYPE)
{
@@ -2946,6 +2965,7 @@ reduced_constant_expression_p (tree t)
   for (; field; field = next_initializable_field (DECL_CHAIN (field)))
if (!is_really_empty_class (TREE_TYPE (field), /*ignore_vptr*/false))
  return false;
+ok:
   if (CONSTRUCTOR_NO_CLEARING (t))
/* All the fields are initialized.  */
CONSTRUCTOR_NO_CLEARING (t) = false;
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C
new file mode 100644
index 000..47b4bff21b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-init21.C
@@ -0,0 +1,27 @@
+// PR c++/99700
+// { dg-do compile { target c++20 } }
+
+template 
+struct A {
+  T c[5];
+  constexpr A(int skip = -1) {
+for (int i = 0; i < 5; i++)
+  if (skip != i)
+c[i] = {};
+  }
+};
+
+constexpr A a;
+constexpr A a0(0); // { dg-error "not a constant expression|incompletely 
initialized" }
+constexpr A a1(1); // { dg-error "not a constant expression|incompletely 
initialized" }
+constexpr A a2(2); // { dg-error "not a constant expression|incompletely 
initialized" }
+constexpr A a3(3); // { dg-error "not a constant expression|incompletely 
initialized" }
+constexpr A a4(4); // { dg-error "not a constant expression|incompletely 
initialized" }
+
+struct s { int n; };
+constexpr A b;
+constexpr A b0(0); // {  dg-error "not a constant expression|incompletely 
initialized" }
+
+struct empty {};
+constexpr A c;
+constexpr A c0(0);
-- 
2.31.1.298.g54a3917115



Re: [PATCH] c++: Fix up C++23 [] <...> requires primary -> type {} parsing [PR99850]

2021-04-15 Thread Jason Merrill via Gcc-patches

On 4/14/21 3:18 PM, Jakub Jelinek wrote:

The requires clause parsing has code to suggest users wrapping
non-primary expressions in (), so if it e.g. parses a primary expression
and sees it is followed by ++, --, ., ( or -> among other things it
will try to reparse it as assignment expression or what and if that works
suggests wrapping it inside of parens.
When it is requires-clause that is after  etc. it already
has an exception from that as ( can occur in valid C++20 expression there
- starting the parameters of the lambda.
In C++23 another case can occur, as the parameters with the ()s can be
omitted, requires C can be followed immediately by -> which starts a
trailing return type.  Even in that case, we don't want to parse that
as C->...



Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux (with
GXX_TESTSUITE_STDS=98,11,14,17,20,2b ), ok for trunk?

2021-04-14  Jakub Jelinek  

PR c++/99850
* parser.c (cp_parser_constraint_requires_parens) :
If lambda_p, return pce_ok for C++23 or later instead of
pce_maybe_postfix.

* g++.dg/cpp23/lambda-specifiers2.C: New test.

--- gcc/cp/parser.c.jj  2021-04-14 10:48:41.318103715 +0200
+++ gcc/cp/parser.c 2021-04-14 14:52:03.220235527 +0200
@@ -28526,7 +28526,19 @@ cp_parser_constraint_requires_parens (cp
case CPP_PLUS_PLUS:
case CPP_MINUS_MINUS:
case CPP_DOT:
+   /* Unenclosed postfix operator.  */
+   return pce_maybe_postfix;
+
case CPP_DEREF:
+   /* A primary constraint that precedes the lambda-declarator of a
+  lambda expression is followed by trailing return type.
+
+ [] requires C -> void {}
+
+  Don't try to re-parse this as a postfix expression in
+  C++23 and later.  In C++20 ( needs to come in between.  */
+   if (lambda_p && cxx_dialect >= cxx23)
+ return pce_ok;


I think let's not make this depend on cxx_dialect, since we allow this 
in C++20 mode as well with a pedwarn.  OK with that change.



/* Unenclosed postfix operator.  */
return pce_maybe_postfix;
 }
--- gcc/testsuite/g++.dg/cpp23/lambda-specifiers2.C.jj  2021-04-14 
15:01:41.728714721 +0200
+++ gcc/testsuite/g++.dg/cpp23/lambda-specifiers2.C 2021-04-14 
15:01:32.959813534 +0200
@@ -0,0 +1,7 @@
+// PR c++/99850
+// P1102R2 - Down with ()!
+// { dg-do compile { target c++23 } }
+
+auto l = [] requires true -> void {};
+template  concept C = true;
+auto m = [] requires (C && ...) -> void {};

Jakub





Re: [PATCH] c++: ICE with bogus late return type [PR99803]

2021-04-15 Thread Jason Merrill via Gcc-patches

On 4/14/21 9:21 PM, Marek Polacek wrote:

Here we ICE when compiling this code in C++20, because we're trying to
slam a 'typename' after the ->.  The cp_parser_template_id call just
before the spot I'm changing parsed A::template A as a BASELINK
that contains a constructor, but make_typename_type crashes on that.

My fix is the same as c++/88325, add an is_overloaded_fn check.


Instead of handling this in various callers, maybe make_typename_type 
should handle the error, like it already does for e.g.


  struct A {  template  static T t; };
  A(unsigned) -> A:: template t;

Incidentally, in the testcase for 88325:


A::A () // { dg-error "partial specialization" }


this error is misleading; this isn't a partial specialization, it's 
redundant template arguments while trying to define the primary 
template.  I wouldn't worry about fixing the compiler now, but let's not 
test for the wrong error.



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/99803
* parser.c (cp_parser_simple_type_specifier): Don't call
cp_parser_make_typename_type for is_overloaded_fn.

gcc/testsuite/ChangeLog:

PR c++/99803
* g++.dg/cpp2a/typename19.C: New test.
---
  gcc/cp/parser.c | 2 +-
  gcc/testsuite/g++.dg/cpp2a/typename19.C | 5 +
  2 files changed, 6 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/typename19.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 3a107206318..3c506d891c9 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18903,7 +18903,7 @@ cp_parser_simple_type_specifier (cp_parser* parser,
  if (TREE_CODE (type) != TYPE_DECL)
{
  /* ...unless we pretend we have seen 'typename'.  */
- if (typename_p)
+ if (typename_p && !is_overloaded_fn (type))
type = cp_parser_make_typename_type (parser, type,
 token->location);
  else
diff --git a/gcc/testsuite/g++.dg/cpp2a/typename19.C 
b/gcc/testsuite/g++.dg/cpp2a/typename19.C
new file mode 100644
index 000..bd7e5110e00
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/typename19.C
@@ -0,0 +1,5 @@
+// PR c++/99803
+// { dg-do compile { target c++20 } }
+
+struct A { template A(T); };
+auto A(unsigned) -> A::template A; // { dg-error "not name a type" }

base-commit: a87d3f964df31d4fbceb822c6d293e85c117d992





[pushed] c++: noexcept error recursion [PR100101]

2021-04-15 Thread Jason Merrill via Gcc-patches
Here instantiating the noexcept-specifier for bar() means
instantiating A::value, which complains about the conversion from 0 to
int* in the default argument of foo.  Since my patch for PR99583, printing
the error context involves looking at C::type, which again wants to
instantiate A::value, which breaks.  For now at least, let's break
this recursion by avoiding looking into the noexcept-specifier in
find_typenames, and limit that to just the uses_parameter_packs case that
PR99583 cares about.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/100101
PR c++/99583
* pt.c (find_parameter_packs_r) [FUNCTION_TYPE]: Walk into
TYPE_RAISES_EXCEPTIONS here.
* tree.c (cp_walk_subtrees): Not here.

gcc/testsuite/ChangeLog:

PR c++/100101
* g++.dg/cpp0x/noexcept67.C: New test.
---
 gcc/cp/pt.c | 11 +++
 gcc/cp/tree.c   |  5 -
 gcc/testsuite/g++.dg/cpp0x/noexcept67.C | 26 +
 3 files changed, 37 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept67.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0f119a55272..2190f83882a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3890,6 +3890,10 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 (struct find_parameter_pack_data*)data;
   bool parameter_pack_p = false;
 
+#define WALK_SUBTREE(NODE) \
+  cp_walk_tree (&(NODE), _parameter_packs_r,  \
+   ppd, ppd->visited)  \
+
   /* Don't look through typedefs; we are interested in whether a
  parameter pack is actually written in the expression/type we're
  looking at, not the target type.  */
@@ -4070,10 +4074,17 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
ppd, ppd->visited);
   return NULL_TREE;
 
+case FUNCTION_TYPE:
+case METHOD_TYPE:
+  WALK_SUBTREE (TYPE_RAISES_EXCEPTIONS (t));
+  break;
+
 default:
   return NULL_TREE;
 }
 
+#undef WALK_SUBTREE
+
   return NULL_TREE;
 }
 
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 13cc61c3123..dca947bf52a 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5415,11 +5415,6 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
walk_tree_fn func,
}
   break;
 
-case FUNCTION_TYPE:
-case METHOD_TYPE:
-  WALK_SUBTREE (TYPE_RAISES_EXCEPTIONS (*tp));
-  break;
-
 default:
   return NULL_TREE;
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept67.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept67.C
new file mode 100644
index 000..7f061034323
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept67.C
@@ -0,0 +1,26 @@
+// PR c++/100101
+// { dg-do compile { target c++11 } }
+
+template  struct A
+{
+template  static char foo(U*, int* = 0);
+static const bool value = sizeof(foo(static_cast(nullptr))) > 0;
+};
+
+template  struct B
+{
+static const bool value = b;
+};
+
+template  struct C
+{
+typedef B::value> type;
+};
+
+template 
+void bar() noexcept(A::value && C::type::value) {}
+
+void baz()
+{
+  bar();
+}

base-commit: a25590f29d07a88f6bf1b2c1ab0e4e012725db98
-- 
2.27.0



Re: [PATCH] aarch64: Fix up 2 other combine opt regressions vs. GCC8 [PR100075]

2021-04-15 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 15, 2021 at 07:11:11PM +0100, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > --- gcc/config/aarch64/aarch64.md.jj2021-04-15 10:45:02.798853095 
> > +0200
> > +++ gcc/config/aarch64/aarch64.md   2021-04-15 13:28:04.734754364 +0200
> > @@ -3572,6 +3572,18 @@ (define_insn "*neg__si2_uxtw"
> >[(set_attr "autodetect_type" "alu_shift__op2")]
> >  )
> >  
> > +(define_insn "*neg_asr_si2_extr"
> > +  [(set (match_operand:SI 0 "register_operand" "r")
> > +   (neg:SI (match_operator 4 "subreg_lowpart_operator"
> 
> Very minor, but it might be better to have the :SI on the match_operator
> too, like in the pattern below.

Fixed, thanks for catching that (and the "r" -> "=r"; I've
actually tested a patch that didn't have any constraints on the first
define_insn because I started with a define_split that didn't work,
and it happened to work during testing, only noticed I've missed them
afterwards.

> > @@ -5382,6 +5394,22 @@ (define_insn "*extrsi5_insn_uxtw_alt"
> >"extr\\t%w0, %w1, %w2, %4"
> >[(set_attr "type" "rotate_imm")]
> >  )
> > +
> > +(define_insn "*extrsi5_insn_di"
> > +  [(set (match_operand:SI 0 "register_operand" "=r")
> > +   (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "r")
> > +  (match_operand 3 "const_int_operand" "n"))
> > +   (match_operator:SI 6 "subreg_lowpart_operator"
> > + [(zero_extract:DI
> > +(match_operand:DI 2 "register_operand" "r")
> > +(match_operand 5 "const_int_operand" "n")
> > +(match_operand 4 "const_int_operand" "n"))])))]
> > +  "UINTVAL (operands[3]) < 32
> > +   && UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32
> > +   && UINTVAL (operands[4]) + UINTVAL (operands[5]) - 32 <= 64"
> 
> Could you explain this condition?  With operand 5 being the size
> and operand 4 being the position, I was expecting something like:

The first two conditions are like those on *extr5_insn
except that GET_MODE_BITSIZE (mode) is hardcoded as 32.
Like in *extr5_insn, we have two shift counts, because
aarch64 is !BITS_BIG_ENDIAN the last operand of zero_extract
is a shift count too.

The
UINTVAL (operands[4]) + UINTVAL (operands[5]) - 32 <= 64
(should have been <= 32 actually) meant to test
IN_RANGE (UINTVAL (operands[4]) + UINTVAL (operands[5]), 32, 64)
because I mistakenly thought 
or maybe just
UINTVAL (operands[4]) + UINTVAL (operands[5]) >= 32
was meant to test that the zero_extract extracts at least as many bits
that it doesn't mask any bits above that.  But the subreg:SI
actually applies to the zero_extract result and therefore
operands[2] >> operands[4] rather than before that, so you're right
it needs to be == 32 rather than >= 32.

So, either it can be
  "UINTVAL (operands[3]) < 32
   && UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32
   && UINTVAL (operands[4]) + UINTVAL (operands[5]) == 32"
or what you wrote:

>   "UINTVAL (operands[3]) < 32
>&& UINTVAL (operands[3]) == UINTVAL (operands[5])
>&& UINTVAL (operands[4]) + UINTVAL (operands[5]) == 32"

or it could be:
  "UINTVAL (operands[3]) < 32
   && UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32
   && INTVAL (operands[3]) == INTVAL (operands[5])"

Jakub



Patch, fortran] PR fortran/100103 - Automatic reallocation fails inside select rank

2021-04-15 Thread José Rui Faustino de Sousa via Gcc-patches

Hi All!

Proposed patch to:

PR100103 - Automatic reallocation fails inside select rank

Patch tested only on x86_64-pc-linux-gnu.

Add select rank temporary associated names as possible targets of 
automatic reallocation.


The patch depends on PR100097 and PR100098.

Thank you very much.

Best regards,
José Rui

Fortran: Fix automatic reallocation inside select rank [PR100103]

gcc/fortran/ChangeLog:

PR fortran/100103
* trans-array.c (gfc_is_reallocatable_lhs): add select rank
temporary associate names as possible targets of automatic
reallocation.

gcc/testsuite/ChangeLog:

PR fortran/100103
* gfortran.dg/PR100103.f90: New test.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index be5eb89350f..99225e70d5d 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -10048,7 +10048,7 @@ gfc_is_reallocatable_lhs (gfc_expr *expr)
 
   /* An allocatable class variable with no reference.  */
   if (sym->ts.type == BT_CLASS
-  && !sym->attr.associate_var
+  && (!sym->attr.associate_var || sym->attr.select_rank_temporary)
   && CLASS_DATA (sym)->attr.allocatable
   && expr->ref
   && ((expr->ref->type == REF_ARRAY && expr->ref->u.ar.type == AR_FULL
@@ -10063,7 +10063,7 @@ gfc_is_reallocatable_lhs (gfc_expr *expr)
 
   /* An allocatable variable.  */
   if (sym->attr.allocatable
-  && !sym->attr.associate_var
+  && (!sym->attr.associate_var || sym->attr.select_rank_temporary)
   && expr->ref
   && expr->ref->type == REF_ARRAY
   && expr->ref->u.ar.type == AR_FULL)
diff --git a/gcc/testsuite/gfortran.dg/PR100103.f90 b/gcc/testsuite/gfortran.dg/PR100103.f90
new file mode 100644
index 000..756fd5824c9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR100103.f90
@@ -0,0 +1,81 @@
+! { dg-do run }
+!
+! Test the fix for PR100103
+!
+
+program main_p
+
+  implicit none
+
+  integer:: i
+  integer, parameter :: n = 11
+  
+  type :: foo_t
+integer :: i
+  end type foo_t
+  
+  type(foo_t), parameter :: a(*) = [(foo_t(i), i=1,n)]
+
+  type(foo_t),  allocatable :: bar_d(:)
+  class(foo_t), allocatable :: bar_p(:)
+  class(*), allocatable :: bar_u(:)
+
+
+  call foo_d(bar_d)
+  if(.not.allocated(bar_d)) stop 1
+  if(any(bar_d%i/=a%i)) stop 2
+  deallocate(bar_d)
+  call foo_p(bar_p)
+  if(.not.allocated(bar_p)) stop 3
+  if(any(bar_p%i/=a%i)) stop 4
+  deallocate(bar_p)
+  call foo_u(bar_u)
+  if(.not.allocated(bar_u)) stop 5
+  select type(bar_u)
+  type is(foo_t)
+if(any(bar_u%i/=a%i)) stop 6
+  class default
+stop 7
+  end select
+  deallocate(bar_u)
+  stop
+
+contains
+
+  subroutine foo_d(that)
+type(foo_t), allocatable, intent(out) :: that(..)
+
+select rank(that)
+rank(1)
+  that = a
+rank default
+  stop 8
+end select
+return
+  end subroutine foo_d
+
+  subroutine foo_p(that)
+class(foo_t), allocatable, intent(out) :: that(..)
+
+select rank(that)
+rank(1)
+  that = a
+rank default
+  stop 9
+end select
+return
+  end subroutine foo_p
+
+  subroutine foo_u(that)
+class(*), allocatable, intent(out) :: that(..)
+
+select rank(that)
+rank(1)
+  that = a
+rank default
+  stop 10
+end select
+return
+  end subroutine foo_u
+
+end program main_p


Re: [PATCH] aarch64: Fix up 2 other combine opt regressions vs. GCC8 [PR100075]

2021-04-15 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> --- gcc/config/aarch64/aarch64.md.jj  2021-04-15 10:45:02.798853095 +0200
> +++ gcc/config/aarch64/aarch64.md 2021-04-15 13:28:04.734754364 +0200
> @@ -3572,6 +3572,18 @@ (define_insn "*neg__si2_uxtw"
>[(set_attr "autodetect_type" "alu_shift__op2")]
>  )
>  
> +(define_insn "*neg_asr_si2_extr"
> +  [(set (match_operand:SI 0 "register_operand" "r")
> + (neg:SI (match_operator 4 "subreg_lowpart_operator"

Very minor, but it might be better to have the :SI on the match_operator
too, like in the pattern below.

> +   [(sign_extract:DI
> +  (match_operand:DI 1 "register_operand" "r")
> +  (match_operand 3 "aarch64_simd_shift_imm_offset_si" "n")
> +  (match_operand 2 "aarch64_simd_shift_imm_offset_si" 
> "n"))])))]
> +  "INTVAL (operands[2]) + INTVAL (operands[3]) == 32"
> +  "neg\\t%w0, %w1, asr %2"
> +  [(set_attr "autodetect_type" "alu_shift_asr_op2")]
> +)
> +
>  (define_insn "mul3"
>[(set (match_operand:GPI 0 "register_operand" "=r")
>   (mult:GPI (match_operand:GPI 1 "register_operand" "r")
> @@ -5382,6 +5394,22 @@ (define_insn "*extrsi5_insn_uxtw_alt"
>"extr\\t%w0, %w1, %w2, %4"
>[(set_attr "type" "rotate_imm")]
>  )
> +
> +(define_insn "*extrsi5_insn_di"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> + (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "r")
> +(match_operand 3 "const_int_operand" "n"))
> + (match_operator:SI 6 "subreg_lowpart_operator"
> +   [(zero_extract:DI
> +  (match_operand:DI 2 "register_operand" "r")
> +  (match_operand 5 "const_int_operand" "n")
> +  (match_operand 4 "const_int_operand" "n"))])))]
> +  "UINTVAL (operands[3]) < 32
> +   && UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32
> +   && UINTVAL (operands[4]) + UINTVAL (operands[5]) - 32 <= 64"

Could you explain this condition?  With operand 5 being the size
and operand 4 being the position, I was expecting something like:

  "UINTVAL (operands[3]) < 32
   && UINTVAL (operands[3]) == UINTVAL (operands[5])
   && UINTVAL (operands[4]) + UINTVAL (operands[5]) == 32"

i.e. the %w1 shift must equal the size of the %w2 extraction
and the %w2 extraction must align with the top of the register.
Or, writing it in more the style of the original condition,
the final line would be:

   && UINTVAL (operands[3]) == UINTVAL (operands[5])"

instead of:

   && UINTVAL (operands[4]) + UINTVAL (operands[5]) - 32 <= 64"

Not tested though, and it's late, so I could have got that completely
wrong :-)

Thanks,
Richard

> +  "extr\\t%w0, %w1, %w2, %4"
> +  [(set_attr "type" "rotate_imm")]
> +)
>  
>  (define_insn "*ror3_insn"
>[(set (match_operand:GPI 0 "register_operand" "=r")
> --- gcc/testsuite/gcc.target/aarch64/pr100075.c.jj2021-04-15 
> 13:23:31.188852983 +0200
> +++ gcc/testsuite/gcc.target/aarch64/pr100075.c   2021-04-15 
> 13:23:10.612086048 +0200
> @@ -0,0 +1,20 @@
> +/* PR target/100075 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not {\tsbfx\tx[0-9]+, x[0-9]+, 16, 16} } } */
> +/* { dg-final { scan-assembler {\tneg\tw[0-9]+, w[0-9]+, asr 16} } } */
> +/* { dg-final { scan-assembler {\textr\tw[0-9]+, w[0-9]+, w[0-9]+, 16} } } */
> +
> +struct S { short x, y; };
> +
> +struct S
> +f1 (struct S p)
> +{
> +  return (struct S) { -p.y, p.x };
> +}
> +
> +struct S
> +f2 (struct S p)
> +{
> +  return (struct S) { p.y, -p.x };
> +}
>
>   Jakub


Re: [PATCH v2] x86: Use crc32 target option for CRC32 intrinsics

2021-04-15 Thread Uros Bizjak via Gcc-patches
On Thu, Apr 15, 2021 at 6:51 PM H.J. Lu  wrote:
>
> On Thu, Apr 15, 2021 at 9:34 AM Uros Bizjak  wrote:
> >
> > On Thu, Apr 15, 2021 at 6:26 PM H.J. Lu  wrote:
> > >
> > > On Thu, Apr 15, 2021 at 9:14 AM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Apr 15, 2021 at 5:11 PM H.J. Lu  wrote:
> > > > >
> > > > > Use crc32 target option for CRC32 intrinsics to support CRC32 
> > > > > intrinsics
> > > > > without enabling SSE vector instructions.
> > > >
> > > > There is no CRC32 ISA. crc32 is part of SSE4.2 [1] and current
> > > > situation reflects that correctly.
> > >
> > > CRC32 is similar to POPCNT which was originally in SSE4.2.   Now POPCNT
> >
> > It is not similar, POPCNT has its own CPUID flag and can be enabled
> > independently of SSE4.2.
> >
> > > is a separate feature which is also enabled by SSE4.2.   Enable CRC32 only
> > > with SSE4.2 makes it impossible to use CRC32 with -mgeneral-regs-only.   
> > > This
> > > patch addresses this issue the same way as POPCNT.
> >
> > CRC32 doesn't have its own CPUID flag, so PTA_CRC32 is pointless.
>
> PTA_CRC32 shouldn't be added.
>
> > OTOH, the situation is similar with MONITOR and MWAIT. These are
>
> There are no intrinsics for  MONITOR nor MWAIT.
>
> > enabled with SSE3 and don't use XMM registers. Also somewhat similar
> > is FISTTP, but there is no intrinsic for this insn.
>
> True.
>
> Here is the v2 patch without PTA_CRC32.

--- a/gcc/config/i386/gnu-property.c
+++ b/gcc/config/i386/gnu-property.c
@@ -92,6 +92,7 @@ file_end_indicate_exec_stack_and_gnu_property (void)
   /* GNU_PROPERTY_X86_ISA_1_V2.  */
   if (TARGET_CMPXCHG16B
   || (TARGET_64BIT && TARGET_SAHF)
+  || TARGET_CRC32
   || TARGET_POPCNT
   || TARGET_SSE3
   || TARGET_SSSE3

This is not needed. CRC32 is not an ISA, and if someone uses
-mx86-64-v2 -mno-crc32 it does what the documentation says - disables
builtin function.

Otherwise OK, but please also obtain RM's approval at this stage.

Thanks,
Uros.

> --
> H.J.


[PATCH] testsuite: Enable zero-scratch-regs-{8,9,10,11}.c on s390*

2021-04-15 Thread Stefan Schulze Frielinghaus via Gcc-patches
On s390* the only missing part for the mentioned testcases was a load of
a double floating-point zero via a move (in particular for quite old
machines) which was added in commit 46c47420a5fefd4d9d02b0db347235dd74e20fb2.
Common code implementation is sufficient in order to clear volatile
GPRs, FPRs, and VRs.  Access registers a0 and a1 are nonvolatile and not
cleared.  Therefore, target hook TARGET_ZERO_CALL_USED_REGS is not
implemented for s390*.

Added a target specific test in order to ensure that all call clobbered
GPRs, FPRs, and VRs are zeroed and all call saved registers are kept.

Ok for mainline?

gcc/testsuite/ChangeLog:

* c-c++-common/zero-scratch-regs-8.c: Enable on s390*.
* c-c++-common/zero-scratch-regs-9.c: Likewise.
* c-c++-common/zero-scratch-regs-10.c: Likewise.
* c-c++-common/zero-scratch-regs-11.c: Likewise.
* gcc.target/s390/zero-scratch-regs-1.c: New test.
---
 .../c-c++-common/zero-scratch-regs-10.c   |  2 +-
 .../c-c++-common/zero-scratch-regs-11.c   |  2 +-
 .../c-c++-common/zero-scratch-regs-8.c|  2 +-
 .../c-c++-common/zero-scratch-regs-9.c|  2 +-
 .../gcc.target/s390/zero-scratch-regs-1.c | 65 +++
 5 files changed, 69 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zero-scratch-regs-1.c

diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c
index ab17143bc4b..96e0b79b328 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-10.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* nvptx*-*-* } } } */
+/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* nvptx*-*-* s390*-*-* } } } */
 /* { dg-options "-O2" } */
 
 #include 
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c
index 6642a377798..0714f95a04f 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-11.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* arm*-*-* nvptx*-*-* } } } */
+/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* arm*-*-* nvptx*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fzero-call-used-regs=all" } */
 
 #include "zero-scratch-regs-10.c"
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-8.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-8.c
index 867c6bdce2c..aceda7e5cb8 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-8.c
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-8.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* arm*-*-* nvptx*-*-* } } } */
+/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* arm*-*-* nvptx*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */
 
 #include "zero-scratch-regs-1.c"
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-9.c 
b/gcc/testsuite/c-c++-common/zero-scratch-regs-9.c
index 4b45d7061df..f3152a7a732 100644
--- a/gcc/testsuite/c-c++-common/zero-scratch-regs-9.c
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-9.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* arm*-*-* nvptx*-*-* } } } */
+/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
aarch64*-*-* arm*-*-* nvptx*-*-* s390*-*-* } } } */
 /* { dg-options "-O2 -fzero-call-used-regs=all" } */
 
 #include "zero-scratch-regs-1.c"
diff --git a/gcc/testsuite/gcc.target/s390/zero-scratch-regs-1.c 
b/gcc/testsuite/gcc.target/s390/zero-scratch-regs-1.c
new file mode 100644
index 000..c394c4b69e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/zero-scratch-regs-1.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=z13" } */
+
+/* Ensure that all call clobbered GPRs, FPRs, and VRs are zeroed and all call
+   saved registers are kept. */
+
+void foo (void) { }
+
+/* { dg-final { scan-assembler-times "lhi\t" 6 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler "lhi\t%r0,0" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler "lhi\t%r1,0" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler "lhi\t%r2,0" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler "lhi\t%r3,0" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler "lhi\t%r4,0" { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler "lhi\t%r5,0" { target { ! lp64 } } } } */
+
+/* { dg-final { scan-assembler-times "lzdr\t" 14 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler "lzdr\t%f0" { target { ! lp64 } } } } */
+/* 

Re: [PATCH] aarch64: Avoid duplicating bti j insns for jump tables [PR99988]

2021-04-15 Thread Richard Sandiford via Gcc-patches
Looks good in general, but like you say, it's GCC 12 material.

Alex Coplan  writes:
> diff --git a/gcc/config/aarch64/aarch64-bti-insert.c 
> b/gcc/config/aarch64/aarch64-bti-insert.c
> index 936649769c7..943fa3c1097 100644
> --- a/gcc/config/aarch64/aarch64-bti-insert.c
> +++ b/gcc/config/aarch64/aarch64-bti-insert.c
> @@ -120,6 +120,13 @@ aarch64_pac_insn_p (rtx x)
>return false;
>  }
>  
> +static bool
> +aarch64_bti_j_insn_p (rtx_insn *insn)
> +{
> +  rtx pat = PATTERN (insn);
> +  return GET_CODE (pat) == UNSPEC_VOLATILE && XINT (pat, 1) == UNSPECV_BTI_J;
> +}
> +

Nit, but even a simple function like this should have a comment. :-)

>  /* Insert the BTI instruction.  */
>  /* This is implemented as a late RTL pass that runs before branch
> shortening and does the following.  */
> @@ -165,6 +172,9 @@ rest_of_insert_bti (void)
> for (j = GET_NUM_ELEM (vec) - 1; j >= 0; --j)
>   {
> label = as_a  (XEXP (RTVEC_ELT (vec, j), 0));
> +   if (aarch64_bti_j_insn_p (next_nonnote_insn (label)))
> + continue;
> +

This should be next_nonnote_nondebug_insn (quite the mouthful),
otherwise debug instructions could affect the choice.

The thing returned by next_nonnote_nondebug_insn isn't in general
guaranteed to be an insn (unlike next_real_nondebug_insn).  It might
also be null in very odd cases.  I think we should therefore check
for null and INSN_P before checking PATTERN.

Thanks,
Richard

> bti_insn = gen_bti_j ();
> emit_insn_after (bti_insn, label);
>   }
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr99988.c 
> b/gcc/testsuite/gcc.target/aarch64/pr99988.c
> new file mode 100644
> index 000..2d87f41a717
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr99988.c
> @@ -0,0 +1,66 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mbranch-protection=standard" } */
> +/* { dg-final { scan-assembler-times {bti j} 13 } } */
> +int a;
> +int c();
> +int d();
> +int e();
> +int f();
> +int g();
> +void h() {
> +  switch (a) {
> +  case 0:
> +  case 56:
> +  case 57:
> +break;
> +  case 58:
> +  case 59:
> +  case 61:
> +  case 62:
> +c();
> +  case 64:
> +  case 63:
> +d();
> +  case 66:
> +  case 65:
> +d();
> +  case 68:
> +  case 67:
> +d();
> +  case 69:
> +  case 70:
> +d();
> +  case 71:
> +  case 72:
> +  case 88:
> +  case 87:
> +d();
> +  case 90:
> +  case 89:
> +d();
> +  case 92:
> +  case 1:
> +d();
> +  case 93:
> +  case 73:
> +  case 4:
> +e();
> +  case 76:
> +  case 5:
> +f();
> +  case 7:
> +  case 8:
> +  case 84:
> +  case 85:
> +break;
> +  case 6:
> +  case 299:
> +  case 9:
> +  case 80:
> +  case 2:
> +  case 3:
> +e();
> +  default:
> +g();
> +  }
> +}


Re: [PATCH v2] x86: Use crc32 target option for CRC32 intrinsics

2021-04-15 Thread H.J. Lu via Gcc-patches
On Thu, Apr 15, 2021 at 9:53 AM Uros Bizjak  wrote:
>
> On Thu, Apr 15, 2021 at 6:51 PM H.J. Lu  wrote:
> >
> > On Thu, Apr 15, 2021 at 9:34 AM Uros Bizjak  wrote:
> > >
> > > On Thu, Apr 15, 2021 at 6:26 PM H.J. Lu  wrote:
> > > >
> > > > On Thu, Apr 15, 2021 at 9:14 AM Uros Bizjak  wrote:
> > > > >
> > > > > On Thu, Apr 15, 2021 at 5:11 PM H.J. Lu  wrote:
> > > > > >
> > > > > > Use crc32 target option for CRC32 intrinsics to support CRC32 
> > > > > > intrinsics
> > > > > > without enabling SSE vector instructions.
> > > > >
> > > > > There is no CRC32 ISA. crc32 is part of SSE4.2 [1] and current
> > > > > situation reflects that correctly.
> > > >
> > > > CRC32 is similar to POPCNT which was originally in SSE4.2.   Now POPCNT
> > >
> > > It is not similar, POPCNT has its own CPUID flag and can be enabled
> > > independently of SSE4.2.
> > >
> > > > is a separate feature which is also enabled by SSE4.2.   Enable CRC32 
> > > > only
> > > > with SSE4.2 makes it impossible to use CRC32 with -mgeneral-regs-only.  
> > > >  This
> > > > patch addresses this issue the same way as POPCNT.
> > >
> > > CRC32 doesn't have its own CPUID flag, so PTA_CRC32 is pointless.
> >
> > PTA_CRC32 shouldn't be added.
> >
> > > OTOH, the situation is similar with MONITOR and MWAIT. These are
> >
> > There are no intrinsics for  MONITOR nor MWAIT.
>
> pmmintrin.h:
>
> extern __inline void __attribute__((__gnu_inline__, __always_inline__,
> __artificial__))
> _mm_monitor (void const * __P, unsigned int __E, unsigned int __H)
> {
>   __builtin_ia32_monitor (__P, __E, __H);
> }
>
> extern __inline void __attribute__((__gnu_inline__, __always_inline__,
> __artificial__))
> _mm_mwait (unsigned int __E, unsigned int __H)
> {
>   __builtin_ia32_mwait (__E, __H);
> }

They can be moved to mwaitintrin.h with -mmwait.

> >
> > > enabled with SSE3 and don't use XMM registers. Also somewhat similar
> > > is FISTTP, but there is no intrinsic for this insn.
> >
> > True.
> >
> > Here is the v2 patch without PTA_CRC32.
> >
> > --
> > H.J.



-- 
H.J.


Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-04-15 Thread Thomas Schwinge
Hi!

On 2021-02-26T04:34:50-0800, Julian Brown  wrote:
> This patch

Thanks, Julian, for your continued improving of these changes!

This has iterated through several conceptually different designs and
implementations, by several people, over the past several years.

It's now been made my task to finish it up -- but I'll very much
appreciate your input (Julian's, primarily) on the following remarks,
which are basically my open work items.


> implements a method to track the "private-ness" of
> OpenACC variables declared in offload regions in gang-partitioned,
> worker-partitioned or vector-partitioned modes. Variables declared
> implicitly in scoped blocks and those declared "private" on enclosing
> directives (e.g. "acc parallel") are both handled. Variables that are
> e.g. gang-private can then be adjusted so they reside in GPU shared
> memory.
>
> The reason for doing this is twofold: correct implementation of OpenACC
> semantics

ACK, and as mentioned before, this very much relates to
 "OpenACC: predetermined private levels for
variables declared in blocks" (plus the corresponding use of 'private'
clauses, implicit/explicit, including 'firstprivate') and
 "Predetermined private levels for variables
declared in OpenACC accelerator routines", which we thus should refer in
testcases/ChangeLog/commit log, as appropriate.  I do understand we're
not yet addressing all of that (and that's fine!), but we should capture
remaining work items of the PRs and Cesar's list in
),
as appropriate.


I was surprised that we didn't really have to fix up any existing libgomp
testcases, because there seem to be quite some that contain a pattern
(exemplified by the 'tmp' variable) as follows:

int main()
{
#define N 123
  int data[N];
  int tmp;

#pragma acc parallel // implicit 'firstprivate(tmp)'
  {
// 'tmp' now conceptually made gang-private here.
#pragma acc loop gang
for (int i = 0; i < 123; ++i)
  {
tmp = i + 234;
data[i] = tmp;
  }
  }

  for (int i = 0; i < 123; ++i)
if (data[i] != i + 234)
  __builtin_abort ();

  return 0;
}

With the code changes as posted, this actually now does *not* use
gang-private memory for 'tmp', but instead continues to use
"thread-private registers", as before.

Same for:

--- s3.c2021-04-13 17:26:49.628739379 +0200
+++ s3_2.c  2021-04-13 17:29:43.484579664 +0200
@@ -4,6 +4,6 @@
   int data[N];
-  int tmp;

-#pragma acc parallel // implicit 'firstprivate(tmp)'
+#pragma acc parallel
   {
+int tmp;
 // 'tmp' now conceptually made gang-private here.
 #pragma acc loop gang

I suppose that's due to conditionalizing this transformation on
'TREE_ADDRESSABLE' (as you're doing), so we should be mostly "safe"
regarding such existing testcases (but I haven't verified that yet in
detail).

That needs to be documented in testcases, with some kind of dump scanning
(host compilation-side even; see below).

A note for later: if this weren't just a 'gang' loop, but 'gang' plus
'worker' and/or 'vector', we'd actually be fixing up user code with
undefined behavior into "correct" code (by *not* making 'tmp'
gang-private, but thread-private), right?

As that may not be obvious to the reader, I'd like to have the
'TREE_ADDRESSABLE' conditionalization be documented in the code.  You had
explained that in
: "a
non-addressable variable [...]".


> and optimisation, since shared memory might be faster than
> the main memory on a GPU.

Do we potentially have a problem that making more use of (scarce)
gang-private memory may negatively affect peformance, because potentially
fewer OpenACC gangs may then be launched to the GPU hardware in parallel?
(Of course, OpenACC semantics conformance firstly is more important than
performance, but there may be ways to be conformant and performant;
"quality of implementation".)  Have you run any such performance testing
with the benchmarking codes that we've got set up?

(As I'm more familiar with that, I'm using nvptx offloading examples in
the following, whilst assuming that similar discussion may apply for GCN
offloading, which uses similar hardware concepts, as far as I remember.)

Looking at the existing 'libgomp.oacc-c-c++-common/private-variables.c'
(random example), for nvptx offloading, '-O0', we see the following PTX
JIT compilation changes (word-'diff' of 'GOMP_DEBUG=1' at run-time):

info: Function properties for 'local_g_1$_omp_fn$0':
info: used 27 registers, 32 stack, [-176-]{+256+} bytes smem, 328 bytes 
cmem[0], 0 bytes lmem
info: Function properties for 'local_w_1$_omp_fn$0':
info: used 40 registers, 48 stack, [-176-]{+256+} bytes smem, 328 bytes 

Re: [PATCH v2] x86: Use crc32 target option for CRC32 intrinsics

2021-04-15 Thread Uros Bizjak via Gcc-patches
On Thu, Apr 15, 2021 at 6:51 PM H.J. Lu  wrote:
>
> On Thu, Apr 15, 2021 at 9:34 AM Uros Bizjak  wrote:
> >
> > On Thu, Apr 15, 2021 at 6:26 PM H.J. Lu  wrote:
> > >
> > > On Thu, Apr 15, 2021 at 9:14 AM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Apr 15, 2021 at 5:11 PM H.J. Lu  wrote:
> > > > >
> > > > > Use crc32 target option for CRC32 intrinsics to support CRC32 
> > > > > intrinsics
> > > > > without enabling SSE vector instructions.
> > > >
> > > > There is no CRC32 ISA. crc32 is part of SSE4.2 [1] and current
> > > > situation reflects that correctly.
> > >
> > > CRC32 is similar to POPCNT which was originally in SSE4.2.   Now POPCNT
> >
> > It is not similar, POPCNT has its own CPUID flag and can be enabled
> > independently of SSE4.2.
> >
> > > is a separate feature which is also enabled by SSE4.2.   Enable CRC32 only
> > > with SSE4.2 makes it impossible to use CRC32 with -mgeneral-regs-only.   
> > > This
> > > patch addresses this issue the same way as POPCNT.
> >
> > CRC32 doesn't have its own CPUID flag, so PTA_CRC32 is pointless.
>
> PTA_CRC32 shouldn't be added.
>
> > OTOH, the situation is similar with MONITOR and MWAIT. These are
>
> There are no intrinsics for  MONITOR nor MWAIT.

pmmintrin.h:

extern __inline void __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_monitor (void const * __P, unsigned int __E, unsigned int __H)
{
  __builtin_ia32_monitor (__P, __E, __H);
}

extern __inline void __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_mwait (unsigned int __E, unsigned int __H)
{
  __builtin_ia32_mwait (__E, __H);
}

>
> > enabled with SSE3 and don't use XMM registers. Also somewhat similar
> > is FISTTP, but there is no intrinsic for this insn.
>
> True.
>
> Here is the v2 patch without PTA_CRC32.
>
> --
> H.J.


[PATCH v2] x86: Use crc32 target option for CRC32 intrinsics

2021-04-15 Thread H.J. Lu via Gcc-patches
On Thu, Apr 15, 2021 at 9:34 AM Uros Bizjak  wrote:
>
> On Thu, Apr 15, 2021 at 6:26 PM H.J. Lu  wrote:
> >
> > On Thu, Apr 15, 2021 at 9:14 AM Uros Bizjak  wrote:
> > >
> > > On Thu, Apr 15, 2021 at 5:11 PM H.J. Lu  wrote:
> > > >
> > > > Use crc32 target option for CRC32 intrinsics to support CRC32 intrinsics
> > > > without enabling SSE vector instructions.
> > >
> > > There is no CRC32 ISA. crc32 is part of SSE4.2 [1] and current
> > > situation reflects that correctly.
> >
> > CRC32 is similar to POPCNT which was originally in SSE4.2.   Now POPCNT
>
> It is not similar, POPCNT has its own CPUID flag and can be enabled
> independently of SSE4.2.
>
> > is a separate feature which is also enabled by SSE4.2.   Enable CRC32 only
> > with SSE4.2 makes it impossible to use CRC32 with -mgeneral-regs-only.   
> > This
> > patch addresses this issue the same way as POPCNT.
>
> CRC32 doesn't have its own CPUID flag, so PTA_CRC32 is pointless.

PTA_CRC32 shouldn't be added.

> OTOH, the situation is similar with MONITOR and MWAIT. These are

There are no intrinsics for  MONITOR nor MWAIT.

> enabled with SSE3 and don't use XMM registers. Also somewhat similar
> is FISTTP, but there is no intrinsic for this insn.

True.

Here is the v2 patch without PTA_CRC32.

-- 
H.J.
From 918c28ae8843df90d1d73838e7afe05ccdfb4cbf Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 15 Apr 2021 05:59:48 -0700
Subject: [PATCH v2] x86: Use crc32 target option for CRC32 intrinsics

Use crc32 target option for CRC32 intrinsics to support CRC32 intrinsics
without enabling SSE vector instructions.

	* config/i386/gnu-property.c
	(file_end_indicate_exec_stack_and_gnu_property): Also check
	TARGET_CRC32 for GNU_PROPERTY_X86_ISA_1_V2.
	* config/i386/i386-c.c (ix86_target_macros_internal): Define
	__CRC32__ for -mcrc32.
	* config/i386/i386-options.c (ix86_option_override_internal):
	Enable crc32 instruction for -msse4.2.
	* config/i386/i386.md (sse4_2_crc32): Remove TARGET_SSE4_2
	check.
	(sse4_2_crc32di): Likewise.
	* config/i386/ia32intrin.h: Use crc32 target option for CRC32
	intrinsics.
---
 gcc/config/i386/gnu-property.c |  1 +
 gcc/config/i386/i386-c.c   |  2 ++
 gcc/config/i386/i386-options.c |  5 +
 gcc/config/i386/i386.md|  4 ++--
 gcc/config/i386/ia32intrin.h   | 28 ++--
 5 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c
index 4ba04403002..b6a3bdf62ce 100644
--- a/gcc/config/i386/gnu-property.c
+++ b/gcc/config/i386/gnu-property.c
@@ -92,6 +92,7 @@ file_end_indicate_exec_stack_and_gnu_property (void)
   /* GNU_PROPERTY_X86_ISA_1_V2.  */
   if (TARGET_CMPXCHG16B
 	  || (TARGET_64BIT && TARGET_SAHF)
+	  || TARGET_CRC32
 	  || TARGET_POPCNT
 	  || TARGET_SSE3
 	  || TARGET_SSSE3
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index be46d0506ad..5ed0de006fb 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -532,6 +532,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__LZCNT__");
   if (isa_flag & OPTION_MASK_ISA_TBM)
 def_or_undef (parse_in, "__TBM__");
+  if (isa_flag & OPTION_MASK_ISA_CRC32)
+def_or_undef (parse_in, "__CRC32__");
   if (isa_flag & OPTION_MASK_ISA_POPCNT)
 def_or_undef (parse_in, "__POPCNT__");
   if (isa_flag & OPTION_MASK_ISA_FSGSBASE)
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 91da2849c49..7e59ccd988d 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2617,6 +2617,11 @@ ix86_option_override_internal (bool main_args_p,
 opts->x_ix86_isa_flags
   |= OPTION_MASK_ISA_POPCNT & ~opts->x_ix86_isa_flags_explicit;
 
+  /* Enable crc32 instruction for -msse4.2.  */
+  if (TARGET_SSE4_2_P (opts->x_ix86_isa_flags))
+opts->x_ix86_isa_flags
+  |= OPTION_MASK_ISA_CRC32 & ~opts->x_ix86_isa_flags_explicit;
+
   /* Enable lzcnt instruction for -mabm.  */
   if (TARGET_ABM_P(opts->x_ix86_isa_flags))
 opts->x_ix86_isa_flags
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9ff35d9a607..1f1d74e6275 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20998,7 +20998,7 @@
 	  [(match_operand:SI 1 "register_operand" "0")
 	   (match_operand:SWI124 2 "nonimmediate_operand" "m")]
 	  UNSPEC_CRC32))]
-  "TARGET_SSE4_2 || TARGET_CRC32"
+  "TARGET_CRC32"
   "crc32{}\t{%2, %0|%0, %2}"
   [(set_attr "type" "sselog1")
(set_attr "prefix_rep" "1")
@@ -21019,7 +21019,7 @@
 	  [(match_operand:DI 1 "register_operand" "0")
 	   (match_operand:DI 2 "nonimmediate_operand" "rm")]
 	  UNSPEC_CRC32))]
-  "TARGET_64BIT && (TARGET_SSE4_2 || TARGET_CRC32)"
+  "TARGET_64BIT && TARGET_CRC32"
   "crc32{q}\t{%2, %0|%0, %2}"
   [(set_attr "type" "sselog1")
(set_attr "prefix_rep" "1")
diff --git a/gcc/config/i386/ia32intrin.h b/gcc/config/i386/ia32intrin.h
index 591394076cc..5422b0fc9e0 100644
--- 

Re: [PATCH] libstdc++: Add -latomic to test flags for 32-bit sparc-linux

2021-04-15 Thread Jonathan Wakely via Gcc-patches

On 15/04/21 18:23 +0200, Eric Botcazou wrote:

Without this I see a number of tests failing when -m32 is used.

libstdc++-v3/ChangeLog:

* testsuite/lib/dg-options.exp (add_options_for_libatomic): Also
add libatomic options for 32-bit sparc*-*-linux-gnu.

Eric, are you OK with this? It adds -latomic and the appropriate -L
options to tests that use { dg-add-options libatomic }, because
sparcv8 needs libatomic for some std::atomic ops.


Sure, thanks.


Thanks, pushed to trunk.





[pushed] c++: lambda in default type template-argument [PR100091]

2021-04-15 Thread Jason Merrill via Gcc-patches
My patch for 99478 relied on local_variables_forbidden_p for distinguishing
between a template parameter and its default argument, but that flag wasn't
set for a default type template-argument.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/100091
PR c++/99478
* parser.c (cp_parser_default_type_template_argument): Set
parser->local_variables_forbidden_p.

gcc/testsuite/ChangeLog:

PR c++/100091
* g++.dg/cpp2a/lambda-uneval15.C: New test.
---
 gcc/cp/parser.c  | 4 
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval15.C | 5 +
 2 files changed, 9 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval15.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 3a107206318..940751b5f05 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16923,6 +16923,10 @@ cp_parser_default_type_template_argument (cp_parser 
*parser)
 
   cp_token *token = cp_lexer_peek_token (parser->lexer);
 
+  /* Tell cp_parser_lambda_expression this is a default argument.  */
+  auto lvf = make_temp_override (parser->local_variables_forbidden_p);
+  parser->local_variables_forbidden_p = LOCAL_VARS_AND_THIS_FORBIDDEN;
+
   /* Parse the default-argument.  */
   push_deferring_access_checks (dk_no_deferred);
   tree default_argument = cp_parser_type_id (parser,
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-uneval15.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-uneval15.C
new file mode 100644
index 000..ae72ea3c56b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-uneval15.C
@@ -0,0 +1,5 @@
+// PR c++/100091
+// { dg-do compile { target c++20 } }
+
+template
+void f() {}

base-commit: b5f644a98b3f3543d3a8d2dfea7785c22879013f
-- 
2.27.0



Re: [PATCH] x86: Use crc32 target option for CRC32 intrinsics

2021-04-15 Thread Uros Bizjak via Gcc-patches
On Thu, Apr 15, 2021 at 6:26 PM H.J. Lu  wrote:
>
> On Thu, Apr 15, 2021 at 9:14 AM Uros Bizjak  wrote:
> >
> > On Thu, Apr 15, 2021 at 5:11 PM H.J. Lu  wrote:
> > >
> > > Use crc32 target option for CRC32 intrinsics to support CRC32 intrinsics
> > > without enabling SSE vector instructions.
> >
> > There is no CRC32 ISA. crc32 is part of SSE4.2 [1] and current
> > situation reflects that correctly.
>
> CRC32 is similar to POPCNT which was originally in SSE4.2.   Now POPCNT

It is not similar, POPCNT has its own CPUID flag and can be enabled
independently of SSE4.2.

> is a separate feature which is also enabled by SSE4.2.   Enable CRC32 only
> with SSE4.2 makes it impossible to use CRC32 with -mgeneral-regs-only.   This
> patch addresses this issue the same way as POPCNT.

CRC32 doesn't have its own CPUID flag, so PTA_CRC32 is pointless.

OTOH, the situation is similar with MONITOR and MWAIT. These are
enabled with SSE3 and don't use XMM registers. Also somewhat similar
is FISTTP, but there is no intrinsic for this insn.

Uros.

>
> > [1] https://en.wikipedia.org/wiki/SSE4
> >
> > Uros.
> >
> > > * config/i386/gnu-property.c
> > > (file_end_indicate_exec_stack_and_gnu_property): Also check
> > > TARGET_CRC32 for GNU_PROPERTY_X86_ISA_1_V2.
> > > * config/i386/i386-c.c (ix86_target_macros_internal): Define
> > > __CRC32__ for -mcrc32.
> > > * config/i386/i386-options.c (ix86_option_override_internal):
> > > Handle PTA_CRC32.  Enable crc32 instruction for -msse4.2.
> > > * config/i386/i386.h (PTA_CRC32): New.
> > > (PTA_X86_64_V2): Add PTA_CRC32.
> > > (PTA_NEHALEM): Likewise.
> > > * config/i386/i386.md (sse4_2_crc32): Remove TARGET_SSE4_2
> > > check.
> > > (sse4_2_crc32di): Likewise.
> > > * config/i386/ia32intrin.h: Use crc32 target option for CRC32
> > > intrinsics.
> > > ---
> > >  gcc/config/i386/gnu-property.c |  1 +
> > >  gcc/config/i386/i386-c.c   |  2 ++
> > >  gcc/config/i386/i386-options.c |  8 
> > >  gcc/config/i386/i386.h |  6 --
> > >  gcc/config/i386/i386.md|  4 ++--
> > >  gcc/config/i386/ia32intrin.h   | 28 ++--
> > >  6 files changed, 31 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/gnu-property.c 
> > > b/gcc/config/i386/gnu-property.c
> > > index 4ba04403002..b6a3bdf62ce 100644
> > > --- a/gcc/config/i386/gnu-property.c
> > > +++ b/gcc/config/i386/gnu-property.c
> > > @@ -92,6 +92,7 @@ file_end_indicate_exec_stack_and_gnu_property (void)
> > >/* GNU_PROPERTY_X86_ISA_1_V2.  */
> > >if (TARGET_CMPXCHG16B
> > >   || (TARGET_64BIT && TARGET_SAHF)
> > > + || TARGET_CRC32
> > >   || TARGET_POPCNT
> > >   || TARGET_SSE3
> > >   || TARGET_SSSE3
> > > diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> > > index be46d0506ad..5ed0de006fb 100644
> > > --- a/gcc/config/i386/i386-c.c
> > > +++ b/gcc/config/i386/i386-c.c
> > > @@ -532,6 +532,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
> > >  def_or_undef (parse_in, "__LZCNT__");
> > >if (isa_flag & OPTION_MASK_ISA_TBM)
> > >  def_or_undef (parse_in, "__TBM__");
> > > +  if (isa_flag & OPTION_MASK_ISA_CRC32)
> > > +def_or_undef (parse_in, "__CRC32__");
> > >if (isa_flag & OPTION_MASK_ISA_POPCNT)
> > >  def_or_undef (parse_in, "__POPCNT__");
> > >if (isa_flag & OPTION_MASK_ISA_FSGSBASE)
> > > diff --git a/gcc/config/i386/i386-options.c 
> > > b/gcc/config/i386/i386-options.c
> > > index 91da2849c49..959ee163d2f 100644
> > > --- a/gcc/config/i386/i386-options.c
> > > +++ b/gcc/config/i386/i386-options.c
> > > @@ -2162,6 +2162,9 @@ ix86_option_override_internal (bool main_args_p,
> > > if (((processor_alias_table[i].flags & PTA_CX16) != 0)
> > > && !(opts->x_ix86_isa_flags2_explicit & 
> > > OPTION_MASK_ISA2_CX16))
> > >   opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_CX16;
> > > +   if (((processor_alias_table[i].flags & PTA_CRC32) != 0)
> > > +   && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_CRC32))
> > > + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_CRC32;
> > > if (((processor_alias_table[i].flags & (PTA_POPCNT | PTA_ABM)) != 
> > > 0)
> > > && !(opts->x_ix86_isa_flags_explicit & 
> > > OPTION_MASK_ISA_POPCNT))
> > >   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_POPCNT;
> > > @@ -2617,6 +2620,11 @@ ix86_option_override_internal (bool main_args_p,
> > >  opts->x_ix86_isa_flags
> > >|= OPTION_MASK_ISA_POPCNT & ~opts->x_ix86_isa_flags_explicit;
> > >
> > > +  /* Enable crc32 instruction for -msse4.2.  */
> > > +  if (TARGET_SSE4_2_P (opts->x_ix86_isa_flags))
> > > +opts->x_ix86_isa_flags
> > > +  |= OPTION_MASK_ISA_CRC32 & ~opts->x_ix86_isa_flags_explicit;
> > > +
> > >/* Enable lzcnt instruction for -mabm.  */
> > >if 

Re: [PATCH] x86: Use crc32 target option for CRC32 intrinsics

2021-04-15 Thread H.J. Lu via Gcc-patches
On Thu, Apr 15, 2021 at 9:14 AM Uros Bizjak  wrote:
>
> On Thu, Apr 15, 2021 at 5:11 PM H.J. Lu  wrote:
> >
> > Use crc32 target option for CRC32 intrinsics to support CRC32 intrinsics
> > without enabling SSE vector instructions.
>
> There is no CRC32 ISA. crc32 is part of SSE4.2 [1] and current
> situation reflects that correctly.

CRC32 is similar to POPCNT which was originally in SSE4.2.   Now POPCNT
is a separate feature which is also enabled by SSE4.2.   Enable CRC32 only
with SSE4.2 makes it impossible to use CRC32 with -mgeneral-regs-only.   This
patch addresses this issue the same way as POPCNT.

> [1] https://en.wikipedia.org/wiki/SSE4
>
> Uros.
>
> > * config/i386/gnu-property.c
> > (file_end_indicate_exec_stack_and_gnu_property): Also check
> > TARGET_CRC32 for GNU_PROPERTY_X86_ISA_1_V2.
> > * config/i386/i386-c.c (ix86_target_macros_internal): Define
> > __CRC32__ for -mcrc32.
> > * config/i386/i386-options.c (ix86_option_override_internal):
> > Handle PTA_CRC32.  Enable crc32 instruction for -msse4.2.
> > * config/i386/i386.h (PTA_CRC32): New.
> > (PTA_X86_64_V2): Add PTA_CRC32.
> > (PTA_NEHALEM): Likewise.
> > * config/i386/i386.md (sse4_2_crc32): Remove TARGET_SSE4_2
> > check.
> > (sse4_2_crc32di): Likewise.
> > * config/i386/ia32intrin.h: Use crc32 target option for CRC32
> > intrinsics.
> > ---
> >  gcc/config/i386/gnu-property.c |  1 +
> >  gcc/config/i386/i386-c.c   |  2 ++
> >  gcc/config/i386/i386-options.c |  8 
> >  gcc/config/i386/i386.h |  6 --
> >  gcc/config/i386/i386.md|  4 ++--
> >  gcc/config/i386/ia32intrin.h   | 28 ++--
> >  6 files changed, 31 insertions(+), 18 deletions(-)
> >
> > diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c
> > index 4ba04403002..b6a3bdf62ce 100644
> > --- a/gcc/config/i386/gnu-property.c
> > +++ b/gcc/config/i386/gnu-property.c
> > @@ -92,6 +92,7 @@ file_end_indicate_exec_stack_and_gnu_property (void)
> >/* GNU_PROPERTY_X86_ISA_1_V2.  */
> >if (TARGET_CMPXCHG16B
> >   || (TARGET_64BIT && TARGET_SAHF)
> > + || TARGET_CRC32
> >   || TARGET_POPCNT
> >   || TARGET_SSE3
> >   || TARGET_SSSE3
> > diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> > index be46d0506ad..5ed0de006fb 100644
> > --- a/gcc/config/i386/i386-c.c
> > +++ b/gcc/config/i386/i386-c.c
> > @@ -532,6 +532,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
> >  def_or_undef (parse_in, "__LZCNT__");
> >if (isa_flag & OPTION_MASK_ISA_TBM)
> >  def_or_undef (parse_in, "__TBM__");
> > +  if (isa_flag & OPTION_MASK_ISA_CRC32)
> > +def_or_undef (parse_in, "__CRC32__");
> >if (isa_flag & OPTION_MASK_ISA_POPCNT)
> >  def_or_undef (parse_in, "__POPCNT__");
> >if (isa_flag & OPTION_MASK_ISA_FSGSBASE)
> > diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> > index 91da2849c49..959ee163d2f 100644
> > --- a/gcc/config/i386/i386-options.c
> > +++ b/gcc/config/i386/i386-options.c
> > @@ -2162,6 +2162,9 @@ ix86_option_override_internal (bool main_args_p,
> > if (((processor_alias_table[i].flags & PTA_CX16) != 0)
> > && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_CX16))
> >   opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_CX16;
> > +   if (((processor_alias_table[i].flags & PTA_CRC32) != 0)
> > +   && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_CRC32))
> > + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_CRC32;
> > if (((processor_alias_table[i].flags & (PTA_POPCNT | PTA_ABM)) != 0)
> > && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_POPCNT))
> >   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_POPCNT;
> > @@ -2617,6 +2620,11 @@ ix86_option_override_internal (bool main_args_p,
> >  opts->x_ix86_isa_flags
> >|= OPTION_MASK_ISA_POPCNT & ~opts->x_ix86_isa_flags_explicit;
> >
> > +  /* Enable crc32 instruction for -msse4.2.  */
> > +  if (TARGET_SSE4_2_P (opts->x_ix86_isa_flags))
> > +opts->x_ix86_isa_flags
> > +  |= OPTION_MASK_ISA_CRC32 & ~opts->x_ix86_isa_flags_explicit;
> > +
> >/* Enable lzcnt instruction for -mabm.  */
> >if (TARGET_ABM_P(opts->x_ix86_isa_flags))
> >  opts->x_ix86_isa_flags
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 97700d797a7..c50f9ab24fa 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -2504,12 +2504,14 @@ constexpr wide_int_bitmask PTA_HRESET (0, 
> > HOST_WIDE_INT_1U << 23);
> >  constexpr wide_int_bitmask PTA_KL (0, HOST_WIDE_INT_1U << 24);
> >  constexpr wide_int_bitmask PTA_WIDEKL (0, HOST_WIDE_INT_1U << 25);
> >  constexpr wide_int_bitmask PTA_AVXVNNI (0, HOST_WIDE_INT_1U << 26);
> > +constexpr wide_int_bitmask PTA_CRC32 (0, HOST_WIDE_INT_1U << 27);
> >
> >  

Re: [PATCH] libstdc++: Add -latomic to test flags for 32-bit sparc-linux

2021-04-15 Thread Eric Botcazou
> Without this I see a number of tests failing when -m32 is used.
> 
> libstdc++-v3/ChangeLog:
> 
>   * testsuite/lib/dg-options.exp (add_options_for_libatomic): Also
>   add libatomic options for 32-bit sparc*-*-linux-gnu.
> 
> Eric, are you OK with this? It adds -latomic and the appropriate -L
> options to tests that use { dg-add-options libatomic }, because
> sparcv8 needs libatomic for some std::atomic ops.

Sure, thanks.

-- 
Eric Botcazou





Re: [PATCH] Deprecate gimple-builder.h API

2021-04-15 Thread Richard Biener
On April 15, 2021 6:08:44 PM GMT+02:00, Martin Sebor  wrote:
>On 4/15/21 5:01 AM, Richard Biener wrote:
>> This adds a deprecation note to the undocumented gimple-builder.h
>> API only used by asan and sancov.
>> 
>> Pushed.
>> 
>> 2021-04-15  Richard Biener  
>> 
>>  * gimple-builder.h: Add deprecation note.
>> ---
>>   gcc/gimple-builder.h | 2 ++
>>   1 file changed, 2 insertions(+)
>> 
>> diff --git a/gcc/gimple-builder.h b/gcc/gimple-builder.h
>> index 61cf08c8dcb..ae273ce9041 100644
>> --- a/gcc/gimple-builder.h
>> +++ b/gcc/gimple-builder.h
>> @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
>>   #ifndef GCC_GIMPLE_BUILDER_H
>>   #define GCC_GIMPLE_BUILDER_H
>>   
>> +/* ???  This API is legacy and should not be used in new code.  */
>
>What do the question marks mean? (IMO, they're misleading and might
>make the reader wonder whether the note really means what it says.)

It means the API should be removed instead which shouldn't be too hard... 

Richard. 

>Martin
>
>
>> +
>>   gassign *build_assign (enum tree_code, tree, int, tree lhs =
>NULL_TREE);
>>   gassign *build_assign (enum tree_code, gimple *, int, tree lhs =
>NULL_TREE);
>>   gassign *build_assign (enum tree_code, tree, tree, tree lhs =
>NULL_TREE);
>> 



Re: [PATCH] x86: Use crc32 target option for CRC32 intrinsics

2021-04-15 Thread Uros Bizjak via Gcc-patches
On Thu, Apr 15, 2021 at 5:11 PM H.J. Lu  wrote:
>
> Use crc32 target option for CRC32 intrinsics to support CRC32 intrinsics
> without enabling SSE vector instructions.

There is no CRC32 ISA. crc32 is part of SSE4.2 [1] and current
situation reflects that correctly.

[1] https://en.wikipedia.org/wiki/SSE4

Uros.

> * config/i386/gnu-property.c
> (file_end_indicate_exec_stack_and_gnu_property): Also check
> TARGET_CRC32 for GNU_PROPERTY_X86_ISA_1_V2.
> * config/i386/i386-c.c (ix86_target_macros_internal): Define
> __CRC32__ for -mcrc32.
> * config/i386/i386-options.c (ix86_option_override_internal):
> Handle PTA_CRC32.  Enable crc32 instruction for -msse4.2.
> * config/i386/i386.h (PTA_CRC32): New.
> (PTA_X86_64_V2): Add PTA_CRC32.
> (PTA_NEHALEM): Likewise.
> * config/i386/i386.md (sse4_2_crc32): Remove TARGET_SSE4_2
> check.
> (sse4_2_crc32di): Likewise.
> * config/i386/ia32intrin.h: Use crc32 target option for CRC32
> intrinsics.
> ---
>  gcc/config/i386/gnu-property.c |  1 +
>  gcc/config/i386/i386-c.c   |  2 ++
>  gcc/config/i386/i386-options.c |  8 
>  gcc/config/i386/i386.h |  6 --
>  gcc/config/i386/i386.md|  4 ++--
>  gcc/config/i386/ia32intrin.h   | 28 ++--
>  6 files changed, 31 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c
> index 4ba04403002..b6a3bdf62ce 100644
> --- a/gcc/config/i386/gnu-property.c
> +++ b/gcc/config/i386/gnu-property.c
> @@ -92,6 +92,7 @@ file_end_indicate_exec_stack_and_gnu_property (void)
>/* GNU_PROPERTY_X86_ISA_1_V2.  */
>if (TARGET_CMPXCHG16B
>   || (TARGET_64BIT && TARGET_SAHF)
> + || TARGET_CRC32
>   || TARGET_POPCNT
>   || TARGET_SSE3
>   || TARGET_SSSE3
> diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
> index be46d0506ad..5ed0de006fb 100644
> --- a/gcc/config/i386/i386-c.c
> +++ b/gcc/config/i386/i386-c.c
> @@ -532,6 +532,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>  def_or_undef (parse_in, "__LZCNT__");
>if (isa_flag & OPTION_MASK_ISA_TBM)
>  def_or_undef (parse_in, "__TBM__");
> +  if (isa_flag & OPTION_MASK_ISA_CRC32)
> +def_or_undef (parse_in, "__CRC32__");
>if (isa_flag & OPTION_MASK_ISA_POPCNT)
>  def_or_undef (parse_in, "__POPCNT__");
>if (isa_flag & OPTION_MASK_ISA_FSGSBASE)
> diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
> index 91da2849c49..959ee163d2f 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -2162,6 +2162,9 @@ ix86_option_override_internal (bool main_args_p,
> if (((processor_alias_table[i].flags & PTA_CX16) != 0)
> && !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_CX16))
>   opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_CX16;
> +   if (((processor_alias_table[i].flags & PTA_CRC32) != 0)
> +   && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_CRC32))
> + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_CRC32;
> if (((processor_alias_table[i].flags & (PTA_POPCNT | PTA_ABM)) != 0)
> && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_POPCNT))
>   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_POPCNT;
> @@ -2617,6 +2620,11 @@ ix86_option_override_internal (bool main_args_p,
>  opts->x_ix86_isa_flags
>|= OPTION_MASK_ISA_POPCNT & ~opts->x_ix86_isa_flags_explicit;
>
> +  /* Enable crc32 instruction for -msse4.2.  */
> +  if (TARGET_SSE4_2_P (opts->x_ix86_isa_flags))
> +opts->x_ix86_isa_flags
> +  |= OPTION_MASK_ISA_CRC32 & ~opts->x_ix86_isa_flags_explicit;
> +
>/* Enable lzcnt instruction for -mabm.  */
>if (TARGET_ABM_P(opts->x_ix86_isa_flags))
>  opts->x_ix86_isa_flags
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 97700d797a7..c50f9ab24fa 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2504,12 +2504,14 @@ constexpr wide_int_bitmask PTA_HRESET (0, 
> HOST_WIDE_INT_1U << 23);
>  constexpr wide_int_bitmask PTA_KL (0, HOST_WIDE_INT_1U << 24);
>  constexpr wide_int_bitmask PTA_WIDEKL (0, HOST_WIDE_INT_1U << 25);
>  constexpr wide_int_bitmask PTA_AVXVNNI (0, HOST_WIDE_INT_1U << 26);
> +constexpr wide_int_bitmask PTA_CRC32 (0, HOST_WIDE_INT_1U << 27);
>
>  constexpr wide_int_bitmask PTA_X86_64_BASELINE = PTA_64BIT | PTA_MMX | 
> PTA_SSE
>| PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR;
>  constexpr wide_int_bitmask PTA_X86_64_V2 = (PTA_X86_64_BASELINE
> & (~PTA_NO_SAHF))
> -  | PTA_CX16 | PTA_POPCNT | PTA_SSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_SSSE3;
> +  | PTA_CRC32 | PTA_CX16 | PTA_POPCNT | PTA_SSE3 | PTA_SSE4_1 | PTA_SSE4_2
> +  | PTA_SSSE3;
>  constexpr wide_int_bitmask PTA_X86_64_V3 = PTA_X86_64_V2
>| PTA_AVX | PTA_AVX2 | PTA_BMI | 

[committed] libstdc++: Move atomic functions to libsupc++ [PR 96657]

2021-04-15 Thread Jonathan Wakely via Gcc-patches
The changes for PR libstdc++/64735 mean that libsupc++ function might
now depend on the __exchange_and_add and __atomic_add functions defined
in config/cpu/*/atomicity.h which is not compiled into libsupc++. This
causes a link failure for some targets when trying to use libsupc++
without the rest of libstdc++.

This patch simply moves the definitions of those functions into
libsupc++ so that they are available there.

libstdc++-v3/ChangeLog:

PR libstdc++/96657
* libsupc++/Makefile.am: Add atomicity.cc here.
* src/c++98/Makefile.am: Remove it from here.
* libsupc++/Makefile.in: Regenerate.
* src/c++98/Makefile.in: Regenerate.
* testsuite/18_support/exception_ptr/96657.cc: New test.

Tested powerpc64le-linux and sparc64-linux (-m32/-m64).

Committed to trunk.

commit 6c0c7fc6236470a533675cd3cd1ebb1cc3dd112c
Author: Jonathan Wakely 
Date:   Wed Apr 14 20:48:54 2021

libstdc++: Move atomic functions to libsupc++ [PR 96657]

The changes for PR libstdc++/64735 mean that libsupc++ function might
now depend on the __exchange_and_add and __atomic_add functions defined
in config/cpu/*/atomicity.h which is not compiled into libsupc++. This
causes a link failure for some targets when trying to use libsupc++
without the rest of libstdc++.

This patch simply moves the definitions of those functions into
libsupc++ so that they are available there.

libstdc++-v3/ChangeLog:

PR libstdc++/96657
* libsupc++/Makefile.am: Add atomicity.cc here.
* src/c++98/Makefile.am: Remove it from here.
* libsupc++/Makefile.in: Regenerate.
* src/c++98/Makefile.in: Regenerate.
* testsuite/18_support/exception_ptr/96657.cc: New test.

diff --git a/libstdc++-v3/libsupc++/Makefile.am 
b/libstdc++-v3/libsupc++/Makefile.am
index 3563a3b421d..10ac4bb0124 100644
--- a/libstdc++-v3/libsupc++/Makefile.am
+++ b/libstdc++-v3/libsupc++/Makefile.am
@@ -48,6 +48,7 @@ sources = \
array_type_info.cc \
atexit_arm.cc \
atexit_thread.cc \
+   atomicity.cc \
bad_alloc.cc \
bad_array_length.cc \
bad_array_new.cc \
@@ -127,6 +128,9 @@ cp-demangle.lo: cp-demangle.c
 cp-demangle.o: cp-demangle.c
$(C_COMPILE) -DIN_GLIBCPP_V3 -Wno-error -c $<
 
+atomicity_file = ${glibcxx_srcdir}/$(ATOMICITY_SRCDIR)/atomicity.h
+atomicity.cc: ${atomicity_file}
+   $(LN_S) ${atomicity_file} ./atomicity.cc || true
 
 # AM_CXXFLAGS needs to be in each subdirectory so that it can be
 # modified in a per-library or per-sub-library way.  Need to manually
diff --git a/libstdc++-v3/src/c++98/Makefile.am 
b/libstdc++-v3/src/c++98/Makefile.am
index 2a9fc1b5f5d..0fa6ab95fb4 100644
--- a/libstdc++-v3/src/c++98/Makefile.am
+++ b/libstdc++-v3/src/c++98/Makefile.am
@@ -39,7 +39,6 @@ endif
 # particular host.
 host_sources = \
$(cow_string_host_sources) \
-   atomicity.cc \
codecvt_members.cc \
collate_members.cc \
messages_members.cc \
@@ -65,10 +64,6 @@ numeric_members.cc: ${glibcxx_srcdir}/$(CNUMERIC_CC)
 time_members.cc: ${glibcxx_srcdir}/$(CTIME_CC)
$(LN_S) ${glibcxx_srcdir}/$(CTIME_CC) . || true
 
-atomicity_file = ${glibcxx_srcdir}/$(ATOMICITY_SRCDIR)/atomicity.h
-atomicity.cc: ${atomicity_file}
-   $(LN_S) ${atomicity_file} ./atomicity.cc || true
-
 if ENABLE_DUAL_ABI
 collate_members_cow.cc: ${glibcxx_srcdir}/$(CCOLLATE_CC)
$(LN_S) ${glibcxx_srcdir}/$(CCOLLATE_CC) ./$@ || true
diff --git a/libstdc++-v3/testsuite/18_support/exception_ptr/96657.cc 
b/libstdc++-v3/testsuite/18_support/exception_ptr/96657.cc
new file mode 100644
index 000..61572668385
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/exception_ptr/96657.cc
@@ -0,0 +1,17 @@
+// { dg-options "-nodefaultlibs -lsupc++ -lgcc_s -lc" { target 
sparc*-*-linux-gnu } }
+// { dg-do link { target c++11 } }
+
+#include 
+
+void
+test01()
+{
+  // PR libstdc++/96657 undefined references in libsupc++
+  std::make_exception_ptr(1);
+}
+
+int
+main()
+{
+  test01();
+}


Re: [PATCH] Deprecate gimple-builder.h API

2021-04-15 Thread Martin Sebor via Gcc-patches

On 4/15/21 5:01 AM, Richard Biener wrote:

This adds a deprecation note to the undocumented gimple-builder.h
API only used by asan and sancov.

Pushed.

2021-04-15  Richard Biener  

* gimple-builder.h: Add deprecation note.
---
  gcc/gimple-builder.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/gimple-builder.h b/gcc/gimple-builder.h
index 61cf08c8dcb..ae273ce9041 100644
--- a/gcc/gimple-builder.h
+++ b/gcc/gimple-builder.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
  #ifndef GCC_GIMPLE_BUILDER_H
  #define GCC_GIMPLE_BUILDER_H
  
+/* ???  This API is legacy and should not be used in new code.  */


What do the question marks mean? (IMO, they're misleading and might
make the reader wonder whether the note really means what it says.)

Martin



+
  gassign *build_assign (enum tree_code, tree, int, tree lhs = NULL_TREE);
  gassign *build_assign (enum tree_code, gimple *, int, tree lhs = NULL_TREE);
  gassign *build_assign (enum tree_code, tree, tree, tree lhs = NULL_TREE);





[PATCH 2/2] Add IEEE 128-bit fp conditional move on PowerPC.

2021-04-15 Thread Michael Meissner via Gcc-patches
[PATCH 2/2] Add IEEE 128-bit fp conditional move on PowerPC.

This patch adds the support for power10 IEEE 128-bit floating point conditional
move and for automatically generating min/max.

In this patch, I simplified things.  Instead of allowing any four of the modes
to be used for the conditional move comparison and the move itself could use
different modes, I restricted the conditional move to just the same mode.
I.e. you can do:

_Float128 a, b, c, d, e, r;

r = (a == b) ? c : d;

But you can't do:

_Float128 c, d, r;
double a, b;

r = (a == b) ? c : d;

or:

_Float128 a, b;
double c, d, r;

r = (a == b) ? c : d;

This eliminates a lot of the complexity of the code, because you don't have to
worry about the sizes being different, and the IEEE 128-bit types being
restricted to Altivec registers, while the SF/DF modes can use any VSX
register.

I did not modify the existing support that allowed conditional moves where
SFmode operands are compared and DFmode operands are moved (and vice versa).

This simplification also eliminates having to insert a XXPERMDI instruction if
you are comparing 64-bit values and doing a conditional move with 128-bit
values.

I modified the test cases that I added to reflect this change.  I have also
fixed the test for not equal to use '!=' instead of '=='.

I have built bootstrap compilers on both a little endian power9 Linux system
and a big endian power8 Linux system.  There were no regressions in either
build in the test suites.  Can I check these changes into the trunk for gcc 11?

gcc/
2021-04-14 Michael Meissner  

* config/rs6000/rs6000.c (have_compare_and_set_mask): Add IEEE
128-bit floating point types.
* config/rs6000/rs6000.md (movcc, IEEE128 iterator): New insn.
(movcc_p10, IEEE128 iterator): New insn.
(movcc_invert_p10, IEEE128 iterator): New insn.
(fpmask, IEEE128 iterator): New insn.
(xxsel, IEEE128 iterator): New insn.

gcc/testsuite/
2021-04-14  Michael Meissner  

* gcc.target/powerpc/float128-cmove.c: New test.
* gcc.target/powerpc/float128-minmax-3.c: New test.
---
 gcc/config/rs6000/rs6000.c|   8 +-
 gcc/config/rs6000/rs6000.md   | 106 ++
 .../gcc.target/powerpc/float128-cmove.c   |  58 ++
 .../gcc.target/powerpc/float128-minmax-3.c|  15 +++
 4 files changed, 185 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-cmove.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8d00f99e9fd..f979d7320ad 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15706,8 +15706,8 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, 
rtx op_false,
   return 1;
 }
 
-/* Possibly emit the xsmaxcdp and xsmincdp instructions to emit a maximum or
-   minimum with "C" semantics.
+/* Possibly emit the xsmaxc{dp,qp} and xsminc{dp,qp} instructions to emit a
+   maximum or minimum with "C" semantics.
 
Unless you use -ffast-math, you can't use these instructions to replace
conditions that implicitly reverse the condition because the comparison
@@ -15843,6 +15843,10 @@ have_compare_and_set_mask (machine_mode mode)
 case E_DFmode:
   return TARGET_P9_MINMAX;
 
+case E_KFmode:
+case E_TFmode:
+  return TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode);
+
 default:
   break;
 }
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 17b2fdc1cdd..c74fc8ce6fc 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5429,6 +5429,112 @@ (define_insn "*xxsel"
   "xxsel %x0,%x4,%x3,%x1"
   [(set_attr "type" "vecmove")])
 
+;; Support for ISA 3.1 IEEE 128-bit conditional move.  The mode used in the
+;; comparison must be the same as used in the conditional move.
+(define_expand "movcc"
+   [(set (match_operand:IEEE128 0 "gpc_reg_operand")
+(if_then_else:IEEE128 (match_operand 1 "comparison_operator")
+  (match_operand:IEEE128 2 "gpc_reg_operand")
+  (match_operand:IEEE128 3 "gpc_reg_operand")))]
+  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+{
+  if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3]))
+DONE;
+  else
+FAIL;
+})
+
+(define_insn_and_split "*movcc_p10"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=,v")
+   (if_then_else:IEEE128
+(match_operator:CCFP 1 "fpmask_comparison_operator"
+   [(match_operand:IEEE128 2 "altivec_register_operand" "v,v")
+(match_operand:IEEE128 3 "altivec_register_operand" "v,v")])
+(match_operand:IEEE128 4 "altivec_register_operand" "v,v")
+(match_operand:IEEE128 5 "altivec_register_operand" "v,v")))
+   (clobber 

[PATCH] libstdc++: Add -latomic to test flags for 32-bit sparc-linux

2021-04-15 Thread Jonathan Wakely via Gcc-patches
Without this I see a number of tests failing when -m32 is used.

libstdc++-v3/ChangeLog:

* testsuite/lib/dg-options.exp (add_options_for_libatomic): Also
add libatomic options for 32-bit sparc*-*-linux-gnu.

Eric, are you OK with this? It adds -latomic and the appropriate -L
options to tests that use { dg-add-options libatomic }, because
sparcv8 needs libatomic for some std::atomic ops.



commit bbee87be3bfaa1ef19521870df998386f83d7ac2
Author: Jonathan Wakely 
Date:   Thu Apr 15 16:39:55 2021

libstdc++: Add -latomic to test flags for 32-bit sparc-linux

Without this I see a number of tests failing when -m32 is used.

libstdc++-v3/ChangeLog:

* testsuite/lib/dg-options.exp (add_options_for_libatomic): Also
add libatomic options for 32-bit sparc*-*-linux-gnu.

diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp 
b/libstdc++-v3/testsuite/lib/dg-options.exp
index 5160e4a72d1..7894973bcca 100644
--- a/libstdc++-v3/testsuite/lib/dg-options.exp
+++ b/libstdc++-v3/testsuite/lib/dg-options.exp
@@ -264,6 +264,7 @@ proc add_options_for_libatomic { flags } {
 if { [istarget hppa*-*-hpux*]
 || ([istarget powerpc*-*-*] && [check_effective_target_ilp32])
 || [istarget riscv*-*-*]
+|| ([istarget sparc*-*-linux-gnu] && [check_effective_target_ilp32])
} {
return "$flags -L../../libatomic/.libs -latomic"
 }


[PATCH 1/2] Add IEEE 128-bit min/max support on PowerPC.

2021-04-15 Thread Michael Meissner via Gcc-patches
[PATCH 1/2] Add IEEE 128-bit min/max support on PowerPC.

This patch adds the support for the IEEE 128-bit floating point C minimum and
maximum instructions.  The next patch will add the support for using the
compare and set mask instruction to implement conditional moves.

I removed the FLOAT128_MIN_MAX_FPMASK_P macro that was in the last iteration of
the patch.

I have built bootstrap compilers on both a little endian power9 Linux system
and a big endian power8 Linux system.  There were no regressions in either
build in the test suites.  Can I check these changes into the trunk for gcc 11?

gcc/
2021-04-14  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_emit_minmax): Add support for ISA
3.1 IEEE 128-bit floating point xsmaxcqp and xsmincqp instructions.
* config/rs6000/rs6000.md (s3, IEEE128 iterator):
New insns.

gcc/testsuite/
2021-04-14  Michael Meissner  

* gcc.target/powerpc/float128-minmax-2.c: New test.
---
 gcc/config/rs6000/rs6000.c|  3 ++-
 gcc/config/rs6000/rs6000.md   | 11 +++
 .../gcc.target/powerpc/float128-minmax-2.c| 15 +++
 3 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 48b8efd732b..8d00f99e9fd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16111,7 +16111,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx 
op0, rtx op1)
   /* VSX/altivec have direct min/max insns.  */
   if ((code == SMAX || code == SMIN)
   && (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
- || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode
+ || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))
+ || (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode
 {
   emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1)));
   return;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c8cdc42533c..17b2fdc1cdd 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5194,6 +5194,17 @@ (define_insn "*s3_vsx"
 }
   [(set_attr "type" "fp")])
 
+;; Min/max for ISA 3.1 IEEE 128-bit floating point
+(define_insn "s3"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
+   (fp_minmax:IEEE128
+(match_operand:IEEE128 1 "altivec_register_operand" "v")
+(match_operand:IEEE128 2 "altivec_register_operand" "v")))]
+  "TARGET_POWER10"
+  "xscqp %0,%1,%2"
+  [(set_attr "type" "vecfloat")
+   (set_attr "size" "128")])
+
 ;; The conditional move instructions allow us to perform max and min operations
 ;; even when we don't have the appropriate max/min instruction using the FSEL
 ;; instruction.
diff --git a/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c 
b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
new file mode 100644
index 000..c71ba08c9f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
@@ -0,0 +1,15 @@
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */
+
+#ifndef TYPE
+#define TYPE _Float128
+#endif
+
+/* Test that the fminf128/fmaxf128 functions generate if/then/else and not a
+   call.  */
+TYPE f128_min (TYPE a, TYPE b) { return __builtin_fminf128 (a, b); }
+TYPE f128_max (TYPE a, TYPE b) { return __builtin_fmaxf128 (a, b); }
+
+/* { dg-final { scan-assembler {\mxsmaxcqp\M} } } */
+/* { dg-final { scan-assembler {\mxsmincqp\M} } } */
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH 0/2] Add IEEE 128-bit min/max/conditional move

2021-04-15 Thread Michael Meissner via Gcc-patches
These patches add support for the XSMAXCQP, XSMINCQP, XSCMPEQQP, XSCMPGTQP, and
XSCMPGEQP instructions that were added to the PowerPC ISA 3.1 (power10).

These patches address the comments raised from the last version of the patches.

In this iteration, I simplified the first patch, eliminating a new macro.

The second patch I removed the support for conditional moves where the modes of
the operands being compared is different from the mode of the operands being
moved because this greatly complicated the patch.  This means you can do:

_Float128 a, b, c, d, r;

r = (a == b) ? c : d;

But you can't do:

_Float128 c, d, r;
double a, b;

r = (a == b) ? c : d;

I did leave in the existing support for doing this mixed conditional move
between float/double, but I did not extended it for the two IEEE 128-bit
types.

I modified the test cases that I added to reflect this change.  I have also
fixed the test for not equal to use '!=' instead of '=='.

I have built bootstrap compilers on both a little endian power9 Linux system
and a big endian power8 Linux system.  There were no regressions in either
build in the test suites.  Can I check these changes into the trunk for gcc 11?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH] testsuite: Move gimplefe40.c and gimplefe41.c

2021-04-15 Thread Robin Dapp via Gcc-patches

Hi,

the gimplefe40 and gimplefe41.c tests expect vector capabilities 
(vect_float etc.) yet are not in the vect subdirectory.  This causes 
both to be called unconditionally without prior target-specific vector 
setup normally performed by vect/vect.exp.


There is a target-specific option for powerpc

 /* { dg-additional-options "-maltivec" { target { powerpc*-*-* && 
powerpc_altivec_ok } } } */


which enables Altivec on supported targets but I'd rather not create 
another special case for s390.  I suppose the better solution is to move 
these tests to the vect subdirectory.


Is this OK?

Regards
 Robin

--

gcc/testsuite/ChangeLog:

* gcc.dg/gimplefe-40.c: Moved to...
* gcc.dg/vect/gimplefe-40.c: ...here.
* gcc.dg/gimplefe-41.c: Moved to...
* gcc.dg/vect/gimplefe-41.c: ...here.
commit 1acd4d01c41aeea78ec3c188beb25b245dda4c8b
Author: Robin Dapp 
Date:   Wed Mar 17 09:08:42 2021 +0100

[PATCH] testsuite: Move gimplefe-4[0|1] tests.

The gimplefe-40.c and gimplefe-41.c test cases require vect_* effective
targets even though they reside in gcc.dg.  By default e.g.
DEFAULT_VECTCFLAGS which is used to add target-specific machine or build
flags is only applied in the ./vect subdirectory.  Move these tests
there.

diff --git a/gcc/testsuite/gcc.dg/gimplefe-40.c b/gcc/testsuite/gcc.dg/vect/gimplefe-40.c
similarity index 100%
rename from gcc/testsuite/gcc.dg/gimplefe-40.c
rename to gcc/testsuite/gcc.dg/vect/gimplefe-40.c
diff --git a/gcc/testsuite/gcc.dg/gimplefe-41.c b/gcc/testsuite/gcc.dg/vect/gimplefe-41.c
similarity index 100%
rename from gcc/testsuite/gcc.dg/gimplefe-41.c
rename to gcc/testsuite/gcc.dg/vect/gimplefe-41.c


[PATCH] aarch64: Fix up 2 other combine opt regressions vs. GCC8 [PR100075]

2021-04-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The testcase used to be compiled at -O2 by GCC8 and earlier to:
f1:
neg w1, w0, asr 16
and w1, w1, 65535
orr w0, w1, w0, lsl 16
ret
f2:
neg w1, w0
extrw0, w1, w0, 16
ret
but since GCC9 (r9-3594 for f1 and r9-6926 for f2) we compile it into:
f1:
mov w1, w0
sbfxx0, x1, 16, 16
neg w0, w0
bfi w0, w1, 16, 16
ret
f2:
neg w1, w0
sbfxx0, x0, 16, 16
bfi w0, w1, 16, 16
ret
instead, i.e. one insn longer each.  With this patch we get:
f1:
mov w1, w0
neg w0, w1, asr 16
bfi w0, w1, 16, 16
ret
f2:
neg w1, w0
extrw0, w1, w0, 16
ret
i.e. identical f2 and same number of insns as in GCC8 in f1.
The combiner unfortunately doesn't try splitters when doing 2 -> 1
combination, so it can't be implemented as combine splitters, but
it could be implemented as define_insn_and_split if desirable.

Bootstrapped/regtested on aarch64-linux, ok for trunk?

2021-04-15  Jakub Jelinek  

PR target/100075
* config/aarch64/aarch64.md (*neg_asr_si2_extr, *extrsi5_insn_di): New
define_insn patterns.

* gcc.target/aarch64/pr100075.c: New test.

--- gcc/config/aarch64/aarch64.md.jj2021-04-15 10:45:02.798853095 +0200
+++ gcc/config/aarch64/aarch64.md   2021-04-15 13:28:04.734754364 +0200
@@ -3572,6 +3572,18 @@ (define_insn "*neg__si2_uxtw"
   [(set_attr "autodetect_type" "alu_shift__op2")]
 )
 
+(define_insn "*neg_asr_si2_extr"
+  [(set (match_operand:SI 0 "register_operand" "r")
+   (neg:SI (match_operator 4 "subreg_lowpart_operator"
+ [(sign_extract:DI
+(match_operand:DI 1 "register_operand" "r")
+(match_operand 3 "aarch64_simd_shift_imm_offset_si" "n")
+(match_operand 2 "aarch64_simd_shift_imm_offset_si" 
"n"))])))]
+  "INTVAL (operands[2]) + INTVAL (operands[3]) == 32"
+  "neg\\t%w0, %w1, asr %2"
+  [(set_attr "autodetect_type" "alu_shift_asr_op2")]
+)
+
 (define_insn "mul3"
   [(set (match_operand:GPI 0 "register_operand" "=r")
(mult:GPI (match_operand:GPI 1 "register_operand" "r")
@@ -5382,6 +5394,22 @@ (define_insn "*extrsi5_insn_uxtw_alt"
   "extr\\t%w0, %w1, %w2, %4"
   [(set_attr "type" "rotate_imm")]
 )
+
+(define_insn "*extrsi5_insn_di"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (ior:SI (ashift:SI (match_operand:SI 1 "register_operand" "r")
+  (match_operand 3 "const_int_operand" "n"))
+   (match_operator:SI 6 "subreg_lowpart_operator"
+ [(zero_extract:DI
+(match_operand:DI 2 "register_operand" "r")
+(match_operand 5 "const_int_operand" "n")
+(match_operand 4 "const_int_operand" "n"))])))]
+  "UINTVAL (operands[3]) < 32
+   && UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32
+   && UINTVAL (operands[4]) + UINTVAL (operands[5]) - 32 <= 64"
+  "extr\\t%w0, %w1, %w2, %4"
+  [(set_attr "type" "rotate_imm")]
+)
 
 (define_insn "*ror3_insn"
   [(set (match_operand:GPI 0 "register_operand" "=r")
--- gcc/testsuite/gcc.target/aarch64/pr100075.c.jj  2021-04-15 
13:23:31.188852983 +0200
+++ gcc/testsuite/gcc.target/aarch64/pr100075.c 2021-04-15 13:23:10.612086048 
+0200
@@ -0,0 +1,20 @@
+/* PR target/100075 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not {\tsbfx\tx[0-9]+, x[0-9]+, 16, 16} } } */
+/* { dg-final { scan-assembler {\tneg\tw[0-9]+, w[0-9]+, asr 16} } } */
+/* { dg-final { scan-assembler {\textr\tw[0-9]+, w[0-9]+, w[0-9]+, 16} } } */
+
+struct S { short x, y; };
+
+struct S
+f1 (struct S p)
+{
+  return (struct S) { -p.y, p.x };
+}
+
+struct S
+f2 (struct S p)
+{
+  return (struct S) { p.y, -p.x };
+}

Jakub



Patch, fortran] PR fortran/100097 PR fortran/100098 - [Unlimited] polymorphic pointers and allocatables have incorrect rank

2021-04-15 Thread José Rui Faustino de Sousa via Gcc-patches

Hi All!

Proposed patch to:

PR100097 - Unlimited polymorphic pointers and allocatables have 
incorrect rank

PR100098 - Polymorphic pointers and allocatables have incorrect rank

Patch tested only on x86_64-pc-linux-gnu.

Pointers, and allocatables, must carry TKR information even when 
undefined. The patch adds code to initialize, for both pointers and 
allocatables, the class descriptor element size, rank and type as soon 
as possible to do so.


Thank you very much.

Best regards,
José Rui

Fortran: Add missing TKR initialization to class variables [PR100097, 
PR100098]


gcc/fortran/ChangeLog:

PR fortran/100097
PR fortran/100098
* trans-array.c (gfc_trans_class_array): new function to
initialize class descriptor's TKR information.
* trans-array.h (gfc_trans_class_array): add function prototype.
* trans-decl.c (gfc_trans_deferred_vars): add calls to the new
function for both pointers and allocatables.

gcc/testsuite/ChangeLog:

PR fortran/100097
* gfortran.dg/PR100097.f90: New test.

PR fortran/100098
* gfortran.dg/PR100098.f90: New test.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index be5eb89350f..acd44a347e2 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -10808,6 +10808,52 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
 }
 
 
+/* Initialize class descriptor's TKR infomation.  */
+
+void
+gfc_trans_class_array (gfc_symbol * sym, gfc_wrapped_block * block)
+{
+  tree type, etype;
+  tree tmp;
+  tree descriptor;
+  stmtblock_t init;
+  locus loc;
+  int rank;
+
+  /* Make sure the frontend gets these right.  */
+  gcc_assert (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
+	  && (CLASS_DATA (sym)->attr.class_pointer
+		  || CLASS_DATA (sym)->attr.allocatable));
+
+  gcc_assert (VAR_P (sym->backend_decl)
+	  || TREE_CODE (sym->backend_decl) == PARM_DECL);
+
+  if (sym->attr.dummy)
+return;
+
+  descriptor = gfc_class_data_get (sym->backend_decl);
+  type = TREE_TYPE (descriptor);
+
+  if (type == NULL || !GFC_DESCRIPTOR_TYPE_P (type))
+return;
+
+  gfc_save_backend_locus ();
+  gfc_set_backend_locus (>declared_at);
+  gfc_init_block ();
+
+  rank = CLASS_DATA (sym)->as ? (CLASS_DATA (sym)->as->rank) : (0);
+  gcc_assert (rank>=0);
+  tmp = gfc_conv_descriptor_dtype (descriptor);
+  etype = gfc_get_element_type (type);
+  tmp = fold_build2_loc (input_location, MODIFY_EXPR, TREE_TYPE (tmp), tmp,
+			 gfc_get_dtype_rank_type (rank, etype));
+  gfc_add_expr_to_block (, tmp);
+
+  gfc_add_init_cleanup (block, gfc_finish_block (), NULL_TREE);
+  gfc_restore_backend_locus ();
+}
+
+
 /* NULLIFY an allocatable/pointer array on function entry, free it on exit.
Do likewise, recursively if necessary, with the allocatable components of
derived types.  This function is also called for assumed-rank arrays, which
diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h
index e4d443d7118..d2768f1be61 100644
--- a/gcc/fortran/trans-array.h
+++ b/gcc/fortran/trans-array.h
@@ -67,6 +67,8 @@ tree gfc_check_pdt_dummy (gfc_symbol *, tree, int, gfc_actual_arglist *);
 
 tree gfc_alloc_allocatable_for_assignment (gfc_loopinfo*, gfc_expr*, gfc_expr*);
 
+/* Add initialization for class descriptors  */
+void gfc_trans_class_array (gfc_symbol *, gfc_wrapped_block *);
 /* Add initialization for deferred arrays.  */
 void gfc_trans_deferred_array (gfc_symbol *, gfc_wrapped_block *);
 /* Generate an initializer for a static pointer or allocatable array.  */
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 34a0d49bae7..6a0d80bccb0 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -4929,7 +4929,7 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
   else if ((!sym->attr.dummy || sym->ts.deferred)
 		&& (sym->ts.type == BT_CLASS
 		&& CLASS_DATA (sym)->attr.class_pointer))
-	continue;
+	gfc_trans_class_array (sym, block);
   else if ((!sym->attr.dummy || sym->ts.deferred)
 		&& (sym->attr.allocatable
 		|| (sym->attr.pointer && sym->attr.result)
@@ -5013,6 +5013,10 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 		  tmp = NULL_TREE;
 		}
 
+	  /* Initialize descriptor's TKR information.  */
+	  if (sym->ts.type == BT_CLASS)
+		gfc_trans_class_array (sym, block);
+
 	  /* Deallocate when leaving the scope. Nullifying is not
 		 needed.  */
 	  if (!sym->attr.result && !sym->attr.dummy && !sym->attr.pointer
diff --git a/gcc/testsuite/gfortran.dg/PR100097.f90 b/gcc/testsuite/gfortran.dg/PR100097.f90
new file mode 100644
index 000..926eb6cc779
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR100097.f90
@@ -0,0 +1,41 @@
+! { dg-do run }
+!
+! Test the fix for PR100097
+!
+
+program main_p
+
+  implicit none
+
+  class(*), pointer :: bar_p(:)
+  class(*), allocatable :: bar_a(:)
+
+  call foo_p(bar_p)
+  call 

[committed] Make SVE ACLE tests work with --with-cpu

2021-04-15 Thread Richard Sandiford via Gcc-patches
This patch follows on from a previous one and adds -mtune=generic
to the SVE ACLE assembler tests.  These tests are pure assembly
tests (execution tests are elsewhere) and they already require
dg-additional-options to be used to add new options.  We therefore
don't need aarch64-with-arch-dg-options.

Tested on an aarch64-linux-gnu toolchain configured with
--with-cpu=a64fx, pushed to trunk.

Richard


gcc/testsuite/
* g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Add
-mtune=generic to the SVE flags.
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.
---
 .../g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp  | 2 +-
 .../g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp| 4 
 .../gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp  | 2 +-
 .../gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp| 4 
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp 
b/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
index 84ae95e2ccc..070a049c149 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
@@ -39,7 +39,7 @@ if { [check_effective_target_aarch64_sve] } {
 
 # Turn off any codegen tweaks by default that may affect expected assembly.
 # Tests relying on those should turn them on explicitly.
-set sve_flags "$sve_flags -moverride=tune=none"
+set sve_flags "$sve_flags -mtune=generic -moverride=tune=none"
 
 global gcc_runtest_parallelize_limit_minor
 if { [info exists gcc_runtest_parallelize_limit_minor] } {
diff --git 
a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp 
b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
index c3a3a01a7ed..4989818664c 100644
--- a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
@@ -37,6 +37,10 @@ if { [check_effective_target_aarch64_sve2] } {
 set sve2_flags "-march=armv8.5-a+sve2"
 }
 
+# Turn off any codegen tweaks by default that may affect expected assembly.
+# Tests relying on those should turn them on explicitly.
+set sve2_flags "$sve2_flags -mtune=generic -moverride=tune=none"
+
 set gcc_subdir [string replace $subdir 0 2 gcc]
 lappend extra_flags "-fno-ipa-icf" "-I$srcdir/$gcc_subdir/../../sve/acle/asm"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
index fcd07aaa040..35229910da8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp
@@ -39,7 +39,7 @@ if { [check_effective_target_aarch64_sve] } {
 
 # Turn off any codegen tweaks by default that may affect expected assembly.
 # Tests relying on those should turn them on explicitly.
-set sve_flags "$sve_flags -moverride=tune=none"
+set sve_flags "$sve_flags -mtune=generic -moverride=tune=none"
 
 global gcc_runtest_parallelize_limit_minor
 if { [info exists gcc_runtest_parallelize_limit_minor] } {
diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp 
b/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
index 632d3508e32..67f817dd21f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp
@@ -37,6 +37,10 @@ if { [check_effective_target_aarch64_sve2] } {
 set sve2_flags "-march=armv8.5-a+sve2"
 }
 
+# Turn off any codegen tweaks by default that may affect expected assembly.
+# Tests relying on those should turn them on explicitly.
+set sve_flags "$sve_flags -mtune=generic -moverride=tune=none"
+
 lappend extra_flags "-fno-ipa-icf"
 
 global gcc_runtest_parallelize_limit_minor


[committed] Make SVE tests work with --with-cpu

2021-04-15 Thread Richard Sandiford via Gcc-patches
A lot of the SVE assembly tests are for generic-tuned SVE codegen
and so can fail when run on a toolchain configured with --with-cpu.

This could easily be solved by forcing -mtune=generic, like we already
do for -moverride=tune=none.  However, the testsuite also has some
useful execution tests that it would be better to run with as
few flag changes as possible.  Also, the flags in $sve_flags are
printed as part of the test results, so each change to $sve_flags
results in a change to the test summaries.

This patch instead intercepts dg-options and tailors the list
of additional options based on what the test is trying to do.
It also gets rid of DEFAULT_CFLAGS, which are never useful
for these tests.

Tested on an aarch64-linux-gnu toolchain configured with
--with-cpu=a64fx, pushed to trunk.

Richard


gcc/testsuite/
* lib/gcc-defs.exp (aarch64-arch-dg-options): New procedure.
(aarch64-with-arch-dg-options): Likewise.
* g++.target/aarch64/sve/aarch64-sve.exp: Run the tests inside
aarch64-with-arch-dg-options.  Move the default architecture
flags to the final dg-runtest argument.
* gcc.target/aarch64/sve/aarch64-sve.exp: Likewise.  Dispense with
DEFAULT_CFLAGS.
* gcc.target/aarch64/sve2/aarch64-sve2.exp: Likewise.
---
 .../g++.target/aarch64/sve/aarch64-sve.exp| 10 ++-
 .../gcc.target/aarch64/sve/aarch64-sve.exp| 19 ++
 .../gcc.target/aarch64/sve2/aarch64-sve2.exp  | 14 ++---
 gcc/testsuite/lib/gcc-defs.exp| 62 +++
 4 files changed, 76 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/g++.target/aarch64/sve/aarch64-sve.exp 
b/gcc/testsuite/g++.target/aarch64/sve/aarch64-sve.exp
index d4761f2d807..2b850232229 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/aarch64-sve.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve/aarch64-sve.exp
@@ -38,12 +38,10 @@ if { [check_effective_target_aarch64_sve] } {
 set sve_flags "-march=armv8.2-a+sve"
 }
 
-# Turn off any codegen tweaks by default that may affect expected assembly.
-# Tests relying on those should turn them on explicitly.
-set sve_flags "$sve_flags -moverride=tune=none"
-
-# Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] $sve_flags ""
+aarch64-with-arch-dg-options $sve_flags {
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] "" $sve_flags
+}
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/aarch64-sve.exp 
b/gcc/testsuite/gcc.target/aarch64/sve/aarch64-sve.exp
index 1d3f56690e6..439a012ce43 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/aarch64-sve.exp
+++ b/gcc/testsuite/gcc.target/aarch64/sve/aarch64-sve.exp
@@ -28,26 +28,15 @@ if {![istarget aarch64*-*-*] } then {
 # Load support procs.
 load_lib gcc-dg.exp
 
-# If a testcase doesn't have special options, use these.
-global DEFAULT_CFLAGS
-if ![info exists DEFAULT_CFLAGS] then {
-set DEFAULT_CFLAGS " -ansi -pedantic-errors"
-}
-
 # Initialize `dg'.
 dg-init
 
-# Force SVE if we're not testing it already.
 if { [check_effective_target_aarch64_sve] } {
 set sve_flags ""
 } else {
 set sve_flags "-march=armv8.2-a+sve"
 }
 
-# Turn off any codegen tweaks by default that may affect expected assembly.
-# Tests relying on those should turn them on explicitly.
-set sve_flags "$sve_flags -moverride=tune=none"
-
 # Most of the code-quality tests are written for LP64.  Just do the
 # correctness tests for ILP32.
 if { [check_effective_target_ilp32] } {
@@ -56,9 +45,11 @@ if { [check_effective_target_ilp32] } {
 set pattern "*"
 }
 
-# Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/$pattern.\[cCS\]]] \
-$sve_flags $DEFAULT_CFLAGS
+aarch64-with-arch-dg-options $sve_flags {
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/$pattern.\[cCS\]]] \
+   "" $sve_flags
+}
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/aarch64-sve2.exp 
b/gcc/testsuite/gcc.target/aarch64/sve2/aarch64-sve2.exp
index fcff0d21899..28d61555ff2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/aarch64-sve2.exp
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/aarch64-sve2.exp
@@ -28,12 +28,6 @@ if {![istarget aarch64*-*-*] } then {
 # Load support procs.
 load_lib gcc-dg.exp
 
-# If a testcase doesn't have special options, use these.
-global DEFAULT_CFLAGS
-if ![info exists DEFAULT_CFLAGS] then {
-set DEFAULT_CFLAGS " -ansi -pedantic-errors"
-}
-
 # Initialize `dg'.
 dg-init
 
@@ -44,9 +38,11 @@ if { [check_effective_target_aarch64_sve2] } {
 set sve2_flags "-march=armv8.5-a+sve2"
 }
 
-# Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
-$sve2_flags $DEFAULT_CFLAGS
+aarch64-with-arch-dg-options $sve2_flags {
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+   "" $sve2_flags
+}
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/lib/gcc-defs.exp 

[PATCH] x86: Use crc32 target option for CRC32 intrinsics

2021-04-15 Thread H.J. Lu via Gcc-patches
Use crc32 target option for CRC32 intrinsics to support CRC32 intrinsics
without enabling SSE vector instructions.

* config/i386/gnu-property.c
(file_end_indicate_exec_stack_and_gnu_property): Also check
TARGET_CRC32 for GNU_PROPERTY_X86_ISA_1_V2.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__CRC32__ for -mcrc32.
* config/i386/i386-options.c (ix86_option_override_internal):
Handle PTA_CRC32.  Enable crc32 instruction for -msse4.2.
* config/i386/i386.h (PTA_CRC32): New.
(PTA_X86_64_V2): Add PTA_CRC32.
(PTA_NEHALEM): Likewise.
* config/i386/i386.md (sse4_2_crc32): Remove TARGET_SSE4_2
check.
(sse4_2_crc32di): Likewise.
* config/i386/ia32intrin.h: Use crc32 target option for CRC32
intrinsics.
---
 gcc/config/i386/gnu-property.c |  1 +
 gcc/config/i386/i386-c.c   |  2 ++
 gcc/config/i386/i386-options.c |  8 
 gcc/config/i386/i386.h |  6 --
 gcc/config/i386/i386.md|  4 ++--
 gcc/config/i386/ia32intrin.h   | 28 ++--
 6 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c
index 4ba04403002..b6a3bdf62ce 100644
--- a/gcc/config/i386/gnu-property.c
+++ b/gcc/config/i386/gnu-property.c
@@ -92,6 +92,7 @@ file_end_indicate_exec_stack_and_gnu_property (void)
   /* GNU_PROPERTY_X86_ISA_1_V2.  */
   if (TARGET_CMPXCHG16B
  || (TARGET_64BIT && TARGET_SAHF)
+ || TARGET_CRC32
  || TARGET_POPCNT
  || TARGET_SSE3
  || TARGET_SSSE3
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index be46d0506ad..5ed0de006fb 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -532,6 +532,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__LZCNT__");
   if (isa_flag & OPTION_MASK_ISA_TBM)
 def_or_undef (parse_in, "__TBM__");
+  if (isa_flag & OPTION_MASK_ISA_CRC32)
+def_or_undef (parse_in, "__CRC32__");
   if (isa_flag & OPTION_MASK_ISA_POPCNT)
 def_or_undef (parse_in, "__POPCNT__");
   if (isa_flag & OPTION_MASK_ISA_FSGSBASE)
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 91da2849c49..959ee163d2f 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2162,6 +2162,9 @@ ix86_option_override_internal (bool main_args_p,
if (((processor_alias_table[i].flags & PTA_CX16) != 0)
&& !(opts->x_ix86_isa_flags2_explicit & OPTION_MASK_ISA2_CX16))
  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_CX16;
+   if (((processor_alias_table[i].flags & PTA_CRC32) != 0)
+   && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_CRC32))
+ opts->x_ix86_isa_flags |= OPTION_MASK_ISA_CRC32;
if (((processor_alias_table[i].flags & (PTA_POPCNT | PTA_ABM)) != 0)
&& !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_POPCNT))
  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_POPCNT;
@@ -2617,6 +2620,11 @@ ix86_option_override_internal (bool main_args_p,
 opts->x_ix86_isa_flags
   |= OPTION_MASK_ISA_POPCNT & ~opts->x_ix86_isa_flags_explicit;
 
+  /* Enable crc32 instruction for -msse4.2.  */
+  if (TARGET_SSE4_2_P (opts->x_ix86_isa_flags))
+opts->x_ix86_isa_flags
+  |= OPTION_MASK_ISA_CRC32 & ~opts->x_ix86_isa_flags_explicit;
+
   /* Enable lzcnt instruction for -mabm.  */
   if (TARGET_ABM_P(opts->x_ix86_isa_flags))
 opts->x_ix86_isa_flags
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 97700d797a7..c50f9ab24fa 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2504,12 +2504,14 @@ constexpr wide_int_bitmask PTA_HRESET (0, 
HOST_WIDE_INT_1U << 23);
 constexpr wide_int_bitmask PTA_KL (0, HOST_WIDE_INT_1U << 24);
 constexpr wide_int_bitmask PTA_WIDEKL (0, HOST_WIDE_INT_1U << 25);
 constexpr wide_int_bitmask PTA_AVXVNNI (0, HOST_WIDE_INT_1U << 26);
+constexpr wide_int_bitmask PTA_CRC32 (0, HOST_WIDE_INT_1U << 27);
 
 constexpr wide_int_bitmask PTA_X86_64_BASELINE = PTA_64BIT | PTA_MMX | PTA_SSE
   | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR;
 constexpr wide_int_bitmask PTA_X86_64_V2 = (PTA_X86_64_BASELINE
& (~PTA_NO_SAHF))
-  | PTA_CX16 | PTA_POPCNT | PTA_SSE3 | PTA_SSE4_1 | PTA_SSE4_2 | PTA_SSSE3;
+  | PTA_CRC32 | PTA_CX16 | PTA_POPCNT | PTA_SSE3 | PTA_SSE4_1 | PTA_SSE4_2
+  | PTA_SSSE3;
 constexpr wide_int_bitmask PTA_X86_64_V3 = PTA_X86_64_V2
   | PTA_AVX | PTA_AVX2 | PTA_BMI | PTA_BMI2 | PTA_F16C | PTA_FMA | PTA_LZCNT
   | PTA_MOVBE | PTA_XSAVE;
@@ -2519,7 +2521,7 @@ constexpr wide_int_bitmask PTA_X86_64_V4 = PTA_X86_64_V3
 constexpr wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2
   | PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR;
 constexpr wide_int_bitmask PTA_NEHALEM = PTA_CORE2 | PTA_SSE4_1 | PTA_SSE4_2
-  | PTA_POPCNT;
+  | PTA_CRC32 | 

[PATCH] arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]

2021-04-15 Thread Alex Coplan via Gcc-patches
Hi all,

The PR shows two ICEs with __sync_bool_compare_and_swap and
-mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one
later on, after the CAS insn is split.

The LRA ICE occurs because the
@atomic_compare_and_swap_1 pattern attempts to tie
two output operands together (operands 0 and 1 in the third
alternative). LRA can't handle this, since it doesn't make sense for an
insn to assign to the same operand twice.

The later (post-splitting) ICE occurs because the expansion of the
cbranchsi4_scratch insn doesn't quite go according to plan. As it
stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
attempting to pass a register (neg_bval) to use as a scratch register.
However, since the RTL template has a match_scratch here,
gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx.
Since this is all happening after RA, this is doomed to fail (and we get
an ICE about the insn not matching its constraints).

It seems that the motivation for the choice of constraints in the
atomic_compare_and_swap pattern comes from an attempt to satisfy the
constraints of the cbranchsi4_scratch insn. This insn requires the
scratch register to be the same as the input register in the case that
we use a larger negative immediate (one that satisfies J, but not L).

Of course, as noted above, LRA refuses to assign two output operands to
the same register, so this was never going to work.

The solution I'm proposing here is to collapse the alternatives to the
CAS insn (allowing the two output register operands to be matched to
different registers) and to ensure that the constraints for
cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by
inserting a move to ensure the source and destination registers match if
necessary (i.e. in the case of large negative immediates).

Another notable change here is that we only do:

  emit_move_insn (neg_bval, const1_rtx);

for non-negative immediates. This is because the ADDS instruction used in
the negative case suffices to leave a suitable value in neg_bval: if the
operands compare equal, we don't take the branch (so neg_bval will be
set by the load exclusive). Otherwise, the ADDS will leave a nonzero
value in neg_bval, which will correctly signal that the CAS has failed
when it is later negated.

Testing:
 * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions.
 * Regtested an arm-eabi cross configured with --with-arch=armv8-m.base, no
 regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this config.

OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/99977
* config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
with negative immediates: ensure we expand cbranchsi4_scratch
correctly and ensure we satisfy its constraints.
* config/arm/sync.md
(@atomic_compare_and_swap_1): Don't
attempt to tie two output operands together with constraints;
collapse two alternatives.
(@atomic_compare_and_swap_1): Likewise.
* config/arm/thumb1.md (cbranchsi4_neg_late): New.

gcc/testsuite/ChangeLog:

PR target/99977
* gcc.target/arm/pr99977.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 475fb0d827f..8d19b8a73fd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30737,13 +30737,31 @@ arm_split_compare_and_swap (rtx operands[])
 }
   else
 {
-  emit_move_insn (neg_bval, const1_rtx);
   cond = gen_rtx_NE (VOIDmode, rval, oldval);
   if (thumb1_cmpneg_operand (oldval, SImode))
-   emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval,
-   label2, cond));
+   {
+ rtx src = rval;
+ if (!satisfies_constraint_L (oldval))
+   {
+ gcc_assert (satisfies_constraint_J (oldval));
+
+ /* For such immediates, ADDS needs the source and destination regs
+to be the same.
+
+Normally this would be handled by RA, but this is all happening
+after RA.  */
+ emit_move_insn (neg_bval, rval);
+ src = neg_bval;
+   }
+
+ emit_unlikely_jump (gen_cbranchsi4_neg_late (neg_bval, src, oldval,
+  label2, cond));
+   }
   else
-   emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2));
+   {
+ emit_move_insn (neg_bval, const1_rtx);
+ emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2));
+   }
 }
 
   arm_emit_store_exclusive (mode, neg_bval, mem, newval, use_release);
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index e4682c039b9..b9fa8702606 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -187,20 +187,20 @@
 ;; Constraints of this pattern must be at least as strict as those of the
 ;; cbranchsi operations in thumb1.md and aim to be 

Re: [PATCH] testsuite: Fix unroll-and-jam.c on IBM Z

2021-04-15 Thread Richard Biener via Gcc-patches
On Thu, Apr 15, 2021 at 2:51 PM Stefan Schulze Frielinghaus via
Gcc-patches  wrote:
>
> For z10 and newer inner loops are completely unrolled which leaves no
> inner loops to jam which renders this testcase to fail.  Reverting
> max-completely-peel-times to the default value fixes this testcase.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/unroll-and-jam.c: Revert max-completely-peel-times to
> the default value on IBM Z.
>
> Ok for mainline?

OK.

> ---
>  gcc/testsuite/gcc.dg/unroll-and-jam.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.dg/unroll-and-jam.c 
> b/gcc/testsuite/gcc.dg/unroll-and-jam.c
> index 7eb64217a05..b8f4f16dc74 100644
> --- a/gcc/testsuite/gcc.dg/unroll-and-jam.c
> +++ b/gcc/testsuite/gcc.dg/unroll-and-jam.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-options "-O3 -floop-unroll-and-jam -fno-tree-loop-im --param 
> unroll-jam-min-percent=0 -fdump-tree-unrolljam-details" } */
> +/* { dg-additional-options "--param max-completely-peel-times=16" { target { 
> s390*-*-* } } } */
>  /* { dg-require-effective-target int32plus } */
>
>  #include 
> --
> 2.23.0
>


Re: [pushed] c++: debug location of variable cleanups [PR88742]

2021-04-15 Thread Christophe Lyon via Gcc-patches
Hi,

On Wed, 14 Apr 2021 at 02:20, Jason Merrill via Gcc-patches
 wrote:
>
> PR49951 complained about the debugger jumping back to the declaration of a
> local variable when we run its destructor.  That was fixed in 4.7, but broke
> again in 4.8.  PR58123 fixed an inconsistency in the behavior, but not the
> jumping around.  This patch addresses the issue by setting EXPR_LOCATION on
> a cleanup destructor call to the location of the closing brace of the
> compound-statement, or whatever token ends the scope of the variable.
>
> The change to cp_parser_compound_statement is so input_location is the }
> rather than the ; of the last substatement.
>
> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> gcc/cp/ChangeLog:
>
> PR c++/88742
> PR c++/49951
> PR c++/58123
> * semantics.c (set_cleanup_locs): New.
> (do_poplevel): Call it.
> * parser.c (cp_parser_compound_statement): Consume the }
> before finish_compound_stmt.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/88742
> * g++.dg/debug/cleanup1.C: New test.
> * c-c++-common/Wimplicit-fallthrough-6.c: Adjust diagnostic line.
> * c-c++-common/Wimplicit-fallthrough-7.c: Likewise.
> * g++.dg/cpp2a/constexpr-dtor3.C: Likewise.

This change is causing a regression on arm:
FAIL: g++.dg/cpp2a/constexpr-dtor3.C  -std=c++2a  (test for warnings, line 154)

the logs contain:
[...]
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:131:22:   in 'constexpr'
expansion of '((W5*)(& w13))->W5::~W5()'
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:72:34: error: inline
assembly is not a constant expression
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:72:34: note: only
unevaluated inline assembly is allowed in a 'constexpr' function in
C++20
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:133:35:   in 'constexpr'
expansion of '((W6*))->W6::~W6()'
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:80:34: error: inline
assembly is not a constant expression
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:80:34: note: only
unevaluated inline assembly is allowed in a 'constexpr' function in
C++20
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C: At global scope:
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:156:23:   in 'constexpr'
expansion of 'f4()'
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:152:12:   in 'constexpr'
expansion of '(& w13)->W7::~W7()'
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:88:34: error: inline
assembly is not a constant expression
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:88:34: note: only
unevaluated inline assembly is allowed in a 'constexpr' function in
C++20
/gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C:165:23:   in 'constexpr'
expansion of 'f5()'
[...]

Can you check?

Thanks

Christophe

> * g++.dg/ext/constexpr-attr-cleanup1.C: Likewise.
> * g++.dg/tm/inherit2.C: Likewise.
> * g++.dg/tm/unsafe1.C: Likewise.
> * g++.dg/warn/Wimplicit-fallthrough-1.C: Likewise.
> * g++.dg/gcov/gcov-2.C: Adjust coverage counts.
> ---
>  gcc/cp/parser.c   |  5 ++-
>  gcc/cp/semantics.c| 19 +
>  .../c-c++-common/Wimplicit-fallthrough-6.c| 16 
>  .../c-c++-common/Wimplicit-fallthrough-7.c|  4 +-
>  gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C  |  4 +-
>  gcc/testsuite/g++.dg/debug/cleanup1.C | 41 +++
>  .../g++.dg/ext/constexpr-attr-cleanup1.C  |  4 +-
>  gcc/testsuite/g++.dg/gcov/gcov-2.C|  4 +-
>  gcc/testsuite/g++.dg/tm/inherit2.C|  4 +-
>  gcc/testsuite/g++.dg/tm/unsafe1.C |  4 +-
>  .../g++.dg/warn/Wimplicit-fallthrough-1.C |  4 +-
>  11 files changed, 85 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/debug/cleanup1.C
>
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 8b7801b2be7..aec3aa3587f 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -12126,11 +12126,12 @@ cp_parser_compound_statement (cp_parser *parser, 
> tree in_statement_expr,
>if (function_body)
>  maybe_splice_retval_cleanup (compound_stmt);
>
> -  /* Finish the compound-statement.  */
> -  finish_compound_stmt (compound_stmt);
>/* Consume the `}'.  */
>braces.require_close (parser);
>
> +  /* Finish the compound-statement.  */
> +  finish_compound_stmt (compound_stmt);
> +
>return compound_stmt;
>  }
>
> diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> index 8eaaaefe2d6..125772238d3 100644
> --- a/gcc/cp/semantics.c
> +++ b/gcc/cp/semantics.c
> @@ -602,6 +602,22 @@ add_decl_expr (tree decl)
>add_stmt (r);
>  }
>
> +/* Set EXPR_LOCATION of the cleanups of any CLEANUP_STMT in STMTS to LOC.  */
> +
> +static void
> +set_cleanup_locs (tree stmts, location_t loc)
> +{
> +  if (TREE_CODE (stmts) == CLEANUP_STMT)
> +{
> +  protected_set_expr_location (CLEANUP_EXPR (stmts), loc);
> +  set_cleanup_locs (CLEANUP_BODY (stmts), loc);
> +}
> +  else if (TREE_CODE 

[PATCH][pushed] docs: remove itemx for a param

2021-04-15 Thread Martin Liška
gcc/ChangeLog:

* doc/invoke.texi: Other params don't use it, remove it.
---
 gcc/doc/invoke.texi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 17551246477..096cebc8562 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13049,7 +13049,6 @@ also use other heuristics to decide whether 
if-conversion is likely to be
 profitable.
 
 @item max-rtl-if-conversion-predictable-cost
-@itemx max-rtl-if-conversion-unpredictable-cost
 RTL if-conversion will try to remove conditional branches around a block
 and replace them with conditionally executed instructions.  These parameters
 give the maximum permissible cost for the sequence that would be generated
-- 
2.31.1



Re: [PATCH] [libstdc++] Refactor/cleanup of atomic wait implementation

2021-04-15 Thread Jonathan Wakely via Gcc-patches

On 23/03/21 12:00 -0700, Thomas Rodgers wrote:

From: Thomas Rodgers 

* This patch addresses jwakely's previous feedback.
* This patch also subsumes thiago.macie...@intel.com 's 'Uncontroversial


If this part is intended as part of the commit msg let's put Thiago's
name rather than email address, but I'm assuming this preamble isn't
intended for the commit anyway.


 improvements to C++20 wait-related implementation'.
* This patch also changes the atomic semaphore implementation to avoid
 checking for any waiters before a FUTEX_WAKE op.

This is a substantial rewrite of the atomic wait/notify (and timed wait
counterparts) implementation.

The previous __platform_wait looped on EINTR however this behavior is
not required by the standard. A new _GLIBCXX_HAVE_PLATFORM_WAIT macro
now controls whether wait/notify are implemented using a platform
specific primitive or with a platform agnostic mutex/condvar. This
patch only supplies a definition for linux futexes. A future update
could add support __ulock_wait/wake on Darwin, for instance.

The members of __waiters were lifted to a new base class. The members
are now arranged such that overall sizeof(__waiters_base) fits in two
cache lines (on platforms with at least 64 byte cache lines). The
definition will also use destructive_interference_size for this if it
is available.


N.B. that makes the ABI potentially different with different
compilers, e.g. if you compile it today it will use 64, but then you
compile it with some future version of Clang that defines the
interference sizes it might use a different value. That's OK for now,
but is something to be aware of and remember.



The __waiters type is now specific to untimed waits. Timed waits have a
corresponding __timed_waiters type. Much of the code has been moved from
the previous __atomic_wait() free function to the __waiter_base template
and a __waiter derived type is provided to implement the un-timed wait
operations. A similar change has been made to the timed wait
implementation.


While reading this code I keep getting confused between __waiter
singular and __waiters plural. Would something like __waiter_pool or
__waiters_mgr work instead of __waiters?


The __atomic_spin code has been extended to take a spin policy which is
invoked after the initial busy wait loop. The default policy is to
return from the spin. The timed wait code adds a timed backoff spinning
policy. The code from  which implements this_thread::sleep_for,
sleep_until has been moved to a new  header
which allows the thread sleep code to be consumed without pulling in the
whole of .


The new header is misnamed. The existing  headers all
define std::foo, but this doesn't define std::thread::sleep* or
std::thread_sleep*. I think  would be fine, or
 if you prefer that.

The original reason I introduced  was that
 seemed too likely to clash with something in glibc or
another project using "bits" as a prefix, so I figured std_mutex.h for
std::mutex would be safer. I had the same concern for 
and so that's  too, but I think thread_sleep is
probably sufficiently un-clashy, and this_thread_sleep definitely so.




The entry points into the wait/notify code have been restructured to
support either -
  * Testing the current value of the atomic stored at the given address
and waiting on a notification.
  * Applying a predicate to determine if the wait was satisfied.
The entry points were renamed to make it clear that the wait and wake
operations operate on addresses. The first variant takes the expected
value and a function which returns the current value that should be used
in comparison operations, these operations are named with a _v suffix
(e.g. 'value'). All atomic<_Tp> wait/notify operations use the first
variant. Barriers, latches and semaphores use the predicate variant.

This change also centralizes what it means to compare values for the
purposes of atomic::wait rather than scattering through individual
predicates.


I like this a lot more, thanks.



diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 2dc00676054..2e46691c59a 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -1017,8 +1015,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  wait(const _Tp* __ptr, _Val<_Tp> __old,
   memory_order __m = memory_order_seq_cst) noexcept
  {
-   std::__atomic_wait(__ptr, __old,
-   [=]() { return load(__ptr, __m) == __old; });
+   std::__atomic_wait_address_v(__ptr, __old,
+   [__ptr, __m]() { return load(__ptr, __m); });


Pre-existing, but __ptr is dependent here so this needs to call
__atomic_impl::load to prevent ADL.




diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index a0c5ef4374e..4b876236d2b 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -36,6 +36,7 @@

#if 

Re: [Patch, fortran] PR fortran/100094 - Undefined pointers have incorrect rank when using optimization

2021-04-15 Thread Tobias Burnus

On 15.04.21 13:56, José Rui Faustino de Sousa via Gcc-patches wrote:


Proposed patch to:
PR100094 - Undefined pointers have incorrect rank when using optimization
Patch tested only on x86_64-pc-linux-gnu.


LGTM - thanks!

Tobias


Pointers, and allocatables, must carry TKR information even when
undefined. The patch adds code to initialize both pointers and
allocatables element size, rank and type as soon as possible to do so.
Latter initialization will work for allocatables, but not for pointers
since one can not test meaningfully the association status of
undefined pointers.

Thank you very much.

Best regards,
José Rui

Fortran: Add missing TKR initialization [PR100094]

gcc/fortran/ChangeLog:

PR fortran/100094
* trans-array.c (gfc_trans_deferred_array): Add code to initialize
pointers and allocatables with correct TKR parameters.

gcc/testsuite/ChangeLog:

PR fortran/100094
* gfortran.dg/PR100094.f90: New test.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-04-15 Thread David Malcolm via Gcc-patches
On Thu, 2021-04-15 at 11:45 +0200, Jan Hubicka wrote:
> Hi,
> this is patch fixing the underlying issue of function missing
> lto_prepare_function_for_streaming because gimple_has_body_p is not
> the
> same thing as node.has_gimple_body (which needs to be clarified next
> stage1 by finding better names for this I suppose).
> 
> I commited it to gcc 11 even though we already have your workaround
> since it is small and safe and it may save some pain when backporting
> changes to the branch in future - basically all passes at WPA
> renumbering statements would hit this issue which is not that obvious
> to
> debug as we found :)
> 

I think it's just the analyzer that's affected in gcc 11 (and plugins,
I suppose), hence I went with the localized fix, but it's your call.


> We may backport it to gcc10 too if you preffer it over your fix - I
> think both are fine in general for release branches.
> 

The analyzer started changing stmt uids in gcc 11 (specifically in
b0702ac5588333e27d7ec43d21d704521f7a05c6, on 2020-10-27), so I think
the fix would only affect plugins in older releases.

Dave


> lto-bootstrapped/regtested x86_64-linux.
> 
> Honza
> 
> 2021-04-15  Jan Hubicka  
> 
> PR lto/98599
> * lto.c (lto_wpa_write_files): Fix handling of clones.
> 
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index ceb61bb300b..5903f75ac23 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -306,7 +306,7 @@ lto_wpa_write_files (void)
>    cgraph_node *node;
>    /* Do body modifications needed for streaming before we fork out
>   worker processes.  */
> -  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> +  FOR_EACH_FUNCTION (node)
>  if (!node->clone_of && gimple_has_body_p (node->decl))
>    lto_prepare_function_for_streaming (node);
>  
> 




P1 patch ping

2021-04-15 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping this patch, it is one of the last 4 P1s we have for GCC11.

Thanks.

On Thu, Apr 08, 2021 at 04:15:42PM -0600, Martin Sebor via Gcc-patches wrote:
> PR c/99420 - bogus -Warray-parameter on a function redeclaration in function 
> scope
> PR c/99972 - missing -Wunused-result on a call to a locally redeclared 
> warn_unused_result function
> 
> gcc/c/ChangeLog:
> 
>   PR c/99420
>   PR c/99972
>   * c-decl.c (pushdecl): Always propagate type attribute.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/99420
>   PR c/99972
>   * gcc.dg/Warray-parameter-9.c: New test.
>   * gcc.dg/Wnonnull-6.c: New test.
>   * gcc.dg/Wreturn-type3.c: New test.
>   * gcc.dg/Wunused-result.c: New test.
>   * gcc.dg/attr-noreturn.c: New test.
>   * gcc.dg/attr-returns-nonnull.c: New test.

Jakub



[committed] testsuite: enable pr86058.c also on i?86-*-* [PR100073]

2021-04-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Apr 14, 2021 at 07:50:37PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 14, 2021 at 10:49:42AM -0600, Martin Sebor via Gcc-patches wrote:
> > Apparently the IL GCC emits on some targets (arm and aarach64 with
> > mabi=ilp32, and powerpc64 to name the three where the failures have
> > been pointed out) isn't handled by the uninit pass and so it doesn't
> > issue the expected warning.  That might be a new (as in previously
> > unknown) limitation in the warning or one I don't remember coming
> > across.
> > 
> > I don't see excess warnings with my arm-eabi cross-compiler.  What
> > are they in your environment?
> > 
> > I have limited the test to just x86_64 for now and repurposed pr100073
> > where the same failure was reported on powerpc64 to track the missing
> > warning on these targets.
> 
> +   The test fails on a number of non-x86_64 targets due to pr100073.
> +   { dg-do compile { target x86_64-*-* } }
> 
> change is incorrect.

I have tested it and the test works the same for -m64/-m32/-mx32, therefore
I chose:
> or you mean x86_64 -m64/-mx32/-m32, then it should be
> { i?86-*-* x86_64-*-* }

Tested on x86_64-linux and i686-linux, committed to trunk.

2021-04-15  Jakub Jelinek  

PR testsuite/100073
* gcc.dg/pr86058.c: Enable also on i?86-*-*.

--- gcc/testsuite/gcc.dg/pr86058.c.jj   2021-04-15 10:40:33.449919170 +0200
+++ gcc/testsuite/gcc.dg/pr86058.c  2021-04-15 14:04:02.247335188 +0200
@@ -1,7 +1,7 @@
 /* PR middle-end/86058 - TARGET_MEM_REF causing incorrect message for
-Wmaybe-uninitialized warning
-   The test fails on a number of non-x86_64 targets due to pr100073.
-   { dg-do compile { target x86_64-*-* } }
+   The test fails on a number of non-x86 targets due to pr100073.
+   { dg-do compile { target i?86-*-* x86_64-*-* } }
{ dg-options "-O2 -Wuninitialized -Wmaybe-uninitialized" } */
 
 extern void foo (int *);


Jakub



[PATCH] testsuite: Fix unroll-and-jam.c on IBM Z

2021-04-15 Thread Stefan Schulze Frielinghaus via Gcc-patches
For z10 and newer inner loops are completely unrolled which leaves no
inner loops to jam which renders this testcase to fail.  Reverting
max-completely-peel-times to the default value fixes this testcase.

gcc/testsuite/ChangeLog:

* gcc.dg/unroll-and-jam.c: Revert max-completely-peel-times to
the default value on IBM Z.

Ok for mainline?

---
 gcc/testsuite/gcc.dg/unroll-and-jam.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/unroll-and-jam.c 
b/gcc/testsuite/gcc.dg/unroll-and-jam.c
index 7eb64217a05..b8f4f16dc74 100644
--- a/gcc/testsuite/gcc.dg/unroll-and-jam.c
+++ b/gcc/testsuite/gcc.dg/unroll-and-jam.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O3 -floop-unroll-and-jam -fno-tree-loop-im --param 
unroll-jam-min-percent=0 -fdump-tree-unrolljam-details" } */
+/* { dg-additional-options "--param max-completely-peel-times=16" { target { 
s390*-*-* } } } */
 /* { dg-require-effective-target int32plus } */
 
 #include 
-- 
2.23.0



Re: [Patch, fortran] 99307 - FAIL: gfortran.dg/class_assign_4.f90 execution test

2021-04-15 Thread Paul Richard Thomas via Gcc-patches
Pushed to master in commit 9a0e09f3dd5339bb18cc47317f2298d9157ced29

Thanks

Paul


On Wed, 14 Apr 2021 at 14:51, Tobias Burnus  wrote:

> On 11.04.21 09:05, Paul Richard Thomas wrote:
> > Tobias noticed a major technical fault with the resubmission below: I
> > forgot to attach the patch :-(
>
> LGTM. Plus as remarked in the first review: 'trans-expr_c' typo needs to
> be fixed (ChangeLog).
>
> Tobias
>
> >
> > Please find it attached this time.
> >
> > Paul
> >
> > On Tue, 6 Apr 2021 at 18:08, Paul Richard Thomas
> > mailto:paul.richard.tho...@gmail.com>>
> > wrote:
> >
> > Hi Tobias,
> >
> > I believe that the attached fixes the problems that you found with
> > gfc_find_and_cut_at_last_class_ref.
> >
> > I will test:
> >type1%type%array_class2 → NULL is returned  (why?)
> >class1%type%array_class2 → ts = class1 but array2_class is used
> > later on (ups!)
> >class1%...%scalar_class2 → ts = class1 but scalar_class2 is used
> >
> > The ChangeLogs remain the same, apart from the date.
> >
> > Regtests OK on FC33/x86_64.
> >
> > Paul
> >
> >
> > On Mon, 29 Mar 2021 at 14:58, Tobias Burnus
> > mailto:tob...@codesourcery.com>> wrote:
> >
> > Hi all,
> >
> > as preremark I want to note that the testcase class_assign_4.f90
> > was added for PR83118/PR96012 (fixes problems in handling
> > class objects, Dec 18, 2020)
> > and got revised for PR99124 (class defined operators, Feb 23,
> > 2021).
> > Both patches were then also applied to GCC 9 and 10.
> >
> > On 26.03.21 17:30, Paul Richard Thomas via Gcc-patches wrote:
> > > This patch comes in two versions: submit.diff with
> > Change.Logs or
> > > submit2.diff with Change2.Logs.
> > > The first fixes the problem by changing array temporaries
> > from class
> > > expressions into class temporaries. This permits the use of
> > > gfc_get_class_from_expr to obtain the vptr for these
> > temporaries and all
> > > the good things that come with that when handling dynamic
> > types. The second
> > > part of the fix is to use the array element length from the
> > class
> > > descriptor, when reallocating on assignment. This is needed
> > because the
> > > vptr is being set too early. I will set about trying to
> > track down why this
> > > is happening and fix it after release.
> > >
> > > The second version does the same as the first but puts in
> > place a load of
> > > tidying up that is permitted by the fix to class array
> > temporaries.
> >
> > > I couldn't readily see how to prepare a testcase - ideas?
> > > Both regtest on FC33/x86_64. The first was tested by
> > Dominique (see the
> > > PR). OK for master?
> >
> > Typo – underscore-'c' should be a dot-'c' – both changelog files
> >
> > >   * trans-expr_c (gfc_trans_scalar_assign): Make use of
> > pre and
> >
> > I think the second longer version is nicer in general, but at
> > least for
> > GCC 9/GCC10 the first version is simpler and, hence, less
> > error prone.
> >
> > As you only ask about mainline, I would prefer the second one.
> >
> > However, I am not happy about gfc_find_and_cut_at_last_class_ref:
> >
> > > + of refs following. If ts is non-null the cut is at the
> > class entity
> > > + or component that is followed by an array reference, which
> > is not +
> > > an element. */ ... + + if (ts) + { + if (e->symtree + &&
> > > e->symtree->n.sym->ts.type == BT_CLASS) + *ts =
> > > >symtree->n.sym->ts; + else + *ts = NULL; + } + for (ref
> > = e->ref;
> > > ref; ref = ref->next) { + if (ts && ref->type ==
> > REF_COMPONENT + &&
> > > ref->u.c.component->ts.type == BT_CLASS + && ref->next &&
> > > ref->next->type == REF_COMPONENT + && strcmp
> > > (ref->next->u.c.component->name, "_data") == 0 + &&
> > ref->next->next +
> > > && ref->next->next->type == REF_ARRAY + &&
> > ref->next->next->u.ar.type
> > > != AR_ELEMENT) + { + *ts = >u.c.component->ts; +
> > class_ref = ref;
> > > + break; + } + + if (ts && *ts == NULL) + return NULL; +
> > Namely, if there is:
> >type1%array_class2 → array_class2 is used for 'ts' and
> > later (ok)
> >type1%type%array_class2 → NULL is returned  (why?)
> >class1%type%array_class2 → ts = class1 but array2_class is
> > used later on (ups!)
> >class1%...%scalar_class2 → ts = class1 but scalar_class2 is
> > used
> > etc.
> >
> > Thus this either needs to be cleaned up (separate 'ref' loop for
> > ts != NULL) – including 

[Patch, fortran] PR fortran/100094 - Undefined pointers have incorrect rank when using optimization

2021-04-15 Thread José Rui Faustino de Sousa via Gcc-patches

Hi All!

Proposed patch to:

PR100094 - Undefined pointers have incorrect rank when using optimization

Patch tested only on x86_64-pc-linux-gnu.

Pointers, and allocatables, must carry TKR information even when 
undefined. The patch adds code to initialize both pointers and 
allocatables element size, rank and type as soon as possible to do so. 
Latter initialization will work for allocatables, but not for pointers 
since one can not test meaningfully the association status of undefined 
pointers.


Thank you very much.

Best regards,
José Rui

Fortran: Add missing TKR initialization [PR100094]

gcc/fortran/ChangeLog:

PR fortran/100094
* trans-array.c (gfc_trans_deferred_array): Add code to initialize
pointers and allocatables with correct TKR parameters.

gcc/testsuite/ChangeLog:

PR fortran/100094
* gfortran.dg/PR100094.f90: New test.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index be5eb89350f..2bd69724366 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -10920,6 +10920,20 @@ gfc_trans_deferred_array (gfc_symbol * sym, gfc_wrapped_block * block)
 	}
 }
 
+  /* Set initial TKR for pointers and allocatables */
+  if (GFC_DESCRIPTOR_TYPE_P (type)
+  && (sym->attr.pointer || sym->attr.allocatable))
+{
+  tree etype;
+
+  gcc_assert (sym->as && sym->as->rank>=0);
+  tmp = gfc_conv_descriptor_dtype (descriptor);
+  etype = gfc_get_element_type (type);
+  tmp = fold_build2_loc (input_location, MODIFY_EXPR,
+  			 TREE_TYPE (tmp), tmp,
+  			 gfc_get_dtype_rank_type (sym->as->rank, etype));
+  gfc_add_expr_to_block (, tmp);
+}
   gfc_restore_backend_locus ();
   gfc_init_block ();
 
diff --git a/gcc/testsuite/gfortran.dg/PR100094.f90 b/gcc/testsuite/gfortran.dg/PR100094.f90
new file mode 100644
index 000..f2f7f1631dc
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR100094.f90
@@ -0,0 +1,37 @@
+! { dg-do run }
+!
+! Test the fix for PR100094
+!
+
+program foo_p
+
+  implicit none
+
+  integer, parameter :: n = 11
+  
+  integer, pointer :: pout(:)
+  integer,  target :: a(n)
+  integer  :: i
+  
+  a = [(i, i=1,n)]
+  call foo(pout)
+  if(.not.associated(pout)) stop 1
+  if(.not.associated(pout, a)) stop 2
+  if(any(pout/=a)) stop 3
+  stop
+
+contains
+
+  subroutine foo(that)
+integer, pointer, intent(out) :: that(..)
+
+select rank(that)
+rank(1)
+  that => a
+rank default
+  stop 4
+end select
+return
+  end subroutine foo
+
+end program foo_p


Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-04-15 Thread Richard Biener
On Thu, 15 Apr 2021, Xionghu Luo wrote:

> Thanks,
> 
> On 2021/4/14 14:41, Richard Biener wrote:
> >> "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
> >> but it moves #538 first, then #235, there is strong dependency here. It
> >> seemsdoesn't like the LCM framework that could solve all and do the
> >> delete-insert in one iteration.
> > So my question was whether we want to do both within the LCM store
> > sinking framework.  The LCM dataflow is also used by RTL PRE which
> > handles both loads and non-loads so in principle it should be able
> > to handle stores and non-stores for the sinking case (PRE on the
> > reverse CFG).
> > 
> > A global dataflow is more powerful than any local ad-hoc method.
> 
> My biggest concern is whether the LCM DF framework could support sinking
> *multiple* reverse-dependent non-store instructions together by *one*
> calling of LCM DF.   If this is not supported, we need run multiple LCM
> until no new changes, it would be time consuming obviously (unless
> compiling time is not important here).

As said it is used for PRE and there it most definitely can do that.

> 
> > 
> > Richard.
> > 
> >> However, there are still some common methods could be shared, like the
> >> def-use check(though store-motion is per bb, rtl-sink is per loop),
> >> insert_store, commit_edge_insertions etc.
> >>
> >>
> >>508: L508:
> >>507: NOTE_INSN_BASIC_BLOCK 34
> >> 12: r139:DI=r140:DI
> >>REG_DEAD r140:DI
> >>240: L240:
> >>231: NOTE_INSN_BASIC_BLOCK 35
> >>232: r142:DI=zero_extend(r139:DI#0)
> >>233: r371:SI=r142:DI#0-0x1
> >>234: r243:DI=zero_extend(r371:SI)
> >>REG_DEAD r371:SI
> >>235: r452:DI=r262:DI+r139:DI
> >>538: r194:DI=r452:DI
> >>236: r372:CCUNS=cmp(r142:DI#0,r254:DI#0)
> 
> 
> Like here, Each instruction's dest reg is calculated in the input vector
> bitmap, after solving the equations by calling pre_edge_rev_lcm, 
> move #538 out of loop for the first call, then move #235 out of loop
> after a second call... 4 repeat calls needed in total here, is the LCM
> framework smart enough to move the all 4 instruction within one iteration?
> I am worried that the input vector bitmap couldn't solve the dependency
> problem for two back chained instructions.
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] Remove gimplify_buildN API use from complex lowering

2021-04-15 Thread Richard Biener
This removes the legacy gimplify_buildN API use from complex lowering.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress, queued for 
stage1

2021-04-15  Richard Biener  

* tree-complex.c: Include gimple-fold.h.
(expand_complex_addition): Use gimple_build.
(expand_complex_multiplication_components): Likewise.
(expand_complex_multiplication): Likewise.
(expand_complex_div_straight): Likewise.
(expand_complex_div_wide): Likewise.
(expand_complex_division): Likewise.
(expand_complex_conjugate): Likewise.
(expand_complex_comparison): Likewise.
---
 gcc/tree-complex.c | 232 ++---
 1 file changed, 132 insertions(+), 100 deletions(-)

diff --git a/gcc/tree-complex.c b/gcc/tree-complex.c
index b11da01a58b..d7d991714de 100644
--- a/gcc/tree-complex.c
+++ b/gcc/tree-complex.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-hasher.h"
 #include "cfgloop.h"
 #include "cfganal.h"
+#include "gimple-fold.h"
 
 
 /* For each complex ssa name, a lattice value.  We're interested in finding
@@ -916,25 +917,27 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree 
inner_type,
 complex_lattice_t al, complex_lattice_t bl)
 {
   tree rr, ri;
+  gimple_seq stmts = NULL;
+  location_t loc = gimple_location (gsi_stmt (*gsi));
 
   switch (PAIR (al, bl))
 {
 case PAIR (ONLY_REAL, ONLY_REAL):
-  rr = gimplify_build2 (gsi, code, inner_type, ar, br);
+  rr = gimple_build (, loc, code, inner_type, ar, br);
   ri = ai;
   break;
 
 case PAIR (ONLY_REAL, ONLY_IMAG):
   rr = ar;
   if (code == MINUS_EXPR)
-   ri = gimplify_build2 (gsi, MINUS_EXPR, inner_type, ai, bi);
+   ri = gimple_build (, loc, MINUS_EXPR, inner_type, ai, bi);
   else
ri = bi;
   break;
 
 case PAIR (ONLY_IMAG, ONLY_REAL):
   if (code == MINUS_EXPR)
-   rr = gimplify_build2 (gsi, MINUS_EXPR, inner_type, ar, br);
+   rr = gimple_build (, loc, MINUS_EXPR, inner_type, ar, br);
   else
rr = br;
   ri = ai;
@@ -942,23 +945,23 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree 
inner_type,
 
 case PAIR (ONLY_IMAG, ONLY_IMAG):
   rr = ar;
-  ri = gimplify_build2 (gsi, code, inner_type, ai, bi);
+  ri = gimple_build (, loc, code, inner_type, ai, bi);
   break;
 
 case PAIR (VARYING, ONLY_REAL):
-  rr = gimplify_build2 (gsi, code, inner_type, ar, br);
+  rr = gimple_build (, loc, code, inner_type, ar, br);
   ri = ai;
   break;
 
 case PAIR (VARYING, ONLY_IMAG):
   rr = ar;
-  ri = gimplify_build2 (gsi, code, inner_type, ai, bi);
+  ri = gimple_build (, loc, code, inner_type, ai, bi);
   break;
 
 case PAIR (ONLY_REAL, VARYING):
   if (code == MINUS_EXPR)
goto general;
-  rr = gimplify_build2 (gsi, code, inner_type, ar, br);
+  rr = gimple_build (, loc, code, inner_type, ar, br);
   ri = bi;
   break;
 
@@ -966,19 +969,20 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree 
inner_type,
   if (code == MINUS_EXPR)
goto general;
   rr = br;
-  ri = gimplify_build2 (gsi, code, inner_type, ai, bi);
+  ri = gimple_build (, loc, code, inner_type, ai, bi);
   break;
 
 case PAIR (VARYING, VARYING):
 general:
-  rr = gimplify_build2 (gsi, code, inner_type, ar, br);
-  ri = gimplify_build2 (gsi, code, inner_type, ai, bi);
+  rr = gimple_build (, loc, code, inner_type, ar, br);
+  ri = gimple_build (, loc, code, inner_type, ai, bi);
   break;
 
 default:
   gcc_unreachable ();
 }
 
+  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
   update_complex_assignment (gsi, rr, ri);
 }
 
@@ -1059,26 +1063,26 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree 
type, tree ar, tree ai,
components of the result into RR and RI.  */
 
 static void
-expand_complex_multiplication_components (gimple_stmt_iterator *gsi,
-tree type, tree ar, tree ai,
-tree br, tree bi,
-tree *rr, tree *ri)
+expand_complex_multiplication_components (gimple_seq *stmts, location_t loc,
+ tree type, tree ar, tree ai,
+ tree br, tree bi,
+ tree *rr, tree *ri)
 {
   tree t1, t2, t3, t4;
 
-  t1 = gimplify_build2 (gsi, MULT_EXPR, type, ar, br);
-  t2 = gimplify_build2 (gsi, MULT_EXPR, type, ai, bi);
-  t3 = gimplify_build2 (gsi, MULT_EXPR, type, ar, bi);
+  t1 = gimple_build (stmts, loc, MULT_EXPR, type, ar, br);
+  t2 = gimple_build (stmts, loc, MULT_EXPR, type, ai, bi);
+  t3 = gimple_build (stmts, loc, MULT_EXPR, type, ar, bi);
 
   /* Avoid expanding redundant multiplication for the common
  case of squaring 

[PATCH] Remove gimplify_buildN API use from phiopt

2021-04-15 Thread Richard Biener
This removes use of the legacy gimplify_buildN API from phiopt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1

2021-04-15  Richard Biener  

* tree-ssa-phiopt.c (two_value_replacement): Remove use
of legacy gimplify_buildN API.
---
 gcc/tree-ssa-phiopt.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 13e5c4971d2..35ce51e5977 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -752,16 +752,16 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
 }
 
   tree arg = wide_int_to_tree (type, a);
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
-  if (!useless_type_conversion_p (type, TREE_TYPE (lhs)))
-lhs = gimplify_build1 (, NOP_EXPR, type, lhs);
+  gimple_seq stmts = NULL;
+  lhs = gimple_convert (, type, lhs);
   tree new_rhs;
   if (code == PLUS_EXPR)
-new_rhs = gimplify_build2 (, PLUS_EXPR, type, lhs, arg);
+new_rhs = gimple_build (, PLUS_EXPR, type, lhs, arg);
   else
-new_rhs = gimplify_build2 (, MINUS_EXPR, type, arg, lhs);
-  if (!useless_type_conversion_p (TREE_TYPE (arg0), type))
-new_rhs = gimplify_build1 (, NOP_EXPR, TREE_TYPE (arg0), new_rhs);
+new_rhs = gimple_build (, MINUS_EXPR, type, arg, lhs);
+  new_rhs = gimple_convert (, TREE_TYPE (arg0), new_rhs);
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gsi_insert_seq_before (, stmts, GSI_SAME_STMT);
 
   replace_phi_edge_with_variable (cond_bb, e1, phi, new_rhs);
 
-- 
2.26.2


[PATCH] Deprecate gimple-builder.h API

2021-04-15 Thread Richard Biener
This adds a deprecation note to the undocumented gimple-builder.h
API only used by asan and sancov.

Pushed.

2021-04-15  Richard Biener  

* gimple-builder.h: Add deprecation note.
---
 gcc/gimple-builder.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/gimple-builder.h b/gcc/gimple-builder.h
index 61cf08c8dcb..ae273ce9041 100644
--- a/gcc/gimple-builder.h
+++ b/gcc/gimple-builder.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_GIMPLE_BUILDER_H
 #define GCC_GIMPLE_BUILDER_H
 
+/* ???  This API is legacy and should not be used in new code.  */
+
 gassign *build_assign (enum tree_code, tree, int, tree lhs = NULL_TREE);
 gassign *build_assign (enum tree_code, gimple *, int, tree lhs = NULL_TREE);
 gassign *build_assign (enum tree_code, tree, tree, tree lhs = NULL_TREE);
-- 
2.26.2


Re: [Patch, fortran] PR fortran/84006, PR fortran/100027 - ICE on storage_size with polymorphic argument

2021-04-15 Thread Tobias Burnus

Hi José,

first, I think you did not yet commit the approved patch for PR100018,
did you?

On 11.04.21 02:34, José Rui Faustino de Sousa via Fortran wrote:

Proposed patch to:
PR84006 - [8/9/10/11 Regression] ICE in storage_size() with CLASS entity
PR100027 - ICE on storage_size with polymorphic argument

Patch tested only on x86_64-pc-linux-gnu.


LGTM – however, I think it would be useful to also test polymorphic
components
– and to check whether the result comes out right, especially as you
already have a dg-do run test.

Hence, how about replacing that testcase by the extended attached testcase?

Tobias


Add branch to if clause to handle polymorphic objects, not sure if I
got all possible variations...

Thank you very much.

Best regards,
José Rui

Fortran: Fix ICE using storage_size intrinsic [PR84006, PR100027]

gcc/fortran/ChangeLog:

PR fortran/84006
PR fortran/100027
* trans-intrinsic.c (gfc_conv_intrinsic_storage_size): add if
clause branch to handle polymorphic objects.

gcc/testsuite/ChangeLog:

PR fortran/84006
* gfortran.dg/PR84006.f90: New test.

PR fortran/100027
* gfortran.dg/PR100027.f90: New test.


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
! { dg-do run }
!

program foo_p

  implicit none

  integer, parameter :: n = 11
  integer, parameter :: foo_size = storage_size(n)*4
  integer, parameter :: bar_size = storage_size(n)*(4+8)
  
  type :: foo_t
integer :: arr1(4)
  end type foo_t

  type, extends(foo_t) :: bar_t
integer :: arr2(8)
  end type bar_t

  type box_t
class(foo_t), allocatable :: x, y(:)
  end type box_t

  class(*), pointer :: apu(:)
  class(foo_t), pointer :: apf(:)
  class(bar_t), pointer :: apb(:)
  type(foo_t),  target :: atf(n)
  type(bar_t),  target :: atb(n)
  type(box_t), target :: aa, bb

  integer :: m
  
  apu => atb
  m = storage_size(apu)
  if (m /= bar_size) stop
  apu => atf
  m = storage_size(apu)
  if (m /= foo_size) stop
  apf => atb
  m = storage_size(apf)
  if (m /= bar_size) stop
  apf => atf
  m = storage_size(apf)
  if (m /= foo_size) stop
  apb => atb
  m = storage_size(apb)
  if (m /= bar_size) stop

  allocate(foo_t :: aa%x, aa%y(1))
  allocate(bar_t :: bb%x, bb%y(1))
  if (storage_size(aa%x) /= foo_size) stop
  if (storage_size(aa%y) /= foo_size) stop
  if (storage_size(bb%x) /= bar_size) stop
  if (storage_size(bb%y) /= bar_size) stop

  apu => bb%y
  m = storage_size(apu)
  if (m /= bar_size) stop
  apu => aa%y
  m = storage_size(apu)
  if (m /= foo_size) stop
  apf => bb%y
  m = storage_size(apf)
  if (m /= bar_size) stop
  apf => aa%y
  m = storage_size(apf)
  if (m /= foo_size) stop

end program foo_p


Re: [committed] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-04-15 Thread Jan Hubicka
Hi,
this is patch fixing the underlying issue of function missing
lto_prepare_function_for_streaming because gimple_has_body_p is not the
same thing as node.has_gimple_body (which needs to be clarified next
stage1 by finding better names for this I suppose).

I commited it to gcc 11 even though we already have your workaround
since it is small and safe and it may save some pain when backporting
changes to the branch in future - basically all passes at WPA
renumbering statements would hit this issue which is not that obvious to
debug as we found :)

We may backport it to gcc10 too if you preffer it over your fix - I
think both are fine in general for release branches.

lto-bootstrapped/regtested x86_64-linux.

Honza

2021-04-15  Jan Hubicka  

PR lto/98599
* lto.c (lto_wpa_write_files): Fix handling of clones.

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index ceb61bb300b..5903f75ac23 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -306,7 +306,7 @@ lto_wpa_write_files (void)
   cgraph_node *node;
   /* Do body modifications needed for streaming before we fork out
  worker processes.  */
-  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+  FOR_EACH_FUNCTION (node)
 if (!node->clone_of && gimple_has_body_p (node->decl))
   lto_prepare_function_for_streaming (node);
 


Re: [WIP] Re: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]

2021-04-15 Thread Thomas Schwinge
Hi!

On 2021-04-09T13:00:39+0200, I wrote:
> On 2021-03-25T12:02:15+0100, I wrote:
>> On 2021-03-11T17:52:55+0100, I wrote:
>>> On 2021-02-23T22:52:38+0100, Jakub Jelinek via Gcc-patches 
>>>  wrote:
 On Tue, Feb 23, 2021 at 09:43:51PM +, Kwok Cheung Yeung wrote:
> On 19/02/2021 7:12 pm, Kwok Cheung Yeung wrote:
> > I have included the current state of my patch. All task-detach-* tests
> > pass when executed without offloading or with offloading to GCN, but
> > with offloading to Nvidia, task-detach-6.* hangs consistently but
> > everything else passes (probably because of the missing
> > gomp_team_barrier_done?).
>
> It looks like the hang has nothing to do with the detach patch - this 
> hangs
> consistently for me when offloaded to NVPTX:
>
> #include 
>
> int main (void)
> {
> #pragma omp target
>   #pragma omp parallel
> #pragma omp task
>   ;
> }
>
> This doesn't hang when offloaded to GCN or the host device, or if
> num_threads(1) is specified on the omp parallel.
>>>
>>> So, I reproduced this the hard way;
>>>  :-/
>>>
>>> Please always file issues when you run into such things.  I've now filed
>>> PR99555 "[OpenMP/nvptx] Execution-time hang for simple nested OpenMP
>>> 'target'/'parallel'/'task' constructs".
>>>
 Then it can be solved separately, I'll try to have a look if I see 
 something
 bad from the dumps, but I admit I don't have much experience with debugging
 NVPTX offloaded code...
>>>
>>> Any luck?
>>>
>>>
>>> Until this gets resolved properly, OK to push something like the attached
>>> (currently testing) "Avoid OpenMP/nvptx execution-time hangs for simple
>>> nested OpenMP 'target'/'parallel'/'task' constructs [PR99555]"?
>>
>> As posted, I've now pushed "Avoid OpenMP/nvptx execution-time hangs for
>> simple nested OpenMP 'target'/'parallel'/'task' constructs [PR99555]" to
>> master branch in commit d99111fd8e12deffdd9a965ce17e8a760d531ec3, see
>> attached.  "... awaiting proper resolution, of course."
>
>> +  if (on_device_arch_nvptx ())
>> +__builtin_abort (); //TODO Until resolved, skip, with error status.
>
> Actually, we can do better: do try to execute this trivial OpenMP code
> (expected to complete in no time), but for nvptx offloading "make sure
> that we exit quickly, with error status", and XFAIL that.  So that we'll
> get XFAIL -> XPASS when this starts to work for nvptx offloading.

Pushed "XFAIL OpenMP/nvptx execution-time hangs for simple nested OpenMP
'target'/'parallel'/'task' constructs [PR99555]" to master branch in
commit 4dd9e1c541e0eb921d62c8652c854b1259e56aac, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 4dd9e1c541e0eb921d62c8652c854b1259e56aac Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 7 Apr 2021 10:36:36 +0200
Subject: [PATCH] XFAIL OpenMP/nvptx execution-time hangs for simple nested
 OpenMP 'target'/'parallel'/'task' constructs [PR99555]

... still awaiting proper resolution, of course.

	libgomp/
	PR target/99555
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_nvptx): New.
	* testsuite/libgomp.c/pr99555-1.c : Until
	resolved, make sure that we exit quickly, with error status,
	XFAILed.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise.
	* testsuite/libgomp.fortran/task-detach-6.f90: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp| 12 
 .../testsuite/libgomp.c-c++-common/task-detach-6.c   |  5 -
 libgomp/testsuite/libgomp.c/pr99555-1.c  |  5 -
 libgomp/testsuite/libgomp.fortran/task-detach-6.f90  |  3 ++-
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 72d001186a5..14dcfdfd00a 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -401,6 +401,18 @@ proc check_effective_target_offload_device_shared_as { } {
 } ]
 }
 
+# Return 1 if using nvptx offload device.
+proc check_effective_target_offload_device_nvptx { } {
+return [check_runtime_nocache offload_device_nvptx {
+  #include 
+  #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
+  int main ()
+	{
+	  return !on_device_arch_nvptx ();
+	}
+} ]
+}
+
 # Return 1 if at least one Nvidia GPU is accessible.
 
 proc check_effective_target_openacc_nvidia_accel_present { } {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
index 119d7f52f8f..f18b57bf047 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
@@ -2,6 +2,8 @@
 
 #include 
 #include 
+#include  // For 

Re: [PATCH] [GCC-9] backport -march=tigerlake to GCC9 [PR target/100009]

2021-04-15 Thread Uros Bizjak via Gcc-patches
On Wed, Apr 14, 2021 at 3:30 AM Hongtao Liu  wrote:
>
> On Tue, Apr 13, 2021 at 6:38 PM Uros Bizjak  wrote:
> >
> > On Tue, Apr 13, 2021 at 12:18 PM Hongtao Liu  wrote:
> > >
> > > Hi:
> > >   As described in PR, we introduced tigerlake string in driver-i386.c
> > > by r9-8652 w/o support -march/tune=tigerlake which causes an error
> > > when using -march/tune=native with GCC9 on tigerlake machine.
> > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > >   Ok for GCC9?
> > >
> > > gcc/
> > > * common/config/i386/i386-common.c
> > > (processor_names): Add tigerlake.
> > > (processor_alias_table): Ditto.
> > > * config.gcc: Document -march=tigerlake.
> >
> > Nope. Better.
> >
> > (x86_64_archs): Ditto.
> >
> > > * config/i386/driver-i386.c
> > > (host_detect_local_cpu): Detect tigerlake, add "has_avx" to
> > > classify processor.
> > > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> > > tigerlake.
> >
> > Handle PROCESSOR_TIGERLAKE.
> >
> > > * config/i386/i386.c (m_TIGERLAKE)  : Define.
> > > (m_CORE_AVX512): Ditto.
> >
> > You don't define this macro, but you add m_TIGERLAKE to m_CORE_AVX512.
> > Please correct this confusion.
> >
> > > (processor_cost_table): Add tigerlake.
> >
> > Please correct the above. You added skylake_cost.
> >
> > > (ix86_option_override_internal): Handle PTA_MOVDIRI, 
> > > PTA_MOVDIR64B.
> >
> > Where?
> >
> > > (processor_model): Add M_INTEL_COREI7_TIGERLAKE.
> > > (arch_names_table): Add tigerlake.
> > > (get_builtin_code_for_version) : Handle PROCESSOR_TIGERLAKE.
> > > * config/i386/i386.h (TARGET_TIGERLAKE): Define.
> > > (processor_type) : Add PROCESSOR_TIGERLAKE.
> >
> > (enum processor_type)
> >
> > > (PTA_TIGERLAKE)  : Ditto.
> >
> > Ditto what? This is a new define.
> >
> > > * doc/extend.texi: Add tigerlake.
> > > * doc/invoke.texi: Add tigerlake.
> >
> > Added where? To which section?
> >
> > > gcc/testsuite/
> > > * gcc.target/i386/funcspec-56.inc: Handle new march.
> > > * g++.target/i386/mv16.C: Handle new march
> >
> > Dot.
> >
> > >
> > > libgcc/
> > > * config/i386/cpuinfo.h: Add INTEL_COREI7_TIGERLAKE.
> >
> > (enum processor_subtypes)
> > >
> > > From-SVN: r274693
> >
> > Please repost with improved/corrected ChangeLog.
> >
> > Uros.
> >
> > > --
> > > BR,
> > > Hongtao
>
> updated.
>
> gcc/
> * common/config/i386/i386-common.c
> (processor_names): Add tigerlake.
> (processor_alias_table): Ditto.
> * config.gcc (x86_64_archs): Ditto.
> * config/i386/driver-i386.c
> (host_detect_local_cpu): Detect tigerlake, add "has_avx" to
> classify processor.
> * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> PROCESSOR_TIGERLAKE.
> * config/i386/i386.c (m_TIGERLAKE): Define.
> (m_CORE_AVX512): Add m_TIGERLAKE.
> (processor_cost_table): Add skylake_cost for tigerlake.
> (processor_model): Add M_INTEL_COREI7_TIGERLAKE.
> (arch_names_table): Add tigerlake.
> (get_builtin_code_for_version): Handle PROCESSOR_TIGERLAKE.
> * config/i386/i386.h (TARGET_TIGERLAKE): Define.
> (enum processor_type): Add PROCESSOR_TIGERLAKE.
> (PTA_TIGERLAKE): Define.
> * doc/extend.texi (__builtin_cpu_is): Add tigerlake.
> * doc/invoke.texi (-march=cpu-type): Ditto.
>
> gcc/testsuite/
> * gcc.target/i386/funcspec-56.inc: Handle new march.
> * g++.target/i386/mv16.C: Handle new march.
>
> libgcc/
> * config/i386/cpuinfo.h (enum processor_subtypes): Add
> INTEL_COREI7_TIGERLAKE.

OK.

Thanks,
Uros.

>
>
> --
> BR,
> Hongtao


Re: [PATCH V6 2/7] dwarf: new dwarf_debuginfo_p predicate

2021-04-15 Thread Richard Biener via Gcc-patches
On Wed, Apr 14, 2021 at 4:07 PM Jose E. Marchesi via Gcc-patches
 wrote:
>
> This patch introduces a dwarf_debuginfo_p predicate that abstracts and
> replaces complex checks on write_symbols.

OK once stage1 opens (can be pushed independently of the rest).

Richard.

> 2021-04-14  Indu Bhagat  
>
> gcc/ChangeLog
>
> * flags.h (dwarf_debuginfo_p): New function declaration.
> * opts.c (dwarf_debuginfo_p): New function definition.
> * config/c6x/c6x.c (c6x_output_file_unwind): Likewise.
> * dwarf2cfi.c (cfi_label_required_p): Likewise.
> (dwarf2out_do_frame): Likewise.
> * final.c (dwarf2_debug_info_emitted_p): Likewise.
> (final_scan_insn_1): Likewise.
> * targhooks.c (default_debug_unwind_info): Likewise.
> * toplev.c (process_options): Likewise.
>
> gcc/c-family/ChangeLog
>
> * c-lex.c (init_c_lex): Use dwarf_debuginfo_p.
> ---
>  gcc/c-family/c-lex.c |  4 ++--
>  gcc/config/c6x/c6x.c |  3 +--
>  gcc/dwarf2cfi.c  |  9 -
>  gcc/final.c  | 15 ++-
>  gcc/flags.h  |  3 +++
>  gcc/opts.c   |  8 
>  gcc/targhooks.c  |  2 +-
>  gcc/toplev.c |  6 ++
>  8 files changed, 27 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
> index 6374b72ed2d..5174b22c303 100644
> --- a/gcc/c-family/c-lex.c
> +++ b/gcc/c-family/c-lex.c
> @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "stor-layout.h"
>  #include "c-pragma.h"
>  #include "debug.h"
> +#include "flags.h"
>  #include "file-prefix-map.h" /* remap_macro_filename()  */
>  #include "langhooks.h"
>  #include "attribs.h"
> @@ -87,8 +88,7 @@ init_c_lex (void)
>
>/* Set the debug callbacks if we can use them.  */
>if ((debug_info_level == DINFO_LEVEL_VERBOSE
> -   && (write_symbols == DWARF2_DEBUG
> -  || write_symbols == VMS_AND_DWARF2_DEBUG))
> +   && dwarf_debuginfo_p ())
>|| flag_dump_go_spec != NULL)
>  {
>cb->define = cb_define;
> diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c
> index f9ad1e5f6c5..a10e2f8d662 100644
> --- a/gcc/config/c6x/c6x.c
> +++ b/gcc/config/c6x/c6x.c
> @@ -439,8 +439,7 @@ c6x_output_file_unwind (FILE * f)
>  {
>if (flag_unwind_tables || flag_exceptions)
> {
> - if (write_symbols == DWARF2_DEBUG
> - || write_symbols == VMS_AND_DWARF2_DEBUG)
> + if (dwarf_debuginfo_p ())
> asm_fprintf (f, "\t.cfi_sections .debug_frame, .c6xabi.exidx\n");
>   else
> asm_fprintf (f, "\t.cfi_sections .c6xabi.exidx\n");
> diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
> index 362ff3fdac2..c27ac1960b0 100644
> --- a/gcc/dwarf2cfi.c
> +++ b/gcc/dwarf2cfi.c
> @@ -39,7 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "expr.h"  /* init_return_column_size */
>  #include "output.h"/* asm_out_file */
>  #include "debug.h" /* dwarf2out_do_frame, dwarf2out_do_cfi_asm */
> -
> +#include "flags.h" /* dwarf_debuginfo_p */
>
>  /* ??? Poison these here until it can be done generically.  They've been
> totally replaced in this file; make sure it stays that way.  */
> @@ -2289,8 +2289,7 @@ cfi_label_required_p (dw_cfi_ref cfi)
>
>if (dwarf_version == 2
>&& debug_info_level > DINFO_LEVEL_TERSE
> -  && (write_symbols == DWARF2_DEBUG
> - || write_symbols == VMS_AND_DWARF2_DEBUG))
> +  && dwarf_debuginfo_p ())
>  {
>switch (cfi->dw_cfi_opc)
> {
> @@ -3557,9 +3556,9 @@ bool
>  dwarf2out_do_frame (void)
>  {
>/* We want to emit correct CFA location expressions or lists, so we
> - have to return true if we're going to output debug info, even if
> + have to return true if we're going to generate debug info, even if
>   we're not going to output frame or unwind info.  */
> -  if (write_symbols == DWARF2_DEBUG || write_symbols == VMS_AND_DWARF2_DEBUG)
> +  if (dwarf_debuginfo_p ())
>  return true;
>
>if (saved_do_cfi_asm > 0)
> diff --git a/gcc/final.c b/gcc/final.c
> index daae115fef5..cae692062b4 100644
> --- a/gcc/final.c
> +++ b/gcc/final.c
> @@ -1442,7 +1442,8 @@ asm_str_count (const char *templ)
>  static bool
>  dwarf2_debug_info_emitted_p (tree decl)
>  {
> -  if (write_symbols != DWARF2_DEBUG && write_symbols != VMS_AND_DWARF2_DEBUG)
> +  /* When DWARF2 debug info is not generated internally.  */
> +  if (!dwarf_debuginfo_p ())
>  return false;
>
>if (DECL_IGNORED_P (decl))
> @@ -2330,10 +2331,8 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int 
> optimize_p ATTRIBUTE_UNUSED,
>   break;
>
> case NOTE_INSN_BLOCK_BEG:
> - if (debug_info_level == DINFO_LEVEL_NORMAL
> - || debug_info_level == DINFO_LEVEL_VERBOSE
> - || write_symbols == DWARF2_DEBUG
> - || write_symbols == VMS_AND_DWARF2_DEBUG
> + if 

Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-04-15 Thread Xionghu Luo via Gcc-patches
Thanks,

On 2021/4/14 14:41, Richard Biener wrote:
>> "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
>> but it moves #538 first, then #235, there is strong dependency here. It
>> seemsdoesn't like the LCM framework that could solve all and do the
>> delete-insert in one iteration.
> So my question was whether we want to do both within the LCM store
> sinking framework.  The LCM dataflow is also used by RTL PRE which
> handles both loads and non-loads so in principle it should be able
> to handle stores and non-stores for the sinking case (PRE on the
> reverse CFG).
> 
> A global dataflow is more powerful than any local ad-hoc method.

My biggest concern is whether the LCM DF framework could support sinking
*multiple* reverse-dependent non-store instructions together by *one*
calling of LCM DF.   If this is not supported, we need run multiple LCM
until no new changes, it would be time consuming obviously (unless
compiling time is not important here).

> 
> Richard.
> 
>> However, there are still some common methods could be shared, like the
>> def-use check(though store-motion is per bb, rtl-sink is per loop),
>> insert_store, commit_edge_insertions etc.
>>
>>
>>508: L508:
>>507: NOTE_INSN_BASIC_BLOCK 34
>> 12: r139:DI=r140:DI
>>REG_DEAD r140:DI
>>240: L240:
>>231: NOTE_INSN_BASIC_BLOCK 35
>>232: r142:DI=zero_extend(r139:DI#0)
>>233: r371:SI=r142:DI#0-0x1
>>234: r243:DI=zero_extend(r371:SI)
>>REG_DEAD r371:SI
>>235: r452:DI=r262:DI+r139:DI
>>538: r194:DI=r452:DI
>>236: r372:CCUNS=cmp(r142:DI#0,r254:DI#0)


Like here, Each instruction's dest reg is calculated in the input vector
bitmap, after solving the equations by calling pre_edge_rev_lcm, 
move #538 out of loop for the first call, then move #235 out of loop
after a second call... 4 repeat calls needed in total here, is the LCM
framework smart enough to move the all 4 instruction within one iteration?
I am worried that the input vector bitmap couldn't solve the dependency
problem for two back chained instructions.


-- 
Thanks,
Xionghu